Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Systematic feature evaluation for gene name recognition. / Hakenberg, Jörg; Bickel, Steffen; Plake, Conrad et al.
in: BMC Bioinformatics, Jahrgang 6, Nr. SUPPL.1, S9, 24.05.2005.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

Hakenberg, J, Bickel, S, Plake, C, Brefeld, U, Zahn, H, Faulstich, L, Leser, U & Scheffer, T 2005, 'Systematic feature evaluation for gene name recognition', BMC Bioinformatics, Jg. 6, Nr. SUPPL.1, S9. https://doi.org/10.1186/1471-2105-6-S1-S9

APA

Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., & Scheffer, T. (2005). Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(SUPPL.1), Artikel S9. https://doi.org/10.1186/1471-2105-6-S1-S9

Vancouver

Hakenberg J, Bickel S, Plake C, Brefeld U, Zahn H, Faulstich L et al. Systematic feature evaluation for gene name recognition. BMC Bioinformatics. 2005 Mai 24;6(SUPPL.1):S9. doi: 10.1186/1471-2105-6-S1-S9

Bibtex

@article{145a0b45e5f644dca22d39068f96349a,
title = "Systematic feature evaluation for gene name recognition",
abstract = "In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.",
keywords = "Informatics, Business informatics",
author = "J{\"o}rg Hakenberg and Steffen Bickel and Conrad Plake and Ulf Brefeld and Hagen Zahn and Lukas Faulstich and Ulf Leser and Tobias Scheffer",
year = "2005",
month = may,
day = "24",
doi = "10.1186/1471-2105-6-S1-S9",
language = "English",
volume = "6",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "SUPPL.1",

}

RIS

TY - JOUR

T1 - Systematic feature evaluation for gene name recognition

AU - Hakenberg, Jörg

AU - Bickel, Steffen

AU - Plake, Conrad

AU - Brefeld, Ulf

AU - Zahn, Hagen

AU - Faulstich, Lukas

AU - Leser, Ulf

AU - Scheffer, Tobias

PY - 2005/5/24

Y1 - 2005/5/24

N2 - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

AB - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=33947304479&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/483dea0a-b292-3915-b21a-ee2b226bc166/

U2 - 10.1186/1471-2105-6-S1-S9

DO - 10.1186/1471-2105-6-S1-S9

M3 - Journal articles

C2 - 15960843

AN - SCOPUS:33947304479

VL - 6

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.1

M1 - S9

ER -

DOI

Zuletzt angesehen

Aktivitäten

  1. Modelling Ecosystem Services - 2009
  2. 23rd (EC)2 Conference - Hypothesis Testing - EC2 2012
  3. UV photodegradation of trimipramine under different environmental variables and chemical nature of aqueous solution - biodegradation and LC-MSn characterization of the formed transformation products
  4. Eigenzeiten of Creativity – Temporal Work as a Coordination Challenge in Artistic and Scientific Project Ecologies
  5. Transdisciplinary Evaluation of Different Coastal Adaptation Strategies: Integrating Regional Perceptions of Scientists, Practitioners and the Public
  6. Student Gender and Teachers' Grading and Written Feedback on Math or Language Assignments
  7. Prototyping in der transdisziplinären Teamarbeit
  8. Video or Text Cases in Problem-Oriented or Direct Instructional Settings for Preservice Teachers?
  9. Group Decision and Negotiation (Fachzeitschrift)
  10. The Rhetoric of Disillusionment. Discursive Shifts in the Rhetoric of "There is no alternative"
  11. Building Collective Institutional Infrastructures for Decent Platform Work: The Development of a Crowdwork Agreement in Germany
  12. It’s hard to part with gains, but what about losses. Contribution and Distribution of Benefits and Burdens in Integrative Negotiations
  13. 18th International Conference on Pragmatics and Language Learning - 2010 (Veranstaltung)
  14. Containing and Accomodating Salafism in the Sahel: Insights and Lessons from Niger
  15. Designmethoden in transdisziplinären Teams
  16. Social Entrepreneurship - an introduction: ERASMUS guest lecture
  17. The Influence of Media-Politics-Parallelism on Political Participation and Pluralism
  18. Founding moral theory: a Meillassouxian perspective on Kant’s postulate problem
  19. From sensors and trajectories to transport and mixing
  20. It’s hard to part with gains, but what about losses. Contribution and Distribution of Benefits and Burdens in Integrative Negotiations
  21. Veranstaltungsreihe "Brown Bag Lectures" am Institute of English Studies