Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Systematic feature evaluation for gene name recognition. / Hakenberg, Jörg; Bickel, Steffen; Plake, Conrad et al.
in: BMC Bioinformatics, Jahrgang 6, Nr. SUPPL.1, S9, 24.05.2005.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

Hakenberg, J, Bickel, S, Plake, C, Brefeld, U, Zahn, H, Faulstich, L, Leser, U & Scheffer, T 2005, 'Systematic feature evaluation for gene name recognition', BMC Bioinformatics, Jg. 6, Nr. SUPPL.1, S9. https://doi.org/10.1186/1471-2105-6-S1-S9

APA

Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., & Scheffer, T. (2005). Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(SUPPL.1), Artikel S9. https://doi.org/10.1186/1471-2105-6-S1-S9

Vancouver

Hakenberg J, Bickel S, Plake C, Brefeld U, Zahn H, Faulstich L et al. Systematic feature evaluation for gene name recognition. BMC Bioinformatics. 2005 Mai 24;6(SUPPL.1):S9. doi: 10.1186/1471-2105-6-S1-S9

Bibtex

@article{145a0b45e5f644dca22d39068f96349a,
title = "Systematic feature evaluation for gene name recognition",
abstract = "In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.",
keywords = "Informatics, Business informatics",
author = "J{\"o}rg Hakenberg and Steffen Bickel and Conrad Plake and Ulf Brefeld and Hagen Zahn and Lukas Faulstich and Ulf Leser and Tobias Scheffer",
year = "2005",
month = may,
day = "24",
doi = "10.1186/1471-2105-6-S1-S9",
language = "English",
volume = "6",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "SUPPL.1",

}

RIS

TY - JOUR

T1 - Systematic feature evaluation for gene name recognition

AU - Hakenberg, Jörg

AU - Bickel, Steffen

AU - Plake, Conrad

AU - Brefeld, Ulf

AU - Zahn, Hagen

AU - Faulstich, Lukas

AU - Leser, Ulf

AU - Scheffer, Tobias

PY - 2005/5/24

Y1 - 2005/5/24

N2 - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

AB - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=33947304479&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/483dea0a-b292-3915-b21a-ee2b226bc166/

U2 - 10.1186/1471-2105-6-S1-S9

DO - 10.1186/1471-2105-6-S1-S9

M3 - Journal articles

C2 - 15960843

AN - SCOPUS:33947304479

VL - 6

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.1

M1 - S9

ER -

DOI

Zuletzt angesehen

Forschende

  1. Horst Rode

Aktivitäten

  1. Statistische Woche - 2013
  2. Histories of Media Art (Networking) in Deep Europe in the 1990s
  3. What impact does a field experience have on on pre-service teachers' adaptive peer feedback expertise?
  4. Development Entrepreneurship and Personal Initiatives: Long term massive randomized experiments on personal inititiative training for entrepreneurs to reduce poverty in developing countries
  5. The effects of pragmatic intervention on directives in EIL feedback speech events
  6. Religious Activity, Risk Taking Preferences, and Financial Economic Behavior: Empirical Evidence from German Survey Data
  7. A diary study on the social dynamics of knowledge hiding and the role of entitlement
  8. What we mean when we talk about freedom – The KOMFOR study: an analysis of students' choices of courses in interdisciplinary parts of the curriculum.
  9. Lehrerfortbildung 2012
  10. Fostering inter-institutional Development Teams in ITE & School Practice: The Significance of epistemic, social and organisational integration.
  11. Situating Global Art - 2015
  12. Congress of Applied Psychology - IAAP 2006
  13. Field release modelling of pesticides and their transformation products during a first significant rainfall in a semi-arid region
  14. Developed materials for thermal energy storage: Design and Characterization
  15. Der neue EU-Nachhaltigkeitsbericht nach der CSRD. Fluch oder Segen?“
  16. Promoting Pre-Service Teachers' Professional Vision of Classroom Management During Practical School Training: An Online- and Video-Based Self-Reflection and Feedback Intervention
  17. Programm-Workshop zur Zukunft der Arbeitsforschung
  18. Lehrerfortbildung 2010
  19. Zoological Systematics (Fachzeitschrift)
  20. Graduate School (Organisation)

Publikationen

  1. The language of situated joint activity: Social virtual reality and language learning in virtual exchange
  2. An introduction to sliding mode control for interdisciplinary education
  3. U-model-based dynamic inversion control for quadrotor UAV systems
  4. Sustainable development and learning for sustainability through a regional network project
  5. Alcohol Breeds Empty Goal Commitments
  6. DESI
  7. Towards the design of organosilicon compounds for environmental degradation by using structure biodegradability relationships
  8. Resisting foundations
  9. What makes online professional development effective?
  10. Learning pragmatic routines during study abroad
  11. Advancing the Integration of Corporate Sustainability Measurement, Management, and Reporting
  12. Entscheide du
  13. International business and the eclectic paradigm, developing the OLI framework , ed. by John Cantwell and Rajneesh Narula
  14. Perspective taking does not moderate the price precision effect, but indirectly affects counteroffers to asking prices
  15. What drives the development of community energy in Europe?
  16. Biodegradability and ecotoxicitiy of tramadol, ranitidine, and their photoderivatives in the aquatic environment
  17. Entity Extraction from Portuguese Legal Documents Using Distant Supervision
  18. Der Markenwert
  19. ‚Permanenter Ausnahmezustand’, ‚Netzkrieg’ oder doch ‚Zivilgesellschaft’?
  20. Personalakte inkl. Datenschutz, Datenschutzbeauftragter
  21. Mockular
  22. THE SOVIET-UNION UNDERGOING RADICAL CHANGE - HISTORICAL BACKGROUND, OBJECTIVES AND LIMITS OF THE REFORM POLICY OF GORBACHEV - GERMAN - MEISSNER,B