Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer
In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.
OriginalspracheEnglisch
AufsatznummerS9
ZeitschriftBMC Bioinformatics
Jahrgang6
AusgabenummerSUPPL.1
Anzahl der Seiten11
ISSN1471-2105
DOIs
PublikationsstatusErschienen - 24.05.2005
Extern publiziertJa

DOI

Zuletzt angesehen

Publikationen

  1. Digital Business Transformation and the Changing Role of the IT Function
  2. The buffering effect of selection, optimization, and compensation strategy use on the relationship between problem solving demands and occupational well-being
  3. (How) Can didactic research find its way into the classroom? Results from a questionnaire survey on the lesson preparation and continuing professional development of German teachers
  4. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  5. Non-acceptances in context
  6. Toward Data-Driven Analyses of Electronic Text Books
  7. The Benefit of Web- and Computer-Based Interventions for Stress
  8. Knowledge Decolonization à la Grounded Theory
  9. Root-root interactions: extending our perspective to be more inclusive of the range of theories in ecology and agriculture using in-vivo analyses
  10. Modelling, explaining, enacting and getting feedback: How can the acquisition of core practices in teacher education be optimally fostered?
  11. What factors enable social-ecological transformative potential? The role of learning practices, empowerment, and networking
  12. Experience from downscaling IPCC-SRES scenarios to specific national-level focus scenarios for ecosystem service management
  13. Peter's positions: a diffractive analysis of authority in a year one classroom
  14. Complex Trait-Treatment-Interaction analysis
  15. Unveiling local knowledge
  16. Mining product configurator data
  17. Hot forging of cast magnesium alloy TX31 using semi-closed die and its finite element simulation
  18. ℓp-norm multiple kernel learning
  19. Mapping industrial patterns in spatial agglomeration
  20. Learning spaces in multi-stakeholder initiatives
  21. Enforcement concepts and strategies in the EU
  22. Microstructure, mechanical and functional properties of refill friction stir spot welds on multilayered aluminum foils for battery application
  23. Requirements to modern semantic search engine
  24. Clusteranalyse als Methode zur Strukturierung großer Datenmodelle
  25. Introduction
  26. Animating embryos
  27. Towards combined methods for recording ground beetles
  28. Deeper Insights into Different Consumer Perceptions of CSR Communication