Systematic feature evaluation for gene name recognition

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

Original languageEnglish
Article numberS9
JournalBMC Bioinformatics
Volume6
Issue numberSUPPL.1
Number of pages11
ISSN1471-2105
DOIs
Publication statusPublished - 24.05.2005
Externally publishedYes

Recently viewed

Publications

  1. How generative drawing affects the learning process
  2. Control of a Three-Axis Robot with Super Twisting Sliding Mode Control
  3. Training effects of two different unstable shoe constructions on postural control in static and dynamic testing situations
  4. From entity to process
  5. Effects of maize roots on aggregate stability and enzyme activities in soil
  6. Errors, error taxonomies, error prevention, and error management
  7. Geometric structures for the parameterization of non-interacting dynamics for multi-body mechanisms
  8. A cognitive mapping approach to understanding public objection to energy infrastructure
  9. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  10. Set oriented approximation of invariant manifolds
  11. Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers
  12. There is no Software, there are just Services: Introduction
  13. Optimal dynamic scale and structure of a multi-pollution economy
  14. The dynamics of prior entry in serial visual processing
  15. Errors in Training Computer Skills
  16. Highlight, Write, Elaborate: Note-Taking Strategies to Master Reality-Based Mathematical Tasks
  17. An Overview of Electro Hydraulic Full Variable Valve Train Systems to Reduce Emissions in Internal Combustion Engines
  18. Mapping Complexity in Environmental Governance
  19. ASSESS — automatic self-assessment using linked data
  20. How alloying and processing effects can influence the microstructure and mechanical properties of directly extruded thin zinc wires
  21. Schooling, local knowledge and working memory
  22. Introduction: Habitual Action, Automaticity, and Control
  23. Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge
  24. Biodiversity in space and time - towards a grid mapping for Mongolia
  25. archiDART: an R package for the automated computation of plant root architectural traits
  26. AGDISTIS-agnostic disambiguation of named entities using linked open data
  27. Pathways of Data-driven Business Model Design and Realization