Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer
In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.
OriginalspracheEnglisch
AufsatznummerS9
ZeitschriftBMC Bioinformatics
Jahrgang6
AusgabenummerSUPPL.1
Anzahl der Seiten11
ISSN1471-2105
DOIs
PublikationsstatusErschienen - 24.05.2005
Extern publiziertJa

DOI

Zuletzt angesehen

Forschende

  1. Stephan Scheel

Publikationen

  1. Determinants of union membership in 18 EU countries
  2. Scaffolding Learner Agency in Technology-Enhanced Language Learning Environments
  3. Enhancing EFL classroom instruction via the FeedBook: effects on language development and communicative language use.
  4. Teaching Sustainable Development in a Sensory and Artful Way — Concepts, Methods, and Examples
  5. User Authentication via Multifaceted Mouse Movements and Outlier Exposure
  6. An introduction to sliding mode control for interdisciplinary education
  7. Exploring priority and year effects on plant diversity, productivity and vertical root distribution: first insights from a grassland field experiment
  8. Adaptive capacity and learning to learn as leverage for social-ecological resilience
  9. The Framework for Inclusive Science Education
  10. Microstructure-based modeling of residual stresses in WC-12Co-sprayed coatings
  11. A high-resolution approach for the spatiotemporal analysis of forest canopy space using terrestrial laser scanning data
  12. How generative drawing affects the learning process
  13. The role of place in shaping responsibility logics
  14. An Optimal and Stabilising PI Controller with an Anti-windup Scheme for a Purification Process of Potable Water
  15. Direct parameter specification of an attention shift: Evidence from perceptual latency priming
  16. Metrics for Experimentation Programs: Categories, Benefits and Challenges
  17. A Trajectory Generation Algorithm for Optimal Consumption in Electromagnetic Actuators
  18. General management principles and a checklist of strategies to guide forest biodiversity conservation
  19. Development of a Parameterized Model for Additively Manufactured Dies to Control the Strains in Extrudates
  20. Stressing the Relevance of Differentiating between Systematic and Random Measurement Errors in Ultrasound Muscle Thickness Diagnostics
  21. The buffering effect of selection, optimization, and compensation strategy use on the relationship between problem solving demands and occupational well-being
  22. Bayesian Parameter Estimation in Green Business Process Management
  23. What motivates people to use energy feedback systems? A multiple goal approach to predict long-term usage behaviour in daily life
  24. A Process Perspective on Organizational Failure
  25. Design of Reliable Remobilisation Finger Implants with Geometry Elements of a Triple Periodic Minimal Surface Structure via Additive Manufacturing of Silicon Nitride
  26. Emotional text design in multimedia learning
  27. Evaluating A Teaching-Learning Sequence (TLS) About Acid-Base Reactions In Upper Secondary School
  28. Implementation of Chemometric Tools to Improve Data Mining and Prioritization in LC-HRMS for Nontarget Screening of Organic Micropollutants in Complex Water Matrixes