Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer
In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.
OriginalspracheEnglisch
AufsatznummerS9
ZeitschriftBMC Bioinformatics
Jahrgang6
AusgabenummerSUPPL.1
Anzahl der Seiten11
ISSN1471-2105
DOIs
PublikationsstatusErschienen - 24.05.2005
Extern publiziertJa

DOI

Zuletzt angesehen

Forschende

  1. Kerstin Fedder

Publikationen

  1. Semiparametric one-step estimation of a sample selection model with endogenous covariates
  2. A Soft Alignment Model for Bug Deduplication
  3. Modeling of temperature- and strain-driven intermetallic compound evolution in an Al-Mg system via a multiphase-field approach with application to refill friction stir spot welding
  4. Analysis of a phase‐field finite element implementation for precipitation
  5. Scaffolding Learner Agency in Technology-Enhanced Language Learning Environments
  6. Application of friction surfacing for solid state additive manufacturing of cylindrical shell structures
  7. Artificial Intelligence in Foreign Language Learning and Teaching
  8. Teaching Sustainable Development in a Sensory and Artful Way — Concepts, Methods, and Examples
  9. Drafts in Action
  10. Are Acute Effects of Foam-Rolling Attributed to Dynamic Warm Up Effects? A Comparative Study
  11. The Framework for Inclusive Science Education
  12. Legitimation problems of participatory processes in technology assessment and technology policy
  13. Digital Business Transformation and the Changing Role of the IT Function
  14. User Authentication via Multifaceted Mouse Movements and Outlier Exposure
  15. Microstructure-based modeling of residual stresses in WC-12Co-sprayed coatings
  16. Predicate‐based model of problem‐solving for robotic actions planning
  17. Digital teaching as an instrument for cross-location teaching networks in medical informatics
  18. Kommentar zu Ute Tellmann
  19. The role of place in shaping responsibility logics
  20. On the Equivalence of Transmission Problems in Nonoverlapping Domain Decomposition Methods for Quasilinear PDEs
  21. From Open Access to Open Science
  22. Nonlinear anisotropic boundary value problems – regularity results and multiscale discretizations
  23. Conjunctive cohesion in English language EU documents - A corpus-based analysis and its implications
  24. How generative drawing affects the learning process
  25. Metrics for Experimentation Programs: Categories, Benefits and Challenges