Systematic feature evaluation for gene name recognition

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

Original languageEnglish
Article numberS9
JournalBMC Bioinformatics
Volume6
Issue numberSUPPL.1
Number of pages11
ISSN1471-2105
DOIs
Publication statusPublished - 24.05.2005
Externally publishedYes

Recently viewed

Researchers

  1. Matthias Klöppner

Publications

  1. Jackson networks in nonautonomous random environments
  2. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  3. A cascade controller structure using an internal PID controller for a hybrid piezo-hydraulic actuator in camless internal combustion engines
  4. Perfectly nested or significantly nested - an important difference for conservation management
  5. Proxy Indicators for the Quality of Open-domain Dialogues
  6. Phosphorus uptake from struvite is modulated by the nitrogen form applied
  7. Erratum: Formalised and non-formalised methods in resource management-knowledge and social learning in participatory processes
  8. Machine Learning Applications
  9. Theory-based course design for professional master's degree program in business engineering
  10. Typewriting Dynamics
  11. Variational pragmatics in the foreign language classroom
  12. Subverting Autocracy
  13. Basic analysis of the incremental profile forming process
  14. New trends in pragmatics
  15. The Crowd in Flux
  16. Overyielding in experimental grassland communities - Irrespective of species pool or spatial scale
  17. The role of supervisor support for dealing with customer verbal aggression. Differences between ethnic minority and ethnic majority workers
  18. Ästhetikkolumne
  19. Mouseology – Ludic Interfaces – Zero Interfaces
  20. SAP exchange infrastructure for developers
  21. Idiosyncratic volatility, option-based measures of informed trading, and investor attention
  22. Lesetechnik
  23. Over here and over there
  24. Numerical investigation of laser beam-welded AA2198 joints under different artificial ageing conditions
  25. Attention and the Speed of Information Processing
  26. Futures loss, despair and empowerment work in the University of Vechta: an action research project
  27. Credit constraints, endogenous innovations, and price setting in international trade
  28. Multiculturalism in Canada
  29. Studien zu einer Ethik der Enttäuschung
  30. Two-pass friction stir welding of cladded API X65
  31. Asynchrone Objekte
  32. Tritheism
  33. Schwarz-weiß in Farbe
  34. All production is joint production - A thermodynamic analysis
  35. Introduction: Children's Literature Global and Local
  36. Das Ethos reiner Fraulichkeit
  37. Rechtliche Aspekte
  38. The F.D.P.
  39. Testing Lazear's jack-of-all-trades view of entrepreneurship with German micro data
  40. Didactics of Mathematics in Higher Education as a Scientific Discipline - Conference Proceedings
  41. Vom Abfall zum Einfall