Systematic feature evaluation for gene name recognition

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

Original languageEnglish
Article numberS9
JournalBMC Bioinformatics
Volume6
Issue numberSUPPL.1
Number of pages11
ISSN1471-2105
DOIs
Publication statusPublished - 24.05.2005
Externally publishedYes

Recently viewed

Activities

  1. Employer Longevity Readiness Index Workshop: Session 2: How do you build a longevity readiness Index?
  2. (De)composing Public Value: New Evidence for Basic Structures
  3. Interaction of Art and (Social)Science
  4. Validity of a mathematics test for the selection of university applicants for teacher training
  5. Operational Research (Zeitschrift)
  6. Scroll
  7. Breaks and age related strain in continuous physical work
  8. Knowledge of result versus elaborated feedback: Students‘ perception of feedback on a digital learning platform
  9. Flexible Lernwege: Studium Individuale
  10. Do mathematics teachers promote the use of multiple representations in learning support situations? - Results from a video analysis.
  11. IMISCOE (Verlag)
  12. Exploring Urban Music Studies (Roundtable)
  13. Monomers release from composite materials after halogen and LED curing.
  14. Exploring priority effects in a central European grassland field experiment in order to inform restoration (Symposium)
  15. Imagining real utopia: An empirical exploration of organizing alternative projects
  16. Artificial Intelligence in Criminal Law
  17. Migrations of Knowledge - Migknow 2014
  18. Rational Design of Molecules by Life Cycle Engineering.
  19. International Conference of Mathematical Modelling and Applications - ICTMA 17
  20. Intercultural Relations in Practice 2017
  21. DIY as a Mode of Organization
  22. Sprach-Los - Grenzen-Los
  23. “Relying on Spontaneity”
  24. Prototypes: The Usefulf Ambiguity of the „Biological Computer" (Annual Meeting of the AMERICAN SOCIETY FOR CYBERNETICS)
  25. Der "als-ob" Modus: Polizei, Protest, Staatlichkeit
  26. Provenance as (Linked) Data
  27. Urban spaces of possibility and imaginaries of sustainability
  28. AHV Writing Workshop - 2013
  29. Workshop Open Educational Ressourcen für das Sprachenlernen
  30. Co-creating transformative processes - a designerly approach
  31. Carl Einstein Re-Visited.
  32. 5th Int. Summer Academy „Energy and the Environment“ 2008
  33. 7th Space, Creativity, and Organization Workshop - SCO 2022
  34. Is the question of democracy a blind spot in the debate about transformation?