Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer
In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.
OriginalspracheEnglisch
AufsatznummerS9
ZeitschriftBMC Bioinformatics
Jahrgang6
AusgabenummerSUPPL.1
Anzahl der Seiten11
ISSN1471-2105
DOIs
PublikationsstatusErschienen - 24.05.2005
Extern publiziertJa

DOI

Zuletzt angesehen

Aktivitäten

  1. HyperKult VII - Computer als Medium: Cut, Copy & Paste 1998 
  2. (Un)regulated affect: sensing moods and analyzing sentiments from pre-individual intensities as a new modulation of control
  3. 2nd Organizing Creativity Transalpine Paper Development Workshop
  4. Workshop on the Exploration of Low Temperature plasma Physics - WELTPP 2019
  5. transcript (Externe Organisation)
  6. Methods workshop: A social science perspective on coastal issues
  7. Presentation of the paper entitled "Conception and analysis of Cascaded Dual Kalman Filters as virtual sensors for mastication activity of stomatognathic craniomandibular system"
  8. Conference presentation: The Relationship between the Internal Audit Function and the Audit Committee
  9. Multi-level Governance, Policy Implementation & Participation: The Mandated Participatory Planning Approach to Implementing Environmental Policy
  10. Policy learning and evidence-based governance in mandated participatory planning
  11. HyperKult IX - Computer als Medium: Augmented Space 2000
  12. Grenzflächen der Informatik 2004
  13. Digitalization and Organizational Learning: Use the Double-Loop
  14. Workshop „Science Fiction – Die Zukünfte des Computers” 2002
  15. The Norms of Behaviour in Space. Our space - Whose rules?
  16. Language Demands of the Language Market: A Predictor of Students‘ Language Skills?
  17. Methodology of Scenario Technique in Regional Development Processes
  18. Setting up camp: emerging organizing around contested issues in the context of covert and illegal living on campsites
  19. Development and Validation of the Later Life Workplace Index for Successful Management of an Aging Workforce
  20. Workshop: The Practical Relevance of Theory in Times of Societal Division
  21. Towards the sustainable university: approaches, indicators, good practice
  22. 2021 3rd International Conference on Soft Computing and its Engineering Applications
  23. The bumpy road from investigation to knowledge
  24. A lecture
  25. Mini-Workshop DFG-SPP 1881
  26. Plenary Speaker at the 2021 4th IEEE International Conference on Information Communication and Signal Processing (ICICSP 2021)
  27. Managing Turnover
  28. Swarm Robotics
  29. Conference presentation: The Relationship between the Internal Audit Function and the Audit Committee. An empirical analysis for the One- and Two Tier-System

Publikationen

  1. Collaborative open science as a way to reproducibility and new insights in primate cognition research
  2. Implementation of a Blended-Learning Course as Part of Faculty Development
  3. How generative drawing affects the learning process
  4. Bayesian Parameter Estimation in Green Business Process Management
  5. Forging of cast Mg-3Sn-2Ca-0.4Al-0.4Si magnesium alloy using processing map
  6. Parameterized Synthetic Image Data Set for Fisheye Lens
  7. Interactions between ecosystem properties and land use clarify spatial strategies to optimize trade-offs between agriculture and species conservation
  8. Effect of yttrium addition on lattice parameter, Young's modulus and vacancy of magnesium
  9. Effects of an expressive writing intervention (EWI) with women treated for breast cancer explored with recurrence quantification analysis (RQA) of changes in text structure - a proof-of-concept study
  10. The link between in- and external rotation of the auditor and the quality of financial accounting and audit
  11. On the Difficulty of Forgetting
  12. Othering Space
  13. Overcoming physical distancing in online communities to create human spaces for societal transformations
  14. Using a CRIS to reduce workload and increase quality for research reporting and university marketing
  15. Wireless power transmission via a multi-coil inductive system
  16. Combining Evaluative and Generative Diagnosis in ActiveMath
  17. Visual Frames – Framing Visuals
  18. Neural Networks for Energy Optimization of Production Processes in Small and Medium Sized Enterprises
  19. Temperature control in Peltier cells comparing sliding mode control and PID controllers