Systematic feature evaluation for gene name recognition

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Jörg Hakenberg
  • Steffen Bickel
  • Conrad Plake
  • Ulf Brefeld
  • Hagen Zahn
  • Lukas Faulstich
  • Ulf Leser
  • Tobias Scheffer

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

Original languageEnglish
Article numberS9
JournalBMC Bioinformatics
Volume6
Issue numberSUPPL.1
Number of pages11
ISSN1471-2105
DOIs
Publication statusPublished - 24.05.2005
Externally publishedYes

Recently viewed

Activities

  1. Preliminary selection of experimental techniques in Subtask D
  2. Dealing with temporal tensions in drug development processes
  3. Language Policy and Language Learning: New Paradigms and New Challenges - LPLL 2009
  4. Teach the teachers program
  5. Draw the line? How boundary creation behaviors at the end of work relate to recovery and next-day work performance
  6. 1st Global Conference on Research Integration and Implementation - i2S 2013
  7. International Conference on Methods and Models in Automation an Robotics - MMAR 2016
  8. Implementing Sustainability Strategies Through Accounting Controls: An Exploration of Practices in Seven Multinational Corporations
  9. Understanding Corruption by Means of Experiments
  10. 4th Keep it Simple Make it Fast! - KISMIF 2018
  11. 2nd International Conference on Advances in Data-driven Computing and Intelligent Systems (ADCIS 2023)
  12. Workshop ‚Independent Local Lists – A comparative perspective’ - 2007
  13. The paths and parts one picture paints: Tracing a visual’s multimodal and relational boundary work in an interorganizational team
  14. Trajectory based Analysis and visualisation of coherent flow structures in stirred tank reactors
  15. Evolutionary clustering of Lagrangian trajectories in turbulent convection
  16. In-Depth Interviews and Data Analysis
  17. Contagious Agents: From Generative Social Science to the Computer Simulation of Epidemics
  18. A Framework for Text Analytics in Online Interventions
  19. Tilling the fields of knowledge in sustainability-oriented science
  20. Geometric Algorithms in Mechanical Systems
  21. Performativity and Authenticity in the Web 2.0-Enhanced Foreign Language Classroom
  22. Fuzzy based control of a nonholonomic car-like robot for drive assistant systems
  23. Synthesis: Interconnectedness of learning analytics, digital badges, and generic skills to enhance student retention

Publications

  1. Modelling, Simulation and Experimental Analysis of a Metal-Polymer Hybrid Fibre based Microstrip Resonator for High Frequency Characterisation
  2. Recent Advances in Intelligent Algorithms for Fault Detection and Diagnosis
  3. Quality Assurance Methods and the Open Source Model
  4. Control of a Three-Axis Robot with Super Twisting Sliding Mode Control
  5. Learning from Erroneous Examples
  6. Training effects of two different unstable shoe constructions on postural control in static and dynamic testing situations
  7. From entity to process
  8. Effects of maize roots on aggregate stability and enzyme activities in soil
  9. PI and Fuzzy Controllers for Non-Linear Systems
  10. Using Wikipedia for Cross-Language Named Entity Recognition
  11. Geometric structures for the parameterization of non-interacting dynamics for multi-body mechanisms
  12. Assembly Theory for Restoring Ecosystem Structure and Functioning
  13. A cognitive mapping approach to understanding public objection to energy infrastructure
  14. Advances in Computer Science and Engineering
  15. Unraveling Privacy Concerns in Complex Data Ecosystems with Architectural Thinking
  16. A Trajectory Generation Algorithm for Optimal Consumption in Electromagnetic Actuators
  17. On the origin of passive rotation in rotational joints, and how to calculate it
  18. Gamma GAMM applied on tree growth data
  19. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  20. Set oriented approximation of invariant manifolds
  21. Integrating adaptation and mitigation to climatic changes