Systematic feature evaluation for gene name recognition

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Systematic feature evaluation for gene name recognition. / Hakenberg, Jörg; Bickel, Steffen; Plake, Conrad et al.
in: BMC Bioinformatics, Jahrgang 6, Nr. SUPPL.1, S9, 24.05.2005.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

Hakenberg, J, Bickel, S, Plake, C, Brefeld, U, Zahn, H, Faulstich, L, Leser, U & Scheffer, T 2005, 'Systematic feature evaluation for gene name recognition', BMC Bioinformatics, Jg. 6, Nr. SUPPL.1, S9. https://doi.org/10.1186/1471-2105-6-S1-S9

APA

Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., & Scheffer, T. (2005). Systematic feature evaluation for gene name recognition. BMC Bioinformatics, 6(SUPPL.1), Artikel S9. https://doi.org/10.1186/1471-2105-6-S1-S9

Vancouver

Hakenberg J, Bickel S, Plake C, Brefeld U, Zahn H, Faulstich L et al. Systematic feature evaluation for gene name recognition. BMC Bioinformatics. 2005 Mai 24;6(SUPPL.1):S9. doi: 10.1186/1471-2105-6-S1-S9

Bibtex

@article{145a0b45e5f644dca22d39068f96349a,
title = "Systematic feature evaluation for gene name recognition",
abstract = "In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.",
keywords = "Informatics, Business informatics",
author = "J{\"o}rg Hakenberg and Steffen Bickel and Conrad Plake and Ulf Brefeld and Hagen Zahn and Lukas Faulstich and Ulf Leser and Tobias Scheffer",
year = "2005",
month = may,
day = "24",
doi = "10.1186/1471-2105-6-S1-S9",
language = "English",
volume = "6",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "SUPPL.1",

}

RIS

TY - JOUR

T1 - Systematic feature evaluation for gene name recognition

AU - Hakenberg, Jörg

AU - Bickel, Steffen

AU - Plake, Conrad

AU - Brefeld, Ulf

AU - Zahn, Hagen

AU - Faulstich, Lukas

AU - Leser, Ulf

AU - Scheffer, Tobias

PY - 2005/5/24

Y1 - 2005/5/24

N2 - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

AB - In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=33947304479&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/483dea0a-b292-3915-b21a-ee2b226bc166/

U2 - 10.1186/1471-2105-6-S1-S9

DO - 10.1186/1471-2105-6-S1-S9

M3 - Journal articles

C2 - 15960843

AN - SCOPUS:33947304479

VL - 6

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.1

M1 - S9

ER -

DOI

Zuletzt angesehen

Publikationen

  1. Towards Computer Simulations of Virtue Ethics
  2. A cascade regulator using Lyapunov's PID-PID controllers for an aggregate actuator in automotive applications
  3. Effectiveness of self-generation during learning is dependent on individual differences in need for cognition
  4. Accuracy Improvement of Vision System for Mobile Robot Navigation by Finding the Energetic Center of Laser Signal
  5. Mechanical properties and microstructures of nano SiC reinforced ZE10 composites prepared with ultrasonic vibration
  6. Playing in the Spaces: Anarchism in the Classroom
  7. Combining SMC and MTPA Using an EKF to estimate parameters and states of an interior PMSM
  8. Comparison of Backpropagation and Kalman Filter-based Training for Neural Networks
  9. Linked Accomplishment Of Order Management And Production Planning And Control. An Integrated Model-based Approach
  10. Modellieren in der Sekundarstufe
  11. Evidence for singlet state β cleavage in the photoreaction of α-(2,6-dimethoxyphenoxy)-acetophenone inferred from time-resolved CIDNP spectroscopy
  12. Methods in Writing Process Research
  13. Chip extrusion with integrated equal channel angular pressing
  14. Development of Early Spatial Perspective-Taking - Toward a Three-Level Model
  15. Horizontal, but not vertical canopy structure is related to stand functional diversity in a subtropical slope forest
  16. Landscape modification and habitat fragmentation: a synthesis
  17. Belief in free will affects causal attributions when judging others’ behavior
  18. A switching observer for sensorless control of an electromagnetic valve actuator for camless internal combustion engines
  19. Energy-aware system design for autonomous wireless sensor nodes
  20. Accuracy Improvement by Artificial Neural Networks in Technical Vision System
  21. Model-based Analysis of Reassembly Processes within the Regeneration of Complex Capital Goods
  22. Test of advanced hyperfine structure theory by precision radio-frequency and laser spectroscopy in molybdenum
  23. Strategy maps
  24. Microstructure refinement by a novel friction-based processing on Mg-Zn-Ca alloy
  25. Modality of task presentation and mathematical abilitiy in a study about spatial ability
  26. On the Hausdorff dimension of fractals given by certain expansions of real numbers