-
Index
- The MIT Press
- Chapter
- Additional Information
abbreviations, 12, 15, 19, 55, 69 ABNER system, 73, 76 abstract concepts, 44 abstracts, 3, 5–6, 10–13, 18, 31, 37, 53, 70–71, 88, 91, 94–96, 103–104, 108 accuracy, 24–25, 59, 72, 76, 82–84, 104 active learning, 90 ad hoc information retrieval, 19–20, 33, 35, 94 adjacent labels, 67 adjectives, 14–16, 58–59 adverbial modifiers, 23 adverse drug reactions, 3, 83, 110 algebraic, 42, 44–45 alignment in HMMs, 65 alphabetical characters, 20 alphanumeric characters, 62, 66 Alzheimer’s disease, 103 ambiguity, 7, 12, 15, 17–18, 24, 32, 55, 68–69, 104, 108 AMENDA database, 103–104 amino acids, 11, 62 anatomic terms, 27 annotation, 14, 29, 60, 72, 79, 88, 96, 109, 113 annotators, 80, 88 appositives, 17 Arabidopsis, 103 arguments, in relation-extraction rules, 72–73, 75 Arrowsmith system, 107 -ase, as a suffix, 14, 62 assays, 112, 114 assembly, 16 assessment, 94. See also empirical evaluation Association for Computing Machinery, 92 attributes in an ontology, 28. See also features authors, 18, 25 automatically extracted information, 31, 103 Index background knowledge, 2, 31 base forms, 21 base noun phrase, 23 Bayes. See naïve Bayes Begin/Internal/Other (B/I/O) representation, 60 benchmark, 112. See also evaluation Bernoulli distributions, 43, 50 bigrams, 21–22, 36. See also n-grams binary relations, 69, 74 binary vectors, 49 binary weighting, 39 BIND database, 35 binding, 17–18, 29, 72, 75 bioassays, 112 BioCreative, 92, 94–97 , 113 bio-entities, 77 , 99, 101 BioGRID database, 35 biological process, 29–30 biomedical connections, 105 biomedical informatics, 2 biomedical knowledge, 10, 26 BioNLP Shared Tasks, 74–75, 92, 97 BLAST. See PSI-BLAST system blood pressure, 16, 36, 38 Boolean combination of terms, 33 Boolean queries, 34–38, 52 Boolean search, 34, 104 Boundaries. See sentence boundaries; token boundaries BRCA, 5–8, 34–35, 55 breast cancer, 5, 55 BRENDA database, 103–104 browsers, 99–100 cancer, 5, 25, 34, 55 canonical dictionaries, 57–58 canonical forms, 57 , 95 canonical identifiers, 67–68 capitalization, 12, 20, 57 , 66 captions, 11, 18–19. See also figures; images case normalization, 20 132 Index catalysis, 13, 16–17, 21–22, 29–30, 54 categorization. See text categorization categorization status value, 46 category labels, 45 causes, in events, 75–76 C. elegans, 103 cellular components, 11, 29, 73, 104, 111. See also compartments in a cell; subcellular localization CHAR database, 105 characterization of genes and proteins, 105, 109–110 characters, 12, 19–20, 22, 59, 62, 71 chemicals, 100–101 chunking, 23–26 citations, 1, 10, 26 classes, 15, 20, 28, 45–47, 50–52, 56, 61–62, 83–84, 88–89 classification, 34, 45–47, 49, 60, 62–63, 75, 80, 84, 90–91, 97, 107 classifiers, 45–46, 49–52, 66, 71, 90–91, 97 , 105, 112 clinical notes, 114 clinicians, 3, 5 clustering, 34, 46, 90, 107–109, 112 combination of singular values, 44 combinations of adjacent labels, 67 combinations of terms, 20, 33, 38, 44 combinations of words, 21 commas, 19 common reference corpora, 91 communities, 2 community-wide evaluations, 8, 74 compartments in a cell, 11, 53, 69. See also cellular components; subcellular localization completeness, 9, 56, 83 complexes, 35, 71 Comprehensive Microbial Resource, 103 conditional independence, 47–48, 52 conditional probabilities, 48, 51, 60, 64, 66 conditional random fields (CRFs), 63, 66–67 confusion matrix, 84 CoNLL-10 shared task, 92 consistency of data, 80, 89, 96 context, for resolving gene and protein names, 5, 55, 57, 63, 68, 71 context of entities and terms, 21, 23, 44, 101–102 controlled evaluation, 79. See also evaluation controlled vocabularies, 10, 26, 28–29, 31, 37 co-occurrence approach, 70 co-occurrences, 70, 101, 104, 106 co-references, 17 corpora, 14, 23, 33, 53, 60, 70, 78, 83, 89, 91–94, 97–98 cosine-based similarity, 41–42, 68, 111–112 critical assessment of information extraction in biology. See BioCreative cross validation, 89–90 curation. See database curation curation bottleneck, 9 curators. See database curators cytosol, 13, 16, 21–22, 54, 74 database curation, 8, 35, 49, 52, 67 , 77 , 92–95, 102–105, 108, 110, 112–113 database curators, 4, 9, 35, 37 , 48, 52, 54, 80, 90–93, 102–103, 105, 110–111, 113 Database of Interacting Proteins (DIP), 35 databases, 4, 8–11, 19, 35, 38, 48–50, 52–53, 67–68, 80, 94, 100, 103–105, 111–112 database schemas, 53 data instances, 51, 81, 83, 90 data integration...