In lieu of an abstract, here is a brief excerpt of the content:

Notes Letter to the Editor (Nature 361, January 14, 1993, p. 121) Sir — What is the longest word spelled out in the sequence of a protein in the protein sequence database using the one-letter code for amino acids? None of the extensive literature devoted to this problem has taken a truly systematic approach; the longest word found to date contains only seven letters. We have matched the entire Oxford English Dictionary (second edition, 20 volumes, 572,728,830 characters, with information content close to that of the human genome) against the entire SwisProt protein sequence database (version 23). Using the Patricia tree data structure, the matching consumed only 23 minutes of computational time. We found two words with nine characters: "hidalgism" (the manner or practice of a Hidalgo) entered the English dictionary via a citation from 1887, and appears at positions 247-355 of the intégrase of bacteriophage lambda (acquisition number P03700). "Ensilists " (the plural of ensilist, one who preserves his crops by ensilage) entered the dictionary via a citation from 1883, and appears at positions 81-89 of the PRRB protein from Escherichia coli (acquisition number P17222). In addition to being the longest strings appearing simultaneously in the English and protein languages, these are candidates for the most unusable pieces of information simultaneously in lexicography and biochemistry. Their discovery does, however, demonstrate the power of these data structures in handling large amounts of information. More Protein Talk (Nature 361, February 25, 1993, p. 694) Sir — Gönnet and Bonner have searched for the longest English word in the protein sequence databank. But what of other languages? Given that the ownership of the longest peptide word will undoubtedly become a source of intense national pride, I thought it wise to investigate. I performed a similar analysis to Gönnet and Bonner using a standard hashing algorithm to search the SwissProt databank with a multilingual word list of 1.3 million words from Danish, Dutch, English , Finnish, French, German, Italian, Norwegian, Spanish, Swedish, and Esperanto. . . . Apart from English, four of the others provided nine-letter words: ansvarlig (a Danish word meaning 'liable') in entry HX—YEAST at position 85; haletante (French for 'breathless') in KlCo—XENLA at 145; saltsilda (Norwegian for 'salted herring') in PAI1—BOVIN at 271, and stillassi (perfect subjunctive of Italian stil- Notes215 lare—'to drip') in STE2—YEAST at 207. Although I did not find any other English words of nine letters or more . . . , the search did turn up a ten-letter Italian word, annidavate, . . . the past imperfect tense of annidare 'to rest'. . . . The race is now on for the next longest word. How long will we have to wait before Germany scoops the honours with die possible 27-letter peptide word for 'social sciences' Gesellschaftswissenschaften7 More Word Play (Nature, 362, April 15, 1993. p. 595) Sir — In the light of recent searches for words from languages in protein sequences transcribed by the single-letter amino acid code, ... we report our findings from the circumsporozoite protein of the malaria parasite Plasmodium falciparum. This protein is expressed at a high level on die surface of the infective sporozoite stage of the parasite and elicits a strong but relatively ineffective T-cell-independent immune response on the part of the host. Thus it has been argued that the function of the circumsporozoite protein is precisely to elicit such an immune attack . Appropriately, we find in die carboxy-terminal region of the molecule the English phrase 'KICKME'. . . . ...


Additional Information

Print ISSN
pp. 214-215
Launched on MUSE
Open Access
Back To Top

This website uses cookies to ensure you get the best experience on our website. Without cookies your experience may not be seamless.