Statistical Methods for Identifying Local Dialectal Terms from GPS-Tagged Documents

Paul Cook; Bo Han; Timothy Baldwin

doi:10.1353/dic.2014.0020

Dictionaries: Journal of the Dictionary Society of North America

Statistical Methods for Identifying Local Dialectal Terms from GPS-Tagged Documents
Paul Cook , Bo Han , Timothy Baldwin
Dictionaries: Journal of the Dictionary Society of North America
Dictionary Society of North America
Number 35, 2014
pp. 248-271
10.1353/dic.2014.0020
Article
- View Citation
- Related Content
Additional Information

Purchase/rental options available:
- Rent from DeepDyve

Abstract

Corpora of documents whose metadata includes GPS coordinates have recently become widely available through online social media such as Twitter. This has created opportunities for statistical corpus methods that describe the geographical spread of words, but such techniques do not appear to be widely used in corpus linguistics and lexicography. This paper presents several methods for describing the spread of a set of points, corresponding to documents containing a given word and applies the methods to a corpus of GPS-tagged tweets from Twitter. In experiments on known regionalisms, we show that these methods could be used to help identify such expressions. We analyze the words in the corpus identified as having the most geographically restricted usage and identify some expressions that appear to be previously undocumented regionalisms with highly localized usage.

collapse

You are not currently authenticated.

If you would like to authenticate using a different subscribed institution or have your own login and password to Project MUSE

Authenticate

Purchase/rental options available:
- Rent from DeepDyve

Dictionaries: Journal of the Dictionary Society of North America

Statistical Methods for Identifying Local Dialectal Terms from GPS-Tagged Documents

Share

Additional Information

Project MUSE Mission