- Statistics for linguistics with R: A practical introduction
This is a fortunate time to be a linguist seeking the right tools for taking your research in a more statistical and quantitative direction. The past few years have witnessed the arrival of several excellent books, not only designed as introductions to statistics or data analysis in general, but specifically tailored to the interests and needs of linguists (Baayen 2008, Gries 2009 and the volume under review, Johnson 2008). These books share a promotion of R, a multiplatform, open-source software package that functions as a programming language, a statistical environment, a graphics package, and a superb all-around tool for data processing, storage, and manipulation (www.r-project.org). Anyone interested in learning more about R, corpus linguistics, and statistical methods in linguistic research will be well served by these books and will find the two books by Stefan Th. Gries to be indispensable learning tools. G's Quantitative corpus linguistics with R (2009) and the current volume, Statistics for linguistics with R, provide excellent introductions to statistics and quantitative methods in linguistics, to the R environment, to corpus linguistics, and to the general ways of thinking and of formulating hypotheses necessary for quantitative linguistic research.
This volume is intended as an introductory textbook in using statistics in linguistic research and to the kinds of questions one asks in developing, formulating, and testing hypotheses. The book consists of five content chapters and an epilogue, and is accompanied by a website and a newsgroup along with downloadable code, exercises, data, and an answer key. Beginning with 'Some fundamentals of empirical research' (Ch. 1), the author introduces the reader to the collection and analysis of experimental and corpus data. A particularly strong point of the book is that G provides numerous models and templates for conducting the kind of research he advocates, much as he does in Gries 2009 and his article on more rigorous methods in corpus linguistics (Gries 2006).
G then covers the 'Fundamentals of R' (Ch. 2), providing the reader with guidance in setting up R and starting out with the first few functions and lines of code. This introduction to R is thorough enough to give the reader confidence and comfort with the R environment and managing data structures. R is freely available for Windows, Mac, and Linux platforms, but the discussion here is somewhat biased toward Windows users. Although there is a significant learning curve in becoming a proficient user of R, G's and Baayen's books are more than sufficient to teach you what you need to know in order to do serious linguistic research with R. As a programming language, R is so useful and powerful that one might even forego learning other programming languages such as Python, at least for a time. G is extremely encouraging throughout. His book 'aims to help you do scientific quantitative research' (2), and it succeeds in reaching this goal. Along the way, G helps readers deal with frustration (54) and keeps up their morale with encouraging comments.
As an introduction to using R itself, Gries 2009 is more comprehensive, but the current book has the task of introducing the reader to statistical techniques and how to employ them. In Chs. 3-5, G begins a whirlwind tour of statistical tests and methods that linguists will find useful. Most statistical tests are carried out on a variety of actual linguistic datasets included with the book at the companion website, making the discussion and plots that much more relevant and tailored to the needs of linguists. This approach to thinking about how the various methods might be used in your own research, rather than merely absorbing material about statistics from a general introductory text, appreciably increases the value of the book. G presents a few equations throughout the book, but the level of math required is no higher than what one would commonly know from high school and basic college math courses. G shows...