In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Data science : fondamentaux et études de cas, Machine learning avec Python et R by Eric Biernat and Michel Lutz
  • Elisabeth Morand
Biernat Eric and Lutz Michel, 2017, Data science : fondamentaux et études de cas, Machine learning avec Python et R [Data science: fundamentals and case studies, machine learning with Python and R], Paris, Eyrolles, 296 p.

This book is for anyone wishing to discover data science. As the subtitle indicates, it includes an introduction to machine learning(1) through practice. The book is composed of three parts, the last of which is devoted exclusively to practical case studies – an effective teaching tool presenting the practical aspects of machine learning algorithms – while the first two parts establish the fundamental elements.

The book is composed of relatively short, titled information sheets or sections, making it relatively easy to read around on the different concepts. The first sheets define terms – a useful service even for experienced researchers. The sheet on logistic regression, for example, offers a different perspective than those currently presented in statistics or social science methods manuals.

The book teaches methods to beginners but is also useful in substantiating ongoing analyses. Its three parts correspond chronologically to the three major phases to be followed in producing an analysis: problem definition, method choice and implementation.

In Part I, which is quite short, the authors specify the types of problems the book is designed to solve as well as the tools needed to do so. This part contains a chapter on programs that also works to clarify the book’s coverage.

Part II presents machine learning analysis methods over 14 chapters. It follows the usual order of books introducing a complex topic, separately presenting standard analytic methods and more contemporary ones. The latter include random forests, gradient boosting, and the support vector machine (SVM), all of which may raise questions for novices. The section on random forests offers a wonderfully clear presentation that covers all the essentials in no more than ten pages.

Part III is the most important, though it relies heavily on the other two, especially Part I. It describes in detail a substantial number of practical cases chosen by the authors from the Kaggle web platform, and so gives readers direct access to the data sets and programs. The methods presented in Part II are supplemented here with practical methods, including a presentation of online learning (p. 219), which can be used if the data set is large enough.

The authors also point out the strengths and weaknesses of the two components of data science, namely statistics and information technology, explaining how the former help apprehend developments in the latter. They rightly favor wider diffusion of visualization methods – a point on which there can be no disagreement. But it would have been useful to specify how the methods complement each other and the different types of assistance they can provide. It is also regrettable [End Page 386] that French scientific language was not used. English terms are everywhere despite the existence of French equivalents.

The far-ranging bibliography, emphasizing recent books and articles but including classic, fundamental studies, is another of the book’s strengths. Relevant references are conveniently listed directly after each information sheet or section. Overall, the book is an excellent toolbox for newcomers to machine learning and is also quite pleasant to read. Readers will benefit from the authors’ long professional experience as both teachers and users of the methods they present. [End Page 387]

Footnotes

1. Machine or statistical learning refers to the development, design, application and analysis of methods that enable machines (in the broad sense) to process data systematically and so to use classic algorithms to carry out difficult tasks or resolve problems.

...

pdf

Share