Natural Language Processing, or NLP for short, is the study of computational methods for working with speech and text data.
The field is dominated by the statistical paradigm and machine learning methods are used for developing predictive models.
In this post, you will discover the top books that you can read to get started with natural language processing.
After reading this post, you will know:
- The top books for practical natural language processing.
- The top textbooks for the theoretical foundations of natural language processing.
- The NLP books I have on my shelf.
Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Top Practical Books on Natural Language Processing
As practitioners, we do not always have to grab for a textbook when getting started on a new topic.
Code examples in the book are in the Python programming language.
Although there are fewer practical books on NLP than textbooks, I have tried to pick the top 3 books that will help you get started and bring NLP method to your machine learning project.
1. Natural Language Processing with Python
Written by Steven Bird, Ewan Klein and Edward Loper.
This book provides an introduction to NLP using the Python stack for practitioners.
The book focuses on using the NLTK Python library, which is very popular for common NLP tasks.
Contents include:
- Language Processing and Python
- Accessing Text Corpora and Lexical Resources
- Processing Raw Text
- Writing Structured Programs
- Categorizing and Tagging Words
- Learning to Classify Text
- Extracting Information from Text
- Analyzing Sentence Structure
- Building Feature-Based GRammars
- Analyzing the Meaning of Sentences
- Managing Linguistic Data
This book is perfect if you are looking at getting into classical NLP using the go-to NLTK platform.
Resources
- Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
- Natural Language Processing with Python (free version)
2. Taming Text
This book provides an introduction to a suite of different NLP tools and problems, such as Apache Solr, Apache OpenNLP, and Apache Mahout.
Code examples are in Java.
It may be more suited to developers getting started with larger enterprise-grade NLP tools on work projects.
Written by Grant Ingersoll, Thomas Morton and Drew Farris.
Notably, Grant Ingersoll is a cofounder of the Apache Mahout project.
Contents include:
- Getting Started Taming Text
- Foundations of Taming Text
- Searching
- Fuzzy String Matching
- Identifying People, Places and Things
- Clustering Text
- Classification, Categorization and Tagging
- Building an Example Question Answering System
- Untaming Text: Exploring the Next Frontier
Resources
- Taming Text: How to Find, Organize, and Manipulate It
- Book homepage
- Book GitHub Repository (code and data)
3. Text Mining with R
Written by Julia Silge and David Robinson.
This book demonstrates statistical natural language processing methods on a range of modern applications.
Code examples are in R.
Code focuses on the “tidy” principles by Hadley Wickham (paper) and the tidytext package by the authors.
Of the three books, this is the most recently published and has a more practical and modern feel to the demonstrations.
Contents include:
- The Tidy Text Format
- Sentiment Analysis with Tidy Data
- Analyzing word and Document Frequency: tf-idf
- Relationships Between Words: N-grams and Correlations
- Converting to and from Nontidy Formats
- Topic Modeling
- Case Study: Comparing Twitter Archives
- Case Study: Mining NASA Metadata
- Case Study: Analyzing Usenet Text
Resources
- Text Mining with R: A Tidy Approach
- Book Homepage (and book for free)
- Book GitHub Repository (code and data)
Do you know of other great practical books on natural language processing?
Let me know in the comments.
Top Textbooks on Natural Language Processing
There are a ton of textbooks on natural language processing and on specific sub-topics.
In this section, I have tried to focus on what I (and consensus) seems to see as the best books on the topic for beginners, e.g. undergraduate or graduate students and practitioners looking to step deeper into the theory.
I have tried to pick a mix of general NLP books as well as books on highly studied topics like translation and speech.
The first two books in this section are essentially cannon for NLP students.
1. Foundations of Statistical Natural Language Processing
Written by Christopher Manning and Hinrich Schütze.
Notably, Christopher Manning teaches NLP at Stanford and is behind the CS224n: Natural Language Processing with Deep Learning course.
This book provides an introduction to statistical methods for natural language processing covering both the required linguistics and the newer (at the time, circa 1999) statistical methods.
This book provides a strong foundation to better grasp the newer methods and encodings.
Contents include:
- Introduction
- Mathematical Foundations
- Linguistic Essentials
- Corpus-Based Work
- Collocations
- Statistical Inference: n-gram Models over Sparse Data
- Word Sense Disambiguation
- Lexical Acquisition
- Markov Models
- Part-of-Speech Tagging
- Probabilistic Context Free Grammars
- Probabilistic Parsing
- Statistical Alignment and Machine Translation
- Clustering
- Topics in Information Retrieval
- Text Categorization
Resources
2. Speech and Language Processing
Written by Daniel Jurafsky and James Martin.
This book provides coverage of NLP from both speech and text perspectives with a strong focus on applications (one in each chapter).
Coverage of the topic feels exhaustive.
Contents include:
- Introduction
- Regular Expressions and Automata
- Words and Transducers
- N-grams
- Part-of-Speech Tagging
- Hidden Markov and Maximum Entropy Models
- Phonetics
- Speech Synthesis
- Automatic Speech Recognition
- Speech Recognition: Advanced Topics
- Computational Phonology
- Formal Grammars of English
- Syntactic Parsing
- Statistical Parsing
- Features and Unification
- Language and Complexity
- The Representation of Meaning
- Computational Semantics
- Lexical Semantics
- Computational Lexical Semantics
- Computational Discourse
- Information Extraction
- Question Answering and Summarization
- Dialog and Conversational Agents
- Machine Translation
Resources
4. Statistical Machine Translation
Written by Philipp Koehn.
This book provides an introduction to the topic of statistical machine translation, a s subfield of NLP.
Contents include:
- Introduction
- Words, Sentences, Corpa
- Probability Theory
- Word-Based Models
- Phrase-Based Models
- Decoding
- Language Models
- Evaluation
- Discriminative Training
- Integrating Linguistic Information
- Tree-Based Methods
Resources
5. Statistical Methods for Speech Recognition
Written by Frederick Jelinek.
This book provides an introduction to the topic of statistical speech recognition, another subfield of NLP that saw an overhaul in the 1990s with statistical approaches.
Contents Include
- The Speech Recognition Problem
- Hidden Markov Models
- The Acoustic Model
- Basic Language Modeling
- The Viterbi Search
- Hypothesis Search on a Tree and the Fast Match
- Elements of Information Theory
- The Complexity of Tasks – The Quality of Language Models
- The Expectation-Maximization Algorithm and Its Consequences
- Decision Trees and Tree Language Models
- Phonetics from Orthography: Spelling-to-Base Form Mappings
- Triphones and Allophones
- Maximum Entropy Probability Estimation and Language Models
- Tree Applications of Maximum Entropy Estimation to Language Modeling
- Estimation of Probabilities from Counts and the Back-Off Method
Resources
NLP Books that I Own
I like to have a mixture of practical and reference texts on my shelf.
The hard part of NLP (for me) is simply the large number of sub-problems and the specialized terminology and theory used.
For this reason I have the following 3 NLP textbooks on my shelf:
- Natural Language Processing with Python
- Foundations of Statistical Natural Language Processing
- Neural Network Methods in Natural Language Processing
I also really like the look of:
I recommend choosing the NLP books that are right for you and your needs or project.
Let me know which books you chose or own.
Leave a comment below.
Further Reading
This section provides more resources on the topic if you are looking go deeper.
Top NLP Books
- Best Sellers in Natural Language Processing on Amazon
- Popular Natural Language Processing Books on GoodReads
Quora
- What are some books that people interested in NLP must read?
- What are the best books on NLP?
- What is the best Natural Language Processing textbook(s)?
- Best NLP books
Summary
In this post, you discovered the top books on natural language processing.
Specifically, you learned:
- The top books for practical natural language processing.
- The top textbooks for the theoretical foundations of natural language processing
- The NLP books I have on my shelf.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
No comments:
Post a Comment