The Lansing Area R Users Group (laRUG) brings together R users in the Lansing area for discussions on data science (analysis and predictive modeling), big data, and all things R. This group provides a relaxed environment to exchange ideas and discuss R. Whether you are new to R (and programming), a statistician, or an advanced user, we are the group for you.
View My GitHub Profile
May 2017 Notes on Text mining and Data Science
The May 2017 meetup focused on R packages for text mining and the types of analysis possible.
tidytext packages were discussed. Throughout the discussion,
tidytext was seen as the cleanest way to work with
RTextTools package is another package to perform text mining and sentiment analysis.
tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools CRAN link
tm: Text Mining Package CRAN link
RTextTools: Automatic Text Classification via Supervised Learning no longer developed CRAN link
tidytext tutorials are an excellent resource for learning the procedure and protocol for cleaning and analyzing text. The following vignettes and tutorials were the basis for our discussion.
- Introduction to
- Tidy Term Frequency and Inverse Document Frequency (tf-idf) link
- Converting to and from Document-Term Matrices and Corpus objects link
- Tidy Topic Modeling link
- UC Business Analytics R Programming Guide
- Tidying Text & Word Frequency link
- Sentiment Analysis link
- Term vs. Document Frequency link
- Word Relationships link
- Converting Between Tidy and Non-tidy Formats link