R has emerged as a preferred programming language in a wide range of data intensive disciplines (e.g., O'Reilly Media's 2014 Data Science Data Science Salary Survey found that R is the most popular programming language among data scientists). The goal of this course is to teach applied and theoretical aspects of R programming for data sciences. Topics will cover generic programming language concepts as they are implemented in high-level languages such as R. Course content focuses on design and implementation of R programs to meet routine and specialized data manipulation/management and analysis objectives. Attention will also be given to mastering concepts and tools necessary for implementing reproducible research.
What is R?
An open source (and freely available for Windows, Mac OS X, and Linux) environment for statistical computing and graphics
Full-featured programming language that can essentially do anything
In particular, it is a scripting language (with similarities to Matlab and Python) that allows for reproducibility and automating tasks
Why Learn R?
R is one of the highest paid IT skills
R is the most-used data science language after SQL
R is used by 70% of data miners
R is #15 of all programming languages
R is growing faster than any other data science language
R is the #1 Google search for Advanced Analytics software
R has more than 2 million users worldwide
R is used by statisticians, scientists, social scientists and has the widest statistical functionality of any software
R users add functionality via packages all the time
R can interact with other software, databases, the operating system, the web, etc.
FOR/STT 875 is delivered entirely online through the course management system D2L. It will be an active, project-based learning environment that focuses on:
History and overview of R
Install and configuration of R programming environment
Basic language elements and data structures
R+Knitr+Markdown+GitHub
Data input/output
Data storage formats
Subsetting objects
Vectorization
Control structures
Functions
Scoping Rules
Loop functions
Graphics and visualization
Grammar of data manipulation (dplyr and related tools)
Debugging/profiling
Statistical simulation
Registration
FOR/STT 875: R Programming for Data Sciences is available for undergraduate, graduate and lifelong education students. There are no prerequisite or co-requisite courses.
MSU Students
Undergraduates have two ways they can enroll:
Honors College students, contact the MSU Forestry undergraduate advisor, ForAdvis@msu.edu, for registration.
All other undergraduates, contact course instructor Dr. Andrew Finley for permission to enroll.
Graduates
MSU students can enroll through the online Schedule of Courses starting in spring 2017. You must enroll in FOR 875, however, if you would like to take the course as STT 875 instead, you can contact your advisor to have the course allocation switched to STT.
Both Lifelong undergraduates and graduates can enroll in the course. Because FOR/STT 875 is a graduate class, Lifelong undergraduates must contact the MSU Forestry undergraduate advisor, ForAdvis@msu.edu, to request an override.