R Programming for Data Sciences

FOR/STT 875

R has emerged as a preferred programming language in a wide range of data intensive disciplines (e.g., O'Reilly Media's 2014 Data Science Data Science Salary Survey found that R is the most popular programming language among data scientists). The goal of this course is to teach applied and theoretical aspects of R programming for data sciences. Topics will cover generic programming language concepts as they are implemented in high-level languages such as R. Course content focuses on design and implementation of R programs to meet routine and specialized data manipulation/management and analysis objectives. Attention will also be given to mastering concepts and tools necessary for implementing reproducible research.

What is R?

  • An open source (and freely available for Windows, Mac OS X, and Linux) environment for statistical computing and graphics
  • Full-featured programming language that can essentially do anything
    • In particular, it is a scripting language (with similarities to Matlab and Python) that allows for reproducibility and automating tasks

Why Learn R?

  • R is one of the highest paid IT skills
  • R is the most-used data science language after SQL
  • R is used by 70% of data miners
  • R is #15 of all programming languages
  • R is growing faster than any other data science language
  • R is the #1 Google search for Advanced Analytics software
  • R has more than 2 million users worldwide
  • R is used by statisticians, scientists, social scientists and has the widest statistical functionality of any software
  • R users add functionality via packages all the time
  • R can interact with other software, databases, the operating system, the web, etc.

Tentative Syllabus

View the proposed syllabus for FOR/STT 875.

Course Structure

FOR/STT 875 is delivered entirely online through the course management system D2L. It will be an active, project-based learning environment that focuses on:

  • History and overview of R
  • Install and configuration of R programming environment
  • Basic language elements and data structures
  • R+Knitr+Markdown+GitHub
  • Data input/output
  • Data storage formats
  • Subsetting objects
  • Vectorization
  • Control structures
  • Functions
  • Scoping Rules
  • Loop functions
  • Graphics and visualization
  • Grammar of data manipulation (dplyr and related tools)
  • Debugging/profiling
  • Statistical simulation

 

Registration

FOR/STT 875: R Programming for Data Sciences is available for undergraduate, graduate and lifelong education students. There are no prerequisite or co-requisite courses.  

MSU Students

  • Undergraduates have two ways they can enroll:
    • Honors College students, contact the MSU Forestry undergraduate advisor, ForAdvis@msu.edu, for registration.
    • All other undergraduates, contact course instructor Dr. Andrew Finley for permission to enroll.
  • Graduates
    • MSU students can enroll through the online Schedule of Courses starting in spring 2017. You must enroll in FOR 875, however, if you would like to take the course as STT 875 instead, you can contact your advisor to have the course allocation switched to STT. 

Non-MSU Students

If you are not a MSU student and would like to take FOR/STT 875, you can apply to MSU as a Lifelong Education student. This is a free, one-page application and does not require essays or additional materials. Learn more about MSU Lifelong Education at the Office of the Registrar Lifelong Education page.

  • Both Lifelong undergraduates and graduates can enroll in the course. Because FOR/STT 875 is a graduate class, Lifelong undergraduates must contact the MSU Forestry undergraduate advisor, ForAdvis@msu.edu, to request an override. 

Contact

Andrew Finley

Andrew Finley

Professor, Forest Management and Modeling
finleya@msu.edu
517-432-7219