See R Tutorials in Data Science created by professors Shonda Kuiper (Grinnell), Laura Chihara (Carleton), and Adam Loy (Lawrence)
The Stat2Labs website, created by Shonda Kuiper, is a repository of project-based curricular materials for teaching data science.
Introduction to Project Proposal
As national interest in learning from “big data” is continuing to grow, there remains a gap in data science education. The Committee on Applied and Theoretical Statistics (CATS) stated that, “domain scientists do not know what is possible to do with their data, and technologists do not know the domain, so there is an expertise gap.” Though there is no clear definition of data science, it is typically defined by the intersection of Mathematics, Statistics, Computer Science and domain knowledge.
Many academic institutions have responded with the development of data science or analytics programs. This typically involves the creation of a new department and hiring several new faculty. We propose a more conservative approach. By working collaboratively across disciplines and across institutions, we will develop a series of interdisciplinary case studies that can be used in a variety of blended as well as brick and mortar ACM courses. We believe these activities could provide a new pedagogical model for teaching students from all disciplines how to make data-based decisions with relevant, real-world data.
Note: Content has been adapted from project proposal.
Making decisions with data is becoming an essential skill in almost any area of study. Data are now easily accessible, and numerous software programs are available to conduct almost any data manipulation that a researcher can imagine. To address these changes in our data-rich society, the new Guidelines for Undergraduate Programs in Statistical Science (ASA 2014) are advocating for fundamentally different pedagogies with an increased emphasis on data science. A 2011 report by McKinsey & Company states, “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the knowhow to use the analysis of big data to make effective decisions.”
Data science draws heavily on skills residing at the intersection of mathematics, statistics, and computer science (MSCS). Mou (2013) used this Venn diagram to emphasize that the disciplines needed in data science include “dramatically different skills, education, and training with very little overlap.”
The emergence of big data has also greatly impacted the area of statistics education. The 2014 American Statistical Association Guidelines for Undergraduate Programs in Statistical Science (2014 ASA Guidelines) are advocating for fundamentally different pedagogies with an increased importance of data science. We propose to create collaborative course materials in data science that provide new pedagogical models that could dramatically transform how students think about data-based decision making.
Courses that address data science and that are grounded within research questions from multiple disciplines are needed. At Carleton, Lawrence. and Grinnell, there are active discussions of the development of a data science course. While there are strong exemplars at the graduate level, there are few materials available at the undergraduate level.
Statistics programs (if they exist) are at various stages of development at most liberal arts colleges. For example, Grinnell offers a statistics concentration, Carleton has a statistics track within the mathematics major, and Lawrence offers some statistics courses (within the mathematics department) but does not offer a statistics minor or major. Thus, how data science is introduced into any curriculum will vary from school to school. Since the ultimate goal is to have materials that can be used at schools across the country, a collaborative approach to designing these case studies will ensure that these materials will be portable and adaptable to a wide-variety of settings.
Goals
This proposed work would dramatically improve our colleges’ ability to address the 2014 ASA Guidelines by developing new materials that address modern data analysis and data science. More specifically, the guidelines state that students interested in a career in data analysis should have:
Facility with programing languages (such as R or Python) and database systems,
The ability to access and manipulate data in various ways,
The ability to perform algorithmic problem-solving,
A clear understanding of ethical standards,
Experience working with complex data, and
The ability to communicate complex statistical methods in basic terms to diverse audiences as well as visualize results in an accessible manner.
As the need for people with strong analytical skills continues to grow, it is imperative that we consider new ways to train students to make effective decisions with large, real-world data sets.
Our primary goal is to work collaboratively to develop, implement, and evaluate multiple activities that incorporate data science and modern statistical methods. We will be writing these case studies for both introductory level statistics courses as well as more advanced data science or statistics courses. As such, we will need to be attentive to the exact framing and/or wording of these activities to ensure their accessibility to a diverse group of students.
Simply teaching students how to conduct statistical analysis for carefully vetted and cleaned textbook datasets does not provide the statistical thinking needed for students to properly make decisions with today’s data. Thus, there is a need for educational materials that bridge the gap from smaller, focused textbook problems to real-world problems. In order to bridge this gap, we will use these new activities that incorporate core statistical issues such as working with messy data, advanced data visualizations, and evaluating data relevance and reliability.
Activities
We will create six to nine case studies that will:
Address the 2014 ASA Guidelines,
Be accessible online (usable in either a online, blended, or brick and mortar course),
Provide strong connections to primary research,
Provide instructor resources to make it easy for instructors to incorporate the materials into a variety of courses,
Explore ways to support the increasing need to expose students to real data that is “messy” or “big,” and
Address the ethical and societal issues that have arisen due to big data.
A primary advantage of working collaboratively to create multiple case studies is that they will be highly adaptable. At all three locations, these materials will be incorporated into current statistics courses, with the expectation that they may also be combined to form a new data science course. At all three colleges, we are discussing plans to create a new course that focuses on data science.
Several ACM schools are currently struggling with over-enrollments in statistics and computer science courses, with limited ability to develop new courses in data science. By creating materials that can be easily incorporated into existing undergraduate courses, we will enable ACM faculty to provide our students with relevant activities that use modern data analysis techniques.
Proposed Timeline
June 1, 2016 – August 1, 2016: Initial development of each case study
Bi-weekly conference calls
Development of a website to publically host materials
Review and revision of each activity as they are shared among team members
Development of instructor resources corresponding to each activity
August 1, 2016-January 1, 2017: Class testing and assessment
Materials will be class tested at all three institutions
Students and faculty will be asked to provide feedback, students will be asked to submit an anonymous online survey
Survey data will be we aggregated and analyzed
January 1, 2017 – August 30, 2017: Revision and Dissemination
Materials will modified according to feedback from previous semester
Additional class testing may take place at some institutions, depending upon what courses are taught
Materials will be publically posted and statistical educators from other institutions will be invited to class test and evaluate the materials
Dissemination
Dissemination Strategies
After class testing these materials at multiple ACM locations, we plan to post these materials onto a publically shared website. Dr. Kuiper maintains the Stat2Labs Website (web.grinnell.edu/individuals/kuipers/stat2labs/), a website exclusively designed for instructors that has had over 20,000 visits since July 2013. It was awarded the MERLOT Classics Award in Statistics in 2012 by the Multimedia Educational Resource for Learning and Online Teaching (MERLOT) Statistics Editorial Board. This award is given once a year for the best peer-reviewed online resources designed to enhance teaching and learning. The results from this project will be linked to the Stat2Labs website in order to further dissemination efforts.
Dissemination of this material will include at least one presentation at a major statistics conference such as United States Conference on Teaching Statistics (USCOTS), International Conference on Teaching Statistics (ICOTS), or the National Council of Teachers of Mathematics (NCTM). This work may also result in a publication in a journal such as Journal of Statistics Education, Statistics Education Research Journal, Statistics Teachers Network, or STEM Education Research Journal. Finally, this could lead to a NSF grant proposal for a blended data analysis and visualization course at the introductory or intermediate level.
Resources & Materials
Kuiper, S., Ismay, C., Lamb, M., Young, A., Cross, G., and Miller, B. (2015) Associated Colleges of the Midwest Faculty Career Enhancement (ACM FaCE) grant. Harnessing Big Data: Planning for collaborative courses in data science.
CATS. 2014. Training Students to Extract Value from Big Data: Summary of a Workshop (2014), prepublication. Washington, D.C.: The National Academies Press.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A. H. 2011. “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global Institute (http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_in novation; accessed August 10, 2015).
Mout, M. 2013. What is Wrong with the Definition of Data Science. KDnuggets Blog. (http://www.kdnuggets.com/2013/12/what-is-wrong-with-definition-data-science.html; accessed August 20, 2015).