What you will learn
Students completing this course will be skilled in the following areas: Data Analysis, Hypothesis Testing, Data Visualization, Metric Development, Process Control, Machine Learning, Modeling, and Optimization. Students will learn to do these analyses using Python and R.
This is an instructor led or instructor supported training course that targets the needs of individuals who want to start a career in data analysis and data science. It prepares students for job opportunities in various industries, including manufacturing, finance, insurance, health care, and retail.
After completing this course, students will be able to:
- Mine datasets for better understanding
- Create metrics, and implement monitoring plans
- Create models for prediction and planning
- Implement Machine Learning algorithms
- Use regression analysis to explain relationships
- Create visualizations
- Test various hypotheses in a designed experiment
- Prepare and deliver findings reports to all audiences
Week 1 | Basic Statistics:
Students will learn the fundamentals needed to be successful throughout the rest of the program. Topics covered here are probability, Bayes Theorem, variable types, descriptive statistics, common distributions, and statistical inference.
Week 2 | Programming Foundations in Python:
Students will learn the fundamentals of programming using the Python language. Topics covered here are algorithms, Boolean logic, data types, data structures, object oriented programming, best practices, and debugging.
Week 3 | Databases:
Students will learn the fundamentals of organizing and extracting data using SQL and noSQL databases.
Week 4 | Statistical Programming in R:
Students will learn the fundamentals of using the statistical software package R, including loading data, accessing libraries to utilize functions, and data manipulation. R will be used throughout the course to conduct analyses.
Week 5 | Metrics and Data Processing:
Students will learn the fundamentals of creating and monitoring metrics, and will be exposed to the common practices in contemporary business settings. The principles of statistical process control will be taught and practiced. Other methods of monitoring data, such as cusum charts and moving average charts will also be taught and practiced.
Week 6 | Data Wrangling and Visualization Week 5 Metrics and Data processing:
Students will learn the fundamentals of manipulating data to facilitate analysis. In addition, several common tools for visualization will be taught and practiced. Supporting metrics and measures that accompany the visualizations will be used.
Week 7 | Intermediate Statistics:
Students will learn to use hypothesis testing as part of the scientific method, and will learn and practice various basic scenarios for hypothesis testing, including one sample z- and t- tests, two sample tests (paired and unpaired), analysis of variance, one- and two-proportion tests, and the Chi-square test for independence.
Week 8 | Machine Learning and Modeling:
Students will learn the fundamentals and practices for several machine learning techniques, including clustering, decision trees, random forests, Bayesian networks, etc. and will understand the difference between supervised and non-supervised systems. In addition to machine learning, students will learn useful modeling techniques, including linear regression, non-linear regression, logistic regression, and step-wise regression.
Week 9 | Intro to Big Data:
Students will learn the fundamentals and history of big data, and will practice with exercises in distributed computing. Other popular big data tools will be introduced.
Week 10-12 | Group Project:
Students will learn to complete a thorough data mining, analysis and modeling exercise in a group setting.