Wednesdays 7:00 p.m. to 9:40 p.m.
Dr. Ugo Etudo
Modality: RONA
Course Description and Objectives
This course offers an in-depth hands-on exploration of various cutting-edge information technologies for analyzing data that is too large for sequential processing. The course will introduce you to the Hadoop environment, its architecture and the MapReduce software library and its evolutions. The course will be taught using computing resources provisioned by either Google Cloud Platform or Amazon Web Services, exposing you to key features of these ubiquitous cloud platforms. The first portion of the course will focus on the importation of data from heterogeneous sources into big data platforms (extract-transform-load or ETL) using high-level scripting languages. The second half of the course will focus on using big data analytics tools for data mining and text analytics. We will focus on well developed and supported libraries for machine learning. It goes without saying that you should be prepared to write code. We will use both Scala and Java programming languages. While the course does not formally require experience in these languages specifically, some programming experience is a must. You will also be exposed to the Unix shell and should be prepared to perform most activities from the command line.
Learning Objectives
Upon completion of this course, students should be able to:
- Understand the Hadoop environment, its history, structure, and the many applications that run natively on it.
- Write simple MapReduce programs in Java and execute those programs in Hadoop.
- Understand the Pig data-flow scripting language (Pig Latin) to explore, transform and store data for higher level analytics.
- Understand the Hive environment and HiveQL. Students will be expected to write HQL scripts.
- Prepare data for and execute several popular Data Mining and Machine Learning algorithms.
Course prerequisites
An undergraduate course in database systems (e.g. INFO 364) and programming (e.g. INFO 350) or any of INFO 610, INFO 630, SCMA 645 or SCMA 648.