Syllabus for CS:4980:002 Spring 2015 (also listed as 22C:196:002)
Course Title: Big Data Technologies
Course Meeting Times: TTh 9:30amâ10:45am in MLH 105 starting 20 January, last day 8 May, with no meetings on 3 February, 5 February, 17 March, 19 March (subject to change during semester, will be updated on this page)
Course Instructor: Ted Herman
Office Hours: MWF 10:30-11:30 in MLH 201M or "by appointment"
3 weeks on MapReduce and related frameworks, tools and programming.
- 2 weeks on alternative parallel programming frameworks.
- 2 weeks about general parallel programming.
- 6 weeks of exploring literature, in-class student presentations, project presentations.
Helpful Background: the areas of cloud computing, big data systems and infrastructure, even data science, fuse aspects of algorithms, databases, distributed operating systems and statistical methods. It's not reasonable to expect everyone to know everything, but here are some of the most helpful background courses one might have taken. A course on operating systems, especially with some performance measurement, helps prepare the student to read papers. A course on databases will save time understanding terminology and concepts of processing large datasets. A course on algorithms and some advanced application programming will help in learning parallel programming. A course on networking could help in understanding the distributed operation of the cloud. And finally, some experience working with a unix/linux or posix command shell, remote login via SSH, Java and Python will save much time and frustration.
Graduate students. For graduate students, it is expected that students will read technical literature (recent conference papers), online documents or programming material related to cloud computing and handling large data. Students will be evaluated by (a) quizzes, (b) written reports, (c) programming homeworks, and (d) in-class presentations over papers. Graduate students will study both theoretic and practical systems aspects of Big Data Technologies.
Undergraduate students. For undergraduate students, there will be (a) quizzes, (b) written reports, and (c) programming assignments.
All students: there is no required textbook, and no final examination in this course. Grading will be by curve, using +/- grades, and the curves will be separately done for graduate and undergraduate students. For some programming assignments, services using Amazon's cloud will be required: each student is expected to get (and pay for) an individual AWS account. Undergraduate students will mainly concentrate on using cloud computing systems for parallel programming, but some local server may be provided for warm-up exercises.
Quizzes (either written in class or oral during office hours) are meant to be a check on student understanding and how well the student is keeping up with the lectures and assigned reading material. For graduate students, grades will be evaluated equally based on presentations/reports and programming assignments (quizzes effectively confirm student performance, and will essentially be a scaling factor for the final curve). For undergraduate students, programming and projects will contribute around 80% of the score, with the balance determined by quizzes, project debriefing and presentations on projects.
General Policies and Expectations
See the Syllabus Insert for standard college-wide policies. In particular, for this course, students are expected to be the original authors of work they submit or present. Hence, collaboration, obtaining solutions from outside resources, other students, experts, tutors, found programs on the internet, and other similar shortcuts are not allowed except by permission of the instructor.
University deadlines: visit the registrar site, click on calendars, then academic deadlines; then scroll by month or get a PDF by choosing Spring 2015.
Other Boilerplate Required Information
The DEO for Computer Science is Prof Segre.
The administrative home for this course is CLAS.