Syllabus for CS:5630 Spring 2017

Course Title: Cloud Computing Technologies

Lectures: 11:00A - 12:15P TTh C121 PBB

Instructor: Ted Herman (ted-herman@uiowa.edu, office 201M MacLean Hall, 335-2833, Office Hours Tuesday & Thursday 10am-11am, Friday 11am-noon, and by appointment)

Teaching Assistant: Farley Lai (farley-lai@uiowa.edu, office 101N MacLean Hall, Office Hours Wednesday 2pm-3:30 and Thursday 12:30pm-2:00)

Department Chair: The DEO of Computer Science is Professor Alberto Segre, whose office is room 14G, MacLean Hall.

Topic Plan for Semester

  1. Introduction to theory of parallel computing. It's possible to extend the RAM model used for teaching algorithms to a multiprocessor model PRAM which has as many processors as we like all running at precisely the same rate sharing a common memory. Unrealistic, but a good start to imagining what is possible in an ideal parallel computing world.
  2. Asynchronous parallel computing. What happens when processes share a common memory space, but do not all run at the same speed? Less than ideal, but more realistic, and resembling common multicore architectures.
  3. Message-passing, asynchronous parallel computing. What if the processes do not share a common memory, but instead can send each other messages? This situation resembles a cluster of computers on a local area network.
  4. Special purpose architectures for parallel computing. The most practical example today is found in multicore GPUs.
  5. Software for a cluster of computers: the MPI package (popular in some research institutions with numerical-type algorithm needs); this software is typically used in "high performance computing" applications.
  6. Alternative software or a cluster of computers: non-MPI approaches.
  7. Functional programming and parallel computing: a sequence of vector operations can be expressed as the composition of functions, and this can be parallelized by familiar ideas.
  8. The MapReduce paradigm and the Hadoop implementation of MapReduce.

  9. Needs of Big Data and the impact of databases. One attraction of the cloud is the potential to hold a vast amount of data, which needs to be organized for scalability of applications, reliability and accessibility. This is a rich topic which could take up an entire semester itself, so much as been researched and done in this area.
  10. The Spark framework for cloud and cluster programming.
  11. Infrastructure topics. Selected parts and looking "under the hood" of how cloud is implemented. Virtual machines, containers, specialized file systems, worries about continuous operation in the face of flaky network behavior, optimization. And there are other applications which we find in cloud computing: databases, caching subsystems, machine learning, streaming content, social networks, and more which influence the current infrastructure of the cloud.

    Following topics start with software most related to MapReduce and Spark, then expand to other parts of cloud infrastructure.

    1. Numbers everyone should know, performance and memory.
    2. Clusters, Data Centers, Content Delivery Networks, operating distributed systems.
    3. Google File System, the Hadoop File System, and Resilient Distributed Datasets.
    4. Replicated Databases and Distributed Transactions.
    5. Consensus, Atomic Broadcast, and the CAP Theorem.
    6. SQL vs NoSQL, ACID vs BASE
    7. BigTable, Dynamo, Cassandra, and column-store databases.

    8. Virtual Machines, Containers, and Dynamic Code Execution.
    9. The three V's of Big Data, large streams of data, complex event processing, and data queues in the cloud.
  12. Command-line basics. Generally speaking, working with cloud systems uses file transfer, directories, file and directory permissions and ownership, ssh and scp (with public/private keys), and vendor-specific tools and CLI (Command Line Interface) recipes that students will need to use. Different students do these things in different ways, especially editing text and code files. Unavoidably, these practical considerations will take up class time.

Textbooks

Given the broad range of topics above, there isn't a single textbook that covers it all. In the past, a few students trying to learn MapReduce and Hadoop bought technical guides ("Hadoop for Dummies", "Hadoop: the Definitive Guide") -- these books actually had misleading directions, so when student got into trouble following the book recipes, it was difficult to know what to do. There are now several books on Spark, which we will use later in the course, but these also are not recommended. There are enough online documents describing the programming frameworks used in this course that an experienced student should not need to purchase a textbook.

Background and Resources

Programming exercises in this course depend on using several languages, including Python, Java, and Scala. Few students will have used Scala before, so we will take a couple of days to cover enough Scala for the needs of the project. Languages Java and Scala interoperate nicely, so familiarity with Java is quite helpful. Both Hadoop and Spark can now be programmed with Python, though not all features are accessible using Python.

Students generally don't have too much difficulty with the programming needs of a project. The place that students may stumble is the lack of experience with command-line (shell/console) interaction with a system. Students who have only programmed using IDEs (Eclipse, IntelliJ, Netbeans, Visual Studio) will need to make extra effort when dealing with command-line aspects. Commands to copy data to and from the cloud, log in to virtual machines in the cloud, setting permissions on files and directories -- these are generally done by Linux commands, so be prepared to master at least a subset of the command-line/shell environment.

Both Hadoop and Spark can be installed on a laptop for practice, before trying something in the cloud. A server for this course with Hadoop and Spark already installed will be available by remote log in, for more practice. Later in the semester, each student is expected to complete a more ambitious project (beyond simple homework), and for this students will use services of a real cloud provider (such as Amazon Web Service or Google Compute).

Most content for this course will be posted to this website, which is a private Wiki. In cases where some confidentiality is advisable, material will be put on ICON (icon.uiowa.edu). Also, ICON will be the place for submitting work, viewing grades, and some miscellaneous announcements.

Expectations

Evaluation Criteria

These gradable criteria have been set up on the ICON site for the course.

Final Exam and No Class Days

There will be no final exam, however there will be final reports or presentations on projects.

No Class Days: Tuesday 31 January, Tuesday 14 March, Thursday 16 March.

Syllabus Insert: Standard Iowa Required Syllabus Items

Administrative Home

The College of Liberal Arts and Sciences is the administrative home of this course and governs matters such as the add/drop deadlines, the second-grade-only option, and other related issues. Different colleges may have different policies. Questions may be addressed to 120 Schaeffer Hall, or see the CLAS Academic Policies Handbook at http://clas.uiowa.edu/students/handbook

Electronic Communication

University policy specifies that students are responsible for all official correspondences sent to their University of Iowa e-mail address (@uiowa.edu). Faculty and students should use this account for correspondences (Operations Manual, III.15.2, k.11).

Accommodations for Disabilities

The University of Iowa is committed to providing an educational experience that is accessible to all students. A student may request academic accommodations for a disability (which includes but is not limited to mental health, attention, learning, vision, and physical or health-related conditions). A student seeking academic accommodations should first register with Student Disability Services and then meet with the course instructor privately in the instructor's office to make particular arrangements. Reasonable accommodations are established through an interactive process between the student, instructor, and SDS. See http://sds.studentlife.uiowa.edu/ for information.

Academic Honesty

All CLAS students or students taking classes offered by CLAS have, in essence, agreed to the College's Code of Academic Honesty: "I pledge to do my own academic work and to excel to the best of my abilities, upholding the IOWA Challenge. I promise not to lie about my academic work, to cheat, or to steal the words or ideas of others; nor will I help fellow students to violate the Code of Academic Honesty." Any student committing academic misconduct is reported to the College and placed on disciplinary probation or may be suspended or expelled (CLAS Academic Policies Handbook).

Making a Suggestion or a Complaint

Students with a suggestion or complaint should first visit with the instructor (and the course supervisor), and then with the departmental DEO. Complaints must be made within six months of the incident (CLAS Academic Policies Handbook).

Understanding Sexual Harassment

Sexual harassment subverts the mission of the University and threatens the well-being of students, faculty, and staff. All members of the UI community have a responsibility to uphold this mission and to contribute to a safe environment that enhances learning. Incidents of sexual harassment should be reported immediately. See the UI Office of the Sexual Misconduct Response Coordinator for assistance, definitions, and the full University policy.

Reacting Safely to Severe Weather

In severe weather, class members should seek appropriate shelter immediately, leaving the classroom if necessary. The class will continue if possible when the event is over. For more information on Hawk Alert and the siren warning system, visit the Department of Public Safety website.

Syllabus (last edited 2017-01-20 19:43:56 by Ted Herman)