Lecture Summaries

Not really meant to be notes on a lecture, this list just gives a rough idea of what went on for a given day.

03 May

Moving on to BigTable, !HBase, !NoSQL, and future of Hadoop and Big Data.

01 May

Responding to a class request, this lecture looks at the Hadoop File System, the Google File System, and even the file system of the Disco Project. These are interesting not only because they support things like MapReduce, but because they show how to engineer distributed file systems for large data that doesn't follow traditional database models.

26 Apr

Upon class request, most of the lecture was about Erlang (programming language) and why that is generating excitement for highly concurrent computing. I summarized, from memory, some of the features/lessons from this presentation. Along the way, happened to briefly glance at Disco Project, which is built on Erlang, plus an alternative to the Hadoop File System called the Disco File System, and a Python interface (could be an interesting alternative to Hadoop). Practically-oriented students might also want to check out the tools that instagram uses.

24 Apr

About Cascading. A comparison mzG4k

19 Apr

Start on Cloud Languages and Frameworks. Just for curiosity sake: AWS

17 Apr
Covering impossibility of 1-fault resiliency in tasks other than consensus.
12 Apr
Showed rough equivalence between consensus and atomic broadcast, which provides another way to understand limitations of scale and fault tolerance for consensus problems.
5 Apr

Continue with Impossibility Results. Finished a proof from this paper. Another big data source.

3 Apr

Impossibility Results; final project; (big data example)

29 Mar

Shared Memory topics, such as consistency of read/write, and atomicity.

27 Mar

Continue with Concurrent Transactions. Aside: Connected Components in MapReduce

22 Mar

Announce Third MapReduce Homework, and begin Concurrent Transactions coverage.

20 Mar

Shortest Path in MapReduce was the main topic

8 Mar

How to make parallel a shortest path computation in graphs (networks) - PRAM and MapReduce

6 Mar

Discuss Hadoop and present Craigs Tickets example.

1 Mar
Going over solution to homework; second Quiz.
28 Feb
More on maximum in O(1) time Also, start of 3-coloring algorithm.
23 Feb
Applying roots of a forest algorithm to parallel prefix. Also, an O(1) time parallel algorithm for maximum of an array, but only if the model permits concurrent write of the same value to a given location -- otherwise it seems we can only get O(lg n) time algorithm.
21 Feb
In more depth, the current homework (details on the framework); then look at PRAM algorithm for finding roots of a forest.
16 Feb

More on Hadoop, MapReduce Applications, and a bit of PRAM algorithms.

14 Feb

A bit on the current assignment. Start of MapReduce Applications and more of Hadoop.

9 Feb

Start of MapReduce and a little of Hadoop.

7 Feb

Going over quiz; then, introducing the Parallel Random Access Machine model. Started some PRAM Notes. Sketched an O(lg n) parallel sort algorithm on the PRAM model.

2 Feb
Pipelining of addition - computing a series of summations: input is a sequence of vectors, output is a sequence of their sums; a new vector enters the pipeline at each consecutive time unit, and after log(n) delay, the corresponding sum emerges from the pipeline.
2 Feb
Quiz over Linda.
31 Jan
No Class.
26 Jan

Going over notes on Linda Trapezoid, then a look at systolic array computation of a matrix product C=AB; also a very brief look at Convolution on a systolic array, like used in engineering a filter for Finite Impulse Response.

25 Jan

Not a lecture summary, but a useful example: Linda Trapezoid.

24 Jan
Continue with MPI. Starting to look at dataflow and systolic array models of processing.
19 Jan

Take a look at PVM and MPI, eg, mpiguide.pdf

Aside: some background for MPI


17 Jan

Introduction of parallel computing as a topic, how it is motivated by current trends in graphics (CPUs), multicore architecture, Cloud Computing and Big Data. Motivations for Big Data include scientific sources (like astronomy, genomics, etc), Social Networks, search engines, business analytics, electronic commerce with data warehousing, military and surveillance applications. Many traditional forms of parallel computing are relevant and we will cover a few of those. For exercises, we propose to use Hadoop and MapReduce; later in the semester we can look at limits of distributed and parallel systems, for instance the !CAP theorem. Notes: 17jan.pdf

Lecture Summaries (last edited 2014-05-25 18:05:35 by localhost)