About the Course

Cloud computing has recently seen a lot of attention from research and industry for applications that can be parallelized on shared-nothing architectures and have a need for elastic scalability. As a consequence, new data management requirements have emerged with multiple solutions to address them. This course will look at the principles behind data management in the cloud as well as discuss actual cloud data management systems that are currently in use or being developed. The topics covered in the course range from novel data processing paradigms (MapReduce, Scope, DryadLINQ), to commercial cloud data management platforms (Google BigTable, Microsoft Azure, Amazon S3 and Dynamo, Yahoo PNUTS) and open-source NoSQL databases (Cassandra, MongoDB, Neo4J). The world of cloud data management is currently very diverse and heterogeneous. Therefore, our course will also report on efforts to classify, compare and benchmark the various approaches and systems. Students in this course will gain broad knowledge about the current state of the art in cloud data management and, through a course project, practical experience with a specific system.

Live Lecture Zoom Link

Registration and other generic inquiries:

We will not be managing a waitlist for the course and we have no control over registration.
Registration will be on a first-come-first-serve basis. Usually, the CS Department staff will release more spots in-bulk every so often.

Prerequisites:

Optional (highly recommended) background: CS 411 or any relevant database course
Programming: For projects, you will do some significant application and web programming, with some host languages of your choice (e.g., C, C++, Java, PHP, Python). We will not cover programming-specific issues in this course.

Textbook:

(Links to an external site.)Guy Harrison, Next Generation Databases: NoSQL, NewSQL, and Big Data (Links to an external site.)
Read the textbook for the required reading before lectures, and study them more carefully after class. Our lectures are intended to provide a roadmap for your reading-- with the limited lecture time, we may not be able to cover everything in the readings.

Grading Summary:

Class component
Percent
Notes
Class Participation 10%
Reading Summaries 15%
Assignments 25% 4 assignments (all have the same weight)
Project 50% Semester-long group project

Class component	Percent	Notes
Class Participation	10%
Reading Summaries	15%
Assignments	25%	4 assignments (all have the same weight)
Project	50%	Semester-long group project

Final grading (tentative):

In this course, we will be assigning +/- letter grades.

Total	Grade
90-100	A (A-, A, A+)
80-89	B (B-, B, B+)
70-79	C (C-, C, C+)
60-69	D (D-, D, D+)

We will give you the best grade of the scale above and a regular Gaussian curve using this rule (Links to an external site.), with a mean around B+.
This course may contain both graduate and advanced undergraduate students. We will grade all groups of students on different curves.

A. Lectures & Attendance (10%)

Students are responsible for anything that transpires during a class. Class attendance is strongly recommended. If you are unable to attend, I appreciate it if you can let me know in advance. There are also “participation points”, generally corresponding to class discussion questions or activities, and graded on a “check-off” basis.

B. Homework Assignments (25%)

There will be both assignments and projects for the course, generally due on Thursday. I will try to post homework at least a week before it is due.

Homework submission will be through Canvas.

Assignments are individual work.
Collaboration is NOT allowed when working on the assignments.
Discussions are allowed if and only if these discussions regard only high-level concepts and general ideas. Discussion cannot involve answers to the questions on the homework. Checking answers/part of the solutions among peers are not allowed. Sharing answers on any public/private electronic platform, including but not limited to email, messenger, Facebook groups, discord chat, etc., are not allowed.
If you discussed questions with your classmates, you must include their names and the questions you discussed. Not including students' names will be considered a violation of the course's academic integrity policy. This rule applies to all individual homework assignments, including MPs.
You should reference (in your code as comments) any code or concepts copied from StackOverflow or any other online resources. However, 80% of the code you turn in must be your own code.
You are allowed to submit regrade requests within the time frame listed on Campuswire. Typically we allow up to one week after the HW grades are released if not explicitly mentioned.
Uploading your assignment questions to public platforms (i.e., shared drive, course hero, etc.) is prohibited. Such violations are copyright infringements and possible violations of academic honesty. We will process these strictly.

C. Reading Summaries (15%)

There will be reading assignments, with short summaries due before class.

1. time before logging off.

D. Project (50%)

There will be a semester-long project, which involves significant database application programming. The project will be structured with several milestones due in the semester, leading to a demo and write-up near the end of the semester. Details and policies for the project will be documented separately. Please note that projects are group-based assignments, and they still follow academic honesty guidelines. Your group should not exchange and discuss code with other groups. All rules listed in Section I - C - (3)~(5) of this syllabus apply to project assignments.

4-credit project (Option for Graduate Students)

Graduate students MAY take this course for 4 credit units. (Undergraduates take this course for three hours credit.) For the extra unit, you will complete an additional project (a literature review) -- i.e., you will work on both tracks of the projects.

IN PROGRESS and Subject to change. Lecture notes will be posted on the day of the lecture.

Schedule

Week	Date	Topic	Assigned	Due	Required Reading	Optional Reading
Week 1	1/18	Course Info, Introduction to Cloud Computing
Week 1	1/20	Introduction to Cloud DM, Challenges	Assignment 1			Above the Clouds: A Berkeley View of Cloud Computing (Links to an external site.)
Week 2	1/25	Challenges, App Characteristics			Ch. 1, Three Database Revolutions	D. Abadi: Data Management in the Cloud: Limitations and Opportunities. IEEE Data Eng Bull. 2009 (Links to an external site.)
Week 2	1/27	Basics: Data Models			Ch. 2, Google, Big Data and Hadoop	R. Cattell: Scalable SQL and NoSQL Data Stores. SIGMOD Rec. 2010 (Links to an external site.)
Week 3	2/1	Basics Data Models, Basics: Consistency	Project 0	Assignment 1		D. Terry. Replicated Data Consistency Explained Through Baseball (Links to an external site.)
Week 3	2/3	Basics: Consistency, GCP Intro	Assignment 2		Ch. 3, Sharding, Amazon, and the Birth of NoSQL	D. Abadi: Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story (Links to an external site.)
Week 4	2/8	Basics: Consistency			Ch. 10, Data Models and Storage
Week 4	2/10	Basics: File Systems	Project 1	Project 0		S. Ghemawat, et al. The Google File System. SOSP 2003 (Links to an external site.)
Week 5	2/15	Basics: File Systems			Chapter 9: Consistency Models
Week 5	2/17	Basics: File Systems, Basics: Map-Reduce			Case Study. GFS: Evolution on Fast-forward (Links to an external site.)
Week 6	2/22	Basics: Map-Reduce		Assignment 2		M. Stonebraker, et al. MapReduce and Parallel DBMSs: Friends or Foes? (Links to an external site.)
Week 6	2/24	Basics: Map-Reduce, Map-Reduce Versus DBMS			J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI (Links to an external site.)
Week 7	3/1	Map-Reduce Versus DBMS, Cloud DBs: Key-Value (Amazon Dynamo)			A. Pavlo, et al. A comparison of approaches to large-scale data analysis. SIGMOD 2009. (Links to an external site.)
Week 7	3/3	Cloud DBs: Key-Value (Amazon Dynamo)	Project 2	Project 1
Week 8	3/8	Cloud DBs: Key-Value (Amazon Dynamo)			DeCandia, et al. Dynamo: Amazon's highly available key-value store. SOSP 2007. (Links to an external site.)
Week 8	3/10	Cloud DBs: Document (MongoDB)		Midpoint Meetings this week	Chapter 4: Document Databases	Chapter 6, pp 110-115: MongoDB Sharding and Replication, Chapter 11, pp 173-175: MongoDB
	3/12-3/20	Spring Break
Week 9	3/22	Cloud DBs: Document (MongoDB); Cloud DBs: Column Family (Bigtable)	Assignment 3
Week 9	3/24	CloudDBs: Column Family (Bigtable); Spark demo			F. Chang, et al.: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), 2008. (Links to an external site.)	Chapter 6, pp 115-119: HBase, Chapter 11, pp 171-173: Hbase
Week 10	3/29	CloudDBs: Column Family (Bigtable); Data Processing: Spark			M. Zaharia, et al.: Apache Spark: a Unified Engine for Big Data Processing. CACM October 2016. (Links to an external site.)	M. Zaharia, et a. (Links to an external site.)
Week 10	3/31	Data Procressing: Spark	Project 3	Project 2
Week 11	4/5	Graph Model (Neo4J)	Assignment 4		Chapter 5: Graph Databases
Week 11	4/7	Graph Model (Neo4j); Data Processing: Hive; Hive Demo			A. Thusoo, et al.: Hive-A Petabyte Scale Data Warehouse Using Hadoop. ICDE, pp. 996-1005, 2010. (Links to an external site.)	J, Camacho-Rodríguez, et al. Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing, SIGMOD 2019 (Links to an external site.)
Week 12	4/12	Data Processing: Hive		Assignment 3
Week 12	4/14	Data Processing: Pig Latin			C. Olston, et al.: Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD 2008. (Links to an external site.)
Week 13	4/19	Data Processing Pig Latin; Data Processing: VoltDB			Chapter 7: The End of Disk?	S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker: OLTP through the looking glass, and what we found there. SIGMOD 2008. (Links to an external site.)
Week 13	4/21	Data Processing: VoltDB		Assignment 4
Week 14	4/26	Graph Processing: Pregel and Giraph			G. Malewicz, et al. Pregel: A System for Large-Scale Graph Processing. SIGMOD 2010 (Links to an external site.)	A. Ching, et al. One Trillion Edges: Graph Processing at Facebook-Scale. VLDB 2015. (Links to an external site.)
Week 14	4/28	Project Presentations		Project 3
Week 15	5/3	Project Presentations
Week 15	5/4	Project Presentations, 1		Project 3 Final Slides

About the Course

Registration and other generic inquiries:

We will not be managing a waitlist for the course and we have no control over registration.
Registration will be on a first-come-first-serve basis. Usually, the CS Department staff will release more spots in-bulk every so often.

Prerequisites:

Textbook:

Grading Summary:

Class component
Percent
Notes
Class Participation 10%
Reading Summaries 15%
Assignments 25% 4 assignments (all have the same weight)
Project 50% Semester-long group project

A. Lectures & Attendance (10%)

B. Homework Assignments (25%)

C. Reading Summaries (15%)

D. Project (50%)

Spring Break

Useful Links

Contact Us

About the Course

Registration and other generic inquiries:

We will not be managing a waitlist for the course and we have no control over registration.Registration will be on a first-come-first-serve basis. Usually, the CS Department staff will release more spots in-bulk every so often.

Prerequisites:

Textbook:

Grading Summary:

Class componentPercentNotesClass Participation10%Reading Summaries15%Assignments25%4 assignments (all have the same weight)Project50%Semester-long group project

A. Lectures & Attendance (10%)

B. Homework Assignments (25%)

C. Reading Summaries (15%)

D. Project (50%)

Spring Break

We will not be managing a waitlist for the course and we have no control over registration.
Registration will be on a first-come-first-serve basis. Usually, the CS Department staff will release more spots in-bulk every so often.

Class component
Percent
Notes
Class Participation 10%
Reading Summaries 15%
Assignments 25% 4 assignments (all have the same weight)
Project 50% Semester-long group project