Module Database Search


Module Title
Big Data Programming

Keywords
Parallel data processing, programming design patterns, Hadoop eco system

ReferenceCMM705
SCQF LevelSCQF 11
SCQF Points15
ECTS Points7.5
CreatedMarch 2016
ApprovedMay 2016
Amended
Version No.1


This Version is No Longer Current
The latest version of this module is available here
Prerequisites for Module

None except for course entry requirements.

Corequisite Modules

None.

Precluded Modules

None.

Aims of Module

To provide a general overview of map-reduce design patterns for large data set processing tasks and to develop specialised knowledge in big data Stream Processing and Scalable Realtime Architecture.


Learning Outcomes for Module

On completion of this module, students are expected to be able to:

1. Discuss, compare and contrast the advantages and disadvantages of applying specific big data design patterns given a real-world big data programming task.
2. Configure a distributed architecture for big data deployment.
3. Design, implement and evaluate scalable program solutions using a big data computation framework.
4. Identify relevant offerings for a given big data problem from the Hadoop eco system and other related big data offerings.

Indicative Module Content

1. Java programing primer to prepare for Big Data design patterns.
2. HDFS and Hadoop architecture for big data.
3. Case studies on how map reduce programming design patterns (e.g. summerisation, filtering, data organization, Join) can be used to address various real-world problems in processing and analyzing large data sets.
4. Investigate the concepts offered and supported in Spark, and how this contrasts with the Hadoop offering.
5. Use technologies like Spark Streaming and Storm for big-data stream processing


Indicative Student Workload

Contact Hours

Part Time
Laboratories
24
Lectures
24

Directed Study

 
Coursework Preparation
25
Directed Study
34

Private Study

 
Private Study
43

Mode of Delivery

This is a lecture based module, supplemented with practical sessions, where a number of Big Data technologies will be used to teach students how to design and implement map-reduce programs guided by design patterns and case studies. The Hadoop eco-system will be studied and other offerings such as the Apache Spark system will be explored.

Assessment Plan

Learning Outcomes Assessed
Component 1 1,2,3,4

Component 1 - This is a coursework assignment consisting of two parts: MapReduce project with 70% of the total module assessment. Hadoop eco-system project with 30% of the total module assessment.

Indicative Bibliography

1.MINER, D., and SHOOK, A., 2012. MapReduce Design Patterns, by O'Reilly Media. O’Reilly.
2.KARAU, H., and KONWINSKY, A., 2015. Learning Spark. O’Reilly.
3.WHITE, T., 2011. Hadoop: The Definitive Guide (2nd edition). O’Reilly.
4.PERERA, S., Gunerathna, T., 2013. Hadoop MapReduce Cookbook. Packt Publishers.
5.MATLOFF, N., 2015. Parallel Computing for Data Science: With Examples in R, C++ and CUDA. CRC Press.



Robert Gordon University, Garthdee House, Aberdeen, AB10 7QB, Scotland, UK: a Scottish charity, registration No. SC013781