Module Database Search



MODULE DESCRIPTOR
Module Title
Data Engineering
Reference CM2606 Version 2
Created February 2024 SCQF Level SCQF 8
Approved July 2020 SCQF Points 15
Amended April 2024 ECTS Points 7.5

Aims of Module
To provide mechanisms and interface to platforms that facilitate the flow and access of information needed to harvest big data, by developing necessary understanding to organise datasets, create accurate models, along with design and development of platforms to be used as business frameworks.

Learning Outcomes for Module
On completion of this module, students are expected to be able to:
1 Practice data engineering design principles, practices and standards for managing big data.
2 Compare the state of art data cleaning, organisation and integration methodologies.
3 Plan a Cloud Based process (AWS), for a data engineering task.
4 Practice Consistency, Availability and Partition Tolerance (CAP) theorem and it’s utility in developing different types of databases.
5 Undertake an Extract Transform and Load (ETL/ELT) process, tailored to specific data science requirement.

Indicative Module Content
Introduction to Data Engineering, Technologies and Tools to be used. OLTP Concepts: Design Methodologies, Normalization, Difference between OLTP and OLAP. Data Warehousing and OLAP Concepts. Introduction to Data Warehousing (Comparison with other DBs, Design and Schemas, Data cube and OLAP operations, FACT tables, KPI, Extending to OLAM). ETL Techniques: ETL Introduction, Extraction, Transformation - Cleaning and Conforming Data, Loading to Data platforms, Scheduling and Construction Process. Data Pipelines: Introduction to Data Pipeline, Automation, Optimizations and Scalability of data pipelines. Modern Trends: Real Time Analysis, Data Lake, Self Service Data Platform, OnPrem(HDFS) vs Cloud Platforms (AWS, GCP and Azure mainly).

Module Delivery
The module will be delivered through a series of lectures and tutorials sessions. The theoretical concepts and main principles will be introduced during the lecture and the students will be provided with exercises during the lecture to apply and test their theoretical knowledge in-class. The tutorial sessions will consist of practical exercises to apply these theoretical principles to real-world problems.

Indicative Student Workload Full Time Part Time
Contact Hours 48 N/A
Non-Contact Hours 102 N/A
Placement/Work-Based Learning Experience [Notional] Hours N/A N/A
TOTAL 150 N/A
Actual Placement hours for professional, statutory or regulatory body    

ASSESSMENT PLAN
If a major/minor model is used and box is ticked, % weightings below are indicative only.
Component 1
Type: Coursework Weighting: 100% Outcomes Assessed: 1, 2, 3, 4, 5
Description: Individual Coursework covering all learning outcomes.

MODULE PERFORMANCE DESCRIPTOR
Explanatory Text
The calculation of the overall grade for this module is based on 100% weighting of C1. An overall minimum grade of D is required to pass this module.
Module Grade Minimum Requirements to achieve Module Grade:
A The student needs to achieve an A in C1.
B The student needs to achieve a B in C1.
C The student needs to achieve a C in C1.
D The student needs to achieve a D in C1.
E The student needs to achieve an E in C1.
F The student needs to achieve an F in C1.
NS Non-submission of work by published deadline or non-attendance for examination

Module Requirements
Prerequisites for Module CM1603 or equivalent.
Corequisites for module None.
Precluded Modules None.

INDICATIVE BIBLIOGRAPHY
1 Kimball, R. and Ross, M. 2013. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd ed. John Wiley & Sons Inc.
2 Silvers, F. 2008. Building and Maintaining a Data Warehouse. CRC Press.
3 Inmon, W., Strauss, D. and Neushloss, G. 2008. DW 2.0: The Architecture for the Next Generation of Data Warehousing (Morgan Kaufman Series in Data Management Systems). 1st ed. Morgan Kaufmann.
4 Kimball, R., Caserta, J. (2007). The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley.
5 Inmon, B. (2016). Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications.


Robert Gordon University, Garthdee House, Aberdeen, AB10 7QB, Scotland, UK: a Scottish charity, registration No. SC013781