Module Database Search

This Version is No Longer Current
The latest version of this module is available here

MODULE DESCRIPTOR
Module Title
Text Analytics
Reference	CMM706	Version	2
Created	October 2017	SCQF Level	SCQF 11
Approved	May 2016	SCQF Points	15
Amended	November 2017	ECTS Points	7.5

Aims of Module
To provide students with a comprehensive understanding of the main principles and practices underlying the retrieval, extraction and mining of text data and the skills to create systems for a variety of information types in differing search environments.

Learning Outcomes for Module
On completion of this module, students are expected to be able to:
1	Critically appraise extraction and search models in information retrieval and Natural Language Processing in relation to big data case studies.
2	Critically evaluate current research and advanced scholarship in IR and NLP, their role and alternative directions for big data projects.
3	Combine methods from NLP, topic modelling and text mining tool-kits to develop new extraction processes for real-world tasks.
4	Plan a comparative study to evaluate and interpret results from designing and developing information retrieval and extraction systems for big data.

Indicative Module Content
Comparative analysis of information retrieval and visualisation methods. Text extraction, tokenisation, stemming, bag-of-words, n-gram, statistical language models, vector representations and topic models. Word sense disambiguation, phrase and named entity recognition, POS tagging, shallow parsing, syntax and dependency parsing. Document similarity, clustering and classification, information extraction, sentiment analysis using lexicon-based techniques. Case studies on text classification, topic modelling applied to news articles, intelligent search and browse, sentiment analysis and social media mining.

Indicative Module Content

Comparative analysis of information retrieval and visualisation methods. Text extraction, tokenisation, stemming, bag-of-words, n-gram, statistical language models, vector representations and topic models. Word sense disambiguation, phrase and named entity recognition, POS tagging, shallow parsing, syntax and dependency parsing. Document similarity, clustering and classification, information extraction, sentiment analysis using lexicon-based techniques. Case studies on text classification, topic modelling applied to news articles, intelligent search and browse, sentiment analysis and social media mining.

Module Delivery
This is a lecture based course, supplemented with laboratory sessions, where state-of-the-art extraction and retrieval toolkits will be applied to varied case studies. Tutorials will be used to initiate discussions on research papers from the field to supplement the lectures.

Indicative Student Workload	Full Time	Part Time
Contact Hours	N/A	48
Non-Contact Hours	N/A	102
Placement/Work-Based Learning Experience [Notional] Hours	N/A	N/A
TOTAL	N/A	150
Actual Placement hours for professional, statutory or regulatory body

ASSESSMENT PLAN
If a major/minor model is used and box is ticked, % weightings below are indicative only.
Component 1
Type:	Coursework	Weighting:	100%	Outcomes Assessed:	1, 2, 3, 4
Description:	Coursework which consists of a written report on the state−of−the−art in a chosen area of information retrieval or text mining research (40%) combined with a class presentation (10%) and a comparative analysis to evaluate methods and systems from NLP, topic modelling and text mining tool kit (50%).

MODULE PERFORMANCE DESCRIPTOR
Explanatory Text
The student must have a grade D on C1 to pass the module.
Module Grade	Minimum Requirements to achieve Module Grade:
A	The student needs to achieve an A in C1.
B	The student needs to achieve a B in C1.
C	The student needs to achieve a C in C1.
D	The student needs to achieve a D in C1.
E	The student needs to achieve an E in C1.
F	The student needs to achieve an F in C1.
NS	Non-submission of work by published deadline or non-attendance for examination

Module Requirements
Prerequisites for Module	None except for course entry requirements.
Corequisites for module	None.
Precluded Modules	None.

INDICATIVE BIBLIOGRAPHY
1	MANNING, C., RAGHAVAN, P., and SCHUTZE, H., 2008. Introduction to Information Retrieval. Cambridge University Press.
2	RUSELL, A., 2013. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. 2nd Edition. O’Reilly Media.
3	MANNING, C., and SCHUTZE, H., 1999. Foundations of Statistical Natural Language Processing. MIT Press.
4	BIRD, S., KLEIN, E., and LOPER, E., 2009. Natural Language Processing with Python. O’Reilly Media.
5	GABER, M.M., COCEA, M., WIRATUNGA, N. and GOKER, A., 2015. Advances in Social Media Analysis. Springer.
6	CROFT, W. B., METZLER, D. and STROHMAN, T., 2015. Search Engines Information Retrieval in Practice. Pearson Education Inc. http://ciir.cs.umass.edu/irbook/

Robert Gordon University, Aberdeen

Module Database Search