SOC 553 Introduction to Text Mining and Statistical Natural Language Processing Syllabus The syllabus below describes a recent offering of the course, but it may not be completely up to date. For current details about this course, please contact the course coordinator. Course coordinators are listed on the course listing for undergraduate courses and graduate courses.
Text Books Required Sholom M. Weiss, Nitin Indurkhya, and Tong Zhang , Fundamentals of Predictive Text Mining , Springer, 2010, ISBN 978-1-84996-225-4
Recommended Christopher D. Manning and Hinrich Schutze , Foundations of Statistical Natural Language Processing , MIT Press, 1999, ISBN 978-0-262-13360-1 Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze , Introduction to Information Retrieval , Cambridge University Press, 2008, ISBN 978 -0-521-86571-5 Steven Pinker , Words and Rules , Perennial/Harper Collins, 2000, ISBN 978-0-060-95840-4
Week-by-Week Schedule Week Topics Covered
Reading
Assignments
1
Overview, Problem Types, Text vs. Data Mining chap 1, appendix A
Respond to following Questions and Exercises in 1.11 1-4. Install Software. Read manuals (tmsk.pdf , riktext.pdf) and learn to use software by week 4.
2
Collect, Standardize, Tokenize, Generate Vectors, Term Frequencies-Inverse Document Frequencies (tf-idf)
sections 2.1-2.5 Assignment 1: Create termdocument spreadsheet .by hand. using algorithms in Figures 2.3, 2.4, 2.5, and 2.7 for assignment documents.
3
Sentence Boundaries, Parts-of- Speech Tagging, word Sense Disambiguation, Full Sentence Parsing
sections 2.6-2.12
4
Application of software to extract results of Chapter 2 topics
5
Classification: Nearest Neighbor, Decision Rules/Trees
chap 3 thru 3.4.4
Respond to following Questions and Exercises in 3.9: 5-6
6
Classification: Probabilistic, Weighted Scores, Evaluation
sections 3.4.5-3.6
Respond to following Questions and Exercises in 3.9: 8-9, 12
7
Midterm
chap 1-3
8
Information Retrieval
chap 4
Respond to following Questions and Exercises in 4.1: 1-4
9
Document Collection Structure: Similarity, Clustering, Evaluation
chap 5
Respond to following Questions and Exercises in 5.8: 11-13
10
Information Retrieval and Extraction
chap 6
Respond to following Questions and Exercises in 6.8: 3-6
Assignment 2: Apply algorithm .by hand. from Figure 2.8 to results of Assignment 1. Also generate parse trees for these sentences. Finish learning software and respond to following Questions and Exercises in 2.15: 1-6
Week Topics Covered
Reading
Assignments
11
Mixed Text and Data from Databases, WWW, and other Hybrid Sources
chap 7
Respond to following Questions and Exercises in 7.8: 5-7
12
Applications
chap 8
Research Project: find report on an application not listed in text and describe it similarly to the text descriptions including problem, solution overview, methods and procedures, and deployment
13
Advanced Topics: Summarization, Active Learning
chap 9