Processing Syllabus SOC 553 Introduction to Text Mining

SOC 553 Introduction to Text Mining and Statistical Natural Language ... Fundamentals of Predictive Text Mining ... Read manuals (tmsk.pdf , riktext.p...

3 downloads 527 Views 10KB Size
SOC 553 Introduction to Text Mining and Statistical Natural Language Processing Syllabus The syllabus below describes a recent offering of the course, but it may not be completely up to date. For current details about this course, please contact the course coordinator. Course coordinators are listed on the course listing for undergraduate courses and graduate courses.

Text Books Required Sholom M. Weiss, Nitin Indurkhya, and Tong Zhang , Fundamentals of Predictive Text Mining , Springer, 2010, ISBN 978-1-84996-225-4

Recommended Christopher D. Manning and Hinrich Schutze , Foundations of Statistical Natural Language Processing , MIT Press, 1999, ISBN 978-0-262-13360-1 Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze , Introduction to Information Retrieval , Cambridge University Press, 2008, ISBN 978 -0-521-86571-5 Steven Pinker , Words and Rules , Perennial/Harper Collins, 2000, ISBN 978-0-060-95840-4

Week-by-Week Schedule Week Topics Covered

Reading

Assignments

1

Overview, Problem Types, Text vs. Data Mining chap 1, appendix A

Respond to following Questions and Exercises in 1.11 1-4. Install Software. Read manuals (tmsk.pdf , riktext.pdf) and learn to use software by week 4.

2

Collect, Standardize, Tokenize, Generate Vectors, Term Frequencies-Inverse Document Frequencies (tf-idf)

sections 2.1-2.5 Assignment 1: Create termdocument spreadsheet .by hand. using algorithms in Figures 2.3, 2.4, 2.5, and 2.7 for assignment documents.

3

Sentence Boundaries, Parts-of- Speech Tagging, word Sense Disambiguation, Full Sentence Parsing

sections 2.6-2.12

4

Application of software to extract results of Chapter 2 topics

5

Classification: Nearest Neighbor, Decision Rules/Trees

chap 3 thru 3.4.4

Respond to following Questions and Exercises in 3.9: 5-6

6

Classification: Probabilistic, Weighted Scores, Evaluation

sections 3.4.5-3.6

Respond to following Questions and Exercises in 3.9: 8-9, 12

7

Midterm

chap 1-3

8

Information Retrieval

chap 4

Respond to following Questions and Exercises in 4.1: 1-4

9

Document Collection Structure: Similarity, Clustering, Evaluation

chap 5

Respond to following Questions and Exercises in 5.8: 11-13

10

Information Retrieval and Extraction

chap 6

Respond to following Questions and Exercises in 6.8: 3-6

Assignment 2: Apply algorithm .by hand. from Figure 2.8 to results of Assignment 1. Also generate parse trees for these sentences. Finish learning software and respond to following Questions and Exercises in 2.15: 1-6

Week Topics Covered

Reading

Assignments

11

Mixed Text and Data from Databases, WWW, and other Hybrid Sources

chap 7

Respond to following Questions and Exercises in 7.8: 5-7

12

Applications

chap 8

Research Project: find report on an application not listed in text and describe it similarly to the text descriptions including problem, solution overview, methods and procedures, and deployment

13

Advanced Topics: Summarization, Active Learning

chap 9