EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer

Intro: What is Text Mining? Text Miner? How?
Dig for diamonds in the textual data sea!

  What is Data Mining?
  What is Data Mining SEMMA process? Pattern discovery applications?

  What is Text Mining? Descriptive mining & predictive mining?
  What is SAS Text Miner? Text Parsing, Text Filter?
  What is Sentiment Analysis? Text Mining application areas?
  SAS Text Mining Tutorial by Examples

• What Is Text Mining?
Text mining uncovers the underlying themes or concepts that are contained in large document collections.
Text mining applications have two phases:
1). exploring the textual data for its content ==> descriptive mining

2). using discovered infor. to improve existing processes ==> predictive mining

Descriptive mining involves discovering the themes and concepts that exist in a textual collection.

For example, many companies collect customers' comments from sources that include the Web, e-mail, and contact centers. Mining the textual comments includes providing detailed information about the terms, phrases, and other entities in the textual collection; clustering the documents into meaningful groups; and reporting the concepts that are discovered in the clusters. Results from descriptive mining enable you to better understand the textual collection.

Concept Linking: Sample output from text mining

Predictive mining involves classifying the documents into categories and using the information that is implicit in the text for decision making.

For example, you might want to identify the customers who ask standard questions so that they receive an automated answer. Additionally, you might want to predict whether a customer is likely to buy again, or even if you should spend more effort to keep the customer.

Predictive modeling involves examining past data to predict results. Consider that you have a customer data set that contains information about past buying behaviors, along with customer comments.

You could build a predictive model that can be used to score new customers—that is, to analyze new customers based on the data from past customers. For example, if you are a researcher for a pharmaceutical company, you know that hand-coding adverse reactions from doctors' reports in a clinical study is a laborious, error-prone job.

Instead, you could create a model by using all your historical textual data, noting which doctors' reports correspond to which adverse reactions. When the model is constructed, processing the textual data can be done automatically by scoring new records that come in. You would just have to examine the "hard-to-classify" examples, and let the computer handle the rest.

Related links:

Continue to What is SAS Text Miner: overview   SAS tutorial  Statistics tutorial
Back to What is Text Mining?   Prepare for SAS interview?