EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

What Is SAS Text Miner? Overview
Dig for diamonds in the textual data sea!

  What is Data Mining?
  What is Data Mining SEMMA process? Pattern discovery applications?

  What is Text Mining? Descriptive mining & predictive mining?
  What is SAS Text Miner? Text Parsing, Text Filter?
  What is Sentiment Analysis? Text Mining application areas?
  SAS Text Mining Tutorial by Examples

• What Is SAS Text Miner?

SAS Text Miner is a plug-in for the SAS Enterprise Miner environment. SAS Enterprise Miner provides a rich set of data mining tools that facilitate the prediction aspect of text mining. The integration of SAS Text Miner within SAS Enterprise Miner combines textual data with traditional data mining variables.

Text mining nodes can embed into SAS Enterprise Miner process flow diagram.
SAS Text Miner supports various sources of textual data: local text files, text as observations in SAS data sets or external databases, and files on the Web.

The Text Miner node encompasses the parsing and exploration aspects of text mining and prepares data for predictive mining and further exploration using other SAS Enterprise Miner nodes. The Text Miner node enables you to analyze structured text information, and combine the structured output of a Text Miner node with other structured data as desired.


The Text Miner node is highly customizable and enables you to choose among a variety of parsing options. It is possible to parse documents for detailed information about the terms, phrases, and other entities in the collection. You can also cluster documents into meaningful groups and report concepts that you discover in the clusters. You can use the Text Miner node in an environment that enables you to interact with the collection. Sorting, searching, filtering (subsetting), and finding similar terms or documents all enhance the exploration process.

Also available are the Text Parsing, Text Filter, and Text Topic nodes. Each of these nodes performs a specific task of the text mining process. The Text Parsing node performs the same parsing operations as the Text Miner node and can be configured in much the same way.

The Text Filter node enables you to remove terms that are deemed to have low information value or occur in too few documents to be relevant. The Text Topic node creates a set of topics based on the most highly correlated terms in the document collection. This is similar to the process of clustering the document collection that is done in the Text Miner node.

The Text Miner and Text Parsing nodes' extensive parsing capabilities:
• stemming
• automatic recognition of multi-word terms
• normalization of various entities such as dates, currencies, percentages,years
• part-of-speech tagging
• extraction of entities: organizations, products, SSN, time, titles, ...
• support for synonyms
• language-specific analysis for Arabic, Chinese, Dutch, etc.


SAS Text Miner also enables you to use a SAS macro that is called %TMFILTER. This macro accomplishes a text preprocessing step and enables SAS data sets to be created from documents that reside in your file system or on Web pages. These documents can exist in a number of proprietary formats.

SAS Text Miner is a very flexible tool that can solve a variety of problems. Here are some examples of tasks that can be accomplished using SAS Text Miner:

• filtering e-mail
• grouping documents by topic into predefined categories
• routing news items
• clustering analysis of research papers in a database
• clustering analysis of survey data
clustering analysis of customer complaints and comments

predicting stock market prices from business news announcements
• predicting customer satisfaction from customer comments
• predicting costs, based on call center logs


Related links:

Continue What is Sentiment Analysis?   SAS tutorial  Statistics tutorial
Back to What is Text Mining?   Prepare for SAS interview?