What Is SAS Text Miner? Overview
Dig for diamonds in the textual data sea!
• What is Data Mining?
• What is Data Mining SEMMA process? Pattern discovery applications?
• What is Text Mining? Descriptive mining & predictive mining?
• What is SAS Text Miner? Text Parsing, Text Filter?
• What is Sentiment Analysis? Text Mining application areas?
• SAS Text Mining Tutorial by Examples
• What Is SAS Text Miner?
SAS Text Miner is a plug-in for the SAS Enterprise Miner environment. SAS Enterprise
Miner provides a rich set of data mining tools that facilitate the prediction aspect of text
mining. The integration of SAS Text Miner within SAS Enterprise Miner combines textual
data with traditional data mining variables.
Text mining nodes can embed into SAS Enterprise Miner process flow diagram
SAS Text Miner
supports various sources of
: local text files
, text as observations in SAS data sets or external databases,
and files on the Web.
The Text Miner node encompasses the parsing and exploration aspects of text mining and
prepares data for predictive mining and further exploration using other SAS Enterprise
Miner nodes. The Text Miner node enables you to analyze structured text information, and
combine the structured output of a Text Miner node with other structured data as desired.
The Text Miner node is highly customizable and enables you to choose among a variety
of parsing options. It is possible to parse documents for detailed information about the
terms, phrases, and other entities in the collection. You can also cluster documents into
meaningful groups and report concepts that you discover in the clusters. You can use the
Text Miner node in an environment that enables you to interact with the collection. Sorting,
searching, filtering (subsetting), and finding similar terms or documents all enhance the
Also available are the Text Parsing, Text Filter, and Text Topic nodes
. Each of these nodes
performs a specific task of the text mining process. The Text Parsing node performs the
same parsing operations as the Text Miner node and can be configured in much the same
The Text Filter node enables you to remove terms that are deemed to have low
information value or occur in too few documents to be relevant. The Text Topic node
creates a set of topics based on the most highly correlated terms in the document collection.
This is similar to the process of clustering the document collection that is done in the Text
The Text Miner and Text Parsing nodes' extensive parsing capabilities:
• automatic recognition of multi-word terms
• normalization of various entities such as dates, currencies, percentages,years
• part-of-speech tagging
• extraction of entities: organizations, products, SSN, time, titles, ...
• support for synonyms
• language-specific analysis for Arabic, Chinese, Dutch, etc.
SAS Text Miner also enables you to use a SAS macro that is called %TMFILTER. This
macro accomplishes a text preprocessing step and enables SAS data sets to be created from
documents that reside in your file system or on Web pages. These documents can exist in
a number of proprietary formats.
SAS Text Miner is a very flexible tool that can solve a variety of problems. Here are some
examples of tasks that can be accomplished using SAS Text Miner:
• filtering e-mail
• grouping documents by topic into predefined categories
• routing news items
• clustering analysis of research papers in a database
• clustering analysis of survey data
• clustering analysis of
customer complaints and comments
• predicting stock market prices from business news announcements
• predicting customer satisfaction from customer comments
• predicting costs, based on call center logs
Continue What is Sentiment Analysis?
Back to What is Text Mining?
Prepare for SAS interview?