Data Mining SEMMA process?
The Essence of Data Mining?
"the discovery of interesting, unexpected, or valuable structures in large data!"
-- Prof. David Hand(Imperial College, London)
• What is Data Mining?
• What is Data Mining SEMMA process? Pattern discovery applications?
• What is Text Mining? Descriptive mining & predictive mining?
• What is SAS Text Miner? Text Parsing, Text Filter?
• What is Sentiment Analysis? Text Mining application areas?
• SAS Text Mining Tutorial by Examples
Sample, Explore, Modify, Model,Assess
– You sample the data by creating one or more data tables. The samples should be large enough to
contain the significant information, yet small enough to process.
– You explore the data by searching for anticipated relationships, unanticipated trends,and anomalies in order to gain understanding and ideas.
– You modify the data by creating, selecting, and transforming the variables to focus the model
– You model the data by using the analytical tools to search for a combination of the data that
reliably predicts a desired outcome.
– You assess competing predictive models (build charts to evaluate the usefulness and reliability
of the findings from the data mining process).
to Data Mining and Knowledge discovery
, by Herb Edelstein,
an internationally recognized expert in
and client-server computing, consulting to both computer vendors and users.
you can get from their offcial website:twocrows.com.
Successful Pattern discovery applications:
• Data reduction
is the most ubiquitous application: exploiting patterns in data to create a more compact
representation of the original. Though vastly broader in scope, data reduction includes analytic methods
such as cluster analysis
• Novelty detection
methods seek unique or previously unobserved data patterns. The methods find
application in business, science, and engineering. Business applications include fraud detection,
warranty claims analysis, and general business process monitoring.
is a by-product of reduction methods such as cluster analysis. The idea is to create rules that
isolate clusters or segments, often based on demographic or behavioral measurements. A marketing
analyst might develop profiles of a customer database to describe the consumers of a company’s
• Market basket analysis
, or association rule discovery
, is used to analyze streams of transactions data
(for example, market baskets) for combinations of items
that occur (or do not occur) more (or less)
commonly than expected. Retailers can use this as a way to identify interesting combinations of
purchases or as predictors of customer segments.
• Sequence analysis
is an extension of market basket analysis to include a time dimension to the
analysis, i.e. Path analysis
. In this way, transactions data is examined for sequences of items that occur (or do not occur)
more (or less) commonly than expected. A Webmaster might use sequence analysis to identify patterns
or problems of navigation through a Web site.
Continue to What is Text Mining?
Back to What is Text Mining?
Prepare for SAS interview?