EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Data Mining SEMMA process?
The Essence of Data Mining?
"the discovery of interesting, unexpected, or valuable structures in large data!"
                                              -- Prof. David Hand(Imperial College, London)

  What is Data Mining?
  What is Data Mining SEMMA process? Pattern discovery applications?

  What is Text Mining? Descriptive mining & predictive mining?
  What is SAS Text Miner? Text Parsing, Text Filter?
  What is Sentiment Analysis? Text Mining application areas?
  SAS Text Mining Tutorial by Examples

SEMMA stands for Sample, Explore, Modify, Model,Assess:

• Sample – You sample the data by creating one or more data tables. The samples should be large enough to contain the significant information, yet small enough to process.

• Explore – You explore the data by searching for anticipated relationships, unanticipated trends,and anomalies in order to gain understanding and ideas.

• Modify – You modify the data by creating, selecting, and transforming the variables to focus the model selection process.

• Model – You model the data by using the analytical tools to search for a combination of the data that reliably predicts a desired outcome.

• Assess – You assess competing predictive models (build charts to evaluate the usefulness and reliability of the findings from the data mining process).

Introduction to Data Mining and Knowledge discovery, by Herb Edelstein,
an internationally recognized expert in data mining, data warehousing and client-server computing, consulting to both computer vendors and users. you can get from their offcial website:twocrows.com.


Successful Pattern discovery applications:

• Data reduction is the most ubiquitous application: exploiting patterns in data to create a more compact representation of the original. Though vastly broader in scope, data reduction includes analytic methods such as cluster analysis.

• Novelty detection methods seek unique or previously unobserved data patterns. The methods find application in business, science, and engineering. Business applications include fraud detection, warranty claims analysis, and general business process monitoring.

• Profiling is a by-product of reduction methods such as cluster analysis. The idea is to create rules that isolate clusters or segments, often based on demographic or behavioral measurements. A marketing analyst might develop profiles of a customer database to describe the consumers of a company’s products.

• Market basket analysis, or association rule discovery, is used to analyze streams of transactions data (for example, market baskets) for combinations of items that occur (or do not occur) more (or less) commonly than expected. Retailers can use this as a way to identify interesting combinations of purchases or as predictors of customer segments.

• Sequence analysis is an extension of market basket analysis to include a time dimension to the analysis, i.e. Path analysis. In this way, transactions data is examined for sequences of items that occur (or do not occur) more (or less) commonly than expected. A Webmaster might use sequence analysis to identify patterns or problems of navigation through a Web site.

Related links:

Continue to What is Text Mining?   SAS tutorial  Statistics tutorial
Back to What is Text Mining?   Prepare for SAS interview?