EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

What is Data Mining, Text Mining? and How?

• "The sexiest job in the next 10 years will be statisticians!"

  What is Data Mining?
  What is Data Mining SEMMA process? Pattern discovery applications?

  What is Text Mining? Descriptive mining & predictive mining?
  What is SAS Text Miner? Text Parsing, Text Filter?
  What is Sentiment Analysis? Text Mining application areas?
  SAS Text Mining Tutorial by Examples

  The meaning of mining
Data mining is the process of uncovering hidden patterns in large amounts of data. Its goal is a good predictive model, which provides you with knowledge that you can act upon to target customers, control risk, identify fraud, and so on.
Predictive modeling is a fundamental data mining task. It's an approach that reads training data composed of multiple input variables and a target variable. It then builds a model that attempts to predict the target on the basis of the inputs.

After this model is developed, it can be applied to new data that is similar to the training data, but which doesn't contain the target. If the model is successful, it will predict the values of the missing target from the values of the new inputs.

In case if you want to understand more deep about data mining, here are some highly recommended books.
 

  Top 5 Deadly Sins of Data Mining

1. Not asking the right questions.
2. Not fully understanding the problem.
3. Underestimating data preparation.
4. Ignoring what's not there.
5. Falling in love with your models.

1. Not asking the right questions. 2. Not fully understanding the problem. 3. Underestimating data preparation. 4. Ignoring what's not there. 5. Falling in love with your models. 6. Going it alone. 7. Using bad data.

  Top 5 Virtues of Data Mining

1. Define the problem.
2. Prepare the data, use domain knowledge.
3. Be open to new methods and models. Keep the toolbox open.
4. Be aware of data quality: outliers, missing data, create dummy variables.
5. Use models, not just associations.

  Modern Trends in Data Mining by Professor Hastie (Stanford)   PDF Slides

Mining of Massive Datasets by Prof. Rajaraman and Jeff Ullman(Stanford).
  Introduction: Data Mining for Prediction
  Example: Credit Risk Assessment, Netflix Challenge competition
  More Examples: Email or Spam? Microarray Cancer Data
  Ideal Predictions, Implementation with Training Data
  Nearest Neighbor Averaging, Kernel smoothing
  Structured Models: Linear Models, Overfitting and Model Assessment
  K-Fold Cross-Validation, Cross-Validation Error Curve
  Modern Structured Models in Data Mining: Generalized Additive Models
  Neural Networks, Support Vector Machines, Properties of SVMs
  Classification and Regression Trees,Ensemble Methods and Boosting
  Gradient Boosting -Adaboost Stumps for Classification, Boosting on SPAM


Related links:

Continue to Data Mining -SEMMA process overview   SAS tutorial  Statistics tutorial
Back to What is Text Mining?   Prepare for SAS interview?