ADVANCED DATA MINING
UNIT 1
2) Mining Frequent patterns:
3)Mining of Association
Mining of Correlations
Mining
of Clusters
4):Classification and Prediction
UNIT 1
1): Data mining tasks:
Data Mining deals with what kind of patterns can be mined.
On the basis of kind of data to be mined there are two kind of functions
involved in Data Mining, that are listed below:
·
Descriptive
·
Classification and Prediction
Descriptive
The descriptive function
deals with general properties of data in the database. Here is the list of
descriptive functions:
·
Class/Concept
Description
·
Mining of Frequent
Patterns
·
Mining of Associations
·
Mining of Correlations
·
Mining of Clusters
Class/Concept Description
Class/Concepts refers
the data to be associated with classes or concepts. For example, in a company
classes of items for sale include computer and printers, and concepts of
customers include big spenders and budget spenders.Such descriptions of a class
or a concept are called class/concept descriptions. These descriptions can be
derived by following two ways:
·
Data
Characterization - This refers to
summarizing data of class under study. This class under study is called as
Target Class.
·
Data
Discrimination - It refers to
mapping or classification of a class with some predefined group or class.
2) Mining Frequent patterns:
Frequent patterns are those patterns that occur frequently
in transactional data. Here is the list of kind of frequent patterns:
·
Frequent Item Set - It refers to set of items that frequently appear together
for example milk and bread.
·
Frequent
Subsequence- A sequence of patterns that
occur frequently such as purchasing a camera is followed by memory card.
·
Frequent Sub
Structure - Substructure refers to different structural forms, such
as graphs, trees, or lattices, which may be combined with itemsets or
subsequences.
3)Mining of Association
Associations are used in retail sales to identify patterns that
are frequently purchased together. This process refers to process of uncovering
the relationship among data and determining association rules.
For example A retailer generates association rule that show
that 70% of time milk is sold with bread and only 30% of times biscuits are
sold with bread.
Mining of Correlations
It is kind of additional analysis performed to uncover
interesting statistical correlations between associated-attribute- value pairs
or between two item Sets to analyze that if they have positive, negative or no
effect on each other.
Mining
of Clusters
Cluster refers to a group of similar kind of objects.
Cluster analysis refers to forming group of objects that are very similar to
each other but are highly different from the objects in other clusters.
4):Classification and Prediction
Classification is the process of finding a model that
describes the data classes or concepts. The purpose is to be able to use this
model to predict the class of objects whose class label is unknown. This derived
model is based on analysis of set of training data. The derived model can be
presented in the following forms:
·
Classification
(IF-THEN) Rules
·
Decision Trees
·
Mathematical Formulae
·
Neural Networks
Here is the list of functions involved in this:
·
Classification - It predicts the class of objects whose class label is
unknown.Its objective is to find a derived model that describes and
distinguishes data classes or concepts. The Derived Model is based on analysis
set of training data i.e the data object whose class label is well known.
·
Prediction - It is used to predict missing or unavailable numerical
data values rather than class labels.Regression Analysis is generally used for
prediction.Prediction can also be used for identification of distribution
trends based on available data.
·
Outlier Analysis - The Outliers may be defined as the data objects that do
not comply with general behaviour or model of the data available.
·
Evolution Analysis - Evolution Analysis refers to description and model
regularities or trends for objects whose behaviour changes over time.
5) :Cluster Analysis
Cluster is a group of objects that belong to the same class.
In other words the similar object are grouped in one cluster and dissimilar are
grouped in other cluster.
Clustering is the
process of making group of abstract objects into classes of similar objects.
Points to Remember
·
A cluster of data
objects can be treated as a one group.
·
While doing the cluster
analysis, we first partition the set of data into groups based on data
similarity and then assign the label to the groups.
·
The main advantage of
Clustering over classification is that, It is adaptable to changes and help
single out useful features that distinguished different groups.
Applications of Cluster Analysis
·
Clustering Analysis is
broadly used in many applications such as market research, pattern recognition,
data analysis, and image processing.
·
Clustering can also help
marketers discover distinct groups in their customer basis. And they can
characterize their customer groups based on purchasing patterns.
·
In field of biology it
can be used to derive plant and animal taxonomies, categorize genes with
similar functionality and gain insight into structures inherent in populations.
·
Clustering also helps in
identification of areas of similar land use in an earth observation database.
It also helps in the identification of groups of houses in a city according
house type, value, geographic location.
·
Clustering also helps in
classifying documents on the web for information discovery.
·
Clustering is also used
in outlier detection applications such as detection of credit card fraud.
Clustering Methods
The clustering methods
can be classified into following categories:
- Partitioning Method
- Hierarchical Method
- Density-based Method
- Grid-Based Method
- Model-Based Method
n 6)Outlier
Analysis:
n A data object that deviates significantly from
the normal objects as if it were generated by a different mechanism
n Ex.: Unusual credit card purchase, sports: Michael
Jordon, Wayne Gretzky, ...
n Outliers are different from the
noise data
n Noise is random error or variance
in a measured variable
n Noise should be removed before
outlier detection
n Outliers are interesting: It violates the mechanism that generates the
normal data
n Outlier detection vs. novelty
detection: early stage, outlier; but later merged into the model
n Applications:
n Credit card fraud detection
n Telecom fraud detection
n Customer segmentation
n Medical analysis
3 types:
->Global
outlier
->Contextual
->collective outlier
No comments:
Post a Comment