ADVANCED DATA MINING UNIT 2
• Method:
Construct a decision tree using frequent patterns:
Rough Set Approach
Fuzzy Set Approaches
In computer science, a rough set, first described by Polish computer scientist Zdzisław
I. Pawlak, is a formal
approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which
give the lower and the upper approximation of the original set. In the standard version of
rough set theory (Pawlak 1991), the lower- and upper-approximation sets are
crisp sets, but in other variations, the approximating sets may be fuzzy sets. Information system
framework[edit]
Example: equivalence-class structure[edit]
Definition of a rough set[edit]
Fuzzy logic[edit]
UNIT II
1): Classification by back propagation:
ackpropagation, an abbreviation for "backward propagation of
errors", is a common method of training artificial neural networksused in conjunction with an optimization method such as gradient
descent. The method
calculates the gradient of a loss
function with
respects to all the weights in the network. The gradient is fed to the
optimization method which in turn uses it to update the weights, in an attempt
to minimize the loss function
The backpropagation
learning algorithm can be divided into two phases: propagation and weight update.
Phase 1: Propagation[edit]
Each propagation
involves the following steps:
1.
Forward propagation of
a training pattern's input through the neural network in order to generate the
propagation's output activations.
2.
Backward propagation
of the propagation's output activations through the neural network using the
training pattern target in order to generate the deltas of all output and
hidden neurons.
Phase 2: Weight update[edit]
For each
weight-synapse follow the following steps:
1.
Multiply its output
delta and input activation to get the gradient of the weight.
2.
Subtract a ratio
(percentage) of the gradient from the weight.
This ratio
(percentage) influences the speed and quality of learning; it is called
the learning rate. The greater the ratio, the faster the neuron
trains; the lower the ratio, the more accurate the training is. The sign of the
gradient of a weight indicates where the error is increasing, this is why the
weight must be updated in the opposite direction.
2): support vector
machines:
In machine learning, support
vector machines (SVMs,
also support vector networks[1]) are supervised learning models
with associated learning algorithms that
analyze data and recognize patterns, used for classification and regression analysis. .
Given a set of training examples, each marked
as belonging to one of two categories, an SVM training algorithm builds a model
that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points
in space, mapped so that the examples of the separate categories are divided by
a clear gap that is as wide as possible. New examples are then mapped into that
same space and predicted to belong to a category based on which side of the gap
they fall on.
More formally, a support vector machine
constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used
for classification, regression, or other tasks.
3) Classificatio using frequent pattern:
The classification model is built in the
feature space of single
features as well as frequent patterns, i.e. map
the data to a higher
dimensional space.
• Feature combination can capture more
underlying semantics than
single features.
• Example: word phrases can improve the
accuracy of document
classification.
• FP-based classification been applied to many
problems:
– Graph classification
– Time series classification
– Protein classification
– Text classification
4)
Other classification methods:
Tree-based
frequent patterns
•
[Fan et al. 2008] proposed building a decision tree in the space of
frequent
patterns as an alternative for the two phases approach
[Cheng
et al. 2007].
•
Arguments against the two phases approach:
1.
The number of frequent patterns can be too large for effective
feature
selection in the second phase.
2.
It is difficult to set the optimal min_sup threshold:
–
low min_sup generate too many candidates and is very slow.
–
High min_sup may miss some important patterns.
–
Apply frequent pattern on the whole data
–
Select the best frequent pattern P to divide the data into two sets:
one
containing P and the other not.
–
Repeat the procedure on each subset until a stopping condition is
satisfied.
•
Decreasing the support with smaller partitions makes the algorithm
able
to mine patterns with very low global support.
5)Genetic
Algorithms
The idea of Genetic Algorithm is derived from natural
evolution. In Genetic Algorithm first of all initial population is created.
This initial population consist of randomly generated rules. we can represent
each rule by a string of bits.
For example , suppose that in a given training set the
samples are described by two boolean attributes such as A1 and A2. And this
given training set contains two classes such as C1 and C2.
We can encode the
rule IF A1 AND NOT A2 THEN C2 into bit string 100. In this bit representation the two leftmost bit represent
the attribute A1 and A2, respectively.
Likewise the rule IF NOT A1 AND NOT A2 THEN C1 can be encoded as 001.
Note:If the attribute has K values where K>2, then we can use
the K bits to encode the attribute values . The classes are also encoded in the
same manner.
Points to remember:
·
Based on the notion of
survival of the fittest, a new population is formed to consist of the fittest
rules in the current population and offspring values of these rules as well.
·
The fitness of the
rule is assessed by its classification accuracy on a set of training samples.
·
The genetic operators
such as crossover and mutation are applied to create offsprings.
·
In crossover the
substring from pair of rules are swapped to from a new pair of rules.
·
In mutation, randomly
selected bits in a rule's string are inverted.
Rough Set Approach
To discover structural relationship within imprecise and
noisy data we can use the rough set.
Note:This approach can only be applied on discrete-valued
attributes.Therefore, continuous-valued attributes must be discretized before
its use.
The Rough Set Theory is base on establishment of
equivalence classes within the given training data. The tuples that forms the
equivalence class are indiscernible. It means the samples are identical wrt to
the attributes describing the data.
There are some
classes in given real world data, which can not be distinguished in terms of
available attributes. We can use the rough sets to roughly define such classes.
For a given class, C, the rough set definition is
approximated by two sets as follows:
·
Lower Approximation
of C - The lower approximation of C consist of all the data
tuples, that bases on knowledge of attribute. These attribute are certain to
belong to class C.
·
Upper Approximation
of C - The upper approximation of C consist of all the tuples
that based on knowledge of attributes, can not be described as not belonging to
C.
The following diagram shows the Upper and Lower
Approximation of class C:
Fuzzy Set Approaches
Fuzzy Set Theory is
also called Possibility Theory. This theory was proposed by Lotfi Zadeh in
1965. This approach is an alternative Two-value logic.This theory allows us to work at high level of
abstraction. This theory also provide us means for dealing with imprecise
measurement of data.
The fuzzy set theory also allow to deal with vague or
inexact facts. For example being a member of a set of high incomes is inexact
(eg. if $50,000 is high then what about $49,000 and $48,000). Unlike the
traditional CRISP set where the element either belong to S or its complement
but in fuzzy set theory the element can belong to more than one fuzzy set.
For example, the income value $49,000 belong to both the
medium and high fuzzy sets but to differing degrees. Fuzzy set notation for
this income value is as follows:
mmedium_income($49k)=0.15 and mhigh_income($49k)=0.96
where m is membership function that operates on fuzzy set
of medium_income and high_income respectively. This notation can be shown
diagrammatically as follows:
7)Roughest
approach:
In computer science, a rough set, first described by Polish computer scientist Zdzisław
I. Pawlak, is a formal
approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which
give the lower and the upper approximation of the original set. In the standard version of
rough set theory (Pawlak 1991), the lower- and upper-approximation sets are
crisp sets, but in other variations, the approximating sets may be fuzzy sets. Information system
framework[edit]
Let
be an information system (attribute-value
system), where
is a non-empty set of finite objects (the universe) and
is a non-empty, finite set of attributes such that
for every
.
is the set of values that attribute
may take. The information table assigns a value
from
to each attribute
and object
in the universe
.
With any
there is an associated equivalence relation
:
The relation
is called a
-indiscernibility relation. The partition of
is a family of all equivalence classes of
and is denoted by
(or
).
If
, then
and
are indiscernible (or indistinguishable) by attributes from
.
Example: equivalence-class structure[edit]
For example, consider the
following information table:
Sample Information System
|
|||||
Object
|
|
|
|
|
|
|
1
|
2
|
0
|
1
|
1
|
|
1
|
2
|
0
|
1
|
1
|
|
2
|
0
|
0
|
1
|
0
|
|
0
|
0
|
1
|
2
|
1
|
|
2
|
1
|
0
|
2
|
1
|
|
0
|
0
|
1
|
2
|
2
|
|
2
|
0
|
0
|
1
|
0
|
|
0
|
1
|
2
|
2
|
1
|
|
2
|
1
|
0
|
2
|
2
|
|
2
|
0
|
0
|
1
|
0
|
When the full set of attributes
is considered, we see that we have the following seven
equivalence classes:
Definition of a rough set[edit]
Let
be a target set that we wish to represent using attribute
subset
; that is, we are told that an
arbitrary set of objects
comprises a single class, and we wish to express this
class (i.e., this subset) using the equivalence classes induced by attribute
subset
. In general,
cannot be expressed exactly, because the set may include
and exclude objects which are indistinguishable on the basis of attributes
.
For example, consider the target
set
, and let attribute subset
, the full available set of
features. It will be noted that the set
cannot be expressed exactly, because in
, objects
are indiscernible. Thus, there is no way to represent any
set
which includes
but excludes objects
and
.
However, the target set
can be approximated using only the information contained within
by constructing the
-lower and
-upper approximations of
:
7)Fuzz
set:
Fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets
were introduced by Lotfi A. Zadeh[1] and Dieter Klaua[2] in 1965 as an extension of the classical
notion of set. At the same time, Salii (1965) defined a more general kind of
structures called L-relations,
which were studied by him in an abstract algebraic context. A fuzzy set is a pair
where
is a set and
For each
the value
is called the grade of membership of
in
For a finite set
the fuzzy set
is often denoted by
Let
Then
is called not included in the fuzzy set
if
,
is called fully
included if
, and
is called a fuzzy member if
.[6] The set
is called the support of
and the set
is called its kernel. The function
is called the membership
fun
Fuzzy logic[edit]
Main article: Fuzzy logic
As an extension of the case of multi-valued logic, valuations (
) of propositional
variables (
) into a set of membership degrees (
) can be thought of as membership
functionsmapping predicates into fuzzy sets
(or more formally, into an ordered set of fuzzy pairs, called a fuzzy relation).
With these valuations, many-valued logic can be extended to allow for fuzzy premisesfrom which graded
conclusions may be drawn.[8]
This extension is sometimes
called "fuzzy logic in the narrow sense" as opposed to "fuzzy
logic in the wider sense," which originated in the engineering fields of automated control and knowledge engineering, and which encompasses many topics involving fuzzy sets
and "approximated reasoning."[9]
Industrial applications of fuzzy
sets in the context of "fuzzy logic in the wider sense" can be found
at fuzzy logic.
ction of the fuzzy set
Sometimes, more general variants
of the notion of fuzzy set are used, with membership functions taking values in
a (fixed or variable) algebra or structure
of a given kind; usually it is required that
be at least a poset or lattice. These are
usually called L-fuzzy sets, to distinguish them from those valued over the unit
interval. The usual membership functions with values in [0, 1] are then
called [0, 1]-valued membership functions.
No comments:
Post a Comment