- Decision Trees
- Used to decide what category an element, represented by a vector of attribute values, belongs to.
- The attribute values must be discrete
- The categories are Boolean (which can be used to model non-Boolean)
- Example - "Will the movie be a success?"
CountryOfOrigin / | \ USA/ Europe| \Other _________/ | \ BigStar Genre Failure / \ / | \ yes/ \no comedy/ romance \scifi / \ / | \ Success Failure Success Failure Failure

- Decision tree learning
- Supervised learning of trees, based on training data
- The goal is two build a tree that is as shallow as possible, to make quickest classifications (e.g., the example above has depth two although three attributes are available).
- Best known is ID3
- Builds tree top down, selecting attribute that provides the most information gain.

- Information gain
- Information gain is the difference between the entropy before and after a decision.
- Entropy =
`-pP * log2(pP) -pN * log2(pN)`

`pP`is the proportion of positive (training) examples

`pN`is the proportion of negative (training) examples

- Entropy is minimal (0) when all examples are positive or negative, maximal (1) when half are positive and half are negative.
- The entropy of a set of sets is the weighted sum of the entropies of the sets.
- Example:
Film CountryOfOrigin BigStar Genre Success/Failure -------------------------------------------------- 1 USA yes scifi Success 2 USA no comedy Failure 3 USA yes comedy Success 4 Europe no comedy Success 5 Europe yes scifi Failure 6 Europe yes romance Failure 7 Australia yes comedy Failure 8 Brazil no scifi Failure 9 Europe yes comedy Success 10 USA yes comedy Success

The entropy is 1.0, because there are 5 successes and 5 failures.- Information gain for CountryOfOrigin:

Entropy(USA) = -0.75 * log2(0.75) - 0.25 * log2(0.25) = 0.811

Entropy(Europe) = 1.0 (2 of each)

Entropy(Other) = 0.0 (2 Failures)

Entropy(CountryOfOrigin) = 0.4 * 0.811 + 0.4 * 1.0 + 0.2 * 0.0 = 0.7244

InformationGain(CountryOfOrigin) = 1.0 - 0.7244 = 0.2756 - Information gain for BigStar: 0.038
- Information gain for Genre: 0.162
- Best decision is CountryOfOrigin

- Information gain for CountryOfOrigin:

- To build the tree, repeatedly do leaves until no entropy is left
- For USA films, information gain for BigStar:

Entropy(yes) = 0.0 (all Success)

Entropy(no) = 0.0 (all Failure)

Entropy(USA:BigStar) = 0.0

InformationGain(USA:BigStar) = 0.811 - 0.0 = 0.811 - For USA films, information gain for Genre:

Entropy(scifi) = 0.0 (all Success)

Entropy(comedy) = -0.66 * log2(0.66) -0.33 * log2(0.33) = 0.92

Entropy(romance) = NA

Entropy(USA:Genre) = 0.25 * 0.0 + 0.75 * 0.92 + 0.00 * NA = 0.69

InformationGain(USA:Genre) = 0.811 - 0.69 = 0.121 - For USA films, best decision is BigStar, and no entropy is left
- For Europe films, best decision is Genre, and no entropy is left

- For USA films, information gain for BigStar:
- Once the decision tree is built new data can be classified (quickly).