# Decision Trees

• Decision Trees
• Used to decide what category an element, represented by a vector of attribute values, belongs to.
• The attribute values must be discrete
• The categories are Boolean (which can be used to model non-Boolean)
• Example - "Will the movie be a success?"
```                         CountryOfOrigin
/        |       \
USA/   Europe|        \Other
_________/          |         \
BigStar                  Genre     Failure
/  \                  /   |   \
yes/    \no        comedy/ romance \scifi
/      \              /     |     \
Success  Failure    Success   Failure  Failure
```

• Decision tree learning
• Supervised learning of trees, based on training data
• The goal is two build a tree that is as shallow as possible, to make quickest classifications (e.g., the example above has depth two although three attributes are available).
• Best known is ID3
• Builds tree top down, selecting attribute that provides the most information gain.
• Information gain
• Information gain is the difference between the entropy before and after a decision.
• Entropy = -pP * log2(pP) -pN * log2(pN)
pP is the proportion of positive (training) examples
pN is the proportion of negative (training) examples
• Entropy is minimal (0) when all examples are positive or negative, maximal (1) when half are positive and half are negative.
• The entropy of a set of sets is the weighted sum of the entropies of the sets.
• Example:
```Film CountryOfOrigin BigStar Genre Success/Failure
--------------------------------------------------
1    USA             yes     scifi    Success
2    USA             no      comedy   Failure
3    USA             yes     comedy   Success
4    Europe          no      comedy   Success
5    Europe          yes     scifi    Failure
6    Europe          yes     romance  Failure
7    Australia       yes     comedy   Failure
8    Brazil          no      scifi    Failure
9    Europe          yes     comedy   Success
10   USA             yes     comedy   Success
```
The entropy is 1.0, because there are 5 successes and 5 failures.
• Information gain for CountryOfOrigin:
Entropy(USA) = -0.75 * log2(0.75) - 0.25 * log2(0.25) = 0.811
Entropy(Europe) = 1.0 (2 of each)
Entropy(Other) = 0.0 (2 Failures)
Entropy(CountryOfOrigin) = 0.4 * 0.811 + 0.4 * 1.0 + 0.2 * 0.0 = 0.7244
InformationGain(CountryOfOrigin) = 1.0 - 0.7244 = 0.2756
• Information gain for BigStar: 0.038
• Information gain for Genre: 0.162
• Best decision is CountryOfOrigin
• To build the tree, repeatedly do leaves until no entropy is left
• For USA films, information gain for BigStar:
Entropy(yes) = 0.0 (all Success)
Entropy(no) = 0.0 (all Failure)
Entropy(USA:BigStar) = 0.0
InformationGain(USA:BigStar) = 0.811 - 0.0 = 0.811
• For USA films, information gain for Genre:
Entropy(scifi) = 0.0 (all Success)
Entropy(comedy) = -0.66 * log2(0.66) -0.33 * log2(0.33) = 0.92
Entropy(romance) = NA
Entropy(USA:Genre) = 0.25 * 0.0 + 0.75 * 0.92 + 0.00 * NA = 0.69
InformationGain(USA:Genre) = 0.811 - 0.69 = 0.121
• For USA films, best decision is BigStar, and no entropy is left
• For Europe films, best decision is Genre, and no entropy is left
• Once the decision tree is built new data can be classified (quickly).