Example:
Film CountryOfOrigin BigStar Genre Success/Failure
--------------------------------------------------
1 USA yes scifi Success
2 USA no comedy Failure
3 USA yes comedy Success
4 Europe no comedy Success
5 Europe yes scifi Failure
6 Europe yes romance Failure
7 Australia yes comedy Failure
8 Brazil no scifi Failure
9 Europe yes comedy Success
10 USA yes comedy Success
The entropy is 1.0, because there are 5 successes and 5 failures.
- Information gain for CountryOfOrigin:
Entropy(USA) = -0.75 * log2(0.75) - 0.25 * log2(0.25) = 0.811
Entropy(Europe) = 1.0 (2 of each)
Entropy(Other) = 0.0 (2 Failures)
Entropy(CountryOfOrigin) = 0.4 * 0.811 + 0.4 * 1.0 + 0.2 * 0.0 = 0.7244
InformationGain(CountryOfOrigin) = 1.0 - 0.7244 = 0.2756
- Information gain for BigStar: 0.038
- Information gain for Genre: 0.162
- Best decision is CountryOfOrigin