References:

Decision tree learning is a method that uses inductive inference to approximate a target function, which will produce discrete values. It is widely used, robust to noisy data, and considered a practical method for learning disjunctive expressions.

Appropriate Problems for Decision Tree Learning

Decision tree learning is generally best suited to problems with the following characteristics:

After a decision tree learns classification rules, it can also be re-represented as a set of if-then rules in order to improve readability.

Decision Tree Representation

A decision tree is an arrangement of tests that provides an appropriate classification at every step in an analysis.

"In general, decision trees represent a disjunction of conjunctions of constraints on the attribute-values of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself to a disjunction of these conjunctions" (Mitchell, 1997, p.53).

More specifically, decision trees classify instances by sorting them down the tree from the root node to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute.

An instance is classified by starting at the root node of the decision tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute. This process is then repeated at the node on this branch and so on until a leaf node is reached.

Diagram

dt rep

Occam's Razor (Specialized to Decision Trees)

"The world is inherently simple. Therefore the smallest decision tree that is consistent
with the samples is the one that is most likely to identify unknown objects correctly." (

dt example

Given m attributes, a decision tree may have a maximum height of m.

Rather than building all the possible trees, measuring the size of each, and choosing the smallest tree that best fits the data, we use Quinlan's ID3 algorithm for constructing a decision tree.