our views and our knowledge in analytics and other releveant topics

our blogs

An introduction to machine learning with decision trees


Decision trees are one of the most popular machine learning methods used to solve classification and regression problems. In this post we’ll look specifically at solving classification problems, focusing on how to train a decision tree classifier followed by a deeper look into how the algorithm learns and generates decision rules.

Technically speaking, decision trees are used to create a training model that can predict the class (y) of an object using it’s observed features (x) by learning decision rules (f(x)) inferred from prior examples (training data). You can learn more about this in my previous blog post on supervised learning.

More simply, the decision tree algorithm tries to solve a problem by using tree representation. For example, when provided with a dataset about user experience, a classification algorithm will generate a set of rules/ questions it can use to predict whether the user will convert.

Why choose decision trees?

Decision trees are really popular for several reasons including…

  • They mimic human-level thinking when making decisions

  • They are transparent, allowing users to see the logic applied when making the decision (unlike black box algorithms such as neural networks etc.)

  • They are easy to understand – even complex models can be simplified by its visualisations

An example

There are many reasons why you might want to set up a classification procedure, for example, sorting objects into classes, making decisions based on information available or forecasting the class based on historical information. This simple decision tree classifies the suitability to play tennis on any given day based on weather conditions…

Image credit: Princeton University

Image credit: Princeton University

Training the classifier:

There are a few steps to go through when training a decision tree classifier. The first step is to split the data into training and test subsets. The test dataset is hidden from the classifier and later used to evaluate the classifiers.

Next, train the classifier using the training data, then use the trained classifier to predict the classes for test datasets.

Since the actual classes for the test data are available, they are used to evaluate the classifier’s accuracy using accuracy score (% of correct predictions) and confusion matrix (count of correct and incorrect predictions). The final step is to visualise the decision tree.

How the algorithm learns:

Decision trees mimic human decision when it comes to solving classification problems. They use the attributes available to generate a series of questions/ criteria that can be used to split the training data into subsets that cannot be split further.

These 4 points outline how the algorithm learns…

  1. It starts by identifying the attribute that is the best predictor of the class and places this at the top of the tree also known as the root node.

    a) Statistical measures such as information gain are used to determine the order in which attributes are used.  Information gain provides a measure of the expected reduction in uncertainty that results from splitting the dataset on a given attribute.

  2. It then generates a criterion for separating the data based on the selected attribute and the relative frequency with which each class occurs in the dataset, expressed formally as the prior probability distribution.

  3. It then forms branches that split the datasets into subsets known as internal nodes.

    a) The classifier uses Gini Index to provide a score of how good a split is based on the resulting subsets. A perfect split results in a score of 0 whilst a poor split results in a 50:50 score.

  4. It then repeats step 1 and 2 until it reaches the data subset that cannot be divided further known as a leaf node.

How to use decision trees in your business

In a world where organisations hold so much data on their customers, decision trees offer a way of extracting useful information from datasets and are frequently used to effectively create models that can segment customers based on customer behaviour and desired action.

As far back as 2007, a study (Lee et al) applied the decision tree model to investigate the relationship between customers’ needs and the success of online shopping. The study classifies users into 2 categories: people who rarely shop online and people who frequently shop online. For people who rarely shop online, the model suggested that one of the most important factors was how urgently a customer needed to purchase a product. For those who shop online frequently, the main factor is price. In a practical sense, this application of decision trees may help an online retailer predict the likelihood of someone completing an online purchase.