Having one or two algorithms that perform reasonably well on a problem is a good start, but sometimes you may be incentivised to get the best result you can given the time and resources you have available.
In this post, you will review methods you can use to squeeze out extra performance and improve the results you are getting from machine learning algorithms.
When tuning algorithms you must have a high confidence in the results given by your test harness. This means that you should be using techniques that reduce the variance of the performance measure you are using to assess algorithm runs. I suggest cross validation with a reasonably high number of folds (the exact number of which depends on your dataset).The three strategies you will learn about in this post are:
- Algorithm Tuning
- Ensembles
- Extreme Feature Engineering
Algorithm Tuning
The place to start is to get better results from algorithms that you already know perform well on your problem. You can do this by exploring and fine tuning the configuration for those algorithms.
Machine learning algorithms are parameterized and modification of those parameters can influence the outcome of the learning process. Think of each algorithm parameter as a dimension on a graph with the values of a given parameter as a point along the axis. Three parameters would be a cube of possible configurations for the algorithm, and n-parameters would be an n-dimensional hypercube of possible configurations for the algorithm.
The objective of algorithm tuning is to find the best point or points in that hypercube for your problem. You will be optimizing against your test harness, so again you cannot underestimate the importance of spending the time to build a trusted test harness.
You can approach this search problem by using automated methods that impose a grid on the possibility space and sample where good algorithm configuration might be. You can then use those points in an optimization algorithm to zoom in on the best performance.
You can repeat this process with a number of well
Ensembles
Ensemble methods are concerned with combining the results of multiple methods in order to get improved results. Ensemble methods work well when you have multiple “good enough” models that specialize in different parts of the problem.
This may be achieved through many ways. Three ensemble strategies you can explore are:
- Bagging: Known more formally as Bootstrapped Aggregation is where the same algorithm has different perspectives on the problem by being trained on different subsets of the training data.
- Boosting: Different algorithms are trained on the same training data.
- Blending: Known more formally as Stacked Generalization or Stacking is where a variety of models whose predictions are taken as input to a new model that learns how to combine the predictions into an overall prediction.
It is a good idea to get into ensemble methods after you have exhausted more traditional methods. There are two good reasons for this, they are generally more complex than traditional methods and the traditional methods give you a good base level from which you can improve and draw from to create your ensembles.
Summary
I this post you learned about three strategies for getting improved results from machine learning algorithms on your problem:
- Algorithm Tuning where discovering the best models is treated like a search problem through model parameter space.
- Ensembles where the predictions made by multiple models are combined.
- Extreme Feature Engineering where the attribute decomposition and aggregation seen in data preparation is pushed to the limits.
Resources
If you are looking to dive deeper into this subject, take a look at the resources below.
- Machine Learning for Hackers, Chapter 12: Model Comparison
- Data Mining: Practical Machine Learning Tools and Techniques, Chapter 7: Transformations: Engineering the input and output
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Chapter 16: Ensemble Learning
performing methods and explore the best you can achieve with each. I strongly advise that the process is automated and reasonably coarse grained as you can quickly reach points of diminishing returns (fractional percentage performance increases) that may not translate to the production system.
The more tuned the parameters of an algorithm, the more biased the algorithm will be to the training data and test harness. This strategy can be effective, but it can also lead to more fragile models that overfit your test harness and don’t perform as well in practice.
No comments:
Post a Comment