Working on a problem, you are always looking to get the most out of the data that you have available. You want the best accuracy you can get.
Typically, the biggest wins are in better understanding the problem you are solving. This is why I stress you spend so much time up front defining your problem, analyzing the data, and preparing datasets for your models.
A key part of data preparation is creating transforms of the dataset such as rescaled attribute values and attributes decomposed into their constituent parts, all with the intention of exposing more and useful structure to the modeling algorithms.
An important suite of methods to employ when preparing the dataset are automatic feature selection algorithms. In this post you will discover feature selection, the benefits of simple feature selection and how to make best use of these algorithms in Weka on your dataset.
Kick-start your project with my new book Machine Learning Mastery With Weka, including step-by-step tutorials and clear screenshots for all examples.
Let’s get started.
Not All Attributes Are Equal
Whether you select and gather sample data yourself or whether it is provided to you by domain experts, the selection of attributes is critically important. It is important because it can mean the difference between successfully and meaningfully modeling the problem and not.
Misleading
Including redundant attributes can be misleading to modeling algorithms. Instance-based methods such as k-nearest neighbor use small neighborhoods in the attribute space to determine classification and regression predictions. These predictions can be greatly skewed by redundant attributes.
Overfitting
Keeping irrelevant attributes in your dataset can result in overfitting. Decision tree algorithms like C4.5 seek to make optimal spits in attribute values. Those attributes that are more correlated with the prediction are split on first. Deeper in the tree less relevant and irrelevant attributes are used to make prediction decisions that may only be beneficial by chance in the training dataset. This overfitting of the training data can negatively affect the modeling power of the method and cripple the predictive accuracy.
It is important to remove redundant and irrelevant attributes from your dataset before evaluating algorithms. This task should be tackled in the Prepare Data step of the applied machine learning process.
Need more help with Weka for Machine Learning?
Take my free 14-day email course and discover how to use the platform step-by-step.
Click to sign-up and also get a free PDF Ebook version of the course.
Feature Selection
Feature Selection or attribute selection is a process by which you automatically search for the best subset of attributes in your dataset. The notion of “best” is relative to the problem you are trying to solve, but typically means highest accuracy.
A useful way to think about the problem of selecting attributes is a state-space search. The search space is discrete and consists of all possible combinations of attributes you could choose from the dataset. The objective is to navigate through the search space and locate the best or a good enough combination that improves performance over selecting all attributes.
Three key benefits of performing feature selection on your data are:
- Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise.
- Improves Accuracy: Less misleading data means modeling accuracy improves.
- Reduces Training Time: Less data means that algorithms train faster.
Attribute Selection in Weka
Weka provides an attribute selection tool. The process is separated into two parts:
- Attribute Evaluator: Method by which attribute subsets are assessed.
- Search Method: Method by which the space of possible subsets is searched.
Attribute Evaluator
The Attribute Evaluator is the method by which a subset of attributes are assessed. For example, they may be assessed by building a model and evaluating the accuracy of the model.
Some examples of attribute evaluation methods are:
- CfsSubsetEval: Values subsets that correlate highly with the class value and low correlation with each other.
- ClassifierSubsetEval: Assesses subsets using a predictive algorithm and another dataset that you specify.
- WrapperSubsetEval: Assess subsets using a classifier that you specify and n-fold cross validation.
Search Method
The Search Method is the is the structured way in which the search space of possible attribute subsets is navigated based on the subset evaluation. Baseline methods include Random Search and Exhaustive Search, although graph search algorithms are popular such as Best First Search.
Some examples of attribute evaluation methods are:
- Exhaustive: Tests all combinations of attributes.
- BestFirst: Uses a best-first search strategy to navigate attribute subsets.
- GreedyStepWise: Uses a forward (additive) or backward (subtractive) step-wise strategy to navigate attribute subsets.
How to Use Attribute Selection in Weka
In this section I want to share with you three clever ways of using attribute selection in Weka.
1. Explore Attribute Selection
When you are just stating out with attribute selection I recommend playing with a few of the methods in the Weka Explorer.
Load your dataset and click the “Select attributes” tab. Try out different Attribute Evaluators and Search Methods on your dataset and review the results in the output window.
The idea is to get a feeling and build up an intuition for 1) how many and 2) which attributes are selected for your problem. You could use this information going forward into either or both of the next steps.
2. Prepare Data with Attribute Selection
The next step would be to use attribute selection as part of your data preparation step.
There is a filter you can use when preprocessing your dataset that will run an attribute selection scheme then trim your dataset to only the selected attributes. The filter is called “AttributeSelection” under the Unsupervised Attribute filters.
You can then save the dataset for use in experiments when spot checking algorithms.
3. Run Algorithms with Attribute Selection
Finally, there is one more clever way you can incorporate attribute selection and that is to incorporate it with the algorithm directly.
There is a meta algorithm you can run and include in experiments that selects attributes running the algorithm. The algorithm is called “AttributeSelectedClassifier” under the “meta” group of algorithms. You can configure this algorithm to use your algorithm of choice as well as the Attribute Evaluator and Search Method of your choosing.
You can include multiple versions of this meta algorithm configured with different variations and configurations of the attribute selection scheme and see how they compare to each other.
Summary
In this post you discovered feature selection as a suite of methods that can increase model accuracy, decrease model training time and reduce overfitting.
You also discovered that feature selection methods are built into Weka and you learned three clever ways for using feature selection methods on your dataset in Weka, namely by exploring, preparing data and in coupling it with your algorithm in a meta classifier.
Wikipedia has a good entry on Feature Selection.
If you are looking for the next step, I recommend the book Feature Extraction: Foundations and Applications. It is a collection of articles by academics covering a range of issues on and realted to feature selection. It’s pricy but well worth it because of the difference the methods it can make on solving your problem.
No comments:
Post a Comment