Friday 26 April 2024

How to Use Machine Learning Algorithms in Weka

 A big benefit of using the Weka platform is the large number of supported machine learning algorithms.

The more algorithms that you can try on your problem the more you will learn about your problem and likely closer you will get to discovering the one or few algorithms that perform best.

In this post you will discover the machine learning algorithms supported by Weka.

After reading this post you will know:

  • The different types of machine learning algorithms supported and key algorithms to try in Weka.
  • How algorithms can be configured in Weka and how to save and load good algorithm configurations.
  • How to learn more about the machine learning algorithms supported by Weka.

    Weka Machine Learning Algorithms

    Weka has a lot of machine learning algorithms. This is great, it is one of the large benefits of using Weka as a platform for machine learning.

    A down side is that it can be a little overwhelming to know which algorithms to use, and when. Also, the algorithms have names that may not be familiar to you, even if you know them in other contexts.

    In this section we will start off by looking at some well known algorithms supported by Weka. What we will learn in this post applies to the machine learning algorithms used across the Weka platform, but the Explorer is the best place to learn more about the algorithms as they are all available in one easy place.

    1. Open the Weka GUI Chooser.
    2. Click the “Explorer” button to open the Weka explorer.
    3. Open a dataset, such as the Pima Indians dataset from the data/diabetes.arff file in your Weka installation.
    4. Click “Classify” to open the Classify tab.

    The classify tab of the Explorer is where you can learn about the various different algorithms and explore predictive modeling.

    You can choose a machine learning algorithm by clicking the “Choose” button.

    Weka Choose a Machine Learning Algorithms

    Weka Choose a Machine Learning Algorithms

    Clicking on the “Choose” button presents you with a list of machine learning algorithms to choose from. They are divided into a number of main groups:

    • bayes: Algorithms that use Bayes Theorem in some core way, like Naive Bayes.
    • function: Algorithms that estimate a function, like Linear Regression.
    • lazy: Algorithms that use lazy learning, like k-Nearest Neighbors.
    • meta: Algorithms that use or combine multiple algorithms, like Ensembles.
    • misc: Implementations that do not neatly fit into the other groups, like running a saved model.
    • rules: Algorithms that use rules, like One Rule.
    • trees: Algorithms that use decision trees, like Random Forest.

    The tab is called “Classify” and the algorithms are listed under an overarching group called “Classifiers”. Nevertheless, Weka supports both classification (predict a category) and regression (predict a numeric value) predictive modeling problems.

    Need more help with Weka for Machine Learning?

    Take my free 14-day email course and discover how to use the platform step-by-step.

    Click to sign-up and also get a free PDF Ebook version of the course.

    The type of problem you are working with is defined by the variable you wish to predict. On the “Classify” tab this is selected below the test options. By default, Weka selects the last attribute in your dataset. If the attribute is nominal, then Weka assumes you are working on a classification problem. If the attribute is numeric, Weka assumes you are working on a regression problem.

    Weka Choose an Output Attribute to Predict

    Weka Choose an Output Attribute to Predict

    This is important, because the type of problem that you are working on determines what algorithms that you can work with. For example, if you are working on a classification problem, you cannot use regression algorithms like Linear Regression. On the other hand, if you are working on a regression problem, you cannot use classification algorithms like Logistic Regression.

    Note if you are confused by the word “regression”, that is OK. It is confusing. Regression is a historical word from statistics. It used to mean making a model for a numerical output (to regress). It now means both the name of some algorithms and to predict a numerical value.

    Weka will gray-out algorithms that are not supported by your chosen problem. Many machine learning algorithms can be used for both classification and regression. So you will have access to a large suite of algorithms regardless of your chosen problem.

    Weka Algorithms Unavailable For Some Problem Types

    Weka Algorithms Unavailable For Some Problem Types

    Which Algorithm To Use

    Generally, when working on a machine learning problem you cannot know which algorithm will be the best for your problem beforehand.

    If you had enough information to know which algorithm would achieve the best performance, you probably would not be doing applied machine learning. You would be doing something else like statistics.

    The solution therefore is to try a suite of algorithms on your problem and see what works best. Try a handful of powerful algorithms, then double down on the 1-to-3 algorithms that perform the best. They will given you an idea of the general type of algorithms that perform well or learning strategies that may be better than average at picking out the hidden structure in your data.

    Some of the machine learning algorithms in Weka have non-standard names. You may already know the names of some machine learning algorithms, but feel confused by the names of the algorithms in Weka.

    Below is a list of 10 top machine learning algorithms you should consider trying on your problem, including both their standard name and the name used in Weka.

    Linear Machine Learning Algorithms

    Linear algorithms assume that the predicted attribute is a linear combination of the input attributes.

    • Linear Regression: function.LinearRegression
    • Logistic Regression: function.Logistic

    Nonlinear Machine Learning Algorithms

    Nonlinear algorithms do not make strong assumptions about the relationship between the input attributes and the output attribute being predicted.

    • Naive Bayes: bayes.NaiveBayes
    • Decision Tree (specifically the C4.5 variety): trees.J48
    • k-Nearest Neighbors (also called KNN: lazy.IBk
    • Support Vector Machines (also called SVM): functions.SMO
    • Neural Network: functions.MultilayerPerceptron

    Ensemble Machine Learning Algorithms

    Ensemble methods combine the predictions from multiple models in order to make more robust predictions.

    • Random Forest: trees.RandomForest
    • Bootstrap Aggregation (also called Bagging): meta.Bagging
    • Stacked Generalization (also called Stacking or Blending): meta.Stacking

    Weka has an extensive array of ensemble methods, perhaps one of the largest available across all of the popular machine learning frameworks.

    If you are looking for an area to specialize in using Weka, a source of true power in the platform besides ease of use, I would point to the support for ensemble techniques.

    Machine Learning Algorithm Configuration

    Once you have chosen a machine learning algorithm, you can configure it.

    Configuration is optional, but highly recommended. Weka cleverly chooses sensible defaults for each machine learning algorithm meaning that you can select an algorithm and start using it immediately without knowing much about it.

    To get the best results from an algorithm, you should configure it to behave ideally for your problem.

    How do you configure an algorithm for your problem?

    Again, this is another open question and not knowable beforehand. Given algorithms do have heuristics that can guide you but they are not a silver bullet. The true answer is to systematically test a suite of standard configurations for a given algorithm on your problem.

    You can configure a machine learning algorithm in Weka by clicking on it’s name after you have selected it. This will launch a window that displays all of the configuration details for the algorithm.

    Weka Configure a Machine Learning Algorithms

    Weka Configure a Machine Learning Algorithms

    You can learn more about the meaning of each configuration option by hovering your mouse over each option which will display a tooltip describing the configuration option.

    Some options give you a limited set of values to choose from, other take integer or real valued numbers. Try both experimentation and research in order to come up with 3-to-5 standard configurations of an algorithm to try on your problem.

    A pro-tip that you can use is to save your standard algorithm configurations to a file. Click the “Save” button at the bottom of the algorithm configuration. Enter a filename that clearly labels the algorithm name and the type of configuration you are saving. You can load an algorithm configuration later in the Weka Explorer, the Experimenter and elsewhere in Weka. This is most valuable when you settle on a suite of standard algorithm configurations that you want reuse on problem to problem.

    You can adopt and use the configuration for the algorithm by clicking the “OK” button on the algorithm configuration window.

    Get More Information on Algorithms

    Weka provides more information about each support machine learning algorithm.

    On the algorithm configuration window, you will notice two buttons to learn more about the algorithm.

    More Information

    Clicking the “More” button will show a window that summarizes the implementation of the algorithm and all of the algorithms configuration properties.

    Weka More Information About an Algorithm

    Weka More Information About an Algorithm

    This is useful to get a fuller idea of how the algorithm works and how to configure it. It also often includes references to books or papers from which the implementation of the algorithm was based. These can be good resources to track down and review in order to get a better idea for how to get the most from a given algorithm.

    Reading up on how to better configure an algorithm is not something to do as a beginner because it can feel a little overwhelming, but it is a pro tip that will help you learn more and faster later on when you have more experience with applied machine learning.

    Algorithm Capabilities

    Clicking on the “Capabilities” button will provide you with a snapshot of the capabilities of the algorithm.

    Weka Capabilities for an Algorithm

    Weka Capabilities for an Algorithm

    Most importantly, this is useful to get an idea of how the algorithm handles missing data and any other important expectations it may have on your problem.

    Reviewing this information can give you ideas on how to create new and different views on your data in order to lift performance for one or more algorithms.

    Summary

    In this post you discovered the support for machine learning algorithms in the Weka machine learning workbench.

    Specifically, you learned:

    • That Weka has a large selection of machine learning algorithms to choose from for classification and regression problems.
    • That you can easily configure each machine learning algorithm and save and load a set of standard configurations.
    • That you can dive deeper into the details of a given algorithm and even discover the source from which it was based in order to learn how to get the best performance.

    Do you have any questions about machine learning algorithms in Weka or about this post? Ask your questions in the comments and I will do my best to answer.

No comments:

Post a Comment

Connect broadband

Avoid Overfitting By Early Stopping With XGBoost In Python

 Overfitting is a problem with sophisticated non-linear learning algorithms like gradient boosting. In this post you will discover how you ...