Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Friday, 17 January 2025

How to Learn Python for Machine Learning

Python has become a de facto lingua franca for machine learning. It is not a difficult language to learn, but if you are not particularly familiar with the language, there are some tips that can help you learn faster or better.

In this post, you will discover what the right way to learn a programming language is and how to get help. After reading this post, you will know:

  • The right mentality to learn Python for use in machine learning
  • Good resources to learn Python
  • How to find answers for questions related to Python

How to Learn Python

There are many ways to learn a language, whether for natural languages like English or programming language like Python. Babies learn a language from listening and mimicking. Slowly, when they learn the pattern and some vocabulary, they can make up their own sentences. On the contrary, when college students learn Latin,  they probably start with grammar rules—singular and plural, indicative and subjunctive, nominative and accusative. Then they can build up to forming a sentence in Latin.

Similarly, when learning Python or any programming language, you can either read other people’s code, try to understand, and then modify from it. Or you can learn the language rules and build up a program from scratch. The latter would be beneficial if your ultimate goal is to work on the language, such as writing the Python interpreter. But usually, the former approach is faster to get some results.

My suggestion is to learn from examples first. But strengthen your foundation in understanding the language by revisiting the language rules from time to time. Let’s look at an example from Wikipedia:

This Python code is implementing the secant method to find a root for a function. If you are new to Python, you should look at the example and see how much you can understand. If you have prior knowledge from other programming languages, you would probably guess def defines a function. But if you do not, you might feel confused. If this is the case, it is best for you to start from a beginner’s book on programming to learn about the concept of functions, variables, loops, etc.

The next thing you might think you can do is modify the functions. For example, what if we are not using the secant method to find the root but instead use Newton’s method? You might guess how to modify the equation on line 4 to do it. What about the bisection method? You would need to add a statement of if f(x2)>0 to decide which way we should go. If we look at the function f_example, we see the symbol **. This is the exponent operator to mean  to the power of 2 there. But should it be 2612 or 2612? You would need to go back and check the language manual to see the operator precedence hierarchy.

Therefore, even with a short example like this, you can learn a lot of language features. By learning from more examples, you can deduce the syntax, get used to the idiomatic way of coding, and do some work even if you cannot explain it in detail.

What to Avoid

If you decide to learn Python, it is inevitable you will want to learn from a book. Just picking up any beginner’s book on Python from your local library should work. But when you read, keep the bigger picture of your learning goal in mind. Do some exercises while you read, try out the codes from the book, and make up your own. It is not a bad idea to skip some pages. Reading a book cover to cover may not be the most efficient way to learn. You should prevent yourself from drilling too deep into a single topic because this will make you lose track of the bigger goal of using Python to do useful things. Topics such as multithreading, network sockets, and object-oriented programming can be treated as advanced topics for later.

Python is a language that is decoupled from its interpreter or compiler. Therefore, different interpreters may behave a bit differently. The standard interpreter from python.org is CPython, also called the reference implementation. A common alternative is PyPy. Regardless of which one you use, you should learn with Python 3 rather than Python 2 as the latter is an obsolete dialect. But bear in mind that Python gained its momentum with Python 2, and you may still see quite a lot of Python 2 programming around.

Resources

Reading Resources

If you cannot go to the library to pick up a printed book, you can make use of some online resources instead. I would highly recommend beginners read The Python Tutorial. It is short but guides you through different aspects of the language. It lets you take a peek at what Python can do and how to do it.

After the tutorial, you probably should keep the Python Language Reference and the Python Library Reference handy. You will reference them from time to time to check the syntax and lookup function usages. Do not force yourself to remember every function.

Programming Environment

Python is built-in in macOS, but you may want to install a newer version. In Windows, it is common to see people using Anaconda instead of installing just the Python interpreter. But if you feel it is too much hassle to install an IDE and the Python programming environment, you might consider using Google Colab. This allows you to write Python programs in a “notebook” format. Indeed, many machine learning projects are developed in the Jupyter notebook as it allows us to quickly explore different approaches to a problem and visually verify the result.

You can also use an online shell at https://www.python.org/shell/ to try out a short snippet. The downside compared to the Google Colab is that you cannot save your work.

Asking for Help

When you start from an example you saw from a book and modify it, you might break the code, making it fail to run. It is especially true in machine learning examples, where you have many lines of code that cover data collection, preprocessing, building a model, training, validation, prediction, and finally presenting the result in a visualized manner. When you see an error result from your code, the first thing you need to do is pinpoint the few lines that caused the error. Try to check the output from each step to make sure it is in the correct format. Or try to roll back your code to see which change you made started to introduce errors.

It is important to make mistakes and learn from mistakes. When you try out syntax and learn this way, you should encounter error messages from time to time. If you try to make sense from it, then it will be easier to figure out what caused the error. Almost always, if the error comes from a library that you’re using, double confirm your syntax with the library’s documentation.

If you are still confused, try to search for it on the internet. If you’re using Google, one trick you can use is to put the entire error message in a pair of double quotes when you search. Or sometimes, searching  on StackOverflow might give you better answers.

Further Readings

Here I list out some pointers for a beginner. As referenced above, the Python Tutorial is a good start. This is especially true at the time of this writing when Python 3.9 rolled out recently and some new syntax was introduced. Printed books are usually not as updated as the official tutorial online.

There are many primer-level books for Python. Some short ones that I know of are:

For a bit more advanced learner, you may want to see more examples to get something done. A cookbook-style book might help a lot as you can learn not only the syntax and language tricks but also the different libraries that can get things done.

Summary

In this post, you learned how one should study Python and the resources that can help you start. A goal-oriented approach to study can help you get the result quicker. However, as always, you need to spend some significant time on it before you become proficient.

Thursday, 16 January 2025

Lagrange Multiplier Approach with Inequality Constraints

In a previous post, we introduced the method of Lagrange multipliers to find local minima or local maxima of a function with equality constraints. The same method can be applied to those with inequality constraints as well.

In this tutorial, you will discover the method of Lagrange multipliers applied to find the local minimum or maximum of a function when inequality constraints are present, optionally together with equality constraints.

After completing this tutorial, you will know

  • How to find points of local maximum or minimum of a function with equality constraints
  • Method of Lagrange multipliers with equality constraints

Let’s get started.

Lagrange Multiplier Approach with Inequality Constraints

Lagrange Multiplier Approach with Inequality Constraints
Photo by Christine Roy, some rights reserved.

Prerequisites

For this tutorial, we assume that you already have reviewed:

as well as

You can review these concepts by clicking on the links above.

Constrained Optimization and Lagrangians

Extending from our previous post, a constrained optimization problem can be generally considered as

min()subject to()=0â„Ž()0()0

where  is a scalar or vector values. Here, ()=0 is the equality constraint, and â„Ž()0()0 are inequality constraints. Note that we always use  and  rather than > and < in optimization problems because the former defined a closed set in mathematics from where we should look for the value of . These can be many constraints of each type in an optimization problem.

The equality constraints are easy to handle but the inequality constraints are not. Therefore, one way to make it easier to tackle is to convert the inequalities into equalities, by introducing slack variables:

min()subject to()=0â„Ž()2=0()+2=0

When something is negative, adding a certain positive quantity into it will make it equal to zero, and vice versa. That quantity is the slack variable; the 2 and 2 above are examples. We deliberately put 2 and 2 terms there to denote that they must not be negative.

With the slack variables introduced, we can use the Lagrange multipliers approach to solve it, in which the Lagrangian is defined as:

(,,,)=()()(â„Ž()2)+(()+2)

It is useful to know that, for the optimal solution  to the problem, the inequality constraints are either having the equality holds (which the slack variable is zero), or not. For those inequality constraints with their equality hold are called the active constraints. Otherwise, the inactive constraints. In this sense, you can consider that the equality constraints are always active.

The Complementary Slackness Condition

The reason we need to know whether a constraint is active or not is because of the Krush-Kuhn-Tucker (KKT) conditions. Precisely, the KKT conditions describe what happens when  is the optimal solution to a constrained optimization problem:

  1. The gradient of the Lagrangian function is zero
  2. All constraints are satisfied
  3. The inequality constraints satisfied complementary slackness condition

The most important of them is the complementary slackness condition. While we learned that optimization problem with equality constraint can be solved using Lagrange multiplier which the gradient of the Lagrangian is zero at the optimal solution, the complementary slackness condition extends this to the case of inequality constraint by saying that at the optimal solution , either the Lagrange multiplier is zero or the corresponding inequality constraint is active.

The use of complementary slackness condition is to help us explore different cases in solving the optimization problem. It is the best to be explained with an example.

Example 1: Mean-variance portfolio optimization

This is an example from finance. If we have 1 dollar and were to engage in two different investments, in which their return is modeled as a bi-variate Gaussian distribution. How much should we invest in each to minimize the overall variance in return?

This optimization problem, also known as Markowitz mean-variance portfolio optimization, is formulated as:

min(1,2)=1212+2222+21212subject to1+2=11011

which the last two are to bound the weight of each investment to between 0 and 1 dollar. Let’s assume 12=0.2522=0.1012=0.15 Then the Lagrangian function is defined as:

(1,2,,,)=0.2512+0.122+0.312(1+21)(12)(11+2)

and we have the gradients:

1=0.51+0.322=0.22+0.31=112=21=112

From this point onward, the complementary slackness condition have to be considered. We have two slack variables  and  and the corresponding Lagrange multipliers are  and . We now have to consider whether a slack variable is zero (which the corresponding inequality constraint is active) or the Lagrange multiplier is zero (the constraint is inactive). There are four possible cases:

  1. ==0 and 2>02>0
  2. 0 but =0, and 2=02>0
  3. =0 but 0, and 2>02=0
  4. 0 and 0, and 2=2=0

For case 1, using /=0/1=0 and /2=0 we get

2=110.51+0.32=0.31+0.22=

which we get 1=12=2=0.1. But with /=0, we get 2=1, which we cannot find a solution (2 cannot be negative). Thus this case is infeasible.

For case 2, with /=0 we get 1=0. Hence from /=0, we know 2=1. And with /2=0, we found =0.2 and from /1 we get =0.1. In this case, the objective function is 0.1

For case 3, with /=0 we get 1=1. Hence from /=0, we know 2=0. And with /2=0, we get =0.3 and from /1 we get =0.2. In this case, the objective function is 0.25

For case 4, we get 1=0 from /=0 but 1=1 from /=0. Hence this case is infeasible.

Comparing the objective function from case 2 and case 3, we see that the value from case 2 is lower. Hence that is taken as our solution to the optimization problem, with the optimal solution attained at 1=02=1.

As an exercise, you can retry the above with 12=0.15. The solution would be 0.0038 attained when 1=513, with the two inequality constraints inactive.

Want to Get Started With Calculus for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Example 2: Water-filling algorithm

This is an example from communication engineering. If we have a channel (say, a wireless bandwidth) in which the noise power is  and the signal power is , the channel capacity (in terms of bits per second) is proportional to log2(1+/). If we have  similar channels, each has its own noise and signal level, the total capacity of all channels is the sum log2(1+/).

Assume we are using a battery that can give only 1 watt of power and this power have to distribute to the  channels (denoted as 1,,). Each channel may have different attenuation so at the end, the signal power is discounted by a gain  for each channel. Then the maximum total capacity we can achieve by using these  channels is formulated as an optimization problem

max(1,,)==1log2(1+)subject to=1=11,,0

For convenience of differentiation, we notice log2=log/log2 and log(1+/)=log(+)log(), hence the objective function can be replaced with

(1,,)==1log(+)

Assume we have =3 channels, each has noise level of 1.0, 0.9, 1.0 respectively, and the channel gain is 0.9, 0.8, 0.7, then the optimization problem is

max(1,2,)=log(1+0.91)+log(0.9+0.82)+log(1+0.73)subject to1+2+3=11,2,30

We have three inequality constraints here. The Lagrangian function is defined as

(1,2,3,,1,2,3)= log(1+0.91)+log(0.9+0.82)+log(1+0.73)(1+2+31)1(112)2(222)3(332)

The gradient is therefore

1=0.91+0.9112=0.80.9+0.8223=0.71+0.733=11231=1212=2223=323

But now we have 3 slack variables and we have to consider 8 cases:

  1. 1=2=3=0, hence none of 12,22,32 are zero
  2. 1=2=0 but 30, hence only 32=0
  3. 1=3=0 but 20, hence only 22=0
  4. 2=3=0 but 10, hence only 12=0
  5. 1=0 but 2,3 non-zero, hence only 22=32=0
  6. 2=0 but 1,3 non-zero, hence only 12=32=0
  7. 3=0 but 1,2 non-zero, hence only 12=22=0
  8. all of 1,2,3 are non-zero, hence 12=22=32=0

Immediately we can tell case 8 is infeasible since from /=0 we can make 1=2=3=0 but it cannot make /=0.

For case 1, we have
0.91+0.91=0.80.9+0.82=0.71+0.73=
from /1=/2=/3=0. Together with 3=112 from /=0, we found the solution to be 1=0.4442=0.4303=0.126, and the objective function (1,2,3)=0.639.

For case 2, we have 3=0 from /3=0. Further, using 2=11 from /=0, and
0.91+0.91=0.80.9+0.82=
from /1=/2=0, we can solve for 1=0.507 and 2=0.493. The objective function (1,2,3)=0.634.

Similarly in case 3, 2=0 and we solved 1=0.659 and 3=0.341, with the objective function (1,2,3)=0.574.

In case 4, we have 1=02=0.6523=0.348, and the objective function (1,2,3)=0.570.

Case 5 we have 2=3=0 and hence 3=1. Thus we have the objective function (1,2,3)=0.0.536.

Similarly in case 6 and case 7, we have 2=1 and 1=1 respectively. The objective function attained 0.531 and 0.425 respectively.

Comparing all these cases, we found that the maximum value that the objective function attained is in case 1. Hence the solution to this optimization problem is
1=0.4442=0.4303=0.126, with (1,2,3)=0.639.

Extensions and Further Reading

While in the above example, we introduced the slack variables into the Lagrangian function, some books may prefer not to add the slack variables but to limit the Lagrange multipliers for inequality constraints as positive. In that case you may see the Lagrangian function written as

(,,,)=()()â„Ž()+()

but requires 0;0.

The Lagrangian function is also useful to apply to primal-dual approach for finding the maximum or minimum. This is particularly helpful if the objectives or constraints are non-linear, which the solution may not be easily found.

Some books that covers this topic are:

Summary

In this tutorial, you discovered how the method of Lagrange multipliers can be applied to inequality constraints. Specifically, you learned:

  • Lagrange multipliers and the Lagrange function in presence of inequality constraints
  • How to use KKT conditions to solve an optimization problem when inequality constraints are given
Connect broadband

How to Learn Python for Machine Learning

Python has become a de facto lingua franca for machine learning. It is not a difficult language to learn, but if you are not particularly fa...