Subscribe Us

Header Ads

Perceptron Algorithm | Pegasos Algorithm | Support Vector Machine | Linear Classifier Among Dataset | Deep Learning | Machine Learning


Perceptron Algorithm is one of the most used algorithm in classifying data.It frequently seems little tough while learning first time.

In today's post , We'll try to understand it well in brief with full explanation. 

Perceptron Algorithm is used whenever given dataset can be classified into two parts or it has only two labels.(One can consider it as +1, -1)

There are three version of Perceptron Algorithm:

  1. Simple Perceptron 
  2. Average Percepron
  3. Pegasos Algorithm or Support Vector  Machine
Relax! Terms are sounding dangerous but they are really very easy.Let me explain you.

1.Simple Perceptron:

The single target of perceptron algorithm is to find a Linear classifier( say Linear equation) on one side of which are all positively labelled point and on another side all negatively labelled points are there.

As we know any linear equation can be written as,

Y=θ.X + θ0

Using Perceptron Our Aim is to find those value of θ vector and θ0 , with which formed line divide our dataset into two part , one part of which will be positively labelled and another part will be negatively labelled.See below pic:

  • It is also an iterative method.We'll start with θ = 0 vector and θ0=0
  • In each iteration ,for every data point ,

        •  if ((yi)*(θ.xi0)<=0) :
          • Update (θ=θold + (yi.xi))
                           (θ0+ yi)

Reason For above updates can be derived using stochastic gradient descent algorithm.We'll derive above updation.If you are not familiar with Stochastic Gradient Descent ,Click Below Link.

2. Averaged Perceptron :

               In Averaged Perceptron , Our Averaged Perceptron Function returns the average value of θ and θ0 ,updated in every iteration.While in "Only Perceptron Algorithm" , We return the last updated value of θ and θ0 .

See the Below Code of a sample implementation


3. Pegasos Algorithm/Support Vector Machine:

In above two algorithm, we have taken care of getting best linear classifier, Which can well classify our dataset, but what if there are many classifier existing that can classify the dataset in two part.See Below pic

In above picture, It can be seen quite clearly that all the classifier line is well classifying our training set.

If you think little wisely, you can say the classifier with a high margin boundary will be said to best fit.

In such a case we should choose, which have large margin(d) on both side (positively labelled and negatively labelled) . 

Now, there are two things :
  1. Loss : Loss on misclassifying a data
  2. Regularization : Responsible for maintaining large boundary
Remember we are concerned only about its correct classification.If our classifier classifies any point wrongly , we will give some penalty to our classifier.
Also Notice if any point will be classified correctly and not on decision boundary, then :

label*Ypredicted >0

Since we want no point inside margin boundary too, We'll give penalty, when it will inside margin boundary too.So there will be no loss if :


Otherwise we'll give some penalty to our model, this is known as Hinge Loss, and it is defined as:

H(z)=max{0,1-z}, z=(yi.Xi+θ0))

means , Loss will be zero if z>1 else 1-z

So, Our Objective function will be 

J = Loss + Regularization

J(θ,θ0) = (1/n)[Σin=1H(yi.Xi0)) + (λ/2)||θ||]

Our target is to find those value of θ,θ0 For which value of objective function is least or we have to minimize Objective function.We'll do it using Stochastic gradient descent Algorithm

So, in each iteration we will consider only one point and will calculate cost only for that and update.

J(θ,θ0) = H(yi.Xi0i)) + (λ/2)||θ||
Since, ∂H(z)/∂θ = {0 if z>1 ,-∂z/∂θ otherwise}
So,the iteration formula from stochastic Gradient Descent:
if z<=1:

θnewold - η[-yi.x+λ.θold]
θnewold +ηyi.xi  -η.λ.θold
θonewoold  η.yi


θnewold  λ.θold
We will make η getting decreased as iteration will increase.
Try to implement and Uderstand Below code

If we remove the regularization term , and take η=1 , it becomes Simple Perceptron Algorithm.

Post a comment