## STOCHASTIC GRADIENT DESCENT

### STOCHASTIC GRADIENT DESCENT is an efficient algorithm over GRADIENT DESCENT when it requires to deal with BIG DATA.Where Data are huge STOCHASTIC GRADIENT DESCENT is used.

### In our Previous post, We already discussed about GRADIENT DESCENT (Click Here) very well.In this post, we will try to understand STOCHASTIC GRADIENT DESCENT. Both are almost same , only difference comes while iterating:

### In Gradient Descent ,We had four things

###
- Feature Vector(X)
- Label(Y)
- Cost function(J)
- Predicted Value(
**Y**_{p)}

θ was representing the coefficient/Weightage vector for feature vector,

θ_{0 }Offset Parameter
Y_{p}=θ.X+θ_{0}
θ_{new}=θ_{old}-(η*∂J/∂θ)

The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating:
In Gradient Descent ,
We sum up the losses over all the data points given and take average in our cost function, Something like this:

**J=(1/n)ΣLoss(Y**^{i},Y_{p}^{i}**)**
**∂J/∂θ=(1/n)∂∕∂θ(Σ****Loss(Y**^{i},Y_{p}^{i}**))**

And in each iteration, for calculating ∂J/∂θ, we consider all the points. Stochastic Gradient Descent makes it easy, in each iteration we choose any **i∈{1,2,3,...,n} randomly **and calculate **∂J/∂θ **only for that point instead of summing up and taking average after that updates **θ.**
**∂J/∂θ=(1/n)∂∕∂θ(****Loss(Y**^{i},Y_{p}^{i}**))**
θ_{new}=θ_{old}-(η*∂J/∂θ)
So, In Stochastic Gradient Descent Method, In every iteration, we update the value of theta according to a single random point only.
As Stochastic means 'Random'.

**Y**_{p)}θ

_{0 }Offset Parameter

_{p}=θ.X+θ

_{0}

_{new}=θ

_{old}-(η*∂J/∂θ)

The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating:

In Gradient Descent ,

We sum up the losses over all the data points given and take average in our cost function, Something like this:

**J=(1/n)ΣLoss(Y**

^{i},Y_{p}^{i}**)**

**∂J/∂θ=(1/n)∂∕∂θ(Σ**

**Loss(Y**

^{i},Y_{p}^{i}**))**

And in each iteration, for calculating ∂J/∂θ, we consider all the points. Stochastic Gradient Descent makes it easy, in each iteration we choose any

**i∈{1,2,3,...,n} randomly**and calculate**∂J/∂θ**only for that point instead of summing up and taking average after that updates**θ.****∂J/∂θ=(1/n)∂∕∂θ(**

**Loss(Y**

^{i},Y_{p}^{i}**))**

θ

_{new}=θ_{old}-(η*∂J/∂θ)
So, In Stochastic Gradient Descent Method, In every iteration, we update the value of theta according to a single random point only.

## 0 Comments