Feature Scaling With Python

Salman Faroz
3 min readFeb 23, 2019

We use feature scaling, also known as standardization, for independent variables in data preprocessing.

It is employed to modify the independent values within a specific range.

why do we do this?

The fact that not all of the values in the data set fall within the same range — for instance, the age and income cannot both fall within the same range — helps to cut down on computing costs.

Sometimes, feature scaling can accelerate the algorithm’s convergence rate.
If it does, the algorithm, which is dependent on distance, will be affected; if not, it won’t.

Where to use it?

K-Means ,K-Nearest-Neighbours , Principal Component Analysis, Gradient Descent so these algorithms are needed to be Scaled and all the Classification algorithms are also needed to be scaled

Where not need to use?

Naive Bayes, Linear Discriminant Analysis, and Tree-Based models are not affected by feature scaling.

Shortly: the Distance-based Algorithms need to be scaled else no need.

How to do this?

Two ways : Min-Max Normalisation and Standardisation

Min-Max Normalisation

This technique re-scales Independent variables or observation values with distribution values between 0 and 1

Formula for min_max norm

We take Iris data-set , you can download the data-set in this link Click Here

Dependencies are Numpy , Pandas , Sklearn.

Importing the data-set and Splitting the independent variables as x

The values are changed between 0 to 1

Standardization

It is the most commonly used one, First of all, we center the data. with 0-mean one-sigma(Standard deviation)

Then we divide by the standard deviation

It re-scales a feature value into Normal distribution with 0 mean value and variance equals 1.

Formula for Standardisation

In this Values are changed from -2.4389 to 3.1146

Conclusion: Feature Scaling gives us Better Results

--

--