How to convert categorical data to mathematics?

4 min readJan 12, 2021

Hello reader,

The category data can be textual in nature. All machine learning models are an unusual kind of mathematical model that requires numbers to work with. This is one of the initial reasons we need to convert the categorical data into numerical before we can feed it to machine learning models.

Now, look at some feature engineering techniques which is used to convert categorical data into numerical

OneHot Encoding Technique

In this technique, we convert the Categorical data in the form of 0 and 1
This technique creates a new feature so it is useful when we have minimum unique values in that specific function.
- Here we performed onehot encoding technique on the Sex feature

It creates two new features female and male and arranged in alphabetical order
In the ‘Sex’ feature wherever female is the category they put 1 in the new ‘female’ feature else 0.
Similarly, wherever male is the category they put 1 in the new ‘male’ feature else 0.

2. Ordinal Label Encoding Technique

This technique is applicable to ordinal categorical data.
Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known.
For example :

We have a feature education with values SSC, HSC, Diploma, Bachelor, Master, Ph.D. so we can easily rank this: Ph.D. person is holding a rank 1, Master = rank 2, Batchelor = rank 3

On the ‘Week Day’ column/feature we perform ordinal label encoding, We assign a rank as below :

3. Count Frequency Encoding Technique

This technique is useful when we have frequent values in our feature
Here we replace category values with their frequency count
It does not create new feature hence it not increasing feature space
If two category having the same frequency count then it will provide the same weight to both

Applying Count Frequency encoding Technique

4. Mean Encoding Technique

First, find the relation percent(%)of the feature with respect to the Target/Output feature (finding mean)
Now, we replace category values with their mean with respect to the target feature

5. Target Guided Encoding Technique

First, find the relation percent(%)of the feature with respect to the Target/Output feature
After that order it in ascending order
Now, we apply ordinal label encoding (we assign rank/number to each cabin)

Applying Target Guided Ordinal Encoding Technique

6. Probability Encoding Technique

First, find the probability percentage :
Find the mean of the feature with respect to the Target/Output feature
Then, subtract the mean from 1
Probability percentage = mean / (mean — 1 )
Now, we replace category values with a probability percentage

Applying Probability Ration Encoding Technique

I believe this will help you develop your knowledge. In this blog, I included “How to convert category data into mathematics?” For that, I tried to include all the techniques used for theoretical conversion. For a more beneficial practical implementation please take a look at my GitHub repository I explained all the code line by line.

Make sure your data is missing free here is my notebooks that help you to handle missing values.

How to convert categorical data to mathematics?

Written by Rushikesh Lavate