K-fold Cross Validation

Md Khaled Hasan
3 min readJan 7, 2021

Let’s say we want to predict mortality of a COVID-19 patient. We have a data set like below with an outcome.

Table 1: COVID19 patient data

Then a patient data like the below picture has come and we need to predict mortality of the patient in order to ease the medical decision taking.

Table 2: New COVID19 patient data

We can use different Machine Learning (ML) classification algorithms like Logistic Regression (LR), Naïve Bayes, Decision Tree (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF) etc. But how would we know which algorithm will be best for the prediction? Cross validation answers the problem. It basically evaluates a model.

In the normal ML system we can evaluate a model in two ways.

Fig 1: Model Evaluation

Let’s understand the system and the problem with a real life example. Imagine we want to teach a student maths and test him.

Option 1: Imagine we taught a kid 100 maths and test him with the same 100 maths. Problem is the kid already know the 100 maths so he can perform 100 maths accurately. So, the accuracy would be 100%. But we don’t know how he is gonna perform if there is no math out of this 100.

Fig 2: Training and testing on the same 100 maths

Option 2: Imagine we have gone through a different approach. We taught the kid 70 maths out of 100 and test with remaining 30. The problem is that 70 maths can be of algebra and the testing 30 can be of trigonometry. Then the kid is going to be performing very bad. So this can not be a good evaluation method also.

Fig 3: Training on 70 maths while testing on 30 maths

The problems can be solved easily with K-fold cross validation. In this method we divide the 100 maths into folds. If we divide 100 maths into 5 folds, it’s called 5-fold cross validation and so on.

Let’s say we are doing a 5-fold cross validation. We divide the 100 maths into equal 5 folds or parts where each folds contain 20 maths. Then we do 5 iterations while taking 4 of the 5 folds to train and other one to test.

Fig 4: Dividing 100 maths into 5 folds

Iteration 1: In 1st iteration, we take 2–5 to train and then 1st one fold to test the model and save the accuracy.
Iteration 2: In 2nd iteration, we take 1,3,4,5 to train and 2 to test the model and save accuracy.

We repeat the process till 5. After the iterations, we average them out which indicates how good is the model.

For more please visit: shorturl.at/bKLR3

--

--

Md Khaled Hasan

I love learning new things. I get an adrenaline rush when I can explain any theoretical physics and maths to anyone. Hope to explain to the more people I can.