Training set and testing set are the common terminologies used in machine learning / data science. In this post, you will get to know about the below introductory concepts in a short and simple way.
- What is a dataset in machine learning?
- What is a machine learning model?
- Why do you need separate data for training and testing?
- What is a Training set?
- What is a Testing set?
- Intuitive explanation to Training Set and Testing set
What is a Dataset in machine learning?
For any machine learning problem, the first step would be data collection. This data that you have collected for your machine learning task is called the “Dataset”.
Example: Collecting historical home price details for building a House price prediction model.
What is a Machine learning model?
The next step in machine learning would be, building a model. This is actually a program written to instruct machine to learn by itself. If you are a beginner reading this post, you can assume the model as a black box which does the task for you. For example, face recognition model, or a house price prediction model.
Why do you need separate data for Training and Testing?
Your model is now ready to learn the data you are going to pass. A clever thing to do is to NOT make the model learn all the data you have. Because, to evaluate if the model is doing good, we will be testing our model in the next step. So, testing the model with the same data that it has already learnt doesn’t make any sense. Because it will definitely perform good if we test on the same data that the model has already seen.
To make things meaningful, we keep aside a smaller portion of the dataset for the model to be tested on later. Setting aside literally means not showing this small portion of dataset to the model until the model has finished learning.
What is a Training set?
Here comes the model learning part. In this stage, we pass the bigger portion of the dataset meant for the model to learn on. This portion of data is called the Training set. Because, this data is what the model is trained on.
What is a Testing set?
Once the model completes learning on the training set, it is time to evaluate the performance of the model. For this, we use the smaller portion of the data that we have already set aside. This data which the model has never seen, is called the Testing set. Because, this data is what the model will be tested on.
Intuitive explanation to Training Set and Testing set
This can be compared to a student who is studying for an exam with the help of his text books. And, tested during the exam with the exam question paper which the student has never seen before. The text book is the training set, and the exam question paper is the testing set.
- Machine Learning Explained In A Simple Way
- Why You Need Machine Learning? Why is ML Important?
- How to Build a Machine Learning System?