Scikit Learn

1. Loading the dataset

The data set will be using for this is the famous “20 Newsgoup” data set. About the data from the original website: This data set is in-built in scikit, so we don’t need to download it explicitly.

2. Extracting features from text files

Finding TF-IDF

3. Running ML algorithms

This will train the NB classifier on the training data we provided.

Building a pipeline: We can write less code and do all of the above, by building a pipeline as follows:

Performance of NB classifier

To improve the accuracy we can change the algorithm

tuning the model to incresae accuracy even more This will take a long time to run

🎊🎊🎊 A accuracay of 90% Much better!