Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Python Machine Learning: Not getting the proper output of the model

I am trying to create a machine learning model with scikit which will show the category according to the product description given by the user. However the program works on certain input but fails on other input.

Here is my code:

import pandas as pd
import numpy as np

df=pd.read_excel('D:\\android\\testdata2.xlsx')
X=df['Product Description']
Y=df['Category']
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=.25,random_state=4)
from sklearn.feature_extraction.text import CountVectorizer
count_vector=CountVectorizer()
X_train_count=count_vector.fit_transform((X_train))
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer= TfidfTransformer()
X_train_tfidf=tfidf_transformer.fit_transform(X_train_count)
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(X_train_tfidf, Y_train)
rom sklearn.pipeline import Pipeline
from sklearn.externals import joblib
import pickle
text_clf=Pipeline([('vect',CountVectorizer()),('tfidf',TfidfTransformer()),('clf',MultinomialNB()),])
text_clf=text_clf.fit(X_train,Y_train)
joblib.dump(text_clf,'model.pkl')
X_test1=[ 'MULTIVAC']
predicted=text_clf.predict(X_test1)
proab=text_clf.predict_proba(X_test)
print (str(predicted))
print (max(proab[0]))

Here is the data that i am using. model-data

The output with some test cases changes, for example:

I/P:X_test1=[ 'IMPLANT MAMMAIRE ANATOMIQUE ']
O/P:['IMPLANT MAMMAIRE']

0.09326544037258762

But as i remove the data and only keep, 'IMPLANT' ,the output changes and this should not happen as it gives the wrong category.

I/P: X_test1=[ 'IMPLANT ']
O/P:['INSTRUMENT ELECTROCHIRURGIE']
0.09326544037258762

Another Example: The category should be:

I/P: X_test1=[ 'SET FISTULE GANT']
O/P:['SET BRANCHEMENT DEBRANCHEMENT HEMODIALYSE FISTULE']
0.09326544037258762

But the output comes as:

I/P: X_test1=[ 'SET FISTULE GANT']
O/P:['BEVACIZUMAB']
0.08333333333333333

Comments