Multi Classification on large dataset with over 600 labels

I'm trying to train a text data for multi label classification which comprises of 1 Million rows. After cleaning the data, I'm using a sparse matrix of Word2Vec features (Feature size is 300)

The data which I have is 1. ID 2. Dictionary 3. Label

Dictionary size varies from 10 keys to 900 keys

Steps I followed on Dictionary columns are:

Converted Dictionary to String
Getting only good tokens from the string
Removing Stopwords
Stemming of words
Word2Vec Model training with feature size 300.
Word2Vec feaure generation
Label Encoding
Converting Feature Vectors to Numpy Array
Converting Numpy Array to Sparse Matrix of (1114220, 300)
Tried OneVsRest model for training

onevsrest = OneVsRestClassifier(SVC(probability=True) , n_jobs=-1)

onevsrest.fit(sparse_matrix , df.labels)

I was running this model for nearly two days and it got killed automatically

I also tried Logistic Regression

lr = LogisticRegression(penalty ='l1' , C=1 ,dual=False , solver='saga' , n_jobs=-1)

lr.fit(sparse_matrix , df.labels)

Still I faced the same issue ( Model keeps training for 2 days and gets killed)

Am I doing something wrong? Or is there any better way to do this type of problem?

All Questions Answered

Search This Blog

Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Multi Classification on large dataset with over 600 labels

Comments

Post a Comment