How to index categorical variables in python with RDD file

I have a RDD file in spark with no header. How do I index the categorical variable? the data is shown below the raw data can be found https://www.kaggle.com/c/titanic/data

print train4.take(3)

[[u'1', u'0', u'3', u'"Braund', u'male', 22.0, 1.0, 0.0, u'A/5 21171', 7.25, u'', u'S', u' Mr', u' Owen Harris"'], [u'2', u'1', u'1', u'"Cumings', u'female', 38.0, 1.0, 0.0, u'PC 17599', 71.2833, u'C85', u'C', u' Mrs', u' John Bradley (Florence Briggs Thayer)"'], [u'3', u'1', u'3', u'"Heikkinen', u'female', 26.0, 0.0, 0.0, u'STON/O2. 3101282', 7.925, u'', u'S', u' Miss', u' Laina"']]

All Questions Answered

Search This Blog

Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

How to index categorical variables in python with RDD file

Comments

Post a Comment