Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

PySpark: Convert PySpark DataFrame — Error “must return a pyspark.sql.DataFrame”

I have a sample PySpark Dataframe (shown below). I want to convert it to another type of Dataframe in which I can apply Pandas NLTK methods to it.

ID   Text      Number
A    hello     456
C    goodbye   862
F    yes       111
G    no        323

I tried the code below:

def function_1(input_df):

    df = input_df

    df = df.toPandas()

    return df 

I get the following error when I run the code:

You returned a pandas.DataFrame in a pyspark workbook. You must return a pyspark.sql.DataFrame in this workbook.

Comments