Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Keywords lookup in Spark Scala based on position

I've 2 files as shown below.

Keyword file

spark
scala
hive

Content file

this is spark.
this can be scala and spark.
this is hive.

My aim is to lookup the keywords in each line of the content file. While searching, I should be able to get only the last occurance of the key word ( i.e. even though if the content contains 2 keyword, I should take only the last occurance) and create a csv file to load the data into hive table.

Expected output

"this is spark.","spark"
"this can be scala and spark.","spark"
"this is hive.","hive"

My content file has millions of rows. what is the best & optimized way to get the output

Comments