I've 2 files as shown below.
Keyword file
spark
scala
hive
Content file
this is spark.
this can be scala and spark.
this is hive.
My aim is to lookup the keywords in each line of the content file. While searching, I should be able to get only the last occurance of the key word ( i.e. even though if the content contains 2 keyword, I should take only the last occurance) and create a csv file to load the data into hive table.
Expected output
"this is spark.","spark"
"this can be scala and spark.","spark"
"this is hive.","hive"
My content file has millions of rows. what is the best & optimized way to get the output
Comments
Post a Comment