Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

How can I read entire input file in hadoop streaming using Python

I have a shell script a.sh that runs python mapper map.py through hadoop streaming. I am able to read the input file line by line through sys.stdin. How can I read the entire input file into a string. I will have to perform string manipulation over multiple files (xml format)

a.sh

hadoop jar --- \
-files map.py \
-input <input_dir> \
-output <output_dir> \
-mapper "python map.py" \
-reducer NONE

map.py

str=sys.stdin  # This reads line by line. How can I read entire file ?

Comments