I have a shell script a.sh that runs python mapper map.py through hadoop streaming. I am able to read the input file line by line through sys.stdin. How can I read the entire input file into a string. I will have to perform string manipulation over multiple files (xml format)
a.sh
hadoop jar --- \
-files map.py \
-input <input_dir> \
-output <output_dir> \
-mapper "python map.py" \
-reducer NONE
map.py
str=sys.stdin # This reads line by line. How can I read entire file ?
Comments
Post a Comment