We have 25 Executors running, each with 5 cores(cpus).
our data is 10 Million records
We have flatMap function which runs for every record.
Our understanding from documentation of SPARK is that each executor represents one JVM, is this correct?
However, from our observation SPARK is creating 10 Million JVMs for processing of flatMap
Kindly clarify our doubt.
EDIT
We are loading 3rd party Context in static block.
our assumption was that it will be loaded once for every JVM.
we run AWR report and found that context loading queries are running for 10 Million times.
Hence, we are assuming that SPARK is creating JVM for each record in flatmap
This flatmap is executed when we call count operation on output dataframe of flatmap.
Comments
Post a Comment