We have 25 Executors running, each with 5 cores(cpus).
our data is 10 Million records
We have flatMap
function which runs for every record.
Our understanding from documentation of SPARK is that each executor represents one JVM, is this correct?
However, from our observation SPARK is creating 10 Million JVMs for processing of flatMap
Kindly clarify our doubt.
EDIT
We are loading 3rd party Context in static block.
our assumption was that it will be loaded once for every JVM.
we run AWR report and found that context loading queries are running for 10 Million times.
Hence, we are assuming that SPARK is creating JVM for each record in flatmap
This flatmap is executed when we call count
operation on output dataframe
of flatmap
.
Comments
Post a Comment