You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Rohit Verma <ro...@rokittech.com> on 2016/11/18 04:21:24 UTC

Is selecting different datasets from same parquet file blocking.

Hi

I have dataset which has 10 columns, created through a parquet file.
I want to perform some operations on each column.

I create 10 datasets as dsBig.select(col).

When I submit these 10 jobs will they be blocking each other as all of them reading from same parquet file. Is selecting different datasets from same parquet file blocking?

Is it better if I used first read as
dsBig.cache().select(col1)

Regards
Rohit