You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Edmon Begoli <eb...@gmail.com> on 2015/11/22 05:05:31 UTC

Benefits of parquet partitioning for non-restrictive, aggregate queries?

Hey guys,

Are there any benefits of generic partitioning for non-restrictive count(*)
queries
with Drill and Parquet files partitioned on some base criteria (by state,
month, etc.)

Let's say I am running:

select count(*) from dfs.tmp.`claims_parquet`;

where I have plain and partitioned claims_parquet

For example, is there maybe a scatter-gather parallelisation?

(we are about to benchmark this, but I would like to know a theory behind
it too)

Thank you,
Edmon