You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by rahul challapalli <ch...@gmail.com> on 2015/08/26 21:21:09 UTC

No of files created by CTAS auto partition feature

Drillers,

I executed the below query on TPCH SF100 with drill and it took ~2hrs to
complete on a 2 node cluster.

alter session set `planner.width.max_per_node` = 4;
alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
create table lineitem partition by (l_shipdate, l_receiptdate) as select *
from dfs.`/drill/testdata/tpch100/lineitem`;

The below query returned 75780, so I expected drill to create the same no
of files or may be a little more. But drill created so many files that a
"hadoop fs -count" command failed with a "GC overhead limit exceeded". (I
did not change the default parquet block size)

select count(*) from (select l_shipdate, l_receiptdate from
dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate, l_receiptdate)
sub;
+---------+
| EXPR$0  |
+---------+
| 75780   |
+---------+


Any thoughts on why drill is creating so many files?

- Rahul

Re: No of files created by CTAS auto partition feature

Posted by rahul challapalli <ch...@gmail.com>.

Thanks for clarifying.

In the spill directory under a specific minor fragment, I saw files with
different sizes. A few files are just 50MB each and a few files are 1.5 GB.
Its not clear to me as to why the file sizes are so different.

- Rahul

On Wed, Aug 26, 2015 at 5:48 PM, Steven Phillips <st...@dremio.com> wrote:

> That's not really how it works. The only "spilling" to disk occurs during
> External Sort, and the spill files are not created based on partition.
>
> What makes you think it is spilling prematurely?
>
> On Wed, Aug 26, 2015 at 5:15 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Steven, Jason :
> >
> > Below is my understanding of when we should spill to disk while
> performing
> > a sort. Let me know if I am missing anything
> >
> > alter session set `planner.width.max_per_node` = 4;
> > alter session set `planner.memory.max_query_memory_per_node` =
> 8147483648;
> > (~8GB)
> > create table lineitem partition by (l_shipdate, l_receiptdate) as select
> *
> > from dfs.`/drill/testdata/tpch100/lineitem`;
> >
> > 1. The above query creates 4 minor fragments and each minor fragment gets
> > ~2GB for the sort phase.
> > 2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
> > partition into a separate file to disk
> > 3. The spilled files would be of different sizes.
> > 4. Now if it is a regular CTAS (with no partition by clause), each
> spilled
> > file should be approximately ~2GB in size
> >
> > I just have a hunch that we are spilling a little early :)
> >
> > - Rahul
> >
> >
> > On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Jason,
> > >
> > > What you described is exactly my understanding.
> > >
> > > I did kickoff a run after setting `store.partition.hash_distribute`. It
> > is
> > > still running. I am expecting the no of files to be slightly more than
> or
> > > equal to 75780. (As the default parquet block size should be sufficient
> > for
> > > most of the partitions)
> > >
> > > - Rahul
> > >
> > >
> > >
> > > On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > > wrote:
> > >
> > >> I feel like there is a little misunderstanding here.
> > >>
> > >> Rahul, did you try setting the option that Steven suggested?
> > >> `store.partition.hash_distribute`
> > >>
> > >> This will cause a re-distribution of the data so that the rows that
> > belong
> > >> in a particular partition will all be written by a single writer. They
> > >> will
> > >> not necessarily be all in one file, as we have a limit on file sizes
> > and I
> > >> don't think we cap partition size.
> > >>
> > >> The default behavior is not to re-distribute, because it is expensive.
> > >> This
> > >> however means that every fragment will write out a file for whichever
> > keys
> > >> appear in the data that ends up at that fragment.
> > >>
> > >> If there is a large number of fragments and the data is spread out
> > pretty
> > >> randomly, then there is a reasonable case for turning on this option
> to
> > >> co-locate data in a single partition to a single writer to reduce the
> > >> number of smaller files. There is no magic formula for when it is best
> > to
> > >> turn on this option, but in most cases it will reduce the number of
> > files
> > >> produced.
> > >>
> > >>
> > >>
> > >> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
> > >> challapallirahul@gmail.com> wrote:
> > >>
> > >> > Well this for generating some testdata
> > >> >
> > >> > - Rahul
> > >> >
> > >> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> > >> > aengelbrecht@maprtech.com> wrote:
> > >> >
> > >> > > Looks like Drill is doing the partitioning as requested then. May
> > not
> > >> be
> > >> > > optimal though.
> > >> > >
> > >> > > Is there a reason why you want to subpartition this much? You may
> be
> > >> > > better of to just partition by l_shipdate (not shipmate,
> autocorrect
> > >> got
> > >> > me
> > >> > > there). Or use columns with much lower cardinality to test
> > >> > subpartitioning.
> > >> > >
> > >> > > —Andries
> > >> > >
> > >> > >
> > >> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> > >> > > challapallirahul@gmail.com> wrote:
> > >> > > >
> > >> > > > Steven,
> > >> > > >
> > >> > > > You were right. The count is 606240 which is 8*75780.
> > >> > > >
> > >> > > >
> > >> > > > Stefan & Andries,
> > >> > > >
> > >> > > > Below is the distinct count or cardinality
> > >> > > >
> > >> > > > select count(*) from (select l_shipdate, l_receiptdate from
> > >> > > > dfs.`/drill/testdata/tpch100/
> > >> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
> > >> > > > +---------+
> > >> > > > | EXPR$0  |
> > >> > > > +---------+
> > >> > > > | 75780   |
> > >> > > > +---------+
> > >> > > >
> > >> > > > - Rahul
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> > >> > > > aengelbrecht@maprtech.com> wrote:
> > >> > > >
> > >> > > >> What is the distinct count for this columns? IIRC TPC-H has at
> > >> least 5
> > >> > > >> years of data irrespective of SF, so you are requesting a lot
> of
> > >> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate
> > and
> > >> > > >> correlating receipt date data, your query doesn’t count the
> > actual
> > >> > > files.
> > >> > > >>
> > >> > > >> Try to partition just on the shipmate column first.
> > >> > > >>
> > >> > > >> —Andries
> > >> > > >>
> > >> > > >>
> > >> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> > >> > stefan@activitystream.com
> > >> > > >
> > >> > > >> wrote:
> > >> > > >>>
> > >> > > >>> Hi,
> > >> > > >>>
> > >> > > >>> Is it possible that the combination values of  (l_shipdate,
> > >> > > >>> l_receiptdate) have a very high cardinality?
> > >> > > >>> I would think you are creating partition files for a small
> > subset
> > >> of
> > >> > > the
> > >> > > >>> data.
> > >> > > >>>
> > >> > > >>> Please keep in mind that I know nothing about TPCH SF100 and
> > only
> > >> a
> > >> > > >> little
> > >> > > >>> about Drill :).
> > >> > > >>>
> > >> > > >>> Regards,
> > >> > > >>> -Stefan
> > >> > > >>>
> > >> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <
> > smp@apache.org>
> > >> > > wrote:
> > >> > > >>>
> > >> > > >>>> It would be helpful if you could figure out what the file
> count
> > >> is.
> > >> > > But
> > >> > > >>>> here are some thoughs:
> > >> > > >>>>
> > >> > > >>>> What is the value of the option:
> > >> > > >>>> store.partition.hash_distribute
> > >> > > >>>>
> > >> > > >>>> If it is false, which it is by default, then every fragment
> > will
> > >> > > >>>> potentially have data in every partition. In this case, that
> > >> could
> > >> > > >> increase
> > >> > > >>>> the number of files by a factor of 8.
> > >> > > >>>>
> > >> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> > >> > > >>>> challapallirahul@gmail.com> wrote:
> > >> > > >>>>
> > >> > > >>>>> Drillers,
> > >> > > >>>>>
> > >> > > >>>>> I executed the below query on TPCH SF100 with drill and it
> > took
> > >> > ~2hrs
> > >> > > >> to
> > >> > > >>>>> complete on a 2 node cluster.
> > >> > > >>>>>
> > >> > > >>>>> alter session set `planner.width.max_per_node` = 4;
> > >> > > >>>>> alter session set
> `planner.memory.max_query_memory_per_node` =
> > >> > > >>>> 8147483648;
> > >> > > >>>>> create table lineitem partition by (l_shipdate,
> l_receiptdate)
> > >> as
> > >> > > >> select
> > >> > > >>>> *
> > >> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> > >> > > >>>>>
> > >> > > >>>>> The below query returned 75780, so I expected drill to
> create
> > >> the
> > >> > > same
> > >> > > >> no
> > >> > > >>>>> of files or may be a little more. But drill created so many
> > >> files
> > >> > > that
> > >> > > >> a
> > >> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> > >> > > exceeded".
> > >> > > >> (I
> > >> > > >>>>> did not change the default parquet block size)
> > >> > > >>>>>
> > >> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> > >> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> > >> > > >>>> l_receiptdate)
> > >> > > >>>>> sub;
> > >> > > >>>>> +---------+
> > >> > > >>>>> | EXPR$0  |
> > >> > > >>>>> +---------+
> > >> > > >>>>> | 75780   |
> > >> > > >>>>> +---------+
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> Any thoughts on why drill is creating so many files?
> > >> > > >>>>>
> > >> > > >>>>> - Rahul
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>
> > >> > > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: No of files created by CTAS auto partition feature

Posted by Steven Phillips <st...@dremio.com>.

That's not really how it works. The only "spilling" to disk occurs during
External Sort, and the spill files are not created based on partition.

What makes you think it is spilling prematurely?

On Wed, Aug 26, 2015 at 5:15 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Steven, Jason :
>
> Below is my understanding of when we should spill to disk while performing
> a sort. Let me know if I am missing anything
>
> alter session set `planner.width.max_per_node` = 4;
> alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
> (~8GB)
> create table lineitem partition by (l_shipdate, l_receiptdate) as select *
> from dfs.`/drill/testdata/tpch100/lineitem`;
>
> 1. The above query creates 4 minor fragments and each minor fragment gets
> ~2GB for the sort phase.
> 2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
> partition into a separate file to disk
> 3. The spilled files would be of different sizes.
> 4. Now if it is a regular CTAS (with no partition by clause), each spilled
> file should be approximately ~2GB in size
>
> I just have a hunch that we are spilling a little early :)
>
> - Rahul
>
>
> On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Jason,
> >
> > What you described is exactly my understanding.
> >
> > I did kickoff a run after setting `store.partition.hash_distribute`. It
> is
> > still running. I am expecting the no of files to be slightly more than or
> > equal to 75780. (As the default parquet block size should be sufficient
> for
> > most of the partitions)
> >
> > - Rahul
> >
> >
> >
> > On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <
> altekrusejason@gmail.com
> > > wrote:
> >
> >> I feel like there is a little misunderstanding here.
> >>
> >> Rahul, did you try setting the option that Steven suggested?
> >> `store.partition.hash_distribute`
> >>
> >> This will cause a re-distribution of the data so that the rows that
> belong
> >> in a particular partition will all be written by a single writer. They
> >> will
> >> not necessarily be all in one file, as we have a limit on file sizes
> and I
> >> don't think we cap partition size.
> >>
> >> The default behavior is not to re-distribute, because it is expensive.
> >> This
> >> however means that every fragment will write out a file for whichever
> keys
> >> appear in the data that ends up at that fragment.
> >>
> >> If there is a large number of fragments and the data is spread out
> pretty
> >> randomly, then there is a reasonable case for turning on this option to
> >> co-locate data in a single partition to a single writer to reduce the
> >> number of smaller files. There is no magic formula for when it is best
> to
> >> turn on this option, but in most cases it will reduce the number of
> files
> >> produced.
> >>
> >>
> >>
> >> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
> >> challapallirahul@gmail.com> wrote:
> >>
> >> > Well this for generating some testdata
> >> >
> >> > - Rahul
> >> >
> >> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> >> > aengelbrecht@maprtech.com> wrote:
> >> >
> >> > > Looks like Drill is doing the partitioning as requested then. May
> not
> >> be
> >> > > optimal though.
> >> > >
> >> > > Is there a reason why you want to subpartition this much? You may be
> >> > > better of to just partition by l_shipdate (not shipmate, autocorrect
> >> got
> >> > me
> >> > > there). Or use columns with much lower cardinality to test
> >> > subpartitioning.
> >> > >
> >> > > —Andries
> >> > >
> >> > >
> >> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> >> > > challapallirahul@gmail.com> wrote:
> >> > > >
> >> > > > Steven,
> >> > > >
> >> > > > You were right. The count is 606240 which is 8*75780.
> >> > > >
> >> > > >
> >> > > > Stefan & Andries,
> >> > > >
> >> > > > Below is the distinct count or cardinality
> >> > > >
> >> > > > select count(*) from (select l_shipdate, l_receiptdate from
> >> > > > dfs.`/drill/testdata/tpch100/
> >> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
> >> > > > +---------+
> >> > > > | EXPR$0  |
> >> > > > +---------+
> >> > > > | 75780   |
> >> > > > +---------+
> >> > > >
> >> > > > - Rahul
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> >> > > > aengelbrecht@maprtech.com> wrote:
> >> > > >
> >> > > >> What is the distinct count for this columns? IIRC TPC-H has at
> >> least 5
> >> > > >> years of data irrespective of SF, so you are requesting a lot of
> >> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate
> and
> >> > > >> correlating receipt date data, your query doesn’t count the
> actual
> >> > > files.
> >> > > >>
> >> > > >> Try to partition just on the shipmate column first.
> >> > > >>
> >> > > >> —Andries
> >> > > >>
> >> > > >>
> >> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> >> > stefan@activitystream.com
> >> > > >
> >> > > >> wrote:
> >> > > >>>
> >> > > >>> Hi,
> >> > > >>>
> >> > > >>> Is it possible that the combination values of  (l_shipdate,
> >> > > >>> l_receiptdate) have a very high cardinality?
> >> > > >>> I would think you are creating partition files for a small
> subset
> >> of
> >> > > the
> >> > > >>> data.
> >> > > >>>
> >> > > >>> Please keep in mind that I know nothing about TPCH SF100 and
> only
> >> a
> >> > > >> little
> >> > > >>> about Drill :).
> >> > > >>>
> >> > > >>> Regards,
> >> > > >>> -Stefan
> >> > > >>>
> >> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <
> smp@apache.org>
> >> > > wrote:
> >> > > >>>
> >> > > >>>> It would be helpful if you could figure out what the file count
> >> is.
> >> > > But
> >> > > >>>> here are some thoughs:
> >> > > >>>>
> >> > > >>>> What is the value of the option:
> >> > > >>>> store.partition.hash_distribute
> >> > > >>>>
> >> > > >>>> If it is false, which it is by default, then every fragment
> will
> >> > > >>>> potentially have data in every partition. In this case, that
> >> could
> >> > > >> increase
> >> > > >>>> the number of files by a factor of 8.
> >> > > >>>>
> >> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> >> > > >>>> challapallirahul@gmail.com> wrote:
> >> > > >>>>
> >> > > >>>>> Drillers,
> >> > > >>>>>
> >> > > >>>>> I executed the below query on TPCH SF100 with drill and it
> took
> >> > ~2hrs
> >> > > >> to
> >> > > >>>>> complete on a 2 node cluster.
> >> > > >>>>>
> >> > > >>>>> alter session set `planner.width.max_per_node` = 4;
> >> > > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
> >> > > >>>> 8147483648;
> >> > > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate)
> >> as
> >> > > >> select
> >> > > >>>> *
> >> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> >> > > >>>>>
> >> > > >>>>> The below query returned 75780, so I expected drill to create
> >> the
> >> > > same
> >> > > >> no
> >> > > >>>>> of files or may be a little more. But drill created so many
> >> files
> >> > > that
> >> > > >> a
> >> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> >> > > exceeded".
> >> > > >> (I
> >> > > >>>>> did not change the default parquet block size)
> >> > > >>>>>
> >> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> >> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> >> > > >>>> l_receiptdate)
> >> > > >>>>> sub;
> >> > > >>>>> +---------+
> >> > > >>>>> | EXPR$0  |
> >> > > >>>>> +---------+
> >> > > >>>>> | 75780   |
> >> > > >>>>> +---------+
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> Any thoughts on why drill is creating so many files?
> >> > > >>>>>
> >> > > >>>>> - Rahul
> >> > > >>>>>
> >> > > >>>>
> >> > > >>
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: No of files created by CTAS auto partition feature

Posted by rahul challapalli <ch...@gmail.com>.

Steven, Jason :

Below is my understanding of when we should spill to disk while performing
a sort. Let me know if I am missing anything

alter session set `planner.width.max_per_node` = 4;
alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
(~8GB)
create table lineitem partition by (l_shipdate, l_receiptdate) as select *
from dfs.`/drill/testdata/tpch100/lineitem`;

1. The above query creates 4 minor fragments and each minor fragment gets
~2GB for the sort phase.
2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
partition into a separate file to disk
3. The spilled files would be of different sizes.
4. Now if it is a regular CTAS (with no partition by clause), each spilled
file should be approximately ~2GB in size

I just have a hunch that we are spilling a little early :)

- Rahul


On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Jason,
>
> What you described is exactly my understanding.
>
> I did kickoff a run after setting `store.partition.hash_distribute`. It is
> still running. I am expecting the no of files to be slightly more than or
> equal to 75780. (As the default parquet block size should be sufficient for
> most of the partitions)
>
> - Rahul
>
>
>
> On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <altekrusejason@gmail.com
> > wrote:
>
>> I feel like there is a little misunderstanding here.
>>
>> Rahul, did you try setting the option that Steven suggested?
>> `store.partition.hash_distribute`
>>
>> This will cause a re-distribution of the data so that the rows that belong
>> in a particular partition will all be written by a single writer. They
>> will
>> not necessarily be all in one file, as we have a limit on file sizes and I
>> don't think we cap partition size.
>>
>> The default behavior is not to re-distribute, because it is expensive.
>> This
>> however means that every fragment will write out a file for whichever keys
>> appear in the data that ends up at that fragment.
>>
>> If there is a large number of fragments and the data is spread out pretty
>> randomly, then there is a reasonable case for turning on this option to
>> co-locate data in a single partition to a single writer to reduce the
>> number of smaller files. There is no magic formula for when it is best to
>> turn on this option, but in most cases it will reduce the number of files
>> produced.
>>
>>
>>
>> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
>> challapallirahul@gmail.com> wrote:
>>
>> > Well this for generating some testdata
>> >
>> > - Rahul
>> >
>> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
>> > aengelbrecht@maprtech.com> wrote:
>> >
>> > > Looks like Drill is doing the partitioning as requested then. May not
>> be
>> > > optimal though.
>> > >
>> > > Is there a reason why you want to subpartition this much? You may be
>> > > better of to just partition by l_shipdate (not shipmate, autocorrect
>> got
>> > me
>> > > there). Or use columns with much lower cardinality to test
>> > subpartitioning.
>> > >
>> > > —Andries
>> > >
>> > >
>> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
>> > > challapallirahul@gmail.com> wrote:
>> > > >
>> > > > Steven,
>> > > >
>> > > > You were right. The count is 606240 which is 8*75780.
>> > > >
>> > > >
>> > > > Stefan & Andries,
>> > > >
>> > > > Below is the distinct count or cardinality
>> > > >
>> > > > select count(*) from (select l_shipdate, l_receiptdate from
>> > > > dfs.`/drill/testdata/tpch100/
>> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
>> > > > +---------+
>> > > > | EXPR$0  |
>> > > > +---------+
>> > > > | 75780   |
>> > > > +---------+
>> > > >
>> > > > - Rahul
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
>> > > > aengelbrecht@maprtech.com> wrote:
>> > > >
>> > > >> What is the distinct count for this columns? IIRC TPC-H has at
>> least 5
>> > > >> years of data irrespective of SF, so you are requesting a lot of
>> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate and
>> > > >> correlating receipt date data, your query doesn’t count the actual
>> > > files.
>> > > >>
>> > > >> Try to partition just on the shipmate column first.
>> > > >>
>> > > >> —Andries
>> > > >>
>> > > >>
>> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
>> > stefan@activitystream.com
>> > > >
>> > > >> wrote:
>> > > >>>
>> > > >>> Hi,
>> > > >>>
>> > > >>> Is it possible that the combination values of  (l_shipdate,
>> > > >>> l_receiptdate) have a very high cardinality?
>> > > >>> I would think you are creating partition files for a small subset
>> of
>> > > the
>> > > >>> data.
>> > > >>>
>> > > >>> Please keep in mind that I know nothing about TPCH SF100 and only
>> a
>> > > >> little
>> > > >>> about Drill :).
>> > > >>>
>> > > >>> Regards,
>> > > >>> -Stefan
>> > > >>>
>> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org>
>> > > wrote:
>> > > >>>
>> > > >>>> It would be helpful if you could figure out what the file count
>> is.
>> > > But
>> > > >>>> here are some thoughs:
>> > > >>>>
>> > > >>>> What is the value of the option:
>> > > >>>> store.partition.hash_distribute
>> > > >>>>
>> > > >>>> If it is false, which it is by default, then every fragment will
>> > > >>>> potentially have data in every partition. In this case, that
>> could
>> > > >> increase
>> > > >>>> the number of files by a factor of 8.
>> > > >>>>
>> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
>> > > >>>> challapallirahul@gmail.com> wrote:
>> > > >>>>
>> > > >>>>> Drillers,
>> > > >>>>>
>> > > >>>>> I executed the below query on TPCH SF100 with drill and it took
>> > ~2hrs
>> > > >> to
>> > > >>>>> complete on a 2 node cluster.
>> > > >>>>>
>> > > >>>>> alter session set `planner.width.max_per_node` = 4;
>> > > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
>> > > >>>> 8147483648;
>> > > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate)
>> as
>> > > >> select
>> > > >>>> *
>> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
>> > > >>>>>
>> > > >>>>> The below query returned 75780, so I expected drill to create
>> the
>> > > same
>> > > >> no
>> > > >>>>> of files or may be a little more. But drill created so many
>> files
>> > > that
>> > > >> a
>> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
>> > > exceeded".
>> > > >> (I
>> > > >>>>> did not change the default parquet block size)
>> > > >>>>>
>> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
>> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
>> > > >>>> l_receiptdate)
>> > > >>>>> sub;
>> > > >>>>> +---------+
>> > > >>>>> | EXPR$0  |
>> > > >>>>> +---------+
>> > > >>>>> | 75780   |
>> > > >>>>> +---------+
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Any thoughts on why drill is creating so many files?
>> > > >>>>>
>> > > >>>>> - Rahul
>> > > >>>>>
>> > > >>>>
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>
>
>

Re: No of files created by CTAS auto partition feature

Posted by rahul challapalli <ch...@gmail.com>.

Jason,

What you described is exactly my understanding.

I did kickoff a run after setting `store.partition.hash_distribute`. It is
still running. I am expecting the no of files to be slightly more than or
equal to 75780. (As the default parquet block size should be sufficient for
most of the partitions)

- Rahul



On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <al...@gmail.com>
wrote:

> I feel like there is a little misunderstanding here.
>
> Rahul, did you try setting the option that Steven suggested?
> `store.partition.hash_distribute`
>
> This will cause a re-distribution of the data so that the rows that belong
> in a particular partition will all be written by a single writer. They will
> not necessarily be all in one file, as we have a limit on file sizes and I
> don't think we cap partition size.
>
> The default behavior is not to re-distribute, because it is expensive. This
> however means that every fragment will write out a file for whichever keys
> appear in the data that ends up at that fragment.
>
> If there is a large number of fragments and the data is spread out pretty
> randomly, then there is a reasonable case for turning on this option to
> co-locate data in a single partition to a single writer to reduce the
> number of smaller files. There is no magic formula for when it is best to
> turn on this option, but in most cases it will reduce the number of files
> produced.
>
>
>
> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Well this for generating some testdata
> >
> > - Rahul
> >
> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> > aengelbrecht@maprtech.com> wrote:
> >
> > > Looks like Drill is doing the partitioning as requested then. May not
> be
> > > optimal though.
> > >
> > > Is there a reason why you want to subpartition this much? You may be
> > > better of to just partition by l_shipdate (not shipmate, autocorrect
> got
> > me
> > > there). Or use columns with much lower cardinality to test
> > subpartitioning.
> > >
> > > —Andries
> > >
> > >
> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > > >
> > > > Steven,
> > > >
> > > > You were right. The count is 606240 which is 8*75780.
> > > >
> > > >
> > > > Stefan & Andries,
> > > >
> > > > Below is the distinct count or cardinality
> > > >
> > > > select count(*) from (select l_shipdate, l_receiptdate from
> > > > dfs.`/drill/testdata/tpch100/
> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
> > > > +---------+
> > > > | EXPR$0  |
> > > > +---------+
> > > > | 75780   |
> > > > +---------+
> > > >
> > > > - Rahul
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> > > > aengelbrecht@maprtech.com> wrote:
> > > >
> > > >> What is the distinct count for this columns? IIRC TPC-H has at
> least 5
> > > >> years of data irrespective of SF, so you are requesting a lot of
> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate and
> > > >> correlating receipt date data, your query doesn’t count the actual
> > > files.
> > > >>
> > > >> Try to partition just on the shipmate column first.
> > > >>
> > > >> —Andries
> > > >>
> > > >>
> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> > stefan@activitystream.com
> > > >
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> Is it possible that the combination values of  (l_shipdate,
> > > >>> l_receiptdate) have a very high cardinality?
> > > >>> I would think you are creating partition files for a small subset
> of
> > > the
> > > >>> data.
> > > >>>
> > > >>> Please keep in mind that I know nothing about TPCH SF100 and only a
> > > >> little
> > > >>> about Drill :).
> > > >>>
> > > >>> Regards,
> > > >>> -Stefan
> > > >>>
> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org>
> > > wrote:
> > > >>>
> > > >>>> It would be helpful if you could figure out what the file count
> is.
> > > But
> > > >>>> here are some thoughs:
> > > >>>>
> > > >>>> What is the value of the option:
> > > >>>> store.partition.hash_distribute
> > > >>>>
> > > >>>> If it is false, which it is by default, then every fragment will
> > > >>>> potentially have data in every partition. In this case, that could
> > > >> increase
> > > >>>> the number of files by a factor of 8.
> > > >>>>
> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> > > >>>> challapallirahul@gmail.com> wrote:
> > > >>>>
> > > >>>>> Drillers,
> > > >>>>>
> > > >>>>> I executed the below query on TPCH SF100 with drill and it took
> > ~2hrs
> > > >> to
> > > >>>>> complete on a 2 node cluster.
> > > >>>>>
> > > >>>>> alter session set `planner.width.max_per_node` = 4;
> > > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
> > > >>>> 8147483648;
> > > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate) as
> > > >> select
> > > >>>> *
> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> > > >>>>>
> > > >>>>> The below query returned 75780, so I expected drill to create the
> > > same
> > > >> no
> > > >>>>> of files or may be a little more. But drill created so many files
> > > that
> > > >> a
> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> > > exceeded".
> > > >> (I
> > > >>>>> did not change the default parquet block size)
> > > >>>>>
> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> > > >>>> l_receiptdate)
> > > >>>>> sub;
> > > >>>>> +---------+
> > > >>>>> | EXPR$0  |
> > > >>>>> +---------+
> > > >>>>> | 75780   |
> > > >>>>> +---------+
> > > >>>>>
> > > >>>>>
> > > >>>>> Any thoughts on why drill is creating so many files?
> > > >>>>>
> > > >>>>> - Rahul
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: No of files created by CTAS auto partition feature

Posted by Jason Altekruse <al...@gmail.com>.

I feel like there is a little misunderstanding here.

Rahul, did you try setting the option that Steven suggested?
`store.partition.hash_distribute`

This will cause a re-distribution of the data so that the rows that belong
in a particular partition will all be written by a single writer. They will
not necessarily be all in one file, as we have a limit on file sizes and I
don't think we cap partition size.

The default behavior is not to re-distribute, because it is expensive. This
however means that every fragment will write out a file for whichever keys
appear in the data that ends up at that fragment.

If there is a large number of fragments and the data is spread out pretty
randomly, then there is a reasonable case for turning on this option to
co-locate data in a single partition to a single writer to reduce the
number of smaller files. There is no magic formula for when it is best to
turn on this option, but in most cases it will reduce the number of files
produced.



On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Well this for generating some testdata
>
> - Rahul
>
> On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
> > Looks like Drill is doing the partitioning as requested then. May not be
> > optimal though.
> >
> > Is there a reason why you want to subpartition this much? You may be
> > better of to just partition by l_shipdate (not shipmate, autocorrect got
> me
> > there). Or use columns with much lower cardinality to test
> subpartitioning.
> >
> > —Andries
> >
> >
> > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> > >
> > > Steven,
> > >
> > > You were right. The count is 606240 which is 8*75780.
> > >
> > >
> > > Stefan & Andries,
> > >
> > > Below is the distinct count or cardinality
> > >
> > > select count(*) from (select l_shipdate, l_receiptdate from
> > > dfs.`/drill/testdata/tpch100/
> > > lineitem` group by l_shipdate, l_receiptdate) sub;
> > > +---------+
> > > | EXPR$0  |
> > > +---------+
> > > | 75780   |
> > > +---------+
> > >
> > > - Rahul
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> > > aengelbrecht@maprtech.com> wrote:
> > >
> > >> What is the distinct count for this columns? IIRC TPC-H has at least 5
> > >> years of data irrespective of SF, so you are requesting a lot of
> > >> partitions. 76K sounds about right for 5 years of TPCH shipmate and
> > >> correlating receipt date data, your query doesn’t count the actual
> > files.
> > >>
> > >> Try to partition just on the shipmate column first.
> > >>
> > >> —Andries
> > >>
> > >>
> > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> stefan@activitystream.com
> > >
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Is it possible that the combination values of  (l_shipdate,
> > >>> l_receiptdate) have a very high cardinality?
> > >>> I would think you are creating partition files for a small subset of
> > the
> > >>> data.
> > >>>
> > >>> Please keep in mind that I know nothing about TPCH SF100 and only a
> > >> little
> > >>> about Drill :).
> > >>>
> > >>> Regards,
> > >>> -Stefan
> > >>>
> > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org>
> > wrote:
> > >>>
> > >>>> It would be helpful if you could figure out what the file count is.
> > But
> > >>>> here are some thoughs:
> > >>>>
> > >>>> What is the value of the option:
> > >>>> store.partition.hash_distribute
> > >>>>
> > >>>> If it is false, which it is by default, then every fragment will
> > >>>> potentially have data in every partition. In this case, that could
> > >> increase
> > >>>> the number of files by a factor of 8.
> > >>>>
> > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> > >>>> challapallirahul@gmail.com> wrote:
> > >>>>
> > >>>>> Drillers,
> > >>>>>
> > >>>>> I executed the below query on TPCH SF100 with drill and it took
> ~2hrs
> > >> to
> > >>>>> complete on a 2 node cluster.
> > >>>>>
> > >>>>> alter session set `planner.width.max_per_node` = 4;
> > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
> > >>>> 8147483648;
> > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate) as
> > >> select
> > >>>> *
> > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> > >>>>>
> > >>>>> The below query returned 75780, so I expected drill to create the
> > same
> > >> no
> > >>>>> of files or may be a little more. But drill created so many files
> > that
> > >> a
> > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> > exceeded".
> > >> (I
> > >>>>> did not change the default parquet block size)
> > >>>>>
> > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> > >>>> l_receiptdate)
> > >>>>> sub;
> > >>>>> +---------+
> > >>>>> | EXPR$0  |
> > >>>>> +---------+
> > >>>>> | 75780   |
> > >>>>> +---------+
> > >>>>>
> > >>>>>
> > >>>>> Any thoughts on why drill is creating so many files?
> > >>>>>
> > >>>>> - Rahul
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: No of files created by CTAS auto partition feature

Posted by rahul challapalli <ch...@gmail.com>.

Well this for generating some testdata

- Rahul

On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Looks like Drill is doing the partitioning as requested then. May not be
> optimal though.
>
> Is there a reason why you want to subpartition this much? You may be
> better of to just partition by l_shipdate (not shipmate, autocorrect got me
> there). Or use columns with much lower cardinality to test subpartitioning.
>
> —Andries
>
>
> > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
> >
> > Steven,
> >
> > You were right. The count is 606240 which is 8*75780.
> >
> >
> > Stefan & Andries,
> >
> > Below is the distinct count or cardinality
> >
> > select count(*) from (select l_shipdate, l_receiptdate from
> > dfs.`/drill/testdata/tpch100/
> > lineitem` group by l_shipdate, l_receiptdate) sub;
> > +---------+
> > | EXPR$0  |
> > +---------+
> > | 75780   |
> > +---------+
> >
> > - Rahul
> >
> >
> >
> >
> >
> > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> > aengelbrecht@maprtech.com> wrote:
> >
> >> What is the distinct count for this columns? IIRC TPC-H has at least 5
> >> years of data irrespective of SF, so you are requesting a lot of
> >> partitions. 76K sounds about right for 5 years of TPCH shipmate and
> >> correlating receipt date data, your query doesn’t count the actual
> files.
> >>
> >> Try to partition just on the shipmate column first.
> >>
> >> —Andries
> >>
> >>
> >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <stefan@activitystream.com
> >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Is it possible that the combination values of  (l_shipdate,
> >>> l_receiptdate) have a very high cardinality?
> >>> I would think you are creating partition files for a small subset of
> the
> >>> data.
> >>>
> >>> Please keep in mind that I know nothing about TPCH SF100 and only a
> >> little
> >>> about Drill :).
> >>>
> >>> Regards,
> >>> -Stefan
> >>>
> >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org>
> wrote:
> >>>
> >>>> It would be helpful if you could figure out what the file count is.
> But
> >>>> here are some thoughs:
> >>>>
> >>>> What is the value of the option:
> >>>> store.partition.hash_distribute
> >>>>
> >>>> If it is false, which it is by default, then every fragment will
> >>>> potentially have data in every partition. In this case, that could
> >> increase
> >>>> the number of files by a factor of 8.
> >>>>
> >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> >>>> challapallirahul@gmail.com> wrote:
> >>>>
> >>>>> Drillers,
> >>>>>
> >>>>> I executed the below query on TPCH SF100 with drill and it took ~2hrs
> >> to
> >>>>> complete on a 2 node cluster.
> >>>>>
> >>>>> alter session set `planner.width.max_per_node` = 4;
> >>>>> alter session set `planner.memory.max_query_memory_per_node` =
> >>>> 8147483648;
> >>>>> create table lineitem partition by (l_shipdate, l_receiptdate) as
> >> select
> >>>> *
> >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> >>>>>
> >>>>> The below query returned 75780, so I expected drill to create the
> same
> >> no
> >>>>> of files or may be a little more. But drill created so many files
> that
> >> a
> >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> exceeded".
> >> (I
> >>>>> did not change the default parquet block size)
> >>>>>
> >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> >>>> l_receiptdate)
> >>>>> sub;
> >>>>> +---------+
> >>>>> | EXPR$0  |
> >>>>> +---------+
> >>>>> | 75780   |
> >>>>> +---------+
> >>>>>
> >>>>>
> >>>>> Any thoughts on why drill is creating so many files?
> >>>>>
> >>>>> - Rahul
> >>>>>
> >>>>
> >>
> >>
>
>

Re: No of files created by CTAS auto partition feature

Posted by Andries Engelbrecht <ae...@maprtech.com>.

Looks like Drill is doing the partitioning as requested then. May not be optimal though.

Is there a reason why you want to subpartition this much? You may be better of to just partition by l_shipdate (not shipmate, autocorrect got me there). Or use columns with much lower cardinality to test subpartitioning.

—Andries


> On Aug 26, 2015, at 3:05 PM, rahul challapalli <ch...@gmail.com> wrote:
> 
> Steven,
> 
> You were right. The count is 606240 which is 8*75780.
> 
> 
> Stefan & Andries,
> 
> Below is the distinct count or cardinality
> 
> select count(*) from (select l_shipdate, l_receiptdate from
> dfs.`/drill/testdata/tpch100/
> lineitem` group by l_shipdate, l_receiptdate) sub;
> +---------+
> | EXPR$0  |
> +---------+
> | 75780   |
> +---------+
> 
> - Rahul
> 
> 
> 
> 
> 
> On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
> 
>> What is the distinct count for this columns? IIRC TPC-H has at least 5
>> years of data irrespective of SF, so you are requesting a lot of
>> partitions. 76K sounds about right for 5 years of TPCH shipmate and
>> correlating receipt date data, your query doesn’t count the actual files.
>> 
>> Try to partition just on the shipmate column first.
>> 
>> —Andries
>> 
>> 
>>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <st...@activitystream.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Is it possible that the combination values of  (l_shipdate,
>>> l_receiptdate) have a very high cardinality?
>>> I would think you are creating partition files for a small subset of the
>>> data.
>>> 
>>> Please keep in mind that I know nothing about TPCH SF100 and only a
>> little
>>> about Drill :).
>>> 
>>> Regards,
>>> -Stefan
>>> 
>>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org> wrote:
>>> 
>>>> It would be helpful if you could figure out what the file count is. But
>>>> here are some thoughs:
>>>> 
>>>> What is the value of the option:
>>>> store.partition.hash_distribute
>>>> 
>>>> If it is false, which it is by default, then every fragment will
>>>> potentially have data in every partition. In this case, that could
>> increase
>>>> the number of files by a factor of 8.
>>>> 
>>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
>>>> challapallirahul@gmail.com> wrote:
>>>> 
>>>>> Drillers,
>>>>> 
>>>>> I executed the below query on TPCH SF100 with drill and it took ~2hrs
>> to
>>>>> complete on a 2 node cluster.
>>>>> 
>>>>> alter session set `planner.width.max_per_node` = 4;
>>>>> alter session set `planner.memory.max_query_memory_per_node` =
>>>> 8147483648;
>>>>> create table lineitem partition by (l_shipdate, l_receiptdate) as
>> select
>>>> *
>>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
>>>>> 
>>>>> The below query returned 75780, so I expected drill to create the same
>> no
>>>>> of files or may be a little more. But drill created so many files that
>> a
>>>>> "hadoop fs -count" command failed with a "GC overhead limit exceeded".
>> (I
>>>>> did not change the default parquet block size)
>>>>> 
>>>>> select count(*) from (select l_shipdate, l_receiptdate from
>>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
>>>> l_receiptdate)
>>>>> sub;
>>>>> +---------+
>>>>> | EXPR$0  |
>>>>> +---------+
>>>>> | 75780   |
>>>>> +---------+
>>>>> 
>>>>> 
>>>>> Any thoughts on why drill is creating so many files?
>>>>> 
>>>>> - Rahul
>>>>> 
>>>> 
>> 
>>

Re: No of files created by CTAS auto partition feature

Posted by rahul challapalli <ch...@gmail.com>.

Steven,

You were right. The count is 606240 which is 8*75780.


Stefan & Andries,

Below is the distinct count or cardinality

select count(*) from (select l_shipdate, l_receiptdate from
dfs.`/drill/testdata/tpch100/
lineitem` group by l_shipdate, l_receiptdate) sub;
+---------+
| EXPR$0  |
+---------+
| 75780   |
+---------+

- Rahul





On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> What is the distinct count for this columns? IIRC TPC-H has at least 5
> years of data irrespective of SF, so you are requesting a lot of
> partitions. 76K sounds about right for 5 years of TPCH shipmate and
> correlating receipt date data, your query doesn’t count the actual files.
>
> Try to partition just on the shipmate column first.
>
> —Andries
>
>
> > On Aug 26, 2015, at 12:34 PM, Stefán Baxter <st...@activitystream.com>
> wrote:
> >
> > Hi,
> >
> > Is it possible that the combination values of  (l_shipdate,
> > l_receiptdate) have a very high cardinality?
> > I would think you are creating partition files for a small subset of the
> > data.
> >
> > Please keep in mind that I know nothing about TPCH SF100 and only a
> little
> > about Drill :).
> >
> > Regards,
> > -Stefan
> >
> > On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org> wrote:
> >
> >> It would be helpful if you could figure out what the file count is. But
> >> here are some thoughs:
> >>
> >> What is the value of the option:
> >> store.partition.hash_distribute
> >>
> >> If it is false, which it is by default, then every fragment will
> >> potentially have data in every partition. In this case, that could
> increase
> >> the number of files by a factor of 8.
> >>
> >> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> >> challapallirahul@gmail.com> wrote:
> >>
> >>> Drillers,
> >>>
> >>> I executed the below query on TPCH SF100 with drill and it took ~2hrs
> to
> >>> complete on a 2 node cluster.
> >>>
> >>> alter session set `planner.width.max_per_node` = 4;
> >>> alter session set `planner.memory.max_query_memory_per_node` =
> >> 8147483648;
> >>> create table lineitem partition by (l_shipdate, l_receiptdate) as
> select
> >> *
> >>> from dfs.`/drill/testdata/tpch100/lineitem`;
> >>>
> >>> The below query returned 75780, so I expected drill to create the same
> no
> >>> of files or may be a little more. But drill created so many files that
> a
> >>> "hadoop fs -count" command failed with a "GC overhead limit exceeded".
> (I
> >>> did not change the default parquet block size)
> >>>
> >>> select count(*) from (select l_shipdate, l_receiptdate from
> >>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> >> l_receiptdate)
> >>> sub;
> >>> +---------+
> >>> | EXPR$0  |
> >>> +---------+
> >>> | 75780   |
> >>> +---------+
> >>>
> >>>
> >>> Any thoughts on why drill is creating so many files?
> >>>
> >>> - Rahul
> >>>
> >>
>
>

Re: No of files created by CTAS auto partition feature

Posted by Andries Engelbrecht <ae...@maprtech.com>.

What is the distinct count for this columns? IIRC TPC-H has at least 5 years of data irrespective of SF, so you are requesting a lot of partitions. 76K sounds about right for 5 years of TPCH shipmate and correlating receipt date data, your query doesn’t count the actual files.

Try to partition just on the shipmate column first.

—Andries


> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <st...@activitystream.com> wrote:
> 
> Hi,
> 
> Is it possible that the combination values of  (l_shipdate,
> l_receiptdate) have a very high cardinality?
> I would think you are creating partition files for a small subset of the
> data.
> 
> Please keep in mind that I know nothing about TPCH SF100 and only a little
> about Drill :).
> 
> Regards,
> -Stefan
> 
> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org> wrote:
> 
>> It would be helpful if you could figure out what the file count is. But
>> here are some thoughs:
>> 
>> What is the value of the option:
>> store.partition.hash_distribute
>> 
>> If it is false, which it is by default, then every fragment will
>> potentially have data in every partition. In this case, that could increase
>> the number of files by a factor of 8.
>> 
>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
>> challapallirahul@gmail.com> wrote:
>> 
>>> Drillers,
>>> 
>>> I executed the below query on TPCH SF100 with drill and it took ~2hrs to
>>> complete on a 2 node cluster.
>>> 
>>> alter session set `planner.width.max_per_node` = 4;
>>> alter session set `planner.memory.max_query_memory_per_node` =
>> 8147483648;
>>> create table lineitem partition by (l_shipdate, l_receiptdate) as select
>> *
>>> from dfs.`/drill/testdata/tpch100/lineitem`;
>>> 
>>> The below query returned 75780, so I expected drill to create the same no
>>> of files or may be a little more. But drill created so many files that a
>>> "hadoop fs -count" command failed with a "GC overhead limit exceeded". (I
>>> did not change the default parquet block size)
>>> 
>>> select count(*) from (select l_shipdate, l_receiptdate from
>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
>> l_receiptdate)
>>> sub;
>>> +---------+
>>> | EXPR$0  |
>>> +---------+
>>> | 75780   |
>>> +---------+
>>> 
>>> 
>>> Any thoughts on why drill is creating so many files?
>>> 
>>> - Rahul
>>> 
>>

Re: No of files created by CTAS auto partition feature

Posted by Stefán Baxter <st...@activitystream.com>.

Hi,

Is it possible that the combination values of  (l_shipdate,
l_receiptdate) have a very high cardinality?
I would think you are creating partition files for a small subset of the
data.

Please keep in mind that I know nothing about TPCH SF100 and only a little
about Drill :).

Regards,
 -Stefan

On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <sm...@apache.org> wrote:

> It would be helpful if you could figure out what the file count is. But
> here are some thoughs:
>
> What is the value of the option:
> store.partition.hash_distribute
>
> If it is false, which it is by default, then every fragment will
> potentially have data in every partition. In this case, that could increase
> the number of files by a factor of 8.
>
> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Drillers,
> >
> > I executed the below query on TPCH SF100 with drill and it took ~2hrs to
> > complete on a 2 node cluster.
> >
> > alter session set `planner.width.max_per_node` = 4;
> > alter session set `planner.memory.max_query_memory_per_node` =
> 8147483648;
> > create table lineitem partition by (l_shipdate, l_receiptdate) as select
> *
> > from dfs.`/drill/testdata/tpch100/lineitem`;
> >
> > The below query returned 75780, so I expected drill to create the same no
> > of files or may be a little more. But drill created so many files that a
> > "hadoop fs -count" command failed with a "GC overhead limit exceeded". (I
> > did not change the default parquet block size)
> >
> > select count(*) from (select l_shipdate, l_receiptdate from
> > dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> l_receiptdate)
> > sub;
> > +---------+
> > | EXPR$0  |
> > +---------+
> > | 75780   |
> > +---------+
> >
> >
> > Any thoughts on why drill is creating so many files?
> >
> > - Rahul
> >
>

Re: No of files created by CTAS auto partition feature

Posted by Steven Phillips <sm...@apache.org>.

It would be helpful if you could figure out what the file count is. But
here are some thoughs:

What is the value of the option:
store.partition.hash_distribute

If it is false, which it is by default, then every fragment will
potentially have data in every partition. In this case, that could increase
the number of files by a factor of 8.

On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Drillers,
>
> I executed the below query on TPCH SF100 with drill and it took ~2hrs to
> complete on a 2 node cluster.
>
> alter session set `planner.width.max_per_node` = 4;
> alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
> create table lineitem partition by (l_shipdate, l_receiptdate) as select *
> from dfs.`/drill/testdata/tpch100/lineitem`;
>
> The below query returned 75780, so I expected drill to create the same no
> of files or may be a little more. But drill created so many files that a
> "hadoop fs -count" command failed with a "GC overhead limit exceeded". (I
> did not change the default parquet block size)
>
> select count(*) from (select l_shipdate, l_receiptdate from
> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate, l_receiptdate)
> sub;
> +---------+
> | EXPR$0  |
> +---------+
> | 75780   |
> +---------+
>
>
> Any thoughts on why drill is creating so many files?
>
> - Rahul
>