You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Aman Sinha <am...@apache.org> on 2015/09/09 19:54:21 UTC

Directory and file based partition pruning

Currently, partition pruning gets all file names in the table and applies
the pruning.  Suppose the files are spread out over several directories and
there is a filter  on dirN,  this is not efficient - both in terms of
elapsed time and memory usage.  This has been seen in a few use cases
recently.

We should ideally perform the pruning in 2 steps:  first get the top-level
directory names only and apply the directory filter, then get the filenames
within that directory and apply remaining filters.

I will create a JIRA for this enhancement but let me know your thoughts...

Aman

Re: Directory and file based partition pruning

Posted by Jacques Nadeau <ja...@dremio.com>.

I'm guessing that the issue is Metadata reading in that case. We've seen
the problem before if you reading hundreds of thousands files since Parquet
Metadata is fairly large.

Given the multiple firings, do we know the time for a single completion. It
seems strange that partition operation in interpreted mode even with
100,000 files would take very long. If it does, I'm wondering if anyone
looked at a profiler to see where the time is spent.
On Sep 10, 2015 8:31 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:

> I got the impression of Java heap memory because one customer
> complained about running into out of heap memory, when they are
> dealing with pruning large number of files. Is it possible that the rule
> put the value vector in the direct memory, but also uses object reference
> which is proportional to the # of files. That might explain why they
> run into out of heap memory.
>
>
>
> On Thu, Sep 10, 2015 at 6:25 PM, Aman Sinha <as...@maprtech.com> wrote:
> > Yes, it is a good point about multiple invocations of the PruneScan rule.
> > The other point about using Java heap is not correct.  The rule does
> > off-heap allocation using memory buffer from QueryContext and in the
> > finally block releases the memory.
> >
> > Aman
> >
> > On Thu, Sep 10, 2015 at 6:18 PM, Jinfeng Ni <ji...@gmail.com>
> wrote:
> >
> >> I opened DRILL-3765 for the multiple rule execution issue:
> >>
> >> https://issues.apache.org/jira/browse/DRILL-3765
> >>
> >>
> >> On Thu, Sep 10, 2015 at 5:34 PM, Jinfeng Ni <ji...@gmail.com>
> wrote:
> >> > Seems to me one important reason we hit out of heap memory for
> partition
> >> > prune rule is that the rule itself is invoked multiple times, even the
> >> > filter has been pushed into scan in the first call.
> >> >
> >> > I tried with a simple unit test
> >> > TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is
> >> the #
> >> > of frequency of partition rules that are fired in Calcite trace
> >> >
> >> >  #_rule_fire,  rule name
> >> >
> >> >  4 [PruneScanRule:Filter_On_Project_Parquet]
> >> >  4 [PruneScanRule:Filter_On_Project]
> >> >
> >> >  2 [PruneScanRule:Filter_On_Scan_Parquet]
> >> >  2 [PruneScanRule:Filter_On_Scan]
> >> >
> >> > Setting a breaking point in PruneScanRule where it calls the
> interpreter
> >> to
> >> > evaluate the expression, I could see that the code stops 6 times in
> that
> >> > point; meaning that Drill will have to build the vector containing the
> >> > filenames at least 6 times.  That would cause lots of heap memory
> >> > consumption, if gc does not kick in to release the memory used in the
> >> prior
> >> > rule's execution.
> >> >
> >> > I think making the partition pruning multiple phases will help to
> reduce
> >> the
> >> > memory consumption. But for now, it seems important to avoid the
> repeated
> >> > and unnecessary rule execution.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Sep 10, 2015 at 4:42 PM, Aman Sinha <as...@maprtech.com>
> wrote:
> >> >>
> >> >> Agree on the N phased approach.  I have filed a JIRA for the
> >> enhancement:
> >> >>  DRILL-3759.
> >> >> Regarding the simplification of the expression tree logic..did you
> mean
> >> >> the
> >> >> logic in FindPartitionConditions  or the Interpreter ?
> >> >> Perhaps you can add comments in the JIRA with some explanation.  I
> am in
> >> >> favor of simplification where possible.
> >> >>
> >> >> On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com>
> >> >> wrote:
> >> >>
> >> >> > Makes sense.
> >> >> >
> >> >> > Is there we can do this with lazy materializations rather than
> writing
> >> >> > complex expression tree logic? I hate have no all this custom
> >> expression
> >> >> > tree manipulation logic.
> >> >> >
> >> >> > Also, it seems like this should be N phased rather than two phase
> >> where
> >> >> > N
> >> >> > is the number of directories below the base path.
> >> >> >
> >> >> > Thoughts?
> >> >> > On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org>
> wrote:
> >> >> >
> >> >> > > Currently, partition pruning gets all file names in the table and
> >> >> > > applies
> >> >> > > the pruning.  Suppose the files are spread out over several
> >> >> > > directories
> >> >> > and
> >> >> > > there is a filter  on dirN,  this is not efficient - both in
> terms
> >> of
> >> >> > > elapsed time and memory usage.  This has been seen in a few use
> >> cases
> >> >> > > recently.
> >> >> > >
> >> >> > > We should ideally perform the pruning in 2 steps:  first get the
> >> >> > top-level
> >> >> > > directory names only and apply the directory filter, then get the
> >> >> > filenames
> >> >> > > within that directory and apply remaining filters.
> >> >> > >
> >> >> > > I will create a JIRA for this enhancement but let me know your
> >> >> > thoughts...
> >> >> > >
> >> >> > > Aman
> >> >> > >
> >> >> >
> >> >
> >> >
> >>
>

Re: Directory and file based partition pruning

Posted by Jinfeng Ni <ji...@gmail.com>.

I got the impression of Java heap memory because one customer
complained about running into out of heap memory, when they are
dealing with pruning large number of files. Is it possible that the rule
put the value vector in the direct memory, but also uses object reference
which is proportional to the # of files. That might explain why they
run into out of heap memory.



On Thu, Sep 10, 2015 at 6:25 PM, Aman Sinha <as...@maprtech.com> wrote:
> Yes, it is a good point about multiple invocations of the PruneScan rule.
> The other point about using Java heap is not correct.  The rule does
> off-heap allocation using memory buffer from QueryContext and in the
> finally block releases the memory.
>
> Aman
>
> On Thu, Sep 10, 2015 at 6:18 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>
>> I opened DRILL-3765 for the multiple rule execution issue:
>>
>> https://issues.apache.org/jira/browse/DRILL-3765
>>
>>
>> On Thu, Sep 10, 2015 at 5:34 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>> > Seems to me one important reason we hit out of heap memory for partition
>> > prune rule is that the rule itself is invoked multiple times, even the
>> > filter has been pushed into scan in the first call.
>> >
>> > I tried with a simple unit test
>> > TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is
>> the #
>> > of frequency of partition rules that are fired in Calcite trace
>> >
>> >  #_rule_fire,  rule name
>> >
>> >  4 [PruneScanRule:Filter_On_Project_Parquet]
>> >  4 [PruneScanRule:Filter_On_Project]
>> >
>> >  2 [PruneScanRule:Filter_On_Scan_Parquet]
>> >  2 [PruneScanRule:Filter_On_Scan]
>> >
>> > Setting a breaking point in PruneScanRule where it calls the interpreter
>> to
>> > evaluate the expression, I could see that the code stops 6 times in that
>> > point; meaning that Drill will have to build the vector containing the
>> > filenames at least 6 times.  That would cause lots of heap memory
>> > consumption, if gc does not kick in to release the memory used in the
>> prior
>> > rule's execution.
>> >
>> > I think making the partition pruning multiple phases will help to reduce
>> the
>> > memory consumption. But for now, it seems important to avoid the repeated
>> > and unnecessary rule execution.
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 10, 2015 at 4:42 PM, Aman Sinha <as...@maprtech.com> wrote:
>> >>
>> >> Agree on the N phased approach.  I have filed a JIRA for the
>> enhancement:
>> >>  DRILL-3759.
>> >> Regarding the simplification of the expression tree logic..did you mean
>> >> the
>> >> logic in FindPartitionConditions  or the Interpreter ?
>> >> Perhaps you can add comments in the JIRA with some explanation.  I am in
>> >> favor of simplification where possible.
>> >>
>> >> On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com>
>> >> wrote:
>> >>
>> >> > Makes sense.
>> >> >
>> >> > Is there we can do this with lazy materializations rather than writing
>> >> > complex expression tree logic? I hate have no all this custom
>> expression
>> >> > tree manipulation logic.
>> >> >
>> >> > Also, it seems like this should be N phased rather than two phase
>> where
>> >> > N
>> >> > is the number of directories below the base path.
>> >> >
>> >> > Thoughts?
>> >> > On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:
>> >> >
>> >> > > Currently, partition pruning gets all file names in the table and
>> >> > > applies
>> >> > > the pruning.  Suppose the files are spread out over several
>> >> > > directories
>> >> > and
>> >> > > there is a filter  on dirN,  this is not efficient - both in terms
>> of
>> >> > > elapsed time and memory usage.  This has been seen in a few use
>> cases
>> >> > > recently.
>> >> > >
>> >> > > We should ideally perform the pruning in 2 steps:  first get the
>> >> > top-level
>> >> > > directory names only and apply the directory filter, then get the
>> >> > filenames
>> >> > > within that directory and apply remaining filters.
>> >> > >
>> >> > > I will create a JIRA for this enhancement but let me know your
>> >> > thoughts...
>> >> > >
>> >> > > Aman
>> >> > >
>> >> >
>> >
>> >
>>

Re: Directory and file based partition pruning

Posted by Aman Sinha <as...@maprtech.com>.

Yes, it is a good point about multiple invocations of the PruneScan rule.
The other point about using Java heap is not correct.  The rule does
off-heap allocation using memory buffer from QueryContext and in the
finally block releases the memory.

Aman

On Thu, Sep 10, 2015 at 6:18 PM, Jinfeng Ni <ji...@gmail.com> wrote:

> I opened DRILL-3765 for the multiple rule execution issue:
>
> https://issues.apache.org/jira/browse/DRILL-3765
>
>
> On Thu, Sep 10, 2015 at 5:34 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> > Seems to me one important reason we hit out of heap memory for partition
> > prune rule is that the rule itself is invoked multiple times, even the
> > filter has been pushed into scan in the first call.
> >
> > I tried with a simple unit test
> > TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is
> the #
> > of frequency of partition rules that are fired in Calcite trace
> >
> >  #_rule_fire,  rule name
> >
> >  4 [PruneScanRule:Filter_On_Project_Parquet]
> >  4 [PruneScanRule:Filter_On_Project]
> >
> >  2 [PruneScanRule:Filter_On_Scan_Parquet]
> >  2 [PruneScanRule:Filter_On_Scan]
> >
> > Setting a breaking point in PruneScanRule where it calls the interpreter
> to
> > evaluate the expression, I could see that the code stops 6 times in that
> > point; meaning that Drill will have to build the vector containing the
> > filenames at least 6 times.  That would cause lots of heap memory
> > consumption, if gc does not kick in to release the memory used in the
> prior
> > rule's execution.
> >
> > I think making the partition pruning multiple phases will help to reduce
> the
> > memory consumption. But for now, it seems important to avoid the repeated
> > and unnecessary rule execution.
> >
> >
> >
> >
> >
> > On Thu, Sep 10, 2015 at 4:42 PM, Aman Sinha <as...@maprtech.com> wrote:
> >>
> >> Agree on the N phased approach.  I have filed a JIRA for the
> enhancement:
> >>  DRILL-3759.
> >> Regarding the simplification of the expression tree logic..did you mean
> >> the
> >> logic in FindPartitionConditions  or the Interpreter ?
> >> Perhaps you can add comments in the JIRA with some explanation.  I am in
> >> favor of simplification where possible.
> >>
> >> On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>
> >> > Makes sense.
> >> >
> >> > Is there we can do this with lazy materializations rather than writing
> >> > complex expression tree logic? I hate have no all this custom
> expression
> >> > tree manipulation logic.
> >> >
> >> > Also, it seems like this should be N phased rather than two phase
> where
> >> > N
> >> > is the number of directories below the base path.
> >> >
> >> > Thoughts?
> >> > On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:
> >> >
> >> > > Currently, partition pruning gets all file names in the table and
> >> > > applies
> >> > > the pruning.  Suppose the files are spread out over several
> >> > > directories
> >> > and
> >> > > there is a filter  on dirN,  this is not efficient - both in terms
> of
> >> > > elapsed time and memory usage.  This has been seen in a few use
> cases
> >> > > recently.
> >> > >
> >> > > We should ideally perform the pruning in 2 steps:  first get the
> >> > top-level
> >> > > directory names only and apply the directory filter, then get the
> >> > filenames
> >> > > within that directory and apply remaining filters.
> >> > >
> >> > > I will create a JIRA for this enhancement but let me know your
> >> > thoughts...
> >> > >
> >> > > Aman
> >> > >
> >> >
> >
> >
>

Re: Directory and file based partition pruning

Posted by Jinfeng Ni <ji...@gmail.com>.

I opened DRILL-3765 for the multiple rule execution issue:

https://issues.apache.org/jira/browse/DRILL-3765


On Thu, Sep 10, 2015 at 5:34 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> Seems to me one important reason we hit out of heap memory for partition
> prune rule is that the rule itself is invoked multiple times, even the
> filter has been pushed into scan in the first call.
>
> I tried with a simple unit test
> TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is the #
> of frequency of partition rules that are fired in Calcite trace
>
>  #_rule_fire,  rule name
>
>  4 [PruneScanRule:Filter_On_Project_Parquet]
>  4 [PruneScanRule:Filter_On_Project]
>
>  2 [PruneScanRule:Filter_On_Scan_Parquet]
>  2 [PruneScanRule:Filter_On_Scan]
>
> Setting a breaking point in PruneScanRule where it calls the interpreter to
> evaluate the expression, I could see that the code stops 6 times in that
> point; meaning that Drill will have to build the vector containing the
> filenames at least 6 times.  That would cause lots of heap memory
> consumption, if gc does not kick in to release the memory used in the prior
> rule's execution.
>
> I think making the partition pruning multiple phases will help to reduce the
> memory consumption. But for now, it seems important to avoid the repeated
> and unnecessary rule execution.
>
>
>
>
>
> On Thu, Sep 10, 2015 at 4:42 PM, Aman Sinha <as...@maprtech.com> wrote:
>>
>> Agree on the N phased approach.  I have filed a JIRA for the enhancement:
>>  DRILL-3759.
>> Regarding the simplification of the expression tree logic..did you mean
>> the
>> logic in FindPartitionConditions  or the Interpreter ?
>> Perhaps you can add comments in the JIRA with some explanation.  I am in
>> favor of simplification where possible.
>>
>> On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>
>> > Makes sense.
>> >
>> > Is there we can do this with lazy materializations rather than writing
>> > complex expression tree logic? I hate have no all this custom expression
>> > tree manipulation logic.
>> >
>> > Also, it seems like this should be N phased rather than two phase where
>> > N
>> > is the number of directories below the base path.
>> >
>> > Thoughts?
>> > On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:
>> >
>> > > Currently, partition pruning gets all file names in the table and
>> > > applies
>> > > the pruning.  Suppose the files are spread out over several
>> > > directories
>> > and
>> > > there is a filter  on dirN,  this is not efficient - both in terms of
>> > > elapsed time and memory usage.  This has been seen in a few use cases
>> > > recently.
>> > >
>> > > We should ideally perform the pruning in 2 steps:  first get the
>> > top-level
>> > > directory names only and apply the directory filter, then get the
>> > filenames
>> > > within that directory and apply remaining filters.
>> > >
>> > > I will create a JIRA for this enhancement but let me know your
>> > thoughts...
>> > >
>> > > Aman
>> > >
>> >
>
>

Re: Directory and file based partition pruning

Posted by Jinfeng Ni <ji...@gmail.com>.

Seems to me one important reason we hit out of heap memory for partition
prune rule is that the rule itself is invoked multiple times, even the
filter has been pushed into scan in the first call.

I tried with a simple unit
test TestPartitionFilter:testPartitionFilter1_Parquet_from_CTAS(), here is
the # of frequency of partition rules that are fired in Calcite trace

 #_rule_fire,  rule name

 4 [PruneScanRule:Filter_On_Project_Parquet]
 4 [PruneScanRule:Filter_On_Project]

 2 [PruneScanRule:Filter_On_Scan_Parquet]
 2 [PruneScanRule:Filter_On_Scan]

Setting a breaking point in PruneScanRule where it calls the interpreter to
evaluate the expression, I could see that the code stops 6 times in that
point; meaning that Drill will have to build the vector containing the
filenames at least 6 times.  That would cause lots of heap memory
consumption, if gc does not kick in to release the memory used in the prior
rule's execution.

I think making the partition pruning multiple phases will help to reduce
the memory consumption. But for now, it seems important to avoid the
repeated and unnecessary rule execution.

On Thu, Sep 10, 2015 at 4:42 PM, Aman Sinha <as...@maprtech.com> wrote:

> Agree on the N phased approach.  I have filed a JIRA for the enhancement:
>  DRILL-3759.
> Regarding the simplification of the expression tree logic..did you mean the
> logic in FindPartitionConditions  or the Interpreter ?
> Perhaps you can add comments in the JIRA with some explanation.  I am in
> favor of simplification where possible.
>
> On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Makes sense.
> >
> > Is there we can do this with lazy materializations rather than writing
> > complex expression tree logic? I hate have no all this custom expression
> > tree manipulation logic.
> >
> > Also, it seems like this should be N phased rather than two phase where N
> > is the number of directories below the base path.
> >
> > Thoughts?
> > On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:
> >
> > > Currently, partition pruning gets all file names in the table and
> applies
> > > the pruning.  Suppose the files are spread out over several directories
> > and
> > > there is a filter  on dirN,  this is not efficient - both in terms of
> > > elapsed time and memory usage.  This has been seen in a few use cases
> > > recently.
> > >
> > > We should ideally perform the pruning in 2 steps:  first get the
> > top-level
> > > directory names only and apply the directory filter, then get the
> > filenames
> > > within that directory and apply remaining filters.
> > >
> > > I will create a JIRA for this enhancement but let me know your
> > thoughts...
> > >
> > > Aman
> > >
> >
>

Re: Directory and file based partition pruning

Posted by Aman Sinha <as...@maprtech.com>.

Agree on the N phased approach.  I have filed a JIRA for the enhancement:
 DRILL-3759.
Regarding the simplification of the expression tree logic..did you mean the
logic in FindPartitionConditions  or the Interpreter ?
Perhaps you can add comments in the JIRA with some explanation.  I am in
favor of simplification where possible.

On Wed, Sep 9, 2015 at 10:39 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Makes sense.
>
> Is there we can do this with lazy materializations rather than writing
> complex expression tree logic? I hate have no all this custom expression
> tree manipulation logic.
>
> Also, it seems like this should be N phased rather than two phase where N
> is the number of directories below the base path.
>
> Thoughts?
> On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:
>
> > Currently, partition pruning gets all file names in the table and applies
> > the pruning.  Suppose the files are spread out over several directories
> and
> > there is a filter  on dirN,  this is not efficient - both in terms of
> > elapsed time and memory usage.  This has been seen in a few use cases
> > recently.
> >
> > We should ideally perform the pruning in 2 steps:  first get the
> top-level
> > directory names only and apply the directory filter, then get the
> filenames
> > within that directory and apply remaining filters.
> >
> > I will create a JIRA for this enhancement but let me know your
> thoughts...
> >
> > Aman
> >
>

Re: Directory and file based partition pruning

Posted by Jacques Nadeau <ja...@dremio.com>.

Makes sense.

Is there we can do this with lazy materializations rather than writing
complex expression tree logic? I hate have no all this custom expression
tree manipulation logic.

Also, it seems like this should be N phased rather than two phase where N
is the number of directories below the base path.

Thoughts?
On Sep 9, 2015 10:54 AM, "Aman Sinha" <am...@apache.org> wrote:

> Currently, partition pruning gets all file names in the table and applies
> the pruning.  Suppose the files are spread out over several directories and
> there is a filter  on dirN,  this is not efficient - both in terms of
> elapsed time and memory usage.  This has been seen in a few use cases
> recently.
>
> We should ideally perform the pruning in 2 steps:  first get the top-level
> directory names only and apply the directory filter, then get the filenames
> within that directory and apply remaining filters.
>
> I will create a JIRA for this enhancement but let me know your thoughts...
>
> Aman
>