You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Adam Gilmore <dr...@gmail.com> on 2015/04/30 08:06:30 UTC

Accessing QueryContext inside a StoragePluginOptimizerRule

Hi guys,

I'm trying to work out how I could access the QueryContext inside
a StoragePluginOptimizerRule.

I've basically implemented the Parquet pushdown filtering, but I really
need to access the session settings (for whether or not we're using the new
Parquet reader so we can completely pushdown a filter and also for
providing a setting to enable/disable pushdown filters).

I've looked in a fair bit of detail, but there doesn't seem to be a way to
access this.

If this is not the way I should be implementing it, can anyone make some
suggestions?  I really want to do full pushdown when using the new Parquet
reader, but I have no easy way to detect it's going to be used.

P.S. In cases where we "fall back" to the new Parquet reader, I don't do a
full pushdown (as detecting that is done further down than the planner).
This could be fixed in the future, but for now I'm happy for just when that
setting is true.

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Adam Gilmore <dr...@gmail.com>.
No worries - just posted it to https://reviews.apache.org/r/33836/ and
updated the JIRA accordingly.

On Mon, May 4, 2015 at 11:26 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Hey Adam,  just saw that.  Can you post to reviewboard? If you don't have
> an account,  you can easily register for one.
> On May 4, 2015 8:25 AM, "Adam Gilmore" <dr...@gmail.com> wrote:
>
> > Just wanted to let you know I've uploaded the patch:
> >
> > https://issues.apache.org/jira/browse/DRILL-1950
> >
> > Be great to get some feedback or start a review board so I can see what
> > needs to be done to get it merged in!
> >
> > On Fri, May 1, 2015 at 12:13 PM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> > > Yes.  Once per query is fine.  Theoretically we could fire the rule
> many
> > > times inside a single query and I was saying we preferably cache in the
> > > context of a single query.
> > >
> > > On Thu, Apr 30, 2015 at 5:57 PM, Adam Gilmore <dr...@gmail.com>
> > > wrote:
> > >
> > > > We'll need to look it up at least once per query, though, right?
> > Because
> > > > session variables can change query to query.  Would you suggest I
> > cached
> > > > the actual setting(s) as soon as the rule is instantiated?  The rule
> is
> > > > only called once or twice per query, I think.
> > > >
> > > > On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau <ja...@apache.org>
> > > > wrote:
> > > >
> > > > > Creating it at rule creation time is good.  The key is we want to
> > cache
> > > > it
> > > > > so we don't have to look it up every single time the rule fires.  I
> > > think
> > > > > your point about specific versus general and PlannerSettings makes
> > > sense.
> > > > >
> > > > > On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <
> dragoncurve@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Moreover, I'm a tad wary about using PlannerSettings as they seem
> > > very
> > > > > > generic (i.e. not specific to a particular storage/format plugin,
> > for
> > > > > > example).  I've used "store.parquet.enable_pushdown_filter" as
> the
> > > > > option,
> > > > > > as it's very specific to the Parquet format plugin.  I imagine I
> > can
> > > > > still
> > > > > > use getOptions() to get that option from PlannerSettings, but it
> > > feels
> > > > a
> > > > > > bit more like we should be using QueryContext like PruneScanRule
> > > uses.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <
> > dragoncurve@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I actually patched the StorageEnginePlugin and FormatPlugin to
> > pass
> > > > > > > QueryContext right through the chain of getting optimizer rules
> > (as
> > > > is
> > > > > > the
> > > > > > > case for getting basic rules, etc.).  This seemed to me like
> the
> > > most
> > > > > > > logical approach and aligned with something like your
> > > PruneScanRule.
> > > > > > >
> > > > > > > Do you think I should revert back and use the example you
> gave?​
> > > > > > >
> > > > > > > P.S. The pushdown filter has improved performance significantly
> > for
> > > > > > > certain types of queries.  It mostly comes back to how well the
> > > data
> > > > is
> > > > > > > ordered and how often it can exclude row groups.  Ideally, we
> > > should
> > > > be
> > > > > > > excluding individual pages as well, but from what I can see
> even
> > > the
> > > > > > reader
> > > > > > > from the Parquet library does not yet do this.
> > > > > > >
> > > > > > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <
> > > jacques@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> PlannerSettings is the primary we expose settings that need to
> > be
> > > > > > >> interrogated during planning (e.g. in an optimizer rule).  You
> > can
> > > > get
> > > > > > >> ahold of this by doing:
> > > > > > >>
> > > > > > >> PlannerSettings settings =
> > > > > > PrelUtil.getPlannerSettings(call.getPlanner());
> > > > > > >>
> > > > > > >> PlannerSettings then has access to session settings.
> > > > > > >>
> > > > > > >> You can see an example of this at [1]
> > > > > > >>
> > > > > > >> I'm excited to see the impact of this.  Look forward to seeing
> > the
> > > > > > patch!
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <
> > > > dragoncurve@gmail.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi guys,
> > > > > > >> >
> > > > > > >> > I'm trying to work out how I could access the QueryContext
> > > inside
> > > > > > >> > a StoragePluginOptimizerRule.
> > > > > > >> >
> > > > > > >> > I've basically implemented the Parquet pushdown filtering,
> > but I
> > > > > > really
> > > > > > >> > need to access the session settings (for whether or not
> we're
> > > > using
> > > > > > the
> > > > > > >> new
> > > > > > >> > Parquet reader so we can completely pushdown a filter and
> also
> > > for
> > > > > > >> > providing a setting to enable/disable pushdown filters).
> > > > > > >> >
> > > > > > >> > I've looked in a fair bit of detail, but there doesn't seem
> to
> > > be
> > > > a
> > > > > > way
> > > > > > >> to
> > > > > > >> > access this.
> > > > > > >> >
> > > > > > >> > If this is not the way I should be implementing it, can
> anyone
> > > > make
> > > > > > some
> > > > > > >> > suggestions?  I really want to do full pushdown when using
> the
> > > new
> > > > > > >> Parquet
> > > > > > >> > reader, but I have no easy way to detect it's going to be
> > used.
> > > > > > >> >
> > > > > > >> > P.S. In cases where we "fall back" to the new Parquet
> reader,
> > I
> > > > > don't
> > > > > > >> do a
> > > > > > >> > full pushdown (as detecting that is done further down than
> the
> > > > > > planner).
> > > > > > >> > This could be fixed in the future, but for now I'm happy for
> > > just
> > > > > when
> > > > > > >> that
> > > > > > >> > setting is true.
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Jacques Nadeau <ja...@apache.org>.
Hey Adam,  just saw that.  Can you post to reviewboard? If you don't have
an account,  you can easily register for one.
On May 4, 2015 8:25 AM, "Adam Gilmore" <dr...@gmail.com> wrote:

> Just wanted to let you know I've uploaded the patch:
>
> https://issues.apache.org/jira/browse/DRILL-1950
>
> Be great to get some feedback or start a review board so I can see what
> needs to be done to get it merged in!
>
> On Fri, May 1, 2015 at 12:13 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > Yes.  Once per query is fine.  Theoretically we could fire the rule many
> > times inside a single query and I was saying we preferably cache in the
> > context of a single query.
> >
> > On Thu, Apr 30, 2015 at 5:57 PM, Adam Gilmore <dr...@gmail.com>
> > wrote:
> >
> > > We'll need to look it up at least once per query, though, right?
> Because
> > > session variables can change query to query.  Would you suggest I
> cached
> > > the actual setting(s) as soon as the rule is instantiated?  The rule is
> > > only called once or twice per query, I think.
> > >
> > > On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > >
> > > > Creating it at rule creation time is good.  The key is we want to
> cache
> > > it
> > > > so we don't have to look it up every single time the rule fires.  I
> > think
> > > > your point about specific versus general and PlannerSettings makes
> > sense.
> > > >
> > > > On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <dragoncurve@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Moreover, I'm a tad wary about using PlannerSettings as they seem
> > very
> > > > > generic (i.e. not specific to a particular storage/format plugin,
> for
> > > > > example).  I've used "store.parquet.enable_pushdown_filter" as the
> > > > option,
> > > > > as it's very specific to the Parquet format plugin.  I imagine I
> can
> > > > still
> > > > > use getOptions() to get that option from PlannerSettings, but it
> > feels
> > > a
> > > > > bit more like we should be using QueryContext like PruneScanRule
> > uses.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <
> dragoncurve@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I actually patched the StorageEnginePlugin and FormatPlugin to
> pass
> > > > > > QueryContext right through the chain of getting optimizer rules
> (as
> > > is
> > > > > the
> > > > > > case for getting basic rules, etc.).  This seemed to me like the
> > most
> > > > > > logical approach and aligned with something like your
> > PruneScanRule.
> > > > > >
> > > > > > Do you think I should revert back and use the example you gave?​
> > > > > >
> > > > > > P.S. The pushdown filter has improved performance significantly
> for
> > > > > > certain types of queries.  It mostly comes back to how well the
> > data
> > > is
> > > > > > ordered and how often it can exclude row groups.  Ideally, we
> > should
> > > be
> > > > > > excluding individual pages as well, but from what I can see even
> > the
> > > > > reader
> > > > > > from the Parquet library does not yet do this.
> > > > > >
> > > > > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <
> > jacques@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> PlannerSettings is the primary we expose settings that need to
> be
> > > > > >> interrogated during planning (e.g. in an optimizer rule).  You
> can
> > > get
> > > > > >> ahold of this by doing:
> > > > > >>
> > > > > >> PlannerSettings settings =
> > > > > PrelUtil.getPlannerSettings(call.getPlanner());
> > > > > >>
> > > > > >> PlannerSettings then has access to session settings.
> > > > > >>
> > > > > >> You can see an example of this at [1]
> > > > > >>
> > > > > >> I'm excited to see the impact of this.  Look forward to seeing
> the
> > > > > patch!
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <
> > > dragoncurve@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi guys,
> > > > > >> >
> > > > > >> > I'm trying to work out how I could access the QueryContext
> > inside
> > > > > >> > a StoragePluginOptimizerRule.
> > > > > >> >
> > > > > >> > I've basically implemented the Parquet pushdown filtering,
> but I
> > > > > really
> > > > > >> > need to access the session settings (for whether or not we're
> > > using
> > > > > the
> > > > > >> new
> > > > > >> > Parquet reader so we can completely pushdown a filter and also
> > for
> > > > > >> > providing a setting to enable/disable pushdown filters).
> > > > > >> >
> > > > > >> > I've looked in a fair bit of detail, but there doesn't seem to
> > be
> > > a
> > > > > way
> > > > > >> to
> > > > > >> > access this.
> > > > > >> >
> > > > > >> > If this is not the way I should be implementing it, can anyone
> > > make
> > > > > some
> > > > > >> > suggestions?  I really want to do full pushdown when using the
> > new
> > > > > >> Parquet
> > > > > >> > reader, but I have no easy way to detect it's going to be
> used.
> > > > > >> >
> > > > > >> > P.S. In cases where we "fall back" to the new Parquet reader,
> I
> > > > don't
> > > > > >> do a
> > > > > >> > full pushdown (as detecting that is done further down than the
> > > > > planner).
> > > > > >> > This could be fixed in the future, but for now I'm happy for
> > just
> > > > when
> > > > > >> that
> > > > > >> > setting is true.
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Adam Gilmore <dr...@gmail.com>.
Just wanted to let you know I've uploaded the patch:

https://issues.apache.org/jira/browse/DRILL-1950

Be great to get some feedback or start a review board so I can see what
needs to be done to get it merged in!

On Fri, May 1, 2015 at 12:13 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Yes.  Once per query is fine.  Theoretically we could fire the rule many
> times inside a single query and I was saying we preferably cache in the
> context of a single query.
>
> On Thu, Apr 30, 2015 at 5:57 PM, Adam Gilmore <dr...@gmail.com>
> wrote:
>
> > We'll need to look it up at least once per query, though, right?  Because
> > session variables can change query to query.  Would you suggest I cached
> > the actual setting(s) as soon as the rule is instantiated?  The rule is
> > only called once or twice per query, I think.
> >
> > On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> > > Creating it at rule creation time is good.  The key is we want to cache
> > it
> > > so we don't have to look it up every single time the rule fires.  I
> think
> > > your point about specific versus general and PlannerSettings makes
> sense.
> > >
> > > On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <dr...@gmail.com>
> > > wrote:
> > >
> > > > Moreover, I'm a tad wary about using PlannerSettings as they seem
> very
> > > > generic (i.e. not specific to a particular storage/format plugin, for
> > > > example).  I've used "store.parquet.enable_pushdown_filter" as the
> > > option,
> > > > as it's very specific to the Parquet format plugin.  I imagine I can
> > > still
> > > > use getOptions() to get that option from PlannerSettings, but it
> feels
> > a
> > > > bit more like we should be using QueryContext like PruneScanRule
> uses.
> > > >
> > > > What do you think?
> > > >
> > > > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <dragoncurve@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I actually patched the StorageEnginePlugin and FormatPlugin to pass
> > > > > QueryContext right through the chain of getting optimizer rules (as
> > is
> > > > the
> > > > > case for getting basic rules, etc.).  This seemed to me like the
> most
> > > > > logical approach and aligned with something like your
> PruneScanRule.
> > > > >
> > > > > Do you think I should revert back and use the example you gave?​
> > > > >
> > > > > P.S. The pushdown filter has improved performance significantly for
> > > > > certain types of queries.  It mostly comes back to how well the
> data
> > is
> > > > > ordered and how often it can exclude row groups.  Ideally, we
> should
> > be
> > > > > excluding individual pages as well, but from what I can see even
> the
> > > > reader
> > > > > from the Parquet library does not yet do this.
> > > > >
> > > > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <
> jacques@apache.org>
> > > > > wrote:
> > > > >
> > > > >> PlannerSettings is the primary we expose settings that need to be
> > > > >> interrogated during planning (e.g. in an optimizer rule).  You can
> > get
> > > > >> ahold of this by doing:
> > > > >>
> > > > >> PlannerSettings settings =
> > > > PrelUtil.getPlannerSettings(call.getPlanner());
> > > > >>
> > > > >> PlannerSettings then has access to session settings.
> > > > >>
> > > > >> You can see an example of this at [1]
> > > > >>
> > > > >> I'm excited to see the impact of this.  Look forward to seeing the
> > > > patch!
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> > > > >>
> > > > >>
> > > > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <
> > dragoncurve@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi guys,
> > > > >> >
> > > > >> > I'm trying to work out how I could access the QueryContext
> inside
> > > > >> > a StoragePluginOptimizerRule.
> > > > >> >
> > > > >> > I've basically implemented the Parquet pushdown filtering, but I
> > > > really
> > > > >> > need to access the session settings (for whether or not we're
> > using
> > > > the
> > > > >> new
> > > > >> > Parquet reader so we can completely pushdown a filter and also
> for
> > > > >> > providing a setting to enable/disable pushdown filters).
> > > > >> >
> > > > >> > I've looked in a fair bit of detail, but there doesn't seem to
> be
> > a
> > > > way
> > > > >> to
> > > > >> > access this.
> > > > >> >
> > > > >> > If this is not the way I should be implementing it, can anyone
> > make
> > > > some
> > > > >> > suggestions?  I really want to do full pushdown when using the
> new
> > > > >> Parquet
> > > > >> > reader, but I have no easy way to detect it's going to be used.
> > > > >> >
> > > > >> > P.S. In cases where we "fall back" to the new Parquet reader, I
> > > don't
> > > > >> do a
> > > > >> > full pushdown (as detecting that is done further down than the
> > > > planner).
> > > > >> > This could be fixed in the future, but for now I'm happy for
> just
> > > when
> > > > >> that
> > > > >> > setting is true.
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Jacques Nadeau <ja...@apache.org>.
Yes.  Once per query is fine.  Theoretically we could fire the rule many
times inside a single query and I was saying we preferably cache in the
context of a single query.

On Thu, Apr 30, 2015 at 5:57 PM, Adam Gilmore <dr...@gmail.com> wrote:

> We'll need to look it up at least once per query, though, right?  Because
> session variables can change query to query.  Would you suggest I cached
> the actual setting(s) as soon as the rule is instantiated?  The rule is
> only called once or twice per query, I think.
>
> On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > Creating it at rule creation time is good.  The key is we want to cache
> it
> > so we don't have to look it up every single time the rule fires.  I think
> > your point about specific versus general and PlannerSettings makes sense.
> >
> > On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <dr...@gmail.com>
> > wrote:
> >
> > > Moreover, I'm a tad wary about using PlannerSettings as they seem very
> > > generic (i.e. not specific to a particular storage/format plugin, for
> > > example).  I've used "store.parquet.enable_pushdown_filter" as the
> > option,
> > > as it's very specific to the Parquet format plugin.  I imagine I can
> > still
> > > use getOptions() to get that option from PlannerSettings, but it feels
> a
> > > bit more like we should be using QueryContext like PruneScanRule uses.
> > >
> > > What do you think?
> > >
> > > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <dr...@gmail.com>
> > > wrote:
> > >
> > > > I actually patched the StorageEnginePlugin and FormatPlugin to pass
> > > > QueryContext right through the chain of getting optimizer rules (as
> is
> > > the
> > > > case for getting basic rules, etc.).  This seemed to me like the most
> > > > logical approach and aligned with something like your PruneScanRule.
> > > >
> > > > Do you think I should revert back and use the example you gave?​
> > > >
> > > > P.S. The pushdown filter has improved performance significantly for
> > > > certain types of queries.  It mostly comes back to how well the data
> is
> > > > ordered and how often it can exclude row groups.  Ideally, we should
> be
> > > > excluding individual pages as well, but from what I can see even the
> > > reader
> > > > from the Parquet library does not yet do this.
> > > >
> > > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <ja...@apache.org>
> > > > wrote:
> > > >
> > > >> PlannerSettings is the primary we expose settings that need to be
> > > >> interrogated during planning (e.g. in an optimizer rule).  You can
> get
> > > >> ahold of this by doing:
> > > >>
> > > >> PlannerSettings settings =
> > > PrelUtil.getPlannerSettings(call.getPlanner());
> > > >>
> > > >> PlannerSettings then has access to session settings.
> > > >>
> > > >> You can see an example of this at [1]
> > > >>
> > > >> I'm excited to see the impact of this.  Look forward to seeing the
> > > patch!
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> > > >>
> > > >>
> > > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <
> dragoncurve@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Hi guys,
> > > >> >
> > > >> > I'm trying to work out how I could access the QueryContext inside
> > > >> > a StoragePluginOptimizerRule.
> > > >> >
> > > >> > I've basically implemented the Parquet pushdown filtering, but I
> > > really
> > > >> > need to access the session settings (for whether or not we're
> using
> > > the
> > > >> new
> > > >> > Parquet reader so we can completely pushdown a filter and also for
> > > >> > providing a setting to enable/disable pushdown filters).
> > > >> >
> > > >> > I've looked in a fair bit of detail, but there doesn't seem to be
> a
> > > way
> > > >> to
> > > >> > access this.
> > > >> >
> > > >> > If this is not the way I should be implementing it, can anyone
> make
> > > some
> > > >> > suggestions?  I really want to do full pushdown when using the new
> > > >> Parquet
> > > >> > reader, but I have no easy way to detect it's going to be used.
> > > >> >
> > > >> > P.S. In cases where we "fall back" to the new Parquet reader, I
> > don't
> > > >> do a
> > > >> > full pushdown (as detecting that is done further down than the
> > > planner).
> > > >> > This could be fixed in the future, but for now I'm happy for just
> > when
> > > >> that
> > > >> > setting is true.
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Adam Gilmore <dr...@gmail.com>.
We'll need to look it up at least once per query, though, right?  Because
session variables can change query to query.  Would you suggest I cached
the actual setting(s) as soon as the rule is instantiated?  The rule is
only called once or twice per query, I think.

On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Creating it at rule creation time is good.  The key is we want to cache it
> so we don't have to look it up every single time the rule fires.  I think
> your point about specific versus general and PlannerSettings makes sense.
>
> On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <dr...@gmail.com>
> wrote:
>
> > Moreover, I'm a tad wary about using PlannerSettings as they seem very
> > generic (i.e. not specific to a particular storage/format plugin, for
> > example).  I've used "store.parquet.enable_pushdown_filter" as the
> option,
> > as it's very specific to the Parquet format plugin.  I imagine I can
> still
> > use getOptions() to get that option from PlannerSettings, but it feels a
> > bit more like we should be using QueryContext like PruneScanRule uses.
> >
> > What do you think?
> >
> > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <dr...@gmail.com>
> > wrote:
> >
> > > I actually patched the StorageEnginePlugin and FormatPlugin to pass
> > > QueryContext right through the chain of getting optimizer rules (as is
> > the
> > > case for getting basic rules, etc.).  This seemed to me like the most
> > > logical approach and aligned with something like your PruneScanRule.
> > >
> > > Do you think I should revert back and use the example you gave?​
> > >
> > > P.S. The pushdown filter has improved performance significantly for
> > > certain types of queries.  It mostly comes back to how well the data is
> > > ordered and how often it can exclude row groups.  Ideally, we should be
> > > excluding individual pages as well, but from what I can see even the
> > reader
> > > from the Parquet library does not yet do this.
> > >
> > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > >
> > >> PlannerSettings is the primary we expose settings that need to be
> > >> interrogated during planning (e.g. in an optimizer rule).  You can get
> > >> ahold of this by doing:
> > >>
> > >> PlannerSettings settings =
> > PrelUtil.getPlannerSettings(call.getPlanner());
> > >>
> > >> PlannerSettings then has access to session settings.
> > >>
> > >> You can see an example of this at [1]
> > >>
> > >> I'm excited to see the impact of this.  Look forward to seeing the
> > patch!
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> > >>
> > >>
> > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <dragoncurve@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Hi guys,
> > >> >
> > >> > I'm trying to work out how I could access the QueryContext inside
> > >> > a StoragePluginOptimizerRule.
> > >> >
> > >> > I've basically implemented the Parquet pushdown filtering, but I
> > really
> > >> > need to access the session settings (for whether or not we're using
> > the
> > >> new
> > >> > Parquet reader so we can completely pushdown a filter and also for
> > >> > providing a setting to enable/disable pushdown filters).
> > >> >
> > >> > I've looked in a fair bit of detail, but there doesn't seem to be a
> > way
> > >> to
> > >> > access this.
> > >> >
> > >> > If this is not the way I should be implementing it, can anyone make
> > some
> > >> > suggestions?  I really want to do full pushdown when using the new
> > >> Parquet
> > >> > reader, but I have no easy way to detect it's going to be used.
> > >> >
> > >> > P.S. In cases where we "fall back" to the new Parquet reader, I
> don't
> > >> do a
> > >> > full pushdown (as detecting that is done further down than the
> > planner).
> > >> > This could be fixed in the future, but for now I'm happy for just
> when
> > >> that
> > >> > setting is true.
> > >> >
> > >>
> > >
> > >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Jacques Nadeau <ja...@apache.org>.
Creating it at rule creation time is good.  The key is we want to cache it
so we don't have to look it up every single time the rule fires.  I think
your point about specific versus general and PlannerSettings makes sense.

On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore <dr...@gmail.com> wrote:

> Moreover, I'm a tad wary about using PlannerSettings as they seem very
> generic (i.e. not specific to a particular storage/format plugin, for
> example).  I've used "store.parquet.enable_pushdown_filter" as the option,
> as it's very specific to the Parquet format plugin.  I imagine I can still
> use getOptions() to get that option from PlannerSettings, but it feels a
> bit more like we should be using QueryContext like PruneScanRule uses.
>
> What do you think?
>
> On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <dr...@gmail.com>
> wrote:
>
> > I actually patched the StorageEnginePlugin and FormatPlugin to pass
> > QueryContext right through the chain of getting optimizer rules (as is
> the
> > case for getting basic rules, etc.).  This seemed to me like the most
> > logical approach and aligned with something like your PruneScanRule.
> >
> > Do you think I should revert back and use the example you gave?​
> >
> > P.S. The pushdown filter has improved performance significantly for
> > certain types of queries.  It mostly comes back to how well the data is
> > ordered and how often it can exclude row groups.  Ideally, we should be
> > excluding individual pages as well, but from what I can see even the
> reader
> > from the Parquet library does not yet do this.
> >
> > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> >> PlannerSettings is the primary we expose settings that need to be
> >> interrogated during planning (e.g. in an optimizer rule).  You can get
> >> ahold of this by doing:
> >>
> >> PlannerSettings settings =
> PrelUtil.getPlannerSettings(call.getPlanner());
> >>
> >> PlannerSettings then has access to session settings.
> >>
> >> You can see an example of this at [1]
> >>
> >> I'm excited to see the impact of this.  Look forward to seeing the
> patch!
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
> >>
> >>
> >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <dr...@gmail.com>
> >> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I'm trying to work out how I could access the QueryContext inside
> >> > a StoragePluginOptimizerRule.
> >> >
> >> > I've basically implemented the Parquet pushdown filtering, but I
> really
> >> > need to access the session settings (for whether or not we're using
> the
> >> new
> >> > Parquet reader so we can completely pushdown a filter and also for
> >> > providing a setting to enable/disable pushdown filters).
> >> >
> >> > I've looked in a fair bit of detail, but there doesn't seem to be a
> way
> >> to
> >> > access this.
> >> >
> >> > If this is not the way I should be implementing it, can anyone make
> some
> >> > suggestions?  I really want to do full pushdown when using the new
> >> Parquet
> >> > reader, but I have no easy way to detect it's going to be used.
> >> >
> >> > P.S. In cases where we "fall back" to the new Parquet reader, I don't
> >> do a
> >> > full pushdown (as detecting that is done further down than the
> planner).
> >> > This could be fixed in the future, but for now I'm happy for just when
> >> that
> >> > setting is true.
> >> >
> >>
> >
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Adam Gilmore <dr...@gmail.com>.
Moreover, I'm a tad wary about using PlannerSettings as they seem very
generic (i.e. not specific to a particular storage/format plugin, for
example).  I've used "store.parquet.enable_pushdown_filter" as the option,
as it's very specific to the Parquet format plugin.  I imagine I can still
use getOptions() to get that option from PlannerSettings, but it feels a
bit more like we should be using QueryContext like PruneScanRule uses.

What do you think?

On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore <dr...@gmail.com> wrote:

> I actually patched the StorageEnginePlugin and FormatPlugin to pass
> QueryContext right through the chain of getting optimizer rules (as is the
> case for getting basic rules, etc.).  This seemed to me like the most
> logical approach and aligned with something like your PruneScanRule.
>
> Do you think I should revert back and use the example you gave?​
>
> P.S. The pushdown filter has improved performance significantly for
> certain types of queries.  It mostly comes back to how well the data is
> ordered and how often it can exclude row groups.  Ideally, we should be
> excluding individual pages as well, but from what I can see even the reader
> from the Parquet library does not yet do this.
>
> On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
>> PlannerSettings is the primary we expose settings that need to be
>> interrogated during planning (e.g. in an optimizer rule).  You can get
>> ahold of this by doing:
>>
>> PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());
>>
>> PlannerSettings then has access to session settings.
>>
>> You can see an example of this at [1]
>>
>> I'm excited to see the impact of this.  Look forward to seeing the patch!
>>
>> [1]
>>
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
>>
>>
>> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <dr...@gmail.com>
>> wrote:
>>
>> > Hi guys,
>> >
>> > I'm trying to work out how I could access the QueryContext inside
>> > a StoragePluginOptimizerRule.
>> >
>> > I've basically implemented the Parquet pushdown filtering, but I really
>> > need to access the session settings (for whether or not we're using the
>> new
>> > Parquet reader so we can completely pushdown a filter and also for
>> > providing a setting to enable/disable pushdown filters).
>> >
>> > I've looked in a fair bit of detail, but there doesn't seem to be a way
>> to
>> > access this.
>> >
>> > If this is not the way I should be implementing it, can anyone make some
>> > suggestions?  I really want to do full pushdown when using the new
>> Parquet
>> > reader, but I have no easy way to detect it's going to be used.
>> >
>> > P.S. In cases where we "fall back" to the new Parquet reader, I don't
>> do a
>> > full pushdown (as detecting that is done further down than the planner).
>> > This could be fixed in the future, but for now I'm happy for just when
>> that
>> > setting is true.
>> >
>>
>
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Adam Gilmore <dr...@gmail.com>.
I actually patched the StorageEnginePlugin and FormatPlugin to pass
QueryContext right through the chain of getting optimizer rules (as is the
case for getting basic rules, etc.).  This seemed to me like the most
logical approach and aligned with something like your PruneScanRule.

Do you think I should revert back and use the example you gave?​

P.S. The pushdown filter has improved performance significantly for certain
types of queries.  It mostly comes back to how well the data is ordered and
how often it can exclude row groups.  Ideally, we should be excluding
individual pages as well, but from what I can see even the reader from the
Parquet library does not yet do this.

On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau <ja...@apache.org> wrote:

> PlannerSettings is the primary we expose settings that need to be
> interrogated during planning (e.g. in an optimizer rule).  You can get
> ahold of this by doing:
>
> PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());
>
> PlannerSettings then has access to session settings.
>
> You can see an example of this at [1]
>
> I'm excited to see the impact of this.  Look forward to seeing the patch!
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82
>
>
> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <dr...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > I'm trying to work out how I could access the QueryContext inside
> > a StoragePluginOptimizerRule.
> >
> > I've basically implemented the Parquet pushdown filtering, but I really
> > need to access the session settings (for whether or not we're using the
> new
> > Parquet reader so we can completely pushdown a filter and also for
> > providing a setting to enable/disable pushdown filters).
> >
> > I've looked in a fair bit of detail, but there doesn't seem to be a way
> to
> > access this.
> >
> > If this is not the way I should be implementing it, can anyone make some
> > suggestions?  I really want to do full pushdown when using the new
> Parquet
> > reader, but I have no easy way to detect it's going to be used.
> >
> > P.S. In cases where we "fall back" to the new Parquet reader, I don't do
> a
> > full pushdown (as detecting that is done further down than the planner).
> > This could be fixed in the future, but for now I'm happy for just when
> that
> > setting is true.
> >
>

Re: Accessing QueryContext inside a StoragePluginOptimizerRule

Posted by Jacques Nadeau <ja...@apache.org>.
PlannerSettings is the primary we expose settings that need to be
interrogated during planning (e.g. in an optimizer rule).  You can get
ahold of this by doing:

PlannerSettings settings = PrelUtil.getPlannerSettings(call.getPlanner());

PlannerSettings then has access to session settings.

You can see an example of this at [1]

I'm excited to see the impact of this.  Look forward to seeing the patch!

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82


On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore <dr...@gmail.com>
wrote:

> Hi guys,
>
> I'm trying to work out how I could access the QueryContext inside
> a StoragePluginOptimizerRule.
>
> I've basically implemented the Parquet pushdown filtering, but I really
> need to access the session settings (for whether or not we're using the new
> Parquet reader so we can completely pushdown a filter and also for
> providing a setting to enable/disable pushdown filters).
>
> I've looked in a fair bit of detail, but there doesn't seem to be a way to
> access this.
>
> If this is not the way I should be implementing it, can anyone make some
> suggestions?  I really want to do full pushdown when using the new Parquet
> reader, but I have no easy way to detect it's going to be used.
>
> P.S. In cases where we "fall back" to the new Parquet reader, I don't do a
> full pushdown (as detecting that is done further down than the planner).
> This could be fixed in the future, but for now I'm happy for just when that
> setting is true.
>