You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Josh Wills <jw...@cloudera.com> on 2014/08/01 01:46:38 UTC

reading the same table multiple times within a query

I've been playing with Optiq, and I ran into something that I can't quite
figure out how to do: successfully run a query that requires the engine to
do multiple passes over a table, something like a self-join or doing a
query that has multiple sub-selects over the same table. The queries always
compile and execute, they just don't return any results, as if the table
had zero rows in it the second time the system tried to read it. My
Enumerators support resetting, so I was just curious what I was doing wrong.

Thanks,
Josh

-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: reading the same table multiple times within a query

Posted by Josh Wills <jw...@cloudera.com>.
Hey Julian,

Thanks for your help-- I found and fixed the bug in my Enumerator
implementation.

Best,
Josh


On Thu, Jul 31, 2014 at 5:07 PM, Julian Hyde <ju...@gmail.com> wrote:

> There aren’t currently any physical operators that make use of restarts.
> The logical operators that use variables and restarts are
> NestedLoopsJoinRel and CorrelatorRel; today these get implemented via a
> decorrelation rewrite. We could in principle add a
> EnumerableNestedLoopsJoinRel. Its implementation would call
> Enumerator.reset() on the inner enumerator between outer loop iterations.
> I’m guessing you’ve done something like that.
>
> Usually when you do restarts there is a correlation variable involved
> (like a bind variable but set and used within the statement, rather than by
> the end-user), otherwise the relation will return the same thing every
> time. If so, make sure that the variable is set before the first execution,
> and assigned a new value between executions. A null variable could account
> for your missing rows.
>
> Try running with -Doptiq.debug=true and the generated code will appear on
> stdout. If you need more help debugging you could post that to this list.
>
> Another possible cause is a bug in an implementation of Enumerator.
> Especially one based on a source such as Iterator that doesn’t support
> reset.
>
> Julian
>
>
> On Jul 31, 2014, at 4:46 PM, Josh Wills <jw...@cloudera.com> wrote:
>
> > I've been playing with Optiq, and I ran into something that I can't quite
> > figure out how to do: successfully run a query that requires the engine
> to
> > do multiple passes over a table, something like a self-join or doing a
> > query that has multiple sub-selects over the same table. The queries
> always
> > compile and execute, they just don't return any results, as if the table
> > had zero rows in it the second time the system tried to read it. My
> > Enumerators support resetting, so I was just curious what I was doing
> wrong.
> >
> > Thanks,
> > Josh
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: reading the same table multiple times within a query

Posted by Julian Hyde <ju...@gmail.com>.
There aren’t currently any physical operators that make use of restarts. The logical operators that use variables and restarts are NestedLoopsJoinRel and CorrelatorRel; today these get implemented via a decorrelation rewrite. We could in principle add a EnumerableNestedLoopsJoinRel. Its implementation would call Enumerator.reset() on the inner enumerator between outer loop iterations. I’m guessing you’ve done something like that.

Usually when you do restarts there is a correlation variable involved (like a bind variable but set and used within the statement, rather than by the end-user), otherwise the relation will return the same thing every time. If so, make sure that the variable is set before the first execution, and assigned a new value between executions. A null variable could account for your missing rows.

Try running with -Doptiq.debug=true and the generated code will appear on stdout. If you need more help debugging you could post that to this list.

Another possible cause is a bug in an implementation of Enumerator. Especially one based on a source such as Iterator that doesn’t support reset.

Julian


On Jul 31, 2014, at 4:46 PM, Josh Wills <jw...@cloudera.com> wrote:

> I've been playing with Optiq, and I ran into something that I can't quite
> figure out how to do: successfully run a query that requires the engine to
> do multiple passes over a table, something like a self-join or doing a
> query that has multiple sub-selects over the same table. The queries always
> compile and execute, they just don't return any results, as if the table
> had zero rows in it the second time the system tried to read it. My
> Enumerators support resetting, so I was just curious what I was doing wrong.
> 
> Thanks,
> Josh
> 
> -- 
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>