You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Frankie Bollaert <fr...@ngdata.com> on 2016/01/11 16:54:51 UTC

Underflow issue with big IN clauses

Hi all,

Having worked on a bug related to IN clauses, I believe I found an issue
which has not been reported yet.  When executing a query with more then 19
id's in the IN clause, an underflow exception is thrown.

I created a Jira ticket for this:
https://issues.apache.org/jira/browse/CALCITE-1051


But besides this, I was wondering about the reason why a large IN clause is
converted into a join to an inline table.  (This happens in the
SqlToRelConverter).

More specifically how does this optimise the query execution?  How would a
storage engine be able to handle a join to an inline table more
efficiently?  Is this not a storage engine specific optimisation?


Thank you !
Frankie

Re: Underflow issue with big IN clauses

Posted by Julian Hyde <jh...@apache.org>.
There are a couple of reasons why we convert to an inline table. An inline table is can often be represented as an array or a list (it depends on the engine we are generating for) and can then be loaded into the "build" side of a hash table for a hash join. This is beneficial if the number of values in the IN clause is large (in the thousands, say) but the other side is still larger. The table is populated once, and membership in the IN clause is a probe into the hash table (which basically is just computing a hash function and looking into an array). Much more efficient than evaluating thousands of branches in a large "OR" expression

The other reason is that as expression size increases, many of the operations that occur during query planning become more expensive. We probably have operations that grow with O(n log n) or O(n ^ 2) with the expression size. Moving these expressions into an inline table keeps the query planning process tractable.

Julian

> On Jan 11, 2016, at 7:54 AM, Frankie Bollaert <fr...@ngdata.com> wrote:
> 
> Hi all,
> 
> Having worked on a bug related to IN clauses, I believe I found an issue
> which has not been reported yet.  When executing a query with more then 19
> id's in the IN clause, an underflow exception is thrown.
> 
> I created a Jira ticket for this:
> https://issues.apache.org/jira/browse/CALCITE-1051
> 
> 
> But besides this, I was wondering about the reason why a large IN clause is
> converted into a join to an inline table.  (This happens in the
> SqlToRelConverter).
> 
> More specifically how does this optimise the query execution?  How would a
> storage engine be able to handle a join to an inline table more
> efficiently?  Is this not a storage engine specific optimisation?
> 
> 
> Thank you !
> Frankie