You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Andrei Sereda <an...@sereda.cc> on 2018/10/24 16:44:48 UTC

scrolling in ElasticSearch adapter

Hello,


I was thinking about adding [scrolling functionality](
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html)
to elastic search adapter. Since scrolling has non-negligible effect on the
cluster it should be selectively enabled on per query basis. So, likely,
user has to explicitly set "scroll flag" somewhere.

Most natural way seems in SQL. [Calcite sql grammar](
https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
(unused to my knowledge). There were also discussions about adding hints to
Calcite.

### Examples
```sql
-- special sql keyword ?
SCROLL select * from elastic;

-- assuming hints are available in calcite
/* HINT: scroll */ select * from elastic;
```

What people think about this use-case ? Are there better ideas ?

Regards,
Andrei.

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Hi Christian,

Agree that (scroll) SQL keyword is an overkill. Need to check how JDBC
context can be accessed in RelNode(s).
If I remember correctly, relational algebra can also be evaluated outside
JDBC using RelBuilders (can't find an example right now).

Andrei.

On Wed, Oct 24, 2018 at 2:20 PM Christian Beikov <ch...@gmail.com>
wrote:

> Hey,
>
> not sure if this should be an SQL keyword. JDBC specifies various
> constants that can be used at statement creation time:
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
>
> Not sure though if or how these configurations are accessible for data
> stores or dialects, but IMO using these would be the proper way.
>
> Regards
>
> Christian
>
> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > Hello,
> >
> >
> > I was thinking about adding [scrolling functionality](
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> )
> > to elastic search adapter. Since scrolling has non-negligible effect on
> the
> > cluster it should be selectively enabled on per query basis. So, likely,
> > user has to explicitly set "scroll flag" somewhere.
> >
> > Most natural way seems in SQL. [Calcite sql grammar](
> > https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> > (unused to my knowledge). There were also discussions about adding hints
> to
> > Calcite.
> >
> > ### Examples
> > ```sql
> > -- special sql keyword ?
> > SCROLL select * from elastic;
> >
> > -- assuming hints are available in calcite
> > /* HINT: scroll */ select * from elastic;
> > ```
> >
> > What people think about this use-case ? Are there better ideas ?
> >
> > Regards,
> > Andrei.
> >
>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
> It seems to me that this behavior is tight with the notion of a transaction and the various isolation levels defined by the standard. If the transaction isolation level is for example "serializable" then executing the same query in the same transaction multiple times should not return different results (no dirty/phantom/repeatable reads). So for serializable transactions consistent snapshot should be the default .I think.

Yes it is closely related to transactions. Do you suggest enabling
scrolling based on [JDBC
transactions](https://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html)
? If so,
elastic doesn't natively supports transactions. Scrollable results are
"serializable" only for single query not for multiple (which might
confuse users using transactions on the client side).


> I would say that this is a bug. If the SQL does not specify a limit then the result set should bring back all documents (possibly in multiple batches depending on the fetch size).

I tend to agree, but pagination doesn't work for all ES aggregations
(only [for terms](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions)).
On the top of that scrolling doesn't work with aggregations.

> Hints and cursors should not return different results.

Agree

On Wed, Oct 24, 2018 at 6:16 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>
> Hi Andrei,
>
> Andrei>Scrolling (in elastic) does not only mean “open a cursor” but also
> iterate
> over consistent snapshot...
>
> It seems to me that this behavior is tight with the notion of a transaction
> and the various isolation levels defined by the standard. If the
> transaction isolation level is  for example "serializable" then executing
> the same query in the same transaction multiple times should not return
> different results (no dirty/phantom/repeatable reads).
> So for serializable transactions consistent snapshot should be the default
> I think.
>
> Andrei>There is one more “issue”. Currently select * from elastic returns
> at most
> 10 rows (in calcite)...
>
> I would say that this is a bug. If the SQL does not specify a limit then
> the result set should bring back all documents (possibly in multiple
> batches depending on the fetch size).
>
> Andrei> I’m not at ease with returning different results based on hints or
> cursor settings.
>
> Hints and cursors should not return different results. They only affect how
> the results are obtained from the server and delivered to the client. The
> client may decide to iterate through all/some of them but that does not
> change the fact that the server must be able to provide the complete result.
>
> Best,
> Stamatis
>
> Στις Τετ, 24 Οκτ 2018 στις 9:39 μ.μ., ο/η Andrei Sereda <an...@sereda.cc>
> έγραψε:
>
> > Hi Julian,
> >
> > Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> > over consistent snapshot. From docs:
> >
> > The results that are returned from a scroll request reflect the state of
> > the index at the time that the initial search request was made, like a
> > snapshot in time. Subsequent changes to documents (index, update or delete)
> > will only affect later search requests.
> >
> > So pagination (fetch / offset) can’t exactly replicate this functionality.
> >
> > The problem with scrolling (in elastic) is that it is expensive and can’t
> > (shouldn’t) be enabled it by default.
> >
> > There is one more “issue”. Currently select * from elastic returns at most
> > 10 rows (in calcite). This is consistent with elastic behaviour which
> > limits result set to 10 documents (unless size is specified). When
> > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > <
> > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > >
> > or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> > returning different results based on hints or cursor settings.
> >
> > Andrei.
> >
> > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com>
> > wrote:
> >
> > > It seems to me that Elasticsearch scroll means return a cursor - a
> > > collection of rows that you iterate over, and you may not read all of
> > them.
> > > This is the default operation of JDBC.
> > >
> > > So, I guess we need to give the user a way to signal their intent to read
> > > all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > > hints[1] seem close to this. We would want the hints to be acted upon by
> > > both the optimizer and the JDBC transport.
> > >
> > > Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > > retrieve different pieces of a large result set in separate statements or
> > > (using query parameters) executions. It would be useful if the server
> > could
> > > be given a hint to cache a statement across page requests.
> > >
> > > Julian
> > >
> > > [1]
> > > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > >
> > > > On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > christian.beikov@gmail.com> wrote:
> > > >
> > > > Hey,
> > > >
> > > > not sure if this should be an SQL keyword. JDBC specifies various
> > > constants that can be used at statement creation time:
> > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > >
> > > > Not sure though if or how these configurations are accessible for data
> > > stores or dialects, but IMO using these would be the proper way.
> > > >
> > > > Regards
> > > >
> > > > Christian
> > > >
> > > >> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > >> Hello,
> > > >>
> > > >>
> > > >> I was thinking about adding [scrolling functionality](
> > > >>
> > >
> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > )
> > > >> to elastic search adapter. Since scrolling has non-negligible effect
> > on
> > > the
> > > >> cluster it should be selectively enabled on per query basis. So,
> > likely,
> > > >> user has to explicitly set "scroll flag" somewhere.
> > > >>
> > > >> Most natural way seems in SQL. [Calcite sql grammar](
> > > >> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> > > >> (unused to my knowledge). There were also discussions about adding
> > > hints to
> > > >> Calcite.
> > > >>
> > > >> ### Examples
> > > >> ```sql
> > > >> -- special sql keyword ?
> > > >> SCROLL select * from elastic;
> > > >>
> > > >> -- assuming hints are available in calcite
> > > >> /* HINT: scroll */ select * from elastic;
> > > >> ```
> > > >>
> > > >> What people think about this use-case ? Are there better ideas ?
> > > >>
> > > >> Regards,
> > > >> Andrei.
> > > >>
> > >
> >

Re: scrolling in ElasticSearch adapter

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Andrei,

Andrei>Scrolling (in elastic) does not only mean “open a cursor” but also
iterate
over consistent snapshot...

It seems to me that this behavior is tight with the notion of a transaction
and the various isolation levels defined by the standard. If the
transaction isolation level is  for example "serializable" then executing
the same query in the same transaction multiple times should not return
different results (no dirty/phantom/repeatable reads).
So for serializable transactions consistent snapshot should be the default
I think.

Andrei>There is one more “issue”. Currently select * from elastic returns
at most
10 rows (in calcite)...

I would say that this is a bug. If the SQL does not specify a limit then
the result set should bring back all documents (possibly in multiple
batches depending on the fetch size).

Andrei> I’m not at ease with returning different results based on hints or
cursor settings.

Hints and cursors should not return different results. They only affect how
the results are obtained from the server and delivered to the client. The
client may decide to iterate through all/some of them but that does not
change the fact that the server must be able to provide the complete result.

Best,
Stamatis

Στις Τετ, 24 Οκτ 2018 στις 9:39 μ.μ., ο/η Andrei Sereda <an...@sereda.cc>
έγραψε:

> Hi Julian,
>
> Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> over consistent snapshot. From docs:
>
> The results that are returned from a scroll request reflect the state of
> the index at the time that the initial search request was made, like a
> snapshot in time. Subsequent changes to documents (index, update or delete)
> will only affect later search requests.
>
> So pagination (fetch / offset) can’t exactly replicate this functionality.
>
> The problem with scrolling (in elastic) is that it is expensive and can’t
> (shouldn’t) be enabled it by default.
>
> There is one more “issue”. Currently select * from elastic returns at most
> 10 rows (in calcite). This is consistent with elastic behaviour which
> limits result set to 10 documents (unless size is specified). When
> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> <
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> >
> or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> returning different results based on hints or cursor settings.
>
> Andrei.
>
> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com>
> wrote:
>
> > It seems to me that Elasticsearch scroll means return a cursor - a
> > collection of rows that you iterate over, and you may not read all of
> them.
> > This is the default operation of JDBC.
> >
> > So, I guess we need to give the user a way to signal their intent to read
> > all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > hints[1] seem close to this. We would want the hints to be acted upon by
> > both the optimizer and the JDBC transport.
> >
> > Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > retrieve different pieces of a large result set in separate statements or
> > (using query parameters) executions. It would be useful if the server
> could
> > be given a hint to cache a statement across page requests.
> >
> > Julian
> >
> > [1]
> > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> >
> > > On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > christian.beikov@gmail.com> wrote:
> > >
> > > Hey,
> > >
> > > not sure if this should be an SQL keyword. JDBC specifies various
> > constants that can be used at statement creation time:
> > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > >
> > > Not sure though if or how these configurations are accessible for data
> > stores or dialects, but IMO using these would be the proper way.
> > >
> > > Regards
> > >
> > > Christian
> > >
> > >> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > >> Hello,
> > >>
> > >>
> > >> I was thinking about adding [scrolling functionality](
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > )
> > >> to elastic search adapter. Since scrolling has non-negligible effect
> on
> > the
> > >> cluster it should be selectively enabled on per query basis. So,
> likely,
> > >> user has to explicitly set "scroll flag" somewhere.
> > >>
> > >> Most natural way seems in SQL. [Calcite sql grammar](
> > >> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> > >> (unused to my knowledge). There were also discussions about adding
> > hints to
> > >> Calcite.
> > >>
> > >> ### Examples
> > >> ```sql
> > >> -- special sql keyword ?
> > >> SCROLL select * from elastic;
> > >>
> > >> -- assuming hints are available in calcite
> > >> /* HINT: scroll */ select * from elastic;
> > >> ```
> > >>
> > >> What people think about this use-case ? Are there better ideas ?
> > >>
> > >> Regards,
> > >> Andrei.
> > >>
> >
>

Re: scrolling in ElasticSearch adapter

Posted by Julian Hyde <jh...@apache.org>.
You’re proposing lazy consensus. We don’t do that for commits. (Yeah, we use a mixture of CTR and RTC. But adding lazy consensus muddies things too much, I think.)

For this change, I think you should do RTC, and require a +1 from at least one committer. Therefore, someone please review https://issues.apache.org/jira/browse/CALCITE-2651 <https://issues.apache.org/jira/browse/CALCITE-2651>.


> On Nov 14, 2018, at 1:27 PM, Andrei Sereda <an...@sereda.cc> wrote:
> 
> Thanks, all, for your input.
> 
> Following this discussion, I have prepared a PR
> https://github.com/apache/calcite/pull/919
> 
> Let me know if you agree with the approach. If there are no objections /
> comments, I will commit it in a couple of days.
> 
> Propagation of Statement.setFetchSize(int) into actual query will be done
> in separate commit (I’m still checking DataContext usages)
> 
> On Fri, Oct 26, 2018 at 10:45 AM Kevin Risden <kr...@apache.org> wrote:
> 
>> #1 sounds reasonable.
>> #2 will have to see what that means for actual aggregations. Sounds like it
>> could work.
>> #3 not sure what is meant here. I think fetch size is a hint and not
>> required. I wouldn't default to 10k though.
>> 
>> Kevin Risden
>> 
>> 
>> On Thu, Oct 25, 2018 at 4:25 PM Andrei Sereda <an...@sereda.cc> wrote:
>> 
>>> There is new Composite Aggregation
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
>>>> 
>>> (still in beta) which allows pagination with aggregates. It is available
>> on
>>> versions >= 5.6 (unfortunately we have to support 2.6+).
>>> 
>>> To sum up:
>>> 
>>>   1. For non-aggregate queries always use scrolling or search-after by
>>>   default (not pagination) .
>>>   2. For aggregates use pagination. Either Terms Aggregation (with
>>>   partitioning) or Composite Aggregation.
>>>   3. If possible Retrieve batch size from JDBC api (known as
>>>   Statement.setFetchSize(int)). If not default to 10k ?
>>> 
>>> Does it sound reasonable ?
>>> 
>>> 
>>> On Thu, Oct 25, 2018 at 3:43 PM
>>> Kevin Risden
>>> <kr...@apache.org> wrote:
>>> 
>>>> What I was saying is that scrolling is the only way to ensure you get
>>>> correct results coming back from Elasticsearch if you are going to do
>>> more
>>>> processing.
>>>> 
>>>> 
>>>> 
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>>>> 
>>>> As long as you delete the scroll at the end its not the end of the
>> world.
>>>> There is no deep paging unless you use scrolling or a search after
>> which
>>> is
>>>> basically the same idea as scrolling
>>>> 
>>>> 
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
>>>> .
>>>> 
>>>> I would say that scrolling should be the default behavior and then
>> should
>>>> look at optimizing for aggregations, smaller result sets, etc. If you
>>> look
>>>> at scrolling as an optimization the default results won't be correct
>> (ie:
>>>> only returning 10 results when there are a lot more). For default
>>>> pagination there is a limit of 10000 results anyway unless you use
>>>> scrolling or search after.
>>>> 
>>>> Kevin Risden
>>>> 
>>>> 
>>>> On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <an...@sereda.cc>
>> wrote:
>>>> 
>>>>> Hi Kevin,
>>>>> 
>>>>> You suggest to use scrolling for all elastic queries ? Even when
>> there
>>>>> are predicates ?  Some questions :
>>>>> 
>>>>> 1) Scrolling has a runtime overhead for elastic cluster. Having it
>>>>> enabled by default (against vendor recommendation) is risky. Does it
>>>>> (not) cause issues in Solr ?
>>>>> 2) Scrolling does not work with aggregates, which means I'll have to
>>>>> fallback to pagination (for whole result set).
>>>>> 3) When result set is small scrolling / pagination is not necessary.
>>>>> 
>>>>> Having said that I don't see any way to automatically enable /
>> disable
>>>>> scrolling based on RelTree (pls let me know if there is one).
>>>>> 
>>>>> You seem to agree with Stamatis that `select * from elastic` (1)
>>>>> should return all documents (rows) not just 10 (ES default). If so,
>>>>> one either has to use pagination or scrolling.
>>>>> What about queries `select count(*), foo from elastic group by foo`
>>>>> (2) (term aggregation). In this case only pagination works. If
>> calcite
>>>>> returns all rows for (1) it should do the same for (2).  Should one
>>>>> use pagination with aggregates (2) and scrolling for simple queries
>>>>> (1) ?
>>>>> What if user specifies a high `LIMIT` (eg 10M) should we batch the
>>>> result ?
>>>>> Should user explicitly enable scrolling (eg. JDBC api) ? If so,
>> should
>>>>> one throw an exception when scrolling and aggregation is used or
>>>>> quietly default to pagination ?
>>>>> 
>>>>> Thanks everybody for your input, it helps.
>>>>> 
>>>>> Regards,
>>>>> Andrei.
>>>>> 
>>>>> 
>>>>> On Thu, Oct 25, 2018 at 2:08 PM
>>>>> Kevin Risden
>>>>> <kr...@apache.org> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> There is one more “issue”. Currently select * from elastic
>> returns
>>> at
>>>>> most
>>>>>>> 10 rows (in calcite). This is consistent with elastic behaviour
>>> which
>>>>>>> limits result set to 10 documents (unless size is specified).
>>>>>> 
>>>>>> 
>>>>>> In Solr land for the Calcite integration it uses the /export
>> handler
>>> or
>>>>>> streaming expressions when there is no limit specified (these are
>>> just
>>>>> like
>>>>>> scrolling or called CursorMark in Solr). It falls back to /select
>>> with
>>>>> the
>>>>>> default limit of 10 rows.
>>>>>> 
>>>>>> I think that scroll should be the default almost always since it is
>>> the
>>>>>> only way to return all the results. Even with a limit = 100000
>> would
>>>>> cause
>>>>>> problems with the default query handler. Paging deep into the
>> result
>>>> set
>>>>> is
>>>>>> not something that Lucene (Elasticsearch or Solr) does well.
>>>>>> 
>>>>>> 
>>>>>> Kevin Risden
>>>>>> 
>>>>>> 
>>>>>> On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Do you need to generate a different plan (i.e. a different tree
>> of
>>>>>>> RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
>>>>>>> inconvenient that you don’t know until execute time whether they
>>> want
>>>>>>> scrolling. A possible solution would be to generate TWO plans -
>> one
>>>>>>> scrolling, one non-scrolling inside the prepared statement - and
>>> pick
>>>>> which
>>>>>>> one based on the runtime context.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>>> On Oct 25, 2018, at 1:42 AM, Christian Beikov <
>>>>>>> christian.beikov@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hey Andrei,
>>>>>>>> 
>>>>>>>> I don't have an answer for how you can access these settings
>> from
>>>>> within
>>>>>>> the adapter nor how one could do that via RelNodes but the
>>> suggestion
>>>>> to
>>>>>>> use DataContext for that purpose sounds reasonable. Maybe someone
>>>> else
>>>>> has
>>>>>>> an idea?
>>>>>>>> 
>>>>>>>> Anyway, since these are settings that don't affect the general
>>>>> semantics
>>>>>>> of the query/statement and also usually require a special API to
>> be
>>>>> used,
>>>>>>> I'd rather see these aspects not end up in the query string.
>>>>>>>> 
>>>>>>>> Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
>>>>>>>>> Christian,
>>>>>>>>> 
>>>>>>>>> I like TYPE_SCROLL_INSENSITIVE / fetchSize in
>> PreparedStatement
>>>>>>>>> generally but have some reservations (questions) :
>>>>>>>>> 
>>>>>>>>> How to pass resultSetType / fetchSize from PreparedStatement
>> to
>>>>>>> RelNodes ?
>>>>>>>>> What if user doesn’t use JDBC (eg. RelBuilders) ?
>>>>>>>>> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
>>>>>>>>> <ch...@gmail.com> wrote:
>>>>>>>>>> In JDBC one can configure a fetch size which would reflect
>> the
>>>>> amount
>>>>>>> of
>>>>>>>>>> rows to be fetched initially, but also subsequently.
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
>>>>>>>>>> 
>>>>>>>>>> According to what you are writing, ES behvior is what
>>>>>>>>>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view
>>>> that
>>>>>>> isn't
>>>>>>>>>> affected by changes.
>>>>>>>>>> 
>>>>>>>>>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2,
>>> R3,
>>>>> R4,
>>>>>>>>>> ... and view R1, R2, then R3 is deleted and you fetch the
>> next
>>>>> rows,
>>>>>>> you
>>>>>>>>>> wouldn't see R3.
>>>>>>>>>> 
>>>>>>>>>> According to the JDBC spec
>>>>>>>>>> (
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
>>>>>>>>>> ) you don't have to support all modes. Usually, user code
>>> doesn't
>>>>> use
>>>>>>>>>> scrolling that much, but at least forward scrolling makes
>>> sense.
>>>>>>>>>> 
>>>>>>>>>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
>>>>>>>>>>> Hi Julian,
>>>>>>>>>>> 
>>>>>>>>>>> Scrolling (in elastic) does not only mean “open a cursor”
>> but
>>>> also
>>>>>>> iterate
>>>>>>>>>>> over consistent snapshot. From docs:
>>>>>>>>>>> 
>>>>>>>>>>> The results that are returned from a scroll request reflect
>>> the
>>>>> state
>>>>>>> of
>>>>>>>>>>> the index at the time that the initial search request was
>>> made,
>>>>> like a
>>>>>>>>>>> snapshot in time. Subsequent changes to documents (index,
>>> update
>>>>> or
>>>>>>> delete)
>>>>>>>>>>> will only affect later search requests.
>>>>>>>>>>> 
>>>>>>>>>>> So pagination (fetch / offset) can’t exactly replicate this
>>>>>>> functionality.
>>>>>>>>>>> 
>>>>>>>>>>> The problem with scrolling (in elastic) is that it is
>>> expensive
>>>>> and
>>>>>>> can’t
>>>>>>>>>>> (shouldn’t) be enabled it by default.
>>>>>>>>>>> 
>>>>>>>>>>> There is one more “issue”. Currently select * from elastic
>>>>> returns at
>>>>>>> most
>>>>>>>>>>> 10 rows (in calcite). This is consistent with elastic
>>> behaviour
>>>>> which
>>>>>>>>>>> limits result set to 10 documents (unless size is
>> specified).
>>>> When
>>>>>>>>>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
>>>>>>>>>>> <
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
>>>>>>>> 
>>>>>>>>>>> or SQL hint) does it mean return whole elastic index ? I’m
>> not
>>>> at
>>>>>>> ease with
>>>>>>>>>>> returning different results based on hints or cursor
>> settings.
>>>>>>>>>>> 
>>>>>>>>>>> Andrei.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <
>>>>> jhyde.apache@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It seems to me that Elasticsearch scroll means return a
>>> cursor
>>>> -
>>>>> a
>>>>>>>>>>>> collection of rows that you iterate over, and you may not
>>> read
>>>>> all
>>>>>>> of them.
>>>>>>>>>>>> This is the default operation of JDBC.
>>>>>>>>>>>> 
>>>>>>>>>>>> So, I guess we need to give the user a way to signal their
>>>>> intent to
>>>>>>> read
>>>>>>>>>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and
>>>>> ALL_ROWS
>>>>>>>>>>>> hints[1] seem close to this. We would want the hints to be
>>>> acted
>>>>>>> upon by
>>>>>>>>>>>> both the optimizer and the JDBC transport.
>>>>>>>>>>>> 
>>>>>>>>>>>> Related is pagination. SQL has FETCH and OFFSET, which
>> allow
>>>> you
>>>>> to
>>>>>>>>>>>> retrieve different pieces of a large result set in separate
>>>>>>> statements or
>>>>>>>>>>>> (using query parameters) executions. It would be useful if
>>> the
>>>>>>> server could
>>>>>>>>>>>> be given a hint to cache a statement across page requests.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
>>>>>>>>>>>> christian.beikov@gmail.com> wrote:
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> not sure if this should be an SQL keyword. JDBC specifies
>>>>> various
>>>>>>>>>>>> constants that can be used at statement creation time:
>>>>>>>>>>>> 
>>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
>>>>>>>>>>>>> Not sure though if or how these configurations are
>>> accessible
>>>>> for
>>>>>>> data
>>>>>>>>>>>> stores or dialects, but IMO using these would be the proper
>>>> way.
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Christian
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I was thinking about adding [scrolling functionality](
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>>>>>>>>>>>> )
>>>>>>>>>>>>>> to elastic search adapter. Since scrolling has
>>> non-negligible
>>>>>>> effect on
>>>>>>>>>>>> the
>>>>>>>>>>>>>> cluster it should be selectively enabled on per query
>>> basis.
>>>>> So,
>>>>>>> likely,
>>>>>>>>>>>>>> user has to explicitly set "scroll flag" somewhere.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
>>>>>>>>>>>>>> https://calcite.apache.org/docs/reference.html) has
>>> `SCROLL`
>>>>>>> keyword
>>>>>>>>>>>>>> (unused to my knowledge). There were also discussions
>> about
>>>>> adding
>>>>>>>>>>>> hints to
>>>>>>>>>>>>>> Calcite.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ### Examples
>>>>>>>>>>>>>> ```sql
>>>>>>>>>>>>>> -- special sql keyword ?
>>>>>>>>>>>>>> SCROLL select * from elastic;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- assuming hints are available in calcite
>>>>>>>>>>>>>> /* HINT: scroll */ select * from elastic;
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What people think about this use-case ? Are there better
>>>> ideas
>>>>> ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Andrei.
>>>>>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Thanks, all, for your input.

Following this discussion, I have prepared a PR
https://github.com/apache/calcite/pull/919

Let me know if you agree with the approach. If there are no objections /
comments, I will commit it in a couple of days.

Propagation of Statement.setFetchSize(int) into actual query will be done
in separate commit (I’m still checking DataContext usages)

On Fri, Oct 26, 2018 at 10:45 AM Kevin Risden <kr...@apache.org> wrote:

> #1 sounds reasonable.
> #2 will have to see what that means for actual aggregations. Sounds like it
> could work.
> #3 not sure what is meant here. I think fetch size is a hint and not
> required. I wouldn't default to 10k though.
>
> Kevin Risden
>
>
> On Thu, Oct 25, 2018 at 4:25 PM Andrei Sereda <an...@sereda.cc> wrote:
>
> > There is new Composite Aggregation
> > <
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> > >
> > (still in beta) which allows pagination with aggregates. It is available
> on
> > versions >= 5.6 (unfortunately we have to support 2.6+).
> >
> > To sum up:
> >
> >    1. For non-aggregate queries always use scrolling or search-after by
> >    default (not pagination) .
> >    2. For aggregates use pagination. Either Terms Aggregation (with
> >    partitioning) or Composite Aggregation.
> >    3. If possible Retrieve batch size from JDBC api (known as
> >    Statement.setFetchSize(int)). If not default to 10k ?
> >
> > Does it sound reasonable ?
> >
> >
> > On Thu, Oct 25, 2018 at 3:43 PM
> > Kevin Risden
> > <kr...@apache.org> wrote:
> >
> > > What I was saying is that scrolling is the only way to ensure you get
> > > correct results coming back from Elasticsearch if you are going to do
> > more
> > > processing.
> > >
> > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > >
> > > As long as you delete the scroll at the end its not the end of the
> world.
> > > There is no deep paging unless you use scrolling or a search after
> which
> > is
> > > basically the same idea as scrolling
> > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
> > > .
> > >
> > > I would say that scrolling should be the default behavior and then
> should
> > > look at optimizing for aggregations, smaller result sets, etc. If you
> > look
> > > at scrolling as an optimization the default results won't be correct
> (ie:
> > > only returning 10 results when there are a lot more). For default
> > > pagination there is a limit of 10000 results anyway unless you use
> > > scrolling or search after.
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <an...@sereda.cc>
> wrote:
> > >
> > > > Hi Kevin,
> > > >
> > > > You suggest to use scrolling for all elastic queries ? Even when
> there
> > > > are predicates ?  Some questions :
> > > >
> > > > 1) Scrolling has a runtime overhead for elastic cluster. Having it
> > > > enabled by default (against vendor recommendation) is risky. Does it
> > > > (not) cause issues in Solr ?
> > > > 2) Scrolling does not work with aggregates, which means I'll have to
> > > > fallback to pagination (for whole result set).
> > > > 3) When result set is small scrolling / pagination is not necessary.
> > > >
> > > > Having said that I don't see any way to automatically enable /
> disable
> > > > scrolling based on RelTree (pls let me know if there is one).
> > > >
> > > > You seem to agree with Stamatis that `select * from elastic` (1)
> > > > should return all documents (rows) not just 10 (ES default). If so,
> > > > one either has to use pagination or scrolling.
> > > > What about queries `select count(*), foo from elastic group by foo`
> > > > (2) (term aggregation). In this case only pagination works. If
> calcite
> > > > returns all rows for (1) it should do the same for (2).  Should one
> > > > use pagination with aggregates (2) and scrolling for simple queries
> > > > (1) ?
> > > > What if user specifies a high `LIMIT` (eg 10M) should we batch the
> > > result ?
> > > > Should user explicitly enable scrolling (eg. JDBC api) ? If so,
> should
> > > > one throw an exception when scrolling and aggregation is used or
> > > > quietly default to pagination ?
> > > >
> > > > Thanks everybody for your input, it helps.
> > > >
> > > > Regards,
> > > > Andrei.
> > > >
> > > >
> > > > On Thu, Oct 25, 2018 at 2:08 PM
> > > > Kevin Risden
> > > > <kr...@apache.org> wrote:
> > > > >
> > > > > >
> > > > > > There is one more “issue”. Currently select * from elastic
> returns
> > at
> > > > most
> > > > > > 10 rows (in calcite). This is consistent with elastic behaviour
> > which
> > > > > > limits result set to 10 documents (unless size is specified).
> > > > >
> > > > >
> > > > > In Solr land for the Calcite integration it uses the /export
> handler
> > or
> > > > > streaming expressions when there is no limit specified (these are
> > just
> > > > like
> > > > > scrolling or called CursorMark in Solr). It falls back to /select
> > with
> > > > the
> > > > > default limit of 10 rows.
> > > > >
> > > > > I think that scroll should be the default almost always since it is
> > the
> > > > > only way to return all the results. Even with a limit = 100000
> would
> > > > cause
> > > > > problems with the default query handler. Paging deep into the
> result
> > > set
> > > > is
> > > > > not something that Lucene (Elasticsearch or Solr) does well.
> > > > >
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > >
> > > > > On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org>
> > wrote:
> > > > >
> > > > > > Do you need to generate a different plan (i.e. a different tree
> of
> > > > > > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> > > > > > inconvenient that you don’t know until execute time whether they
> > want
> > > > > > scrolling. A possible solution would be to generate TWO plans -
> one
> > > > > > scrolling, one non-scrolling inside the prepared statement - and
> > pick
> > > > which
> > > > > > one based on the runtime context.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> > > > > > christian.beikov@gmail.com> wrote:
> > > > > > >
> > > > > > > Hey Andrei,
> > > > > > >
> > > > > > > I don't have an answer for how you can access these settings
> from
> > > > within
> > > > > > the adapter nor how one could do that via RelNodes but the
> > suggestion
> > > > to
> > > > > > use DataContext for that purpose sounds reasonable. Maybe someone
> > > else
> > > > has
> > > > > > an idea?
> > > > > > >
> > > > > > > Anyway, since these are settings that don't affect the general
> > > > semantics
> > > > > > of the query/statement and also usually require a special API to
> be
> > > > used,
> > > > > > I'd rather see these aspects not end up in the query string.
> > > > > > >
> > > > > > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> > > > > > >> Christian,
> > > > > > >>
> > > > > > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in
> PreparedStatement
> > > > > > >> generally but have some reservations (questions) :
> > > > > > >>
> > > > > > >> How to pass resultSetType / fetchSize from PreparedStatement
> to
> > > > > > RelNodes ?
> > > > > > >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> > > > > > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> > > > > > >> <ch...@gmail.com> wrote:
> > > > > > >>> In JDBC one can configure a fetch size which would reflect
> the
> > > > amount
> > > > > > of
> > > > > > >>> rows to be fetched initially, but also subsequently.
> > > > > > >>>
> > > > > >
> > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> > > > > > >>>
> > > > > > >>> According to what you are writing, ES behvior is what
> > > > > > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view
> > > that
> > > > > > isn't
> > > > > > >>> affected by changes.
> > > > > > >>>
> > > > > > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2,
> > R3,
> > > > R4,
> > > > > > >>> ... and view R1, R2, then R3 is deleted and you fetch the
> next
> > > > rows,
> > > > > > you
> > > > > > >>> wouldn't see R3.
> > > > > > >>>
> > > > > > >>> According to the JDBC spec
> > > > > > >>> (
> > > > > >
> > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > > > > > >>> ) you don't have to support all modes. Usually, user code
> > doesn't
> > > > use
> > > > > > >>> scrolling that much, but at least forward scrolling makes
> > sense.
> > > > > > >>>
> > > > > > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > > > > >>>> Hi Julian,
> > > > > > >>>>
> > > > > > >>>> Scrolling (in elastic) does not only mean “open a cursor”
> but
> > > also
> > > > > > iterate
> > > > > > >>>> over consistent snapshot. From docs:
> > > > > > >>>>
> > > > > > >>>> The results that are returned from a scroll request reflect
> > the
> > > > state
> > > > > > of
> > > > > > >>>> the index at the time that the initial search request was
> > made,
> > > > like a
> > > > > > >>>> snapshot in time. Subsequent changes to documents (index,
> > update
> > > > or
> > > > > > delete)
> > > > > > >>>> will only affect later search requests.
> > > > > > >>>>
> > > > > > >>>> So pagination (fetch / offset) can’t exactly replicate this
> > > > > > functionality.
> > > > > > >>>>
> > > > > > >>>> The problem with scrolling (in elastic) is that it is
> > expensive
> > > > and
> > > > > > can’t
> > > > > > >>>> (shouldn’t) be enabled it by default.
> > > > > > >>>>
> > > > > > >>>> There is one more “issue”. Currently select * from elastic
> > > > returns at
> > > > > > most
> > > > > > >>>> 10 rows (in calcite). This is consistent with elastic
> > behaviour
> > > > which
> > > > > > >>>> limits result set to 10 documents (unless size is
> specified).
> > > When
> > > > > > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > > > > >>>> <
> > > > > >
> > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > > > > > >
> > > > > > >>>> or SQL hint) does it mean return whole elastic index ? I’m
> not
> > > at
> > > > > > ease with
> > > > > > >>>> returning different results based on hints or cursor
> settings.
> > > > > > >>>>
> > > > > > >>>> Andrei.
> > > > > > >>>>
> > > > > > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <
> > > > jhyde.apache@gmail.com>
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> It seems to me that Elasticsearch scroll means return a
> > cursor
> > > -
> > > > a
> > > > > > >>>>> collection of rows that you iterate over, and you may not
> > read
> > > > all
> > > > > > of them.
> > > > > > >>>>> This is the default operation of JDBC.
> > > > > > >>>>>
> > > > > > >>>>> So, I guess we need to give the user a way to signal their
> > > > intent to
> > > > > > read
> > > > > > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and
> > > > ALL_ROWS
> > > > > > >>>>> hints[1] seem close to this. We would want the hints to be
> > > acted
> > > > > > upon by
> > > > > > >>>>> both the optimizer and the JDBC transport.
> > > > > > >>>>>
> > > > > > >>>>> Related is pagination. SQL has FETCH and OFFSET, which
> allow
> > > you
> > > > to
> > > > > > >>>>> retrieve different pieces of a large result set in separate
> > > > > > statements or
> > > > > > >>>>> (using query parameters) executions. It would be useful if
> > the
> > > > > > server could
> > > > > > >>>>> be given a hint to cache a statement across page requests.
> > > > > > >>>>>
> > > > > > >>>>> Julian
> > > > > > >>>>>
> > > > > > >>>>> [1]
> > > > > > >>>>>
> > > > > >
> > > >
> > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > > > > > >>>>>
> > > > > > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > > > > >>>>> christian.beikov@gmail.com> wrote:
> > > > > > >>>>>> Hey,
> > > > > > >>>>>>
> > > > > > >>>>>> not sure if this should be an SQL keyword. JDBC specifies
> > > > various
> > > > > > >>>>> constants that can be used at statement creation time:
> > > > > > >>>>>
> > > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > > > > >>>>>> Not sure though if or how these configurations are
> > accessible
> > > > for
> > > > > > data
> > > > > > >>>>> stores or dialects, but IMO using these would be the proper
> > > way.
> > > > > > >>>>>> Regards
> > > > > > >>>>>>
> > > > > > >>>>>> Christian
> > > > > > >>>>>>
> > > > > > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > > > > >>>>>>> Hello,
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> I was thinking about adding [scrolling functionality](
> > > > > > >>>>>>>
> > > > > > >>>>>
> > > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > > > > >>>>> )
> > > > > > >>>>>>> to elastic search adapter. Since scrolling has
> > non-negligible
> > > > > > effect on
> > > > > > >>>>> the
> > > > > > >>>>>>> cluster it should be selectively enabled on per query
> > basis.
> > > > So,
> > > > > > likely,
> > > > > > >>>>>>> user has to explicitly set "scroll flag" somewhere.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> > > > > > >>>>>>> https://calcite.apache.org/docs/reference.html) has
> > `SCROLL`
> > > > > > keyword
> > > > > > >>>>>>> (unused to my knowledge). There were also discussions
> about
> > > > adding
> > > > > > >>>>> hints to
> > > > > > >>>>>>> Calcite.
> > > > > > >>>>>>>
> > > > > > >>>>>>> ### Examples
> > > > > > >>>>>>> ```sql
> > > > > > >>>>>>> -- special sql keyword ?
> > > > > > >>>>>>> SCROLL select * from elastic;
> > > > > > >>>>>>>
> > > > > > >>>>>>> -- assuming hints are available in calcite
> > > > > > >>>>>>> /* HINT: scroll */ select * from elastic;
> > > > > > >>>>>>> ```
> > > > > > >>>>>>>
> > > > > > >>>>>>> What people think about this use-case ? Are there better
> > > ideas
> > > > ?
> > > > > > >>>>>>>
> > > > > > >>>>>>> Regards,
> > > > > > >>>>>>> Andrei.
> > > > > > >>>>>>>
> > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: scrolling in ElasticSearch adapter

Posted by Kevin Risden <kr...@apache.org>.
#1 sounds reasonable.
#2 will have to see what that means for actual aggregations. Sounds like it
could work.
#3 not sure what is meant here. I think fetch size is a hint and not
required. I wouldn't default to 10k though.

Kevin Risden


On Thu, Oct 25, 2018 at 4:25 PM Andrei Sereda <an...@sereda.cc> wrote:

> There is new Composite Aggregation
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> >
> (still in beta) which allows pagination with aggregates. It is available on
> versions >= 5.6 (unfortunately we have to support 2.6+).
>
> To sum up:
>
>    1. For non-aggregate queries always use scrolling or search-after by
>    default (not pagination) .
>    2. For aggregates use pagination. Either Terms Aggregation (with
>    partitioning) or Composite Aggregation.
>    3. If possible Retrieve batch size from JDBC api (known as
>    Statement.setFetchSize(int)). If not default to 10k ?
>
> Does it sound reasonable ?
>
>
> On Thu, Oct 25, 2018 at 3:43 PM
> Kevin Risden
> <kr...@apache.org> wrote:
>
> > What I was saying is that scrolling is the only way to ensure you get
> > correct results coming back from Elasticsearch if you are going to do
> more
> > processing.
> >
> >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> >
> > As long as you delete the scroll at the end its not the end of the world.
> > There is no deep paging unless you use scrolling or a search after which
> is
> > basically the same idea as scrolling
> >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
> > .
> >
> > I would say that scrolling should be the default behavior and then should
> > look at optimizing for aggregations, smaller result sets, etc. If you
> look
> > at scrolling as an optimization the default results won't be correct (ie:
> > only returning 10 results when there are a lot more). For default
> > pagination there is a limit of 10000 results anyway unless you use
> > scrolling or search after.
> >
> > Kevin Risden
> >
> >
> > On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <an...@sereda.cc> wrote:
> >
> > > Hi Kevin,
> > >
> > > You suggest to use scrolling for all elastic queries ? Even when there
> > > are predicates ?  Some questions :
> > >
> > > 1) Scrolling has a runtime overhead for elastic cluster. Having it
> > > enabled by default (against vendor recommendation) is risky. Does it
> > > (not) cause issues in Solr ?
> > > 2) Scrolling does not work with aggregates, which means I'll have to
> > > fallback to pagination (for whole result set).
> > > 3) When result set is small scrolling / pagination is not necessary.
> > >
> > > Having said that I don't see any way to automatically enable / disable
> > > scrolling based on RelTree (pls let me know if there is one).
> > >
> > > You seem to agree with Stamatis that `select * from elastic` (1)
> > > should return all documents (rows) not just 10 (ES default). If so,
> > > one either has to use pagination or scrolling.
> > > What about queries `select count(*), foo from elastic group by foo`
> > > (2) (term aggregation). In this case only pagination works. If calcite
> > > returns all rows for (1) it should do the same for (2).  Should one
> > > use pagination with aggregates (2) and scrolling for simple queries
> > > (1) ?
> > > What if user specifies a high `LIMIT` (eg 10M) should we batch the
> > result ?
> > > Should user explicitly enable scrolling (eg. JDBC api) ? If so, should
> > > one throw an exception when scrolling and aggregation is used or
> > > quietly default to pagination ?
> > >
> > > Thanks everybody for your input, it helps.
> > >
> > > Regards,
> > > Andrei.
> > >
> > >
> > > On Thu, Oct 25, 2018 at 2:08 PM
> > > Kevin Risden
> > > <kr...@apache.org> wrote:
> > > >
> > > > >
> > > > > There is one more “issue”. Currently select * from elastic returns
> at
> > > most
> > > > > 10 rows (in calcite). This is consistent with elastic behaviour
> which
> > > > > limits result set to 10 documents (unless size is specified).
> > > >
> > > >
> > > > In Solr land for the Calcite integration it uses the /export handler
> or
> > > > streaming expressions when there is no limit specified (these are
> just
> > > like
> > > > scrolling or called CursorMark in Solr). It falls back to /select
> with
> > > the
> > > > default limit of 10 rows.
> > > >
> > > > I think that scroll should be the default almost always since it is
> the
> > > > only way to return all the results. Even with a limit = 100000 would
> > > cause
> > > > problems with the default query handler. Paging deep into the result
> > set
> > > is
> > > > not something that Lucene (Elasticsearch or Solr) does well.
> > > >
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org>
> wrote:
> > > >
> > > > > Do you need to generate a different plan (i.e. a different tree of
> > > > > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> > > > > inconvenient that you don’t know until execute time whether they
> want
> > > > > scrolling. A possible solution would be to generate TWO plans - one
> > > > > scrolling, one non-scrolling inside the prepared statement - and
> pick
> > > which
> > > > > one based on the runtime context.
> > > > >
> > > > > Julian
> > > > >
> > > > > > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> > > > > christian.beikov@gmail.com> wrote:
> > > > > >
> > > > > > Hey Andrei,
> > > > > >
> > > > > > I don't have an answer for how you can access these settings from
> > > within
> > > > > the adapter nor how one could do that via RelNodes but the
> suggestion
> > > to
> > > > > use DataContext for that purpose sounds reasonable. Maybe someone
> > else
> > > has
> > > > > an idea?
> > > > > >
> > > > > > Anyway, since these are settings that don't affect the general
> > > semantics
> > > > > of the query/statement and also usually require a special API to be
> > > used,
> > > > > I'd rather see these aspects not end up in the query string.
> > > > > >
> > > > > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> > > > > >> Christian,
> > > > > >>
> > > > > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> > > > > >> generally but have some reservations (questions) :
> > > > > >>
> > > > > >> How to pass resultSetType / fetchSize from PreparedStatement to
> > > > > RelNodes ?
> > > > > >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> > > > > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> > > > > >> <ch...@gmail.com> wrote:
> > > > > >>> In JDBC one can configure a fetch size which would reflect the
> > > amount
> > > > > of
> > > > > >>> rows to be fetched initially, but also subsequently.
> > > > > >>>
> > > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> > > > > >>>
> > > > > >>> According to what you are writing, ES behvior is what
> > > > > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view
> > that
> > > > > isn't
> > > > > >>> affected by changes.
> > > > > >>>
> > > > > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2,
> R3,
> > > R4,
> > > > > >>> ... and view R1, R2, then R3 is deleted and you fetch the next
> > > rows,
> > > > > you
> > > > > >>> wouldn't see R3.
> > > > > >>>
> > > > > >>> According to the JDBC spec
> > > > > >>> (
> > > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > > > > >>> ) you don't have to support all modes. Usually, user code
> doesn't
> > > use
> > > > > >>> scrolling that much, but at least forward scrolling makes
> sense.
> > > > > >>>
> > > > > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > > > >>>> Hi Julian,
> > > > > >>>>
> > > > > >>>> Scrolling (in elastic) does not only mean “open a cursor” but
> > also
> > > > > iterate
> > > > > >>>> over consistent snapshot. From docs:
> > > > > >>>>
> > > > > >>>> The results that are returned from a scroll request reflect
> the
> > > state
> > > > > of
> > > > > >>>> the index at the time that the initial search request was
> made,
> > > like a
> > > > > >>>> snapshot in time. Subsequent changes to documents (index,
> update
> > > or
> > > > > delete)
> > > > > >>>> will only affect later search requests.
> > > > > >>>>
> > > > > >>>> So pagination (fetch / offset) can’t exactly replicate this
> > > > > functionality.
> > > > > >>>>
> > > > > >>>> The problem with scrolling (in elastic) is that it is
> expensive
> > > and
> > > > > can’t
> > > > > >>>> (shouldn’t) be enabled it by default.
> > > > > >>>>
> > > > > >>>> There is one more “issue”. Currently select * from elastic
> > > returns at
> > > > > most
> > > > > >>>> 10 rows (in calcite). This is consistent with elastic
> behaviour
> > > which
> > > > > >>>> limits result set to 10 documents (unless size is specified).
> > When
> > > > > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > > > >>>> <
> > > > >
> > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > > > > >
> > > > > >>>> or SQL hint) does it mean return whole elastic index ? I’m not
> > at
> > > > > ease with
> > > > > >>>> returning different results based on hints or cursor settings.
> > > > > >>>>
> > > > > >>>> Andrei.
> > > > > >>>>
> > > > > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <
> > > jhyde.apache@gmail.com>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> It seems to me that Elasticsearch scroll means return a
> cursor
> > -
> > > a
> > > > > >>>>> collection of rows that you iterate over, and you may not
> read
> > > all
> > > > > of them.
> > > > > >>>>> This is the default operation of JDBC.
> > > > > >>>>>
> > > > > >>>>> So, I guess we need to give the user a way to signal their
> > > intent to
> > > > > read
> > > > > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and
> > > ALL_ROWS
> > > > > >>>>> hints[1] seem close to this. We would want the hints to be
> > acted
> > > > > upon by
> > > > > >>>>> both the optimizer and the JDBC transport.
> > > > > >>>>>
> > > > > >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow
> > you
> > > to
> > > > > >>>>> retrieve different pieces of a large result set in separate
> > > > > statements or
> > > > > >>>>> (using query parameters) executions. It would be useful if
> the
> > > > > server could
> > > > > >>>>> be given a hint to cache a statement across page requests.
> > > > > >>>>>
> > > > > >>>>> Julian
> > > > > >>>>>
> > > > > >>>>> [1]
> > > > > >>>>>
> > > > >
> > >
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > > > > >>>>>
> > > > > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > > > >>>>> christian.beikov@gmail.com> wrote:
> > > > > >>>>>> Hey,
> > > > > >>>>>>
> > > > > >>>>>> not sure if this should be an SQL keyword. JDBC specifies
> > > various
> > > > > >>>>> constants that can be used at statement creation time:
> > > > > >>>>>
> > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > > > >>>>>> Not sure though if or how these configurations are
> accessible
> > > for
> > > > > data
> > > > > >>>>> stores or dialects, but IMO using these would be the proper
> > way.
> > > > > >>>>>> Regards
> > > > > >>>>>>
> > > > > >>>>>> Christian
> > > > > >>>>>>
> > > > > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > > > >>>>>>> Hello,
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> I was thinking about adding [scrolling functionality](
> > > > > >>>>>>>
> > > > > >>>>>
> > > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > > > >>>>> )
> > > > > >>>>>>> to elastic search adapter. Since scrolling has
> non-negligible
> > > > > effect on
> > > > > >>>>> the
> > > > > >>>>>>> cluster it should be selectively enabled on per query
> basis.
> > > So,
> > > > > likely,
> > > > > >>>>>>> user has to explicitly set "scroll flag" somewhere.
> > > > > >>>>>>>
> > > > > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> > > > > >>>>>>> https://calcite.apache.org/docs/reference.html) has
> `SCROLL`
> > > > > keyword
> > > > > >>>>>>> (unused to my knowledge). There were also discussions about
> > > adding
> > > > > >>>>> hints to
> > > > > >>>>>>> Calcite.
> > > > > >>>>>>>
> > > > > >>>>>>> ### Examples
> > > > > >>>>>>> ```sql
> > > > > >>>>>>> -- special sql keyword ?
> > > > > >>>>>>> SCROLL select * from elastic;
> > > > > >>>>>>>
> > > > > >>>>>>> -- assuming hints are available in calcite
> > > > > >>>>>>> /* HINT: scroll */ select * from elastic;
> > > > > >>>>>>> ```
> > > > > >>>>>>>
> > > > > >>>>>>> What people think about this use-case ? Are there better
> > ideas
> > > ?
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Andrei.
> > > > > >>>>>>>
> > > > >
> > > > >
> > >
> >
>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
There is new Composite Aggregation
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html>
(still in beta) which allows pagination with aggregates. It is available on
versions >= 5.6 (unfortunately we have to support 2.6+).

To sum up:

   1. For non-aggregate queries always use scrolling or search-after by
   default (not pagination) .
   2. For aggregates use pagination. Either Terms Aggregation (with
   partitioning) or Composite Aggregation.
   3. If possible Retrieve batch size from JDBC api (known as
   Statement.setFetchSize(int)). If not default to 10k ?

Does it sound reasonable ?


On Thu, Oct 25, 2018 at 3:43 PM Kevin Risden <kr...@apache.org> wrote:

> What I was saying is that scrolling is the only way to ensure you get
> correct results coming back from Elasticsearch if you are going to do more
> processing.
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>
> As long as you delete the scroll at the end its not the end of the world.
> There is no deep paging unless you use scrolling or a search after which is
> basically the same idea as scrolling
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
> .
>
> I would say that scrolling should be the default behavior and then should
> look at optimizing for aggregations, smaller result sets, etc. If you look
> at scrolling as an optimization the default results won't be correct (ie:
> only returning 10 results when there are a lot more). For default
> pagination there is a limit of 10000 results anyway unless you use
> scrolling or search after.
>
> Kevin Risden
>
>
> On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <an...@sereda.cc> wrote:
>
> > Hi Kevin,
> >
> > You suggest to use scrolling for all elastic queries ? Even when there
> > are predicates ?  Some questions :
> >
> > 1) Scrolling has a runtime overhead for elastic cluster. Having it
> > enabled by default (against vendor recommendation) is risky. Does it
> > (not) cause issues in Solr ?
> > 2) Scrolling does not work with aggregates, which means I'll have to
> > fallback to pagination (for whole result set).
> > 3) When result set is small scrolling / pagination is not necessary.
> >
> > Having said that I don't see any way to automatically enable / disable
> > scrolling based on RelTree (pls let me know if there is one).
> >
> > You seem to agree with Stamatis that `select * from elastic` (1)
> > should return all documents (rows) not just 10 (ES default). If so,
> > one either has to use pagination or scrolling.
> > What about queries `select count(*), foo from elastic group by foo`
> > (2) (term aggregation). In this case only pagination works. If calcite
> > returns all rows for (1) it should do the same for (2).  Should one
> > use pagination with aggregates (2) and scrolling for simple queries
> > (1) ?
> > What if user specifies a high `LIMIT` (eg 10M) should we batch the
> result ?
> > Should user explicitly enable scrolling (eg. JDBC api) ? If so, should
> > one throw an exception when scrolling and aggregation is used or
> > quietly default to pagination ?
> >
> > Thanks everybody for your input, it helps.
> >
> > Regards,
> > Andrei.
> >
> >
> > On Thu, Oct 25, 2018 at 2:08 PM
> > Kevin Risden
> > <kr...@apache.org> wrote:
> > >
> > > >
> > > > There is one more “issue”. Currently select * from elastic returns at
> > most
> > > > 10 rows (in calcite). This is consistent with elastic behaviour which
> > > > limits result set to 10 documents (unless size is specified).
> > >
> > >
> > > In Solr land for the Calcite integration it uses the /export handler or
> > > streaming expressions when there is no limit specified (these are just
> > like
> > > scrolling or called CursorMark in Solr). It falls back to /select with
> > the
> > > default limit of 10 rows.
> > >
> > > I think that scroll should be the default almost always since it is the
> > > only way to return all the results. Even with a limit = 100000 would
> > cause
> > > problems with the default query handler. Paging deep into the result
> set
> > is
> > > not something that Lucene (Elasticsearch or Solr) does well.
> > >
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> wrote:
> > >
> > > > Do you need to generate a different plan (i.e. a different tree of
> > > > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> > > > inconvenient that you don’t know until execute time whether they want
> > > > scrolling. A possible solution would be to generate TWO plans - one
> > > > scrolling, one non-scrolling inside the prepared statement - and pick
> > which
> > > > one based on the runtime context.
> > > >
> > > > Julian
> > > >
> > > > > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> > > > christian.beikov@gmail.com> wrote:
> > > > >
> > > > > Hey Andrei,
> > > > >
> > > > > I don't have an answer for how you can access these settings from
> > within
> > > > the adapter nor how one could do that via RelNodes but the suggestion
> > to
> > > > use DataContext for that purpose sounds reasonable. Maybe someone
> else
> > has
> > > > an idea?
> > > > >
> > > > > Anyway, since these are settings that don't affect the general
> > semantics
> > > > of the query/statement and also usually require a special API to be
> > used,
> > > > I'd rather see these aspects not end up in the query string.
> > > > >
> > > > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> > > > >> Christian,
> > > > >>
> > > > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> > > > >> generally but have some reservations (questions) :
> > > > >>
> > > > >> How to pass resultSetType / fetchSize from PreparedStatement to
> > > > RelNodes ?
> > > > >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> > > > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> > > > >> <ch...@gmail.com> wrote:
> > > > >>> In JDBC one can configure a fetch size which would reflect the
> > amount
> > > > of
> > > > >>> rows to be fetched initially, but also subsequently.
> > > > >>>
> > > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> > > > >>>
> > > > >>> According to what you are writing, ES behvior is what
> > > > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view
> that
> > > > isn't
> > > > >>> affected by changes.
> > > > >>>
> > > > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3,
> > R4,
> > > > >>> ... and view R1, R2, then R3 is deleted and you fetch the next
> > rows,
> > > > you
> > > > >>> wouldn't see R3.
> > > > >>>
> > > > >>> According to the JDBC spec
> > > > >>> (
> > > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > > > >>> ) you don't have to support all modes. Usually, user code doesn't
> > use
> > > > >>> scrolling that much, but at least forward scrolling makes sense.
> > > > >>>
> > > > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > > >>>> Hi Julian,
> > > > >>>>
> > > > >>>> Scrolling (in elastic) does not only mean “open a cursor” but
> also
> > > > iterate
> > > > >>>> over consistent snapshot. From docs:
> > > > >>>>
> > > > >>>> The results that are returned from a scroll request reflect the
> > state
> > > > of
> > > > >>>> the index at the time that the initial search request was made,
> > like a
> > > > >>>> snapshot in time. Subsequent changes to documents (index, update
> > or
> > > > delete)
> > > > >>>> will only affect later search requests.
> > > > >>>>
> > > > >>>> So pagination (fetch / offset) can’t exactly replicate this
> > > > functionality.
> > > > >>>>
> > > > >>>> The problem with scrolling (in elastic) is that it is expensive
> > and
> > > > can’t
> > > > >>>> (shouldn’t) be enabled it by default.
> > > > >>>>
> > > > >>>> There is one more “issue”. Currently select * from elastic
> > returns at
> > > > most
> > > > >>>> 10 rows (in calcite). This is consistent with elastic behaviour
> > which
> > > > >>>> limits result set to 10 documents (unless size is specified).
> When
> > > > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > > >>>> <
> > > >
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > > > >
> > > > >>>> or SQL hint) does it mean return whole elastic index ? I’m not
> at
> > > > ease with
> > > > >>>> returning different results based on hints or cursor settings.
> > > > >>>>
> > > > >>>> Andrei.
> > > > >>>>
> > > > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <
> > jhyde.apache@gmail.com>
> > > > wrote:
> > > > >>>>
> > > > >>>>> It seems to me that Elasticsearch scroll means return a cursor
> -
> > a
> > > > >>>>> collection of rows that you iterate over, and you may not read
> > all
> > > > of them.
> > > > >>>>> This is the default operation of JDBC.
> > > > >>>>>
> > > > >>>>> So, I guess we need to give the user a way to signal their
> > intent to
> > > > read
> > > > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and
> > ALL_ROWS
> > > > >>>>> hints[1] seem close to this. We would want the hints to be
> acted
> > > > upon by
> > > > >>>>> both the optimizer and the JDBC transport.
> > > > >>>>>
> > > > >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow
> you
> > to
> > > > >>>>> retrieve different pieces of a large result set in separate
> > > > statements or
> > > > >>>>> (using query parameters) executions. It would be useful if the
> > > > server could
> > > > >>>>> be given a hint to cache a statement across page requests.
> > > > >>>>>
> > > > >>>>> Julian
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>>>
> > > >
> > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > > > >>>>>
> > > > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > > >>>>> christian.beikov@gmail.com> wrote:
> > > > >>>>>> Hey,
> > > > >>>>>>
> > > > >>>>>> not sure if this should be an SQL keyword. JDBC specifies
> > various
> > > > >>>>> constants that can be used at statement creation time:
> > > > >>>>>
> > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > > >>>>>> Not sure though if or how these configurations are accessible
> > for
> > > > data
> > > > >>>>> stores or dialects, but IMO using these would be the proper
> way.
> > > > >>>>>> Regards
> > > > >>>>>>
> > > > >>>>>> Christian
> > > > >>>>>>
> > > > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > > >>>>>>> Hello,
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> I was thinking about adding [scrolling functionality](
> > > > >>>>>>>
> > > > >>>>>
> > > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > > >>>>> )
> > > > >>>>>>> to elastic search adapter. Since scrolling has non-negligible
> > > > effect on
> > > > >>>>> the
> > > > >>>>>>> cluster it should be selectively enabled on per query basis.
> > So,
> > > > likely,
> > > > >>>>>>> user has to explicitly set "scroll flag" somewhere.
> > > > >>>>>>>
> > > > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> > > > >>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> > > > keyword
> > > > >>>>>>> (unused to my knowledge). There were also discussions about
> > adding
> > > > >>>>> hints to
> > > > >>>>>>> Calcite.
> > > > >>>>>>>
> > > > >>>>>>> ### Examples
> > > > >>>>>>> ```sql
> > > > >>>>>>> -- special sql keyword ?
> > > > >>>>>>> SCROLL select * from elastic;
> > > > >>>>>>>
> > > > >>>>>>> -- assuming hints are available in calcite
> > > > >>>>>>> /* HINT: scroll */ select * from elastic;
> > > > >>>>>>> ```
> > > > >>>>>>>
> > > > >>>>>>> What people think about this use-case ? Are there better
> ideas
> > ?
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> Andrei.
> > > > >>>>>>>
> > > >
> > > >
> >
>

Re: scrolling in ElasticSearch adapter

Posted by Kevin Risden <kr...@apache.org>.
What I was saying is that scrolling is the only way to ensure you get
correct results coming back from Elasticsearch if you are going to do more
processing.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

As long as you delete the scroll at the end its not the end of the world.
There is no deep paging unless you use scrolling or a search after which is
basically the same idea as scrolling
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
.

I would say that scrolling should be the default behavior and then should
look at optimizing for aggregations, smaller result sets, etc. If you look
at scrolling as an optimization the default results won't be correct (ie:
only returning 10 results when there are a lot more). For default
pagination there is a limit of 10000 results anyway unless you use
scrolling or search after.

Kevin Risden


On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <an...@sereda.cc> wrote:

> Hi Kevin,
>
> You suggest to use scrolling for all elastic queries ? Even when there
> are predicates ?  Some questions :
>
> 1) Scrolling has a runtime overhead for elastic cluster. Having it
> enabled by default (against vendor recommendation) is risky. Does it
> (not) cause issues in Solr ?
> 2) Scrolling does not work with aggregates, which means I'll have to
> fallback to pagination (for whole result set).
> 3) When result set is small scrolling / pagination is not necessary.
>
> Having said that I don't see any way to automatically enable / disable
> scrolling based on RelTree (pls let me know if there is one).
>
> You seem to agree with Stamatis that `select * from elastic` (1)
> should return all documents (rows) not just 10 (ES default). If so,
> one either has to use pagination or scrolling.
> What about queries `select count(*), foo from elastic group by foo`
> (2) (term aggregation). In this case only pagination works. If calcite
> returns all rows for (1) it should do the same for (2).  Should one
> use pagination with aggregates (2) and scrolling for simple queries
> (1) ?
> What if user specifies a high `LIMIT` (eg 10M) should we batch the result ?
> Should user explicitly enable scrolling (eg. JDBC api) ? If so, should
> one throw an exception when scrolling and aggregation is used or
> quietly default to pagination ?
>
> Thanks everybody for your input, it helps.
>
> Regards,
> Andrei.
>
>
> On Thu, Oct 25, 2018 at 2:08 PM
> Kevin Risden
> <kr...@apache.org> wrote:
> >
> > >
> > > There is one more “issue”. Currently select * from elastic returns at
> most
> > > 10 rows (in calcite). This is consistent with elastic behaviour which
> > > limits result set to 10 documents (unless size is specified).
> >
> >
> > In Solr land for the Calcite integration it uses the /export handler or
> > streaming expressions when there is no limit specified (these are just
> like
> > scrolling or called CursorMark in Solr). It falls back to /select with
> the
> > default limit of 10 rows.
> >
> > I think that scroll should be the default almost always since it is the
> > only way to return all the results. Even with a limit = 100000 would
> cause
> > problems with the default query handler. Paging deep into the result set
> is
> > not something that Lucene (Elasticsearch or Solr) does well.
> >
> >
> > Kevin Risden
> >
> >
> > On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> wrote:
> >
> > > Do you need to generate a different plan (i.e. a different tree of
> > > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> > > inconvenient that you don’t know until execute time whether they want
> > > scrolling. A possible solution would be to generate TWO plans - one
> > > scrolling, one non-scrolling inside the prepared statement - and pick
> which
> > > one based on the runtime context.
> > >
> > > Julian
> > >
> > > > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> > > christian.beikov@gmail.com> wrote:
> > > >
> > > > Hey Andrei,
> > > >
> > > > I don't have an answer for how you can access these settings from
> within
> > > the adapter nor how one could do that via RelNodes but the suggestion
> to
> > > use DataContext for that purpose sounds reasonable. Maybe someone else
> has
> > > an idea?
> > > >
> > > > Anyway, since these are settings that don't affect the general
> semantics
> > > of the query/statement and also usually require a special API to be
> used,
> > > I'd rather see these aspects not end up in the query string.
> > > >
> > > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> > > >> Christian,
> > > >>
> > > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> > > >> generally but have some reservations (questions) :
> > > >>
> > > >> How to pass resultSetType / fetchSize from PreparedStatement to
> > > RelNodes ?
> > > >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> > > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> > > >> <ch...@gmail.com> wrote:
> > > >>> In JDBC one can configure a fetch size which would reflect the
> amount
> > > of
> > > >>> rows to be fetched initially, but also subsequently.
> > > >>>
> > >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> > > >>>
> > > >>> According to what you are writing, ES behvior is what
> > > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that
> > > isn't
> > > >>> affected by changes.
> > > >>>
> > > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3,
> R4,
> > > >>> ... and view R1, R2, then R3 is deleted and you fetch the next
> rows,
> > > you
> > > >>> wouldn't see R3.
> > > >>>
> > > >>> According to the JDBC spec
> > > >>> (
> > >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > > >>> ) you don't have to support all modes. Usually, user code doesn't
> use
> > > >>> scrolling that much, but at least forward scrolling makes sense.
> > > >>>
> > > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > >>>> Hi Julian,
> > > >>>>
> > > >>>> Scrolling (in elastic) does not only mean “open a cursor” but also
> > > iterate
> > > >>>> over consistent snapshot. From docs:
> > > >>>>
> > > >>>> The results that are returned from a scroll request reflect the
> state
> > > of
> > > >>>> the index at the time that the initial search request was made,
> like a
> > > >>>> snapshot in time. Subsequent changes to documents (index, update
> or
> > > delete)
> > > >>>> will only affect later search requests.
> > > >>>>
> > > >>>> So pagination (fetch / offset) can’t exactly replicate this
> > > functionality.
> > > >>>>
> > > >>>> The problem with scrolling (in elastic) is that it is expensive
> and
> > > can’t
> > > >>>> (shouldn’t) be enabled it by default.
> > > >>>>
> > > >>>> There is one more “issue”. Currently select * from elastic
> returns at
> > > most
> > > >>>> 10 rows (in calcite). This is consistent with elastic behaviour
> which
> > > >>>> limits result set to 10 documents (unless size is specified). When
> > > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > >>>> <
> > >
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > > >
> > > >>>> or SQL hint) does it mean return whole elastic index ? I’m not at
> > > ease with
> > > >>>> returning different results based on hints or cursor settings.
> > > >>>>
> > > >>>> Andrei.
> > > >>>>
> > > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <
> jhyde.apache@gmail.com>
> > > wrote:
> > > >>>>
> > > >>>>> It seems to me that Elasticsearch scroll means return a cursor -
> a
> > > >>>>> collection of rows that you iterate over, and you may not read
> all
> > > of them.
> > > >>>>> This is the default operation of JDBC.
> > > >>>>>
> > > >>>>> So, I guess we need to give the user a way to signal their
> intent to
> > > read
> > > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and
> ALL_ROWS
> > > >>>>> hints[1] seem close to this. We would want the hints to be acted
> > > upon by
> > > >>>>> both the optimizer and the JDBC transport.
> > > >>>>>
> > > >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you
> to
> > > >>>>> retrieve different pieces of a large result set in separate
> > > statements or
> > > >>>>> (using query parameters) executions. It would be useful if the
> > > server could
> > > >>>>> be given a hint to cache a statement across page requests.
> > > >>>>>
> > > >>>>> Julian
> > > >>>>>
> > > >>>>> [1]
> > > >>>>>
> > >
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > > >>>>>
> > > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > >>>>> christian.beikov@gmail.com> wrote:
> > > >>>>>> Hey,
> > > >>>>>>
> > > >>>>>> not sure if this should be an SQL keyword. JDBC specifies
> various
> > > >>>>> constants that can be used at statement creation time:
> > > >>>>>
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > >>>>>> Not sure though if or how these configurations are accessible
> for
> > > data
> > > >>>>> stores or dialects, but IMO using these would be the proper way.
> > > >>>>>> Regards
> > > >>>>>>
> > > >>>>>> Christian
> > > >>>>>>
> > > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > >>>>>>> Hello,
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> I was thinking about adding [scrolling functionality](
> > > >>>>>>>
> > > >>>>>
> > >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > >>>>> )
> > > >>>>>>> to elastic search adapter. Since scrolling has non-negligible
> > > effect on
> > > >>>>> the
> > > >>>>>>> cluster it should be selectively enabled on per query basis.
> So,
> > > likely,
> > > >>>>>>> user has to explicitly set "scroll flag" somewhere.
> > > >>>>>>>
> > > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> > > >>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> > > keyword
> > > >>>>>>> (unused to my knowledge). There were also discussions about
> adding
> > > >>>>> hints to
> > > >>>>>>> Calcite.
> > > >>>>>>>
> > > >>>>>>> ### Examples
> > > >>>>>>> ```sql
> > > >>>>>>> -- special sql keyword ?
> > > >>>>>>> SCROLL select * from elastic;
> > > >>>>>>>
> > > >>>>>>> -- assuming hints are available in calcite
> > > >>>>>>> /* HINT: scroll */ select * from elastic;
> > > >>>>>>> ```
> > > >>>>>>>
> > > >>>>>>> What people think about this use-case ? Are there better ideas
> ?
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Andrei.
> > > >>>>>>>
> > >
> > >
>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Hi Kevin,

You suggest to use scrolling for all elastic queries ? Even when there
are predicates ?  Some questions :

1) Scrolling has a runtime overhead for elastic cluster. Having it
enabled by default (against vendor recommendation) is risky. Does it
(not) cause issues in Solr ?
2) Scrolling does not work with aggregates, which means I'll have to
fallback to pagination (for whole result set).
3) When result set is small scrolling / pagination is not necessary.

Having said that I don't see any way to automatically enable / disable
scrolling based on RelTree (pls let me know if there is one).

You seem to agree with Stamatis that `select * from elastic` (1)
should return all documents (rows) not just 10 (ES default). If so,
one either has to use pagination or scrolling.
What about queries `select count(*), foo from elastic group by foo`
(2) (term aggregation). In this case only pagination works. If calcite
returns all rows for (1) it should do the same for (2).  Should one
use pagination with aggregates (2) and scrolling for simple queries
(1) ?
What if user specifies a high `LIMIT` (eg 10M) should we batch the result ?
Should user explicitly enable scrolling (eg. JDBC api) ? If so, should
one throw an exception when scrolling and aggregation is used or
quietly default to pagination ?

Thanks everybody for your input, it helps.

Regards,
Andrei.


On Thu, Oct 25, 2018 at 2:08 PM Kevin Risden <kr...@apache.org> wrote:
>
> >
> > There is one more “issue”. Currently select * from elastic returns at most
> > 10 rows (in calcite). This is consistent with elastic behaviour which
> > limits result set to 10 documents (unless size is specified).
>
>
> In Solr land for the Calcite integration it uses the /export handler or
> streaming expressions when there is no limit specified (these are just like
> scrolling or called CursorMark in Solr). It falls back to /select with the
> default limit of 10 rows.
>
> I think that scroll should be the default almost always since it is the
> only way to return all the results. Even with a limit = 100000 would cause
> problems with the default query handler. Paging deep into the result set is
> not something that Lucene (Elasticsearch or Solr) does well.
>
>
> Kevin Risden
>
>
> On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> wrote:
>
> > Do you need to generate a different plan (i.e. a different tree of
> > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> > inconvenient that you don’t know until execute time whether they want
> > scrolling. A possible solution would be to generate TWO plans - one
> > scrolling, one non-scrolling inside the prepared statement - and pick which
> > one based on the runtime context.
> >
> > Julian
> >
> > > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> > christian.beikov@gmail.com> wrote:
> > >
> > > Hey Andrei,
> > >
> > > I don't have an answer for how you can access these settings from within
> > the adapter nor how one could do that via RelNodes but the suggestion to
> > use DataContext for that purpose sounds reasonable. Maybe someone else has
> > an idea?
> > >
> > > Anyway, since these are settings that don't affect the general semantics
> > of the query/statement and also usually require a special API to be used,
> > I'd rather see these aspects not end up in the query string.
> > >
> > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> > >> Christian,
> > >>
> > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> > >> generally but have some reservations (questions) :
> > >>
> > >> How to pass resultSetType / fetchSize from PreparedStatement to
> > RelNodes ?
> > >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> > >> <ch...@gmail.com> wrote:
> > >>> In JDBC one can configure a fetch size which would reflect the amount
> > of
> > >>> rows to be fetched initially, but also subsequently.
> > >>>
> > https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> > >>>
> > >>> According to what you are writing, ES behvior is what
> > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that
> > isn't
> > >>> affected by changes.
> > >>>
> > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> > >>> ... and view R1, R2, then R3 is deleted and you fetch the next rows,
> > you
> > >>> wouldn't see R3.
> > >>>
> > >>> According to the JDBC spec
> > >>> (
> > https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > >>> ) you don't have to support all modes. Usually, user code doesn't use
> > >>> scrolling that much, but at least forward scrolling makes sense.
> > >>>
> > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > >>>> Hi Julian,
> > >>>>
> > >>>> Scrolling (in elastic) does not only mean “open a cursor” but also
> > iterate
> > >>>> over consistent snapshot. From docs:
> > >>>>
> > >>>> The results that are returned from a scroll request reflect the state
> > of
> > >>>> the index at the time that the initial search request was made, like a
> > >>>> snapshot in time. Subsequent changes to documents (index, update or
> > delete)
> > >>>> will only affect later search requests.
> > >>>>
> > >>>> So pagination (fetch / offset) can’t exactly replicate this
> > functionality.
> > >>>>
> > >>>> The problem with scrolling (in elastic) is that it is expensive and
> > can’t
> > >>>> (shouldn’t) be enabled it by default.
> > >>>>
> > >>>> There is one more “issue”. Currently select * from elastic returns at
> > most
> > >>>> 10 rows (in calcite). This is consistent with elastic behaviour which
> > >>>> limits result set to 10 documents (unless size is specified). When
> > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > >>>> <
> > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > >
> > >>>> or SQL hint) does it mean return whole elastic index ? I’m not at
> > ease with
> > >>>> returning different results based on hints or cursor settings.
> > >>>>
> > >>>> Andrei.
> > >>>>
> > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> It seems to me that Elasticsearch scroll means return a cursor - a
> > >>>>> collection of rows that you iterate over, and you may not read all
> > of them.
> > >>>>> This is the default operation of JDBC.
> > >>>>>
> > >>>>> So, I guess we need to give the user a way to signal their intent to
> > read
> > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > >>>>> hints[1] seem close to this. We would want the hints to be acted
> > upon by
> > >>>>> both the optimizer and the JDBC transport.
> > >>>>>
> > >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > >>>>> retrieve different pieces of a large result set in separate
> > statements or
> > >>>>> (using query parameters) executions. It would be useful if the
> > server could
> > >>>>> be given a hint to cache a statement across page requests.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > >>>>>
> > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > >>>>> christian.beikov@gmail.com> wrote:
> > >>>>>> Hey,
> > >>>>>>
> > >>>>>> not sure if this should be an SQL keyword. JDBC specifies various
> > >>>>> constants that can be used at statement creation time:
> > >>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > >>>>>> Not sure though if or how these configurations are accessible for
> > data
> > >>>>> stores or dialects, but IMO using these would be the proper way.
> > >>>>>> Regards
> > >>>>>>
> > >>>>>> Christian
> > >>>>>>
> > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I was thinking about adding [scrolling functionality](
> > >>>>>>>
> > >>>>>
> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > >>>>> )
> > >>>>>>> to elastic search adapter. Since scrolling has non-negligible
> > effect on
> > >>>>> the
> > >>>>>>> cluster it should be selectively enabled on per query basis. So,
> > likely,
> > >>>>>>> user has to explicitly set "scroll flag" somewhere.
> > >>>>>>>
> > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> > >>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> > keyword
> > >>>>>>> (unused to my knowledge). There were also discussions about adding
> > >>>>> hints to
> > >>>>>>> Calcite.
> > >>>>>>>
> > >>>>>>> ### Examples
> > >>>>>>> ```sql
> > >>>>>>> -- special sql keyword ?
> > >>>>>>> SCROLL select * from elastic;
> > >>>>>>>
> > >>>>>>> -- assuming hints are available in calcite
> > >>>>>>> /* HINT: scroll */ select * from elastic;
> > >>>>>>> ```
> > >>>>>>>
> > >>>>>>> What people think about this use-case ? Are there better ideas ?
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Andrei.
> > >>>>>>>
> >
> >

Re: scrolling in ElasticSearch adapter

Posted by Kevin Risden <kr...@apache.org>.
>
> There is one more “issue”. Currently select * from elastic returns at most
> 10 rows (in calcite). This is consistent with elastic behaviour which
> limits result set to 10 documents (unless size is specified).


In Solr land for the Calcite integration it uses the /export handler or
streaming expressions when there is no limit specified (these are just like
scrolling or called CursorMark in Solr). It falls back to /select with the
default limit of 10 rows.

I think that scroll should be the default almost always since it is the
only way to return all the results. Even with a limit = 100000 would cause
problems with the default query handler. Paging deep into the result set is
not something that Lucene (Elasticsearch or Solr) does well.


Kevin Risden


On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> wrote:

> Do you need to generate a different plan (i.e. a different tree of
> RelNodes) for scrolling vs non-scrolling? If so, it’s certainly
> inconvenient that you don’t know until execute time whether they want
> scrolling. A possible solution would be to generate TWO plans - one
> scrolling, one non-scrolling inside the prepared statement - and pick which
> one based on the runtime context.
>
> Julian
>
> > On Oct 25, 2018, at 1:42 AM, Christian Beikov <
> christian.beikov@gmail.com> wrote:
> >
> > Hey Andrei,
> >
> > I don't have an answer for how you can access these settings from within
> the adapter nor how one could do that via RelNodes but the suggestion to
> use DataContext for that purpose sounds reasonable. Maybe someone else has
> an idea?
> >
> > Anyway, since these are settings that don't affect the general semantics
> of the query/statement and also usually require a special API to be used,
> I'd rather see these aspects not end up in the query string.
> >
> > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> >> Christian,
> >>
> >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> >> generally but have some reservations (questions) :
> >>
> >> How to pass resultSetType / fetchSize from PreparedStatement to
> RelNodes ?
> >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> >> <ch...@gmail.com> wrote:
> >>> In JDBC one can configure a fetch size which would reflect the amount
> of
> >>> rows to be fetched initially, but also subsequently.
> >>>
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> >>>
> >>> According to what you are writing, ES behvior is what
> >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that
> isn't
> >>> affected by changes.
> >>>
> >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> >>> ... and view R1, R2, then R3 is deleted and you fetch the next rows,
> you
> >>> wouldn't see R3.
> >>>
> >>> According to the JDBC spec
> >>> (
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> >>> ) you don't have to support all modes. Usually, user code doesn't use
> >>> scrolling that much, but at least forward scrolling makes sense.
> >>>
> >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> >>>> Hi Julian,
> >>>>
> >>>> Scrolling (in elastic) does not only mean “open a cursor” but also
> iterate
> >>>> over consistent snapshot. From docs:
> >>>>
> >>>> The results that are returned from a scroll request reflect the state
> of
> >>>> the index at the time that the initial search request was made, like a
> >>>> snapshot in time. Subsequent changes to documents (index, update or
> delete)
> >>>> will only affect later search requests.
> >>>>
> >>>> So pagination (fetch / offset) can’t exactly replicate this
> functionality.
> >>>>
> >>>> The problem with scrolling (in elastic) is that it is expensive and
> can’t
> >>>> (shouldn’t) be enabled it by default.
> >>>>
> >>>> There is one more “issue”. Currently select * from elastic returns at
> most
> >>>> 10 rows (in calcite). This is consistent with elastic behaviour which
> >>>> limits result set to 10 documents (unless size is specified). When
> >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> >>>> <
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> >
> >>>> or SQL hint) does it mean return whole elastic index ? I’m not at
> ease with
> >>>> returning different results based on hints or cursor settings.
> >>>>
> >>>> Andrei.
> >>>>
> >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com>
> wrote:
> >>>>
> >>>>> It seems to me that Elasticsearch scroll means return a cursor - a
> >>>>> collection of rows that you iterate over, and you may not read all
> of them.
> >>>>> This is the default operation of JDBC.
> >>>>>
> >>>>> So, I guess we need to give the user a way to signal their intent to
> read
> >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> >>>>> hints[1] seem close to this. We would want the hints to be acted
> upon by
> >>>>> both the optimizer and the JDBC transport.
> >>>>>
> >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> >>>>> retrieve different pieces of a large result set in separate
> statements or
> >>>>> (using query parameters) executions. It would be useful if the
> server could
> >>>>> be given a hint to cache a statement across page requests.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>>
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> >>>>>
> >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> >>>>> christian.beikov@gmail.com> wrote:
> >>>>>> Hey,
> >>>>>>
> >>>>>> not sure if this should be an SQL keyword. JDBC specifies various
> >>>>> constants that can be used at statement creation time:
> >>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >>>>>> Not sure though if or how these configurations are accessible for
> data
> >>>>> stores or dialects, but IMO using these would be the proper way.
> >>>>>> Regards
> >>>>>>
> >>>>>> Christian
> >>>>>>
> >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>>
> >>>>>>> I was thinking about adding [scrolling functionality](
> >>>>>>>
> >>>>>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> >>>>> )
> >>>>>>> to elastic search adapter. Since scrolling has non-negligible
> effect on
> >>>>> the
> >>>>>>> cluster it should be selectively enabled on per query basis. So,
> likely,
> >>>>>>> user has to explicitly set "scroll flag" somewhere.
> >>>>>>>
> >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> >>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> keyword
> >>>>>>> (unused to my knowledge). There were also discussions about adding
> >>>>> hints to
> >>>>>>> Calcite.
> >>>>>>>
> >>>>>>> ### Examples
> >>>>>>> ```sql
> >>>>>>> -- special sql keyword ?
> >>>>>>> SCROLL select * from elastic;
> >>>>>>>
> >>>>>>> -- assuming hints are available in calcite
> >>>>>>> /* HINT: scroll */ select * from elastic;
> >>>>>>> ```
> >>>>>>>
> >>>>>>> What people think about this use-case ? Are there better ideas ?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Andrei.
> >>>>>>>
>
>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Julian,

> Do you need to generate a different plan (i.e. a different tree of RelNodes) for scrolling vs non-scrolling?
The plan is same. I just need to construct a different ES query and
batching Enumerator

On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> wrote:
>
> Do you need to generate a different plan (i.e. a different tree of RelNodes) for scrolling vs non-scrolling? If so, it’s certainly inconvenient that you don’t know until execute time whether they want scrolling. A possible solution would be to generate TWO plans - one scrolling, one non-scrolling inside the prepared statement - and pick which one based on the runtime context.
>
> Julian
>
> > On Oct 25, 2018, at 1:42 AM, Christian Beikov <ch...@gmail.com> wrote:
> >
> > Hey Andrei,
> >
> > I don't have an answer for how you can access these settings from within the adapter nor how one could do that via RelNodes but the suggestion to use DataContext for that purpose sounds reasonable. Maybe someone else has an idea?
> >
> > Anyway, since these are settings that don't affect the general semantics of the query/statement and also usually require a special API to be used, I'd rather see these aspects not end up in the query string.
> >
> > Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> >> Christian,
> >>
> >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> >> generally but have some reservations (questions) :
> >>
> >> How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
> >> What if user doesn’t use JDBC (eg. RelBuilders) ?
> >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> >> <ch...@gmail.com> wrote:
> >>> In JDBC one can configure a fetch size which would reflect the amount of
> >>> rows to be fetched initially, but also subsequently.
> >>> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> >>>
> >>> According to what you are writing, ES behvior is what
> >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
> >>> affected by changes.
> >>>
> >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> >>> ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
> >>> wouldn't see R3.
> >>>
> >>> According to the JDBC spec
> >>> (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> >>> ) you don't have to support all modes. Usually, user code doesn't use
> >>> scrolling that much, but at least forward scrolling makes sense.
> >>>
> >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> >>>> Hi Julian,
> >>>>
> >>>> Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> >>>> over consistent snapshot. From docs:
> >>>>
> >>>> The results that are returned from a scroll request reflect the state of
> >>>> the index at the time that the initial search request was made, like a
> >>>> snapshot in time. Subsequent changes to documents (index, update or delete)
> >>>> will only affect later search requests.
> >>>>
> >>>> So pagination (fetch / offset) can’t exactly replicate this functionality.
> >>>>
> >>>> The problem with scrolling (in elastic) is that it is expensive and can’t
> >>>> (shouldn’t) be enabled it by default.
> >>>>
> >>>> There is one more “issue”. Currently select * from elastic returns at most
> >>>> 10 rows (in calcite). This is consistent with elastic behaviour which
> >>>> limits result set to 10 documents (unless size is specified). When
> >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> >>>> <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
> >>>> or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> >>>> returning different results based on hints or cursor settings.
> >>>>
> >>>> Andrei.
> >>>>
> >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:
> >>>>
> >>>>> It seems to me that Elasticsearch scroll means return a cursor - a
> >>>>> collection of rows that you iterate over, and you may not read all of them.
> >>>>> This is the default operation of JDBC.
> >>>>>
> >>>>> So, I guess we need to give the user a way to signal their intent to read
> >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> >>>>> hints[1] seem close to this. We would want the hints to be acted upon by
> >>>>> both the optimizer and the JDBC transport.
> >>>>>
> >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> >>>>> retrieve different pieces of a large result set in separate statements or
> >>>>> (using query parameters) executions. It would be useful if the server could
> >>>>> be given a hint to cache a statement across page requests.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> >>>>>
> >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> >>>>> christian.beikov@gmail.com> wrote:
> >>>>>> Hey,
> >>>>>>
> >>>>>> not sure if this should be an SQL keyword. JDBC specifies various
> >>>>> constants that can be used at statement creation time:
> >>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >>>>>> Not sure though if or how these configurations are accessible for data
> >>>>> stores or dialects, but IMO using these would be the proper way.
> >>>>>> Regards
> >>>>>>
> >>>>>> Christian
> >>>>>>
> >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>>
> >>>>>>> I was thinking about adding [scrolling functionality](
> >>>>>>>
> >>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> >>>>> )
> >>>>>>> to elastic search adapter. Since scrolling has non-negligible effect on
> >>>>> the
> >>>>>>> cluster it should be selectively enabled on per query basis. So, likely,
> >>>>>>> user has to explicitly set "scroll flag" somewhere.
> >>>>>>>
> >>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
> >>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> >>>>>>> (unused to my knowledge). There were also discussions about adding
> >>>>> hints to
> >>>>>>> Calcite.
> >>>>>>>
> >>>>>>> ### Examples
> >>>>>>> ```sql
> >>>>>>> -- special sql keyword ?
> >>>>>>> SCROLL select * from elastic;
> >>>>>>>
> >>>>>>> -- assuming hints are available in calcite
> >>>>>>> /* HINT: scroll */ select * from elastic;
> >>>>>>> ```
> >>>>>>>
> >>>>>>> What people think about this use-case ? Are there better ideas ?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Andrei.
> >>>>>>>
>

Re: scrolling in ElasticSearch adapter

Posted by Julian Hyde <jh...@apache.org>.
Do you need to generate a different plan (i.e. a different tree of RelNodes) for scrolling vs non-scrolling? If so, it’s certainly inconvenient that you don’t know until execute time whether they want scrolling. A possible solution would be to generate TWO plans - one scrolling, one non-scrolling inside the prepared statement - and pick which one based on the runtime context.

Julian

> On Oct 25, 2018, at 1:42 AM, Christian Beikov <ch...@gmail.com> wrote:
> 
> Hey Andrei,
> 
> I don't have an answer for how you can access these settings from within the adapter nor how one could do that via RelNodes but the suggestion to use DataContext for that purpose sounds reasonable. Maybe someone else has an idea?
> 
> Anyway, since these are settings that don't affect the general semantics of the query/statement and also usually require a special API to be used, I'd rather see these aspects not end up in the query string.
> 
> Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
>> Christian,
>> 
>> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
>> generally but have some reservations (questions) :
>> 
>> How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
>> What if user doesn’t use JDBC (eg. RelBuilders) ?
>> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
>> <ch...@gmail.com> wrote:
>>> In JDBC one can configure a fetch size which would reflect the amount of
>>> rows to be fetched initially, but also subsequently.
>>> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
>>> 
>>> According to what you are writing, ES behvior is what
>>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
>>> affected by changes.
>>> 
>>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
>>> ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
>>> wouldn't see R3.
>>> 
>>> According to the JDBC spec
>>> (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
>>> ) you don't have to support all modes. Usually, user code doesn't use
>>> scrolling that much, but at least forward scrolling makes sense.
>>> 
>>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
>>>> Hi Julian,
>>>> 
>>>> Scrolling (in elastic) does not only mean “open a cursor” but also iterate
>>>> over consistent snapshot. From docs:
>>>> 
>>>> The results that are returned from a scroll request reflect the state of
>>>> the index at the time that the initial search request was made, like a
>>>> snapshot in time. Subsequent changes to documents (index, update or delete)
>>>> will only affect later search requests.
>>>> 
>>>> So pagination (fetch / offset) can’t exactly replicate this functionality.
>>>> 
>>>> The problem with scrolling (in elastic) is that it is expensive and can’t
>>>> (shouldn’t) be enabled it by default.
>>>> 
>>>> There is one more “issue”. Currently select * from elastic returns at most
>>>> 10 rows (in calcite). This is consistent with elastic behaviour which
>>>> limits result set to 10 documents (unless size is specified). When
>>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
>>>> <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
>>>> or SQL hint) does it mean return whole elastic index ? I’m not at ease with
>>>> returning different results based on hints or cursor settings.
>>>> 
>>>> Andrei.
>>>> 
>>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:
>>>> 
>>>>> It seems to me that Elasticsearch scroll means return a cursor - a
>>>>> collection of rows that you iterate over, and you may not read all of them.
>>>>> This is the default operation of JDBC.
>>>>> 
>>>>> So, I guess we need to give the user a way to signal their intent to read
>>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
>>>>> hints[1] seem close to this. We would want the hints to be acted upon by
>>>>> both the optimizer and the JDBC transport.
>>>>> 
>>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
>>>>> retrieve different pieces of a large result set in separate statements or
>>>>> (using query parameters) executions. It would be useful if the server could
>>>>> be given a hint to cache a statement across page requests.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>>>>> 
>>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
>>>>> christian.beikov@gmail.com> wrote:
>>>>>> Hey,
>>>>>> 
>>>>>> not sure if this should be an SQL keyword. JDBC specifies various
>>>>> constants that can be used at statement creation time:
>>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
>>>>>> Not sure though if or how these configurations are accessible for data
>>>>> stores or dialects, but IMO using these would be the proper way.
>>>>>> Regards
>>>>>> 
>>>>>> Christian
>>>>>> 
>>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> 
>>>>>>> I was thinking about adding [scrolling functionality](
>>>>>>> 
>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>>>>> )
>>>>>>> to elastic search adapter. Since scrolling has non-negligible effect on
>>>>> the
>>>>>>> cluster it should be selectively enabled on per query basis. So, likely,
>>>>>>> user has to explicitly set "scroll flag" somewhere.
>>>>>>> 
>>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
>>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
>>>>>>> (unused to my knowledge). There were also discussions about adding
>>>>> hints to
>>>>>>> Calcite.
>>>>>>> 
>>>>>>> ### Examples
>>>>>>> ```sql
>>>>>>> -- special sql keyword ?
>>>>>>> SCROLL select * from elastic;
>>>>>>> 
>>>>>>> -- assuming hints are available in calcite
>>>>>>> /* HINT: scroll */ select * from elastic;
>>>>>>> ```
>>>>>>> 
>>>>>>> What people think about this use-case ? Are there better ideas ?
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Andrei.
>>>>>>> 


Re: scrolling in ElasticSearch adapter

Posted by Christian Beikov <ch...@gmail.com>.
Hey Andrei,

I don't have an answer for how you can access these settings from within 
the adapter nor how one could do that via RelNodes but the suggestion to 
use DataContext for that purpose sounds reasonable. Maybe someone else 
has an idea?

Anyway, since these are settings that don't affect the general semantics 
of the query/statement and also usually require a special API to be 
used, I'd rather see these aspects not end up in the query string.

Am 25.10.2018 um 02:15 schrieb Andrei Sereda:
> Christian,
>
> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> generally but have some reservations (questions) :
>
> How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
> What if user doesn’t use JDBC (eg. RelBuilders) ?
> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> <ch...@gmail.com> wrote:
>> In JDBC one can configure a fetch size which would reflect the amount of
>> rows to be fetched initially, but also subsequently.
>> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
>>
>> According to what you are writing, ES behvior is what
>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
>> affected by changes.
>>
>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
>> ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
>> wouldn't see R3.
>>
>> According to the JDBC spec
>> (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
>> ) you don't have to support all modes. Usually, user code doesn't use
>> scrolling that much, but at least forward scrolling makes sense.
>>
>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
>>> Hi Julian,
>>>
>>> Scrolling (in elastic) does not only mean “open a cursor” but also iterate
>>> over consistent snapshot. From docs:
>>>
>>> The results that are returned from a scroll request reflect the state of
>>> the index at the time that the initial search request was made, like a
>>> snapshot in time. Subsequent changes to documents (index, update or delete)
>>> will only affect later search requests.
>>>
>>> So pagination (fetch / offset) can’t exactly replicate this functionality.
>>>
>>> The problem with scrolling (in elastic) is that it is expensive and can’t
>>> (shouldn’t) be enabled it by default.
>>>
>>> There is one more “issue”. Currently select * from elastic returns at most
>>> 10 rows (in calcite). This is consistent with elastic behaviour which
>>> limits result set to 10 documents (unless size is specified). When
>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
>>> <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
>>> or SQL hint) does it mean return whole elastic index ? I’m not at ease with
>>> returning different results based on hints or cursor settings.
>>>
>>> Andrei.
>>>
>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:
>>>
>>>> It seems to me that Elasticsearch scroll means return a cursor - a
>>>> collection of rows that you iterate over, and you may not read all of them.
>>>> This is the default operation of JDBC.
>>>>
>>>> So, I guess we need to give the user a way to signal their intent to read
>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
>>>> hints[1] seem close to this. We would want the hints to be acted upon by
>>>> both the optimizer and the JDBC transport.
>>>>
>>>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
>>>> retrieve different pieces of a large result set in separate statements or
>>>> (using query parameters) executions. It would be useful if the server could
>>>> be given a hint to cache a statement across page requests.
>>>>
>>>> Julian
>>>>
>>>> [1]
>>>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>>>>
>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
>>>> christian.beikov@gmail.com> wrote:
>>>>> Hey,
>>>>>
>>>>> not sure if this should be an SQL keyword. JDBC specifies various
>>>> constants that can be used at statement creation time:
>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
>>>>> Not sure though if or how these configurations are accessible for data
>>>> stores or dialects, but IMO using these would be the proper way.
>>>>> Regards
>>>>>
>>>>> Christian
>>>>>
>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
>>>>>> Hello,
>>>>>>
>>>>>>
>>>>>> I was thinking about adding [scrolling functionality](
>>>>>>
>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>>>> )
>>>>>> to elastic search adapter. Since scrolling has non-negligible effect on
>>>> the
>>>>>> cluster it should be selectively enabled on per query basis. So, likely,
>>>>>> user has to explicitly set "scroll flag" somewhere.
>>>>>>
>>>>>> Most natural way seems in SQL. [Calcite sql grammar](
>>>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
>>>>>> (unused to my knowledge). There were also discussions about adding
>>>> hints to
>>>>>> Calcite.
>>>>>>
>>>>>> ### Examples
>>>>>> ```sql
>>>>>> -- special sql keyword ?
>>>>>> SCROLL select * from elastic;
>>>>>>
>>>>>> -- assuming hints are available in calcite
>>>>>> /* HINT: scroll */ select * from elastic;
>>>>>> ```
>>>>>>
>>>>>> What people think about this use-case ? Are there better ideas ?
>>>>>>
>>>>>> Regards,
>>>>>> Andrei.
>>>>>>

Re: scrolling in ElasticSearch adapter

Posted by Stamatis Zampetakis <za...@gmail.com>.
For the sake of the discussion, I outline below a few ideas which may be of
some use.

Andrei>..elastic doesn't natively supports transactions. Scrollable results
are "serializable" only for single query not for multiple (which might
confuse users using transactions on the client side).

One option is to implement the transaction mechanism on the layer above. I
am not sure how Elastic works but possibly you could store various
information like IndexSearchers snapshots in a (new) subclass of
CalciteConnectionImpl and then pass it along using an appropriate
DataContext. This looks like a lot of work and I guess is out of the scope
of a simple adapter.

On the other hand, you could possibly rely on the semantics of
con.setAutoCommit(false)
method of JDBC (a statement is completed, and therefore committed, right
after it is executed) and indicate that setting con.setAutoCommit(true) is
not supported.

Andrei>I tend to agree, but pagination doesn't work for all ES aggregations

In principle, you could decide not to push all agregations into Elastic and
let them in the EnumerableConvention (or another convention). It would be
less performant I suppose but possibly cleaner in terms of semantics.

Andrei>How to pass resultSetType / fetchSize from PreparedStatement to
RelNodes ? What if user doesn’t use JDBC (eg. RelBuilders)

The first thing that comes in mind is to exploit the DataContext but I
don't know if this is convenient for you.




Στις Πέμ, 25 Οκτ 2018 στις 2:15 π.μ., ο/η Andrei Sereda <an...@sereda.cc>
έγραψε:

> Christian,
>
> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> generally but have some reservations (questions) :
>
> How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
> What if user doesn’t use JDBC (eg. RelBuilders) ?
> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> <ch...@gmail.com> wrote:
> >
> > In JDBC one can configure a fetch size which would reflect the amount of
> > rows to be fetched initially, but also subsequently.
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> >
> > According to what you are writing, ES behvior is what
> > TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
> > affected by changes.
> >
> > IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> > ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
> > wouldn't see R3.
> >
> > According to the JDBC spec
> > (
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > ) you don't have to support all modes. Usually, user code doesn't use
> > scrolling that much, but at least forward scrolling makes sense.
> >
> > Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > Hi Julian,
> > >
> > > Scrolling (in elastic) does not only mean “open a cursor” but also
> iterate
> > > over consistent snapshot. From docs:
> > >
> > > The results that are returned from a scroll request reflect the state
> of
> > > the index at the time that the initial search request was made, like a
> > > snapshot in time. Subsequent changes to documents (index, update or
> delete)
> > > will only affect later search requests.
> > >
> > > So pagination (fetch / offset) can’t exactly replicate this
> functionality.
> > >
> > > The problem with scrolling (in elastic) is that it is expensive and
> can’t
> > > (shouldn’t) be enabled it by default.
> > >
> > > There is one more “issue”. Currently select * from elastic returns at
> most
> > > 10 rows (in calcite). This is consistent with elastic behaviour which
> > > limits result set to 10 documents (unless size is specified). When
> > > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > <
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> >
> > > or SQL hint) does it mean return whole elastic index ? I’m not at ease
> with
> > > returning different results based on hints or cursor settings.
> > >
> > > Andrei.
> > >
> > > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com>
> wrote:
> > >
> > >> It seems to me that Elasticsearch scroll means return a cursor - a
> > >> collection of rows that you iterate over, and you may not read all of
> them.
> > >> This is the default operation of JDBC.
> > >>
> > >> So, I guess we need to give the user a way to signal their intent to
> read
> > >> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > >> hints[1] seem close to this. We would want the hints to be acted upon
> by
> > >> both the optimizer and the JDBC transport.
> > >>
> > >> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > >> retrieve different pieces of a large result set in separate
> statements or
> > >> (using query parameters) executions. It would be useful if the server
> could
> > >> be given a hint to cache a statement across page requests.
> > >>
> > >> Julian
> > >>
> > >> [1]
> > >>
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > >>
> > >>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > >> christian.beikov@gmail.com> wrote:
> > >>> Hey,
> > >>>
> > >>> not sure if this should be an SQL keyword. JDBC specifies various
> > >> constants that can be used at statement creation time:
> > >> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > >>> Not sure though if or how these configurations are accessible for
> data
> > >> stores or dialects, but IMO using these would be the proper way.
> > >>> Regards
> > >>>
> > >>> Christian
> > >>>
> > >>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > >>>> Hello,
> > >>>>
> > >>>>
> > >>>> I was thinking about adding [scrolling functionality](
> > >>>>
> > >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > >> )
> > >>>> to elastic search adapter. Since scrolling has non-negligible
> effect on
> > >> the
> > >>>> cluster it should be selectively enabled on per query basis. So,
> likely,
> > >>>> user has to explicitly set "scroll flag" somewhere.
> > >>>>
> > >>>> Most natural way seems in SQL. [Calcite sql grammar](
> > >>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> keyword
> > >>>> (unused to my knowledge). There were also discussions about adding
> > >> hints to
> > >>>> Calcite.
> > >>>>
> > >>>> ### Examples
> > >>>> ```sql
> > >>>> -- special sql keyword ?
> > >>>> SCROLL select * from elastic;
> > >>>>
> > >>>> -- assuming hints are available in calcite
> > >>>> /* HINT: scroll */ select * from elastic;
> > >>>> ```
> > >>>>
> > >>>> What people think about this use-case ? Are there better ideas ?
> > >>>>
> > >>>> Regards,
> > >>>> Andrei.
> > >>>>
>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Christian,

I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
generally but have some reservations (questions) :

How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
What if user doesn’t use JDBC (eg. RelBuilders) ?
On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
<ch...@gmail.com> wrote:
>
> In JDBC one can configure a fetch size which would reflect the amount of
> rows to be fetched initially, but also subsequently.
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
>
> According to what you are writing, ES behvior is what
> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
> affected by changes.
>
> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
> wouldn't see R3.
>
> According to the JDBC spec
> (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> ) you don't have to support all modes. Usually, user code doesn't use
> scrolling that much, but at least forward scrolling makes sense.
>
> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > Hi Julian,
> >
> > Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> > over consistent snapshot. From docs:
> >
> > The results that are returned from a scroll request reflect the state of
> > the index at the time that the initial search request was made, like a
> > snapshot in time. Subsequent changes to documents (index, update or delete)
> > will only affect later search requests.
> >
> > So pagination (fetch / offset) can’t exactly replicate this functionality.
> >
> > The problem with scrolling (in elastic) is that it is expensive and can’t
> > (shouldn’t) be enabled it by default.
> >
> > There is one more “issue”. Currently select * from elastic returns at most
> > 10 rows (in calcite). This is consistent with elastic behaviour which
> > limits result set to 10 documents (unless size is specified). When
> > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
> > or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> > returning different results based on hints or cursor settings.
> >
> > Andrei.
> >
> > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:
> >
> >> It seems to me that Elasticsearch scroll means return a cursor - a
> >> collection of rows that you iterate over, and you may not read all of them.
> >> This is the default operation of JDBC.
> >>
> >> So, I guess we need to give the user a way to signal their intent to read
> >> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> >> hints[1] seem close to this. We would want the hints to be acted upon by
> >> both the optimizer and the JDBC transport.
> >>
> >> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> >> retrieve different pieces of a large result set in separate statements or
> >> (using query parameters) executions. It would be useful if the server could
> >> be given a hint to cache a statement across page requests.
> >>
> >> Julian
> >>
> >> [1]
> >> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> >>
> >>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> >> christian.beikov@gmail.com> wrote:
> >>> Hey,
> >>>
> >>> not sure if this should be an SQL keyword. JDBC specifies various
> >> constants that can be used at statement creation time:
> >> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >>> Not sure though if or how these configurations are accessible for data
> >> stores or dialects, but IMO using these would be the proper way.
> >>> Regards
> >>>
> >>> Christian
> >>>
> >>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >>>> Hello,
> >>>>
> >>>>
> >>>> I was thinking about adding [scrolling functionality](
> >>>>
> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> >> )
> >>>> to elastic search adapter. Since scrolling has non-negligible effect on
> >> the
> >>>> cluster it should be selectively enabled on per query basis. So, likely,
> >>>> user has to explicitly set "scroll flag" somewhere.
> >>>>
> >>>> Most natural way seems in SQL. [Calcite sql grammar](
> >>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> >>>> (unused to my knowledge). There were also discussions about adding
> >> hints to
> >>>> Calcite.
> >>>>
> >>>> ### Examples
> >>>> ```sql
> >>>> -- special sql keyword ?
> >>>> SCROLL select * from elastic;
> >>>>
> >>>> -- assuming hints are available in calcite
> >>>> /* HINT: scroll */ select * from elastic;
> >>>> ```
> >>>>
> >>>> What people think about this use-case ? Are there better ideas ?
> >>>>
> >>>> Regards,
> >>>> Andrei.
> >>>>

Re: scrolling in ElasticSearch adapter

Posted by Christian Beikov <ch...@gmail.com>.
In JDBC one can configure a fetch size which would reflect the amount of 
rows to be fetched initially, but also subsequently. 
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)

According to what you are writing, ES behvior is what 
TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't 
affected by changes.

IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4, 
... and view R1, R2, then R3 is deleted and you fetch the next rows, you 
wouldn't see R3.

According to the JDBC spec 
(https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) 
) you don't have to support all modes. Usually, user code doesn't use 
scrolling that much, but at least forward scrolling makes sense.

Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> Hi Julian,
>
> Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> over consistent snapshot. From docs:
>
> The results that are returned from a scroll request reflect the state of
> the index at the time that the initial search request was made, like a
> snapshot in time. Subsequent changes to documents (index, update or delete)
> will only affect later search requests.
>
> So pagination (fetch / offset) can’t exactly replicate this functionality.
>
> The problem with scrolling (in elastic) is that it is expensive and can’t
> (shouldn’t) be enabled it by default.
>
> There is one more “issue”. Currently select * from elastic returns at most
> 10 rows (in calcite). This is consistent with elastic behaviour which
> limits result set to 10 documents (unless size is specified). When
> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
> or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> returning different results based on hints or cursor settings.
>
> Andrei.
>
> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:
>
>> It seems to me that Elasticsearch scroll means return a cursor - a
>> collection of rows that you iterate over, and you may not read all of them.
>> This is the default operation of JDBC.
>>
>> So, I guess we need to give the user a way to signal their intent to read
>> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
>> hints[1] seem close to this. We would want the hints to be acted upon by
>> both the optimizer and the JDBC transport.
>>
>> Related is pagination. SQL has FETCH and OFFSET, which allow you to
>> retrieve different pieces of a large result set in separate statements or
>> (using query parameters) executions. It would be useful if the server could
>> be given a hint to cache a statement across page requests.
>>
>> Julian
>>
>> [1]
>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>>
>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
>> christian.beikov@gmail.com> wrote:
>>> Hey,
>>>
>>> not sure if this should be an SQL keyword. JDBC specifies various
>> constants that can be used at statement creation time:
>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
>>> Not sure though if or how these configurations are accessible for data
>> stores or dialects, but IMO using these would be the proper way.
>>> Regards
>>>
>>> Christian
>>>
>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
>>>> Hello,
>>>>
>>>>
>>>> I was thinking about adding [scrolling functionality](
>>>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
>> )
>>>> to elastic search adapter. Since scrolling has non-negligible effect on
>> the
>>>> cluster it should be selectively enabled on per query basis. So, likely,
>>>> user has to explicitly set "scroll flag" somewhere.
>>>>
>>>> Most natural way seems in SQL. [Calcite sql grammar](
>>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
>>>> (unused to my knowledge). There were also discussions about adding
>> hints to
>>>> Calcite.
>>>>
>>>> ### Examples
>>>> ```sql
>>>> -- special sql keyword ?
>>>> SCROLL select * from elastic;
>>>>
>>>> -- assuming hints are available in calcite
>>>> /* HINT: scroll */ select * from elastic;
>>>> ```
>>>>
>>>> What people think about this use-case ? Are there better ideas ?
>>>>
>>>> Regards,
>>>> Andrei.
>>>>

Re: scrolling in ElasticSearch adapter

Posted by Andrei Sereda <an...@sereda.cc>.
Hi Julian,

Scrolling (in elastic) does not only mean “open a cursor” but also iterate
over consistent snapshot. From docs:

The results that are returned from a scroll request reflect the state of
the index at the time that the initial search request was made, like a
snapshot in time. Subsequent changes to documents (index, update or delete)
will only affect later search requests.

So pagination (fetch / offset) can’t exactly replicate this functionality.

The problem with scrolling (in elastic) is that it is expensive and can’t
(shouldn’t) be enabled it by default.

There is one more “issue”. Currently select * from elastic returns at most
10 rows (in calcite). This is consistent with elastic behaviour which
limits result set to 10 documents (unless size is specified). When
returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
<https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
or SQL hint) does it mean return whole elastic index ? I’m not at ease with
returning different results based on hints or cursor settings.

Andrei.

On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jh...@gmail.com> wrote:

> It seems to me that Elasticsearch scroll means return a cursor - a
> collection of rows that you iterate over, and you may not read all of them.
> This is the default operation of JDBC.
>
> So, I guess we need to give the user a way to signal their intent to read
> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> hints[1] seem close to this. We would want the hints to be acted upon by
> both the optimizer and the JDBC transport.
>
> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> retrieve different pieces of a large result set in separate statements or
> (using query parameters) executions. It would be useful if the server could
> be given a hint to cache a statement across page requests.
>
> Julian
>
> [1]
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>
> > On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> christian.beikov@gmail.com> wrote:
> >
> > Hey,
> >
> > not sure if this should be an SQL keyword. JDBC specifies various
> constants that can be used at statement creation time:
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >
> > Not sure though if or how these configurations are accessible for data
> stores or dialects, but IMO using these would be the proper way.
> >
> > Regards
> >
> > Christian
> >
> >> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >> Hello,
> >>
> >>
> >> I was thinking about adding [scrolling functionality](
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> )
> >> to elastic search adapter. Since scrolling has non-negligible effect on
> the
> >> cluster it should be selectively enabled on per query basis. So, likely,
> >> user has to explicitly set "scroll flag" somewhere.
> >>
> >> Most natural way seems in SQL. [Calcite sql grammar](
> >> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> >> (unused to my knowledge). There were also discussions about adding
> hints to
> >> Calcite.
> >>
> >> ### Examples
> >> ```sql
> >> -- special sql keyword ?
> >> SCROLL select * from elastic;
> >>
> >> -- assuming hints are available in calcite
> >> /* HINT: scroll */ select * from elastic;
> >> ```
> >>
> >> What people think about this use-case ? Are there better ideas ?
> >>
> >> Regards,
> >> Andrei.
> >>
>

Re: scrolling in ElasticSearch adapter

Posted by Julian Hyde <jh...@gmail.com>.
It seems to me that Elasticsearch scroll means return a cursor - a collection of rows that you iterate over, and you may not read all of them. This is the default operation of JDBC. 

So, I guess we need to give the user a way to signal their intent to read all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS hints[1] seem close to this. We would want the hints to be acted upon by both the optimizer and the JDBC transport.

Related is pagination. SQL has FETCH and OFFSET, which allow you to retrieve different pieces of a large result set in separate statements or (using query parameters) executions. It would be useful if the server could be given a hint to cache a statement across page requests.

Julian

[1] https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924

> On Oct 24, 2018, at 11:19 AM, Christian Beikov <ch...@gmail.com> wrote:
> 
> Hey,
> 
> not sure if this should be an SQL keyword. JDBC specifies various constants that can be used at statement creation time: https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> 
> Not sure though if or how these configurations are accessible for data stores or dialects, but IMO using these would be the proper way.
> 
> Regards
> 
> Christian
> 
>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
>> Hello,
>> 
>> 
>> I was thinking about adding [scrolling functionality](
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html)
>> to elastic search adapter. Since scrolling has non-negligible effect on the
>> cluster it should be selectively enabled on per query basis. So, likely,
>> user has to explicitly set "scroll flag" somewhere.
>> 
>> Most natural way seems in SQL. [Calcite sql grammar](
>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
>> (unused to my knowledge). There were also discussions about adding hints to
>> Calcite.
>> 
>> ### Examples
>> ```sql
>> -- special sql keyword ?
>> SCROLL select * from elastic;
>> 
>> -- assuming hints are available in calcite
>> /* HINT: scroll */ select * from elastic;
>> ```
>> 
>> What people think about this use-case ? Are there better ideas ?
>> 
>> Regards,
>> Andrei.
>> 

Re: scrolling in ElasticSearch adapter

Posted by Christian Beikov <ch...@gmail.com>.
Hey,

not sure if this should be an SQL keyword. JDBC specifies various 
constants that can be used at statement creation time: 
https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html

Not sure though if or how these configurations are accessible for data 
stores or dialects, but IMO using these would be the proper way.

Regards

Christian

Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> Hello,
>
>
> I was thinking about adding [scrolling functionality](
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html)
> to elastic search adapter. Since scrolling has non-negligible effect on the
> cluster it should be selectively enabled on per query basis. So, likely,
> user has to explicitly set "scroll flag" somewhere.
>
> Most natural way seems in SQL. [Calcite sql grammar](
> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> (unused to my knowledge). There were also discussions about adding hints to
> Calcite.
>
> ### Examples
> ```sql
> -- special sql keyword ?
> SCROLL select * from elastic;
>
> -- assuming hints are available in calcite
> /* HINT: scroll */ select * from elastic;
> ```
>
> What people think about this use-case ? Are there better ideas ?
>
> Regards,
> Andrei.
>