You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Andrei Sereda <an...@sereda.cc> on 2018/12/22 00:14:09 UTC

ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Hello,

Upon indexing, elastic allocates a unique _id
<https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html>
for each document (unless specified by the user). Currently when mapping
between elastic result and calcite return type we query only _source or
fields attributes. _id is not part those but exposed at higher level in
query response (see SearchHit
<https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html>
).

Currently  _MAP['foo'] becomes _source.get('foo') (or fields.get('foo')).

Do you think we should make _MAP['_id'] available implicitly ?

I have a couple of use-cases where one needs to know document ID.

Please note this makes sense only for non-aggregate queries.

Regards,

Andrei.

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Andrei Sereda <an...@sereda.cc>.

Thanks, Stamatis. I wanted to confirm that adding one more mapping is not
redundant (this information might have existed somewhere).

I am not sure if you really need to keep the mapping between EXPR and ITEM
or just the field names

Do you mean having a List instead of a Map ? Currently, I’m doing the
latter. The mapping is between String and String.

/**
      * Keeps mapping between calcite expression identifier (like
{@code EXPR$0}) and
      * original item call like {@code _MAP['foo.bar']} ({@code
foo.bar} really).
      * This information otherwise might be lost during query translation.
      *
      * @see SqlStdOperatorTable#ITEM
      */
     final List<Map.Entry<String, String>> expressionItemMap = new
ArrayList<>();


On Mon, Dec 31, 2018 at 9:28 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> I would say that the plan you provided is the physical plan but this does
> not change the point that
> I was trying to make, which is that you have all the necessary information
> in the plan.
>
> Andrei>Does it mean one has to manually keep mapping between EXPR and ITEM
> somewhere ?
>
> I would say yes. I am not sure if you really need to keep the mapping
> between EXPR and ITEM or just the field names but certainly it is up to you
> to store the information that you need somewhere.
> The right place I guess would
> be org.apache.calcite.adapter.elasticsearch.ElasticsearchRel.Implementor.
> Note
> that you could make also Implementor#visitChild method return a result
> instead of being void and add there necessary information.
>
> Andrei>Is there a better way ?
>
> An alternative solution (not sure if it is better) would be to put
> necessary information in the Elastic operators when moving from the logical
> plan to the physical plan by having custom rules. This would require
> possibly many changes so I guess in the end it is isn't worth it.
>
> Andrei>In Sort Rel (ElasticsearchSort). How to know that sort0=[$2]
> (EXPR$2) is sort on ITEM($0, 'c') ?
>
> Sorting is position based so you just have to know that field c is in
> position $2. Again you can exploit information that you added in
> Implementor.
>
>
>
> Στις Σάβ, 29 Δεκ 2018 στις 7:00 μ.μ., ο/η Andrei Sereda <an...@sereda.cc>
> έγραψε:
>
> > Let’s take an example of a simple query:
> >
> > -- note _id fieldselect  _MAP['a'], _MAP['_id']  from elastic order by
> > _MAP['c']
> >
> > Logical plan is as follows:
> >
> > ElasticsearchToEnumerableConverter
> >   ElasticsearchSort(sort0=[$2], dir0=[ASC])
> >     ElasticsearchProject(EXPR$0=[ITEM($0, 'a')], EXPR$1=[ITEM($0,
> > '_id')], EXPR$2=[ITEM($0, 'c')])
> >       ElasticsearchTableScan(table=[[elastic, foo]])
> >
> > Questions
> >
> >    1. In ElasticsearchEnumerators
> >    <
> >
> https://github.com/asereda-gs/calcite/blob/elastic-idfield/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchEnumerators.java#L46
> > >
> >    I need to know if requested field is a meta field (_id) or not (to
> >    correctly fetch it from payload). Does it mean one has to manually
> keep
> >    mapping between EXPR and ITEM somewhere ? Is there a better way ?
> >    2. In Sort Rel (ElasticsearchSort). How to know that sort0=[$2]
> (EXPR$2)
> >    is sort on ITEM($0, 'c') ? Mapping above (1) will help.
> >
> > Thanks for your help.
> >
> > On Sat, Dec 29, 2018 at 3:08 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> > > I think a concrete example of a query with its respective logical plan
> > > could clarify a few things. I would expect that the logical plan
> contains
> > > all the necessary information to decide which fields need to be fetched
> > > from Elastic without requiring additional aliases. If that information
> is
> > > lost when passing to the physical plan then we could possibly add some
> > > rules and/or operators to ensure this doesn't happen.
> > >
> > > Στις Παρ, 28 Δεκ 2018 στις 11:37 μ.μ., ο/η Andrei Sereda
> > <andrei@sereda.cc
> > > >
> > > έγραψε:
> > >
> > > > Other than that it seems to me _MAP[‘_id’] will become a RexCall
> > similar
> > > to
> > > > ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field
> > name
> > > by
> > > > implementing a RexVisitor.
> > > >
> > > > Yes. That’s what I’m doing (see isItem
> > > > <
> > > >
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85
> > > > >
> > > > )
> > > >
> > > > I don’t see clearly why you need to set explicitly aliases but
> probably
> > > is
> > > > because I am missing implementation details related with the adapter
> > > > itself.
> > > >
> > > > For physical plan I get a list of expressions / projections (via
> > > > RelDataType
> > > > / getRowType) but at some point I need to know if (for example)
> EXPR$0
> > ==
> > > > ITEM($0, '_id') (_id is a system field and needs to be fetched
> > > differently
> > > > from payload). RelDataType doesn’t have item information.
> > > >
> > > > That’s the only reason I’m manually saving this mapping. Perhaps
> there
> > > is a
> > > > better way to do it ?
> > > >
> > > > On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <
> zabetak@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Andrei,
> > > > >
> > > > > I tried to have a look in the PR but I am not familiar with the
> > Elastic
> > > > > adapter code so I am missing a lot of context.
> > > > >
> > > > > In general, I think that relying on the on the field names coming
> > from
> > > a
> > > > > RelDataType it is not safe. As you observed, it can get messed up
> > quite
> > > > > easily from many parts of Calcite.
> > > > > It is usually better to try to reason using the position and/or the
> > > type
> > > > of
> > > > > the field rather than its name.
> > > > >
> > > > > Other than that it seems to me _MAP['_id'] will become a RexCall
> > > similar
> > > > to
> > > > > ITEM($0, '_ID') so it appears to me that you can retrieve the field
> > > name
> > > > by
> > > > > implementing a RexVisitor.
> > > > > I don't see clearly why you need to set explicitly aliases but
> > probably
> > > > is
> > > > > because I am missing implementation details related with the
> adapter
> > > > > itself.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <
> > > jhyde@apache.org>
> > > > > έγραψε:
> > > > >
> > > > > > I added comments to the case. I did not look at the PR, sorry.
> > > > > >
> > > > > > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc>
> > > wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Can somebody pls take a look at the following PR:
> > > > > > >
> > > > > > > PR: https://github.com/apache/calcite/pull/982
> > > > > > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > > > > > >
> > > > > > > Specifically, I would like to understand if adding explicit
> > mapping
> > > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > > > > > >
> > > > > > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a
> > > > correct
> > > > > > > (and recommended) approach. The reason I need it is because the
> > > > > original
> > > > > > > item name (eg. `_MAP['foo.bar']`) is lost after converter
> > operation
> > > > (ie
> > > > > > I'm
> > > > > > > not able to derive it from `input.getRowType()`)
> > > > > > >
> > > > > > > ```sql
> > > > > > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id'
> and
> > > > > > 'foo.bar'
> > > > > > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > > > > > >
> > > > > > > -- Query below preserves attribute information
> > > > > > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar']
> as
> > > > > > > \"foo.bar\" from elastic)
> > > > > > > ```
> > > > > > >
> > > > > > > Many thanks,
> > > > > > > Andrei.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <
> > > > zabetak@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Andrei,
> > > > > > >>
> > > > > > >> Many data management systems have internal columns which the
> > users
> > > > may
> > > > > > be
> > > > > > >> able to query (see for example Postgres [1]). It doesn't sound
> > > > > > unnatural to
> > > > > > >> allow users access these fields. On the other hand such system
> > > > columns
> > > > > > do
> > > > > > >> not appear in SELECT * queries and I suppose they are pruned
> out
> > > by
> > > > > > >> projections early on if they are not used by the query. I
> guess
> > > you
> > > > > > could
> > > > > > >> take a similar approach in the ElasticSearch adapter.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Stamatis
> > > > > > >>
> > > > > > >> [1]
> > > https://www.postgresql.org/docs/current/ddl-system-columns.html
> > > > > > >>
> > > > > > >>
> > > > > > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > > > > > <an...@sereda.cc>
> > > > > > >> έγραψε:
> > > > > > >>
> > > > > > >>> Hello,
> > > > > > >>>
> > > > > > >>> Upon indexing, elastic allocates a unique _id
> > > > > > >>> <
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > > > > > >>>>
> > > > > > >>> for each document (unless specified by the user). Currently
> > when
> > > > > > mapping
> > > > > > >>> between elastic result and calcite return type we query only
> > > > _source
> > > > > or
> > > > > > >>> fields attributes. _id is not part those but exposed at
> higher
> > > > level
> > > > > in
> > > > > > >>> query response (see SearchHit
> > > > > > >>> <
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > > > > > >>>>
> > > > > > >>> ).
> > > > > > >>>
> > > > > > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > > > > > fields.get('foo')).
> > > > > > >>>
> > > > > > >>> Do you think we should make _MAP['_id'] available implicitly
> ?
> > > > > > >>>
> > > > > > >>> I have a couple of use-cases where one needs to know document
> > ID.
> > > > > > >>>
> > > > > > >>> Please note this makes sense only for non-aggregate queries.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>>
> > > > > > >>> Andrei.
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Stamatis Zampetakis <za...@gmail.com>.

I would say that the plan you provided is the physical plan but this does
not change the point that
I was trying to make, which is that you have all the necessary information
in the plan.

Andrei>Does it mean one has to manually keep mapping between EXPR and ITEM
somewhere ?

I would say yes. I am not sure if you really need to keep the mapping
between EXPR and ITEM or just the field names but certainly it is up to you
to store the information that you need somewhere.
The right place I guess would
be org.apache.calcite.adapter.elasticsearch.ElasticsearchRel.Implementor.  Note
that you could make also Implementor#visitChild method return a result
instead of being void and add there necessary information.

Andrei>Is there a better way ?

An alternative solution (not sure if it is better) would be to put
necessary information in the Elastic operators when moving from the logical
plan to the physical plan by having custom rules. This would require
possibly many changes so I guess in the end it is isn't worth it.

Andrei>In Sort Rel (ElasticsearchSort). How to know that sort0=[$2]
(EXPR$2) is sort on ITEM($0, 'c') ?

Sorting is position based so you just have to know that field c is in
position $2. Again you can exploit information that you added in
Implementor.



Στις Σάβ, 29 Δεκ 2018 στις 7:00 μ.μ., ο/η Andrei Sereda <an...@sereda.cc>
έγραψε:

> Let’s take an example of a simple query:
>
> -- note _id fieldselect  _MAP['a'], _MAP['_id']  from elastic order by
> _MAP['c']
>
> Logical plan is as follows:
>
> ElasticsearchToEnumerableConverter
>   ElasticsearchSort(sort0=[$2], dir0=[ASC])
>     ElasticsearchProject(EXPR$0=[ITEM($0, 'a')], EXPR$1=[ITEM($0,
> '_id')], EXPR$2=[ITEM($0, 'c')])
>       ElasticsearchTableScan(table=[[elastic, foo]])
>
> Questions
>
>    1. In ElasticsearchEnumerators
>    <
> https://github.com/asereda-gs/calcite/blob/elastic-idfield/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchEnumerators.java#L46
> >
>    I need to know if requested field is a meta field (_id) or not (to
>    correctly fetch it from payload). Does it mean one has to manually keep
>    mapping between EXPR and ITEM somewhere ? Is there a better way ?
>    2. In Sort Rel (ElasticsearchSort). How to know that sort0=[$2] (EXPR$2)
>    is sort on ITEM($0, 'c') ? Mapping above (1) will help.
>
> Thanks for your help.
>
> On Sat, Dec 29, 2018 at 3:08 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > I think a concrete example of a query with its respective logical plan
> > could clarify a few things. I would expect that the logical plan contains
> > all the necessary information to decide which fields need to be fetched
> > from Elastic without requiring additional aliases. If that information is
> > lost when passing to the physical plan then we could possibly add some
> > rules and/or operators to ensure this doesn't happen.
> >
> > Στις Παρ, 28 Δεκ 2018 στις 11:37 μ.μ., ο/η Andrei Sereda
> <andrei@sereda.cc
> > >
> > έγραψε:
> >
> > > Other than that it seems to me _MAP[‘_id’] will become a RexCall
> similar
> > to
> > > ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field
> name
> > by
> > > implementing a RexVisitor.
> > >
> > > Yes. That’s what I’m doing (see isItem
> > > <
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85
> > > >
> > > )
> > >
> > > I don’t see clearly why you need to set explicitly aliases but probably
> > is
> > > because I am missing implementation details related with the adapter
> > > itself.
> > >
> > > For physical plan I get a list of expressions / projections (via
> > > RelDataType
> > > / getRowType) but at some point I need to know if (for example) EXPR$0
> ==
> > > ITEM($0, '_id') (_id is a system field and needs to be fetched
> > differently
> > > from payload). RelDataType doesn’t have item information.
> > >
> > > That’s the only reason I’m manually saving this mapping. Perhaps there
> > is a
> > > better way to do it ?
> > >
> > > On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <zabetak@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Andrei,
> > > >
> > > > I tried to have a look in the PR but I am not familiar with the
> Elastic
> > > > adapter code so I am missing a lot of context.
> > > >
> > > > In general, I think that relying on the on the field names coming
> from
> > a
> > > > RelDataType it is not safe. As you observed, it can get messed up
> quite
> > > > easily from many parts of Calcite.
> > > > It is usually better to try to reason using the position and/or the
> > type
> > > of
> > > > the field rather than its name.
> > > >
> > > > Other than that it seems to me _MAP['_id'] will become a RexCall
> > similar
> > > to
> > > > ITEM($0, '_ID') so it appears to me that you can retrieve the field
> > name
> > > by
> > > > implementing a RexVisitor.
> > > > I don't see clearly why you need to set explicitly aliases but
> probably
> > > is
> > > > because I am missing implementation details related with the adapter
> > > > itself.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <
> > jhyde@apache.org>
> > > > έγραψε:
> > > >
> > > > > I added comments to the case. I did not look at the PR, sorry.
> > > > >
> > > > > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc>
> > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Can somebody pls take a look at the following PR:
> > > > > >
> > > > > > PR: https://github.com/apache/calcite/pull/982
> > > > > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > > > > >
> > > > > > Specifically, I would like to understand if adding explicit
> mapping
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > > > > >
> > > > > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a
> > > correct
> > > > > > (and recommended) approach. The reason I need it is because the
> > > > original
> > > > > > item name (eg. `_MAP['foo.bar']`) is lost after converter
> operation
> > > (ie
> > > > > I'm
> > > > > > not able to derive it from `input.getRowType()`)
> > > > > >
> > > > > > ```sql
> > > > > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> > > > > 'foo.bar'
> > > > > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > > > > >
> > > > > > -- Query below preserves attribute information
> > > > > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > > > > > \"foo.bar\" from elastic)
> > > > > > ```
> > > > > >
> > > > > > Many thanks,
> > > > > > Andrei.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <
> > > zabetak@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Andrei,
> > > > > >>
> > > > > >> Many data management systems have internal columns which the
> users
> > > may
> > > > > be
> > > > > >> able to query (see for example Postgres [1]). It doesn't sound
> > > > > unnatural to
> > > > > >> allow users access these fields. On the other hand such system
> > > columns
> > > > > do
> > > > > >> not appear in SELECT * queries and I suppose they are pruned out
> > by
> > > > > >> projections early on if they are not used by the query. I guess
> > you
> > > > > could
> > > > > >> take a similar approach in the ElasticSearch adapter.
> > > > > >>
> > > > > >> Best,
> > > > > >> Stamatis
> > > > > >>
> > > > > >> [1]
> > https://www.postgresql.org/docs/current/ddl-system-columns.html
> > > > > >>
> > > > > >>
> > > > > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > > > > <an...@sereda.cc>
> > > > > >> έγραψε:
> > > > > >>
> > > > > >>> Hello,
> > > > > >>>
> > > > > >>> Upon indexing, elastic allocates a unique _id
> > > > > >>> <
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > > > > >>>>
> > > > > >>> for each document (unless specified by the user). Currently
> when
> > > > > mapping
> > > > > >>> between elastic result and calcite return type we query only
> > > _source
> > > > or
> > > > > >>> fields attributes. _id is not part those but exposed at higher
> > > level
> > > > in
> > > > > >>> query response (see SearchHit
> > > > > >>> <
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > > > > >>>>
> > > > > >>> ).
> > > > > >>>
> > > > > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > > > > fields.get('foo')).
> > > > > >>>
> > > > > >>> Do you think we should make _MAP['_id'] available implicitly ?
> > > > > >>>
> > > > > >>> I have a couple of use-cases where one needs to know document
> ID.
> > > > > >>>
> > > > > >>> Please note this makes sense only for non-aggregate queries.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>>
> > > > > >>> Andrei.
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Andrei Sereda <an...@sereda.cc>.

Let’s take an example of a simple query:

-- note _id fieldselect  _MAP['a'], _MAP['_id']  from elastic order by _MAP['c']

Logical plan is as follows:

ElasticsearchToEnumerableConverter
  ElasticsearchSort(sort0=[$2], dir0=[ASC])
    ElasticsearchProject(EXPR$0=[ITEM($0, 'a')], EXPR$1=[ITEM($0,
'_id')], EXPR$2=[ITEM($0, 'c')])
      ElasticsearchTableScan(table=[[elastic, foo]])

Questions

   1. In ElasticsearchEnumerators
   <https://github.com/asereda-gs/calcite/blob/elastic-idfield/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchEnumerators.java#L46>
   I need to know if requested field is a meta field (_id) or not (to
   correctly fetch it from payload). Does it mean one has to manually keep
   mapping between EXPR and ITEM somewhere ? Is there a better way ?
   2. In Sort Rel (ElasticsearchSort). How to know that sort0=[$2] (EXPR$2)
   is sort on ITEM($0, 'c') ? Mapping above (1) will help.

Thanks for your help.

On Sat, Dec 29, 2018 at 3:08 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> I think a concrete example of a query with its respective logical plan
> could clarify a few things. I would expect that the logical plan contains
> all the necessary information to decide which fields need to be fetched
> from Elastic without requiring additional aliases. If that information is
> lost when passing to the physical plan then we could possibly add some
> rules and/or operators to ensure this doesn't happen.
>
> Στις Παρ, 28 Δεκ 2018 στις 11:37 μ.μ., ο/η Andrei Sereda <andrei@sereda.cc
> >
> έγραψε:
>
> > Other than that it seems to me _MAP[‘_id’] will become a RexCall similar
> to
> > ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field name
> by
> > implementing a RexVisitor.
> >
> > Yes. That’s what I’m doing (see isItem
> > <
> >
> https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85
> > >
> > )
> >
> > I don’t see clearly why you need to set explicitly aliases but probably
> is
> > because I am missing implementation details related with the adapter
> > itself.
> >
> > For physical plan I get a list of expressions / projections (via
> > RelDataType
> > / getRowType) but at some point I need to know if (for example) EXPR$0 ==
> > ITEM($0, '_id') (_id is a system field and needs to be fetched
> differently
> > from payload). RelDataType doesn’t have item information.
> >
> > That’s the only reason I’m manually saving this mapping. Perhaps there
> is a
> > better way to do it ?
> >
> > On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> > > Hi Andrei,
> > >
> > > I tried to have a look in the PR but I am not familiar with the Elastic
> > > adapter code so I am missing a lot of context.
> > >
> > > In general, I think that relying on the on the field names coming from
> a
> > > RelDataType it is not safe. As you observed, it can get messed up quite
> > > easily from many parts of Calcite.
> > > It is usually better to try to reason using the position and/or the
> type
> > of
> > > the field rather than its name.
> > >
> > > Other than that it seems to me _MAP['_id'] will become a RexCall
> similar
> > to
> > > ITEM($0, '_ID') so it appears to me that you can retrieve the field
> name
> > by
> > > implementing a RexVisitor.
> > > I don't see clearly why you need to set explicitly aliases but probably
> > is
> > > because I am missing implementation details related with the adapter
> > > itself.
> > >
> > > Best,
> > > Stamatis
> > >
> > > Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <
> jhyde@apache.org>
> > > έγραψε:
> > >
> > > > I added comments to the case. I did not look at the PR, sorry.
> > > >
> > > > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc>
> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > Can somebody pls take a look at the following PR:
> > > > >
> > > > > PR: https://github.com/apache/calcite/pull/982
> > > > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > > > >
> > > > > Specifically, I would like to understand if adding explicit mapping
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > > > >
> > > > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a
> > correct
> > > > > (and recommended) approach. The reason I need it is because the
> > > original
> > > > > item name (eg. `_MAP['foo.bar']`) is lost after converter operation
> > (ie
> > > > I'm
> > > > > not able to derive it from `input.getRowType()`)
> > > > >
> > > > > ```sql
> > > > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> > > > 'foo.bar'
> > > > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > > > >
> > > > > -- Query below preserves attribute information
> > > > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > > > > \"foo.bar\" from elastic)
> > > > > ```
> > > > >
> > > > > Many thanks,
> > > > > Andrei.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <
> > zabetak@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Andrei,
> > > > >>
> > > > >> Many data management systems have internal columns which the users
> > may
> > > > be
> > > > >> able to query (see for example Postgres [1]). It doesn't sound
> > > > unnatural to
> > > > >> allow users access these fields. On the other hand such system
> > columns
> > > > do
> > > > >> not appear in SELECT * queries and I suppose they are pruned out
> by
> > > > >> projections early on if they are not used by the query. I guess
> you
> > > > could
> > > > >> take a similar approach in the ElasticSearch adapter.
> > > > >>
> > > > >> Best,
> > > > >> Stamatis
> > > > >>
> > > > >> [1]
> https://www.postgresql.org/docs/current/ddl-system-columns.html
> > > > >>
> > > > >>
> > > > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > > > <an...@sereda.cc>
> > > > >> έγραψε:
> > > > >>
> > > > >>> Hello,
> > > > >>>
> > > > >>> Upon indexing, elastic allocates a unique _id
> > > > >>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > > > >>>>
> > > > >>> for each document (unless specified by the user). Currently when
> > > > mapping
> > > > >>> between elastic result and calcite return type we query only
> > _source
> > > or
> > > > >>> fields attributes. _id is not part those but exposed at higher
> > level
> > > in
> > > > >>> query response (see SearchHit
> > > > >>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > > > >>>>
> > > > >>> ).
> > > > >>>
> > > > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > > > fields.get('foo')).
> > > > >>>
> > > > >>> Do you think we should make _MAP['_id'] available implicitly ?
> > > > >>>
> > > > >>> I have a couple of use-cases where one needs to know document ID.
> > > > >>>
> > > > >>> Please note this makes sense only for non-aggregate queries.
> > > > >>>
> > > > >>> Regards,
> > > > >>>
> > > > >>> Andrei.
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Stamatis Zampetakis <za...@gmail.com>.

I think a concrete example of a query with its respective logical plan
could clarify a few things. I would expect that the logical plan contains
all the necessary information to decide which fields need to be fetched
from Elastic without requiring additional aliases. If that information is
lost when passing to the physical plan then we could possibly add some
rules and/or operators to ensure this doesn't happen.

Στις Παρ, 28 Δεκ 2018 στις 11:37 μ.μ., ο/η Andrei Sereda <an...@sereda.cc>
έγραψε:

> Other than that it seems to me _MAP[‘_id’] will become a RexCall similar to
> ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field name by
> implementing a RexVisitor.
>
> Yes. That’s what I’m doing (see isItem
> <
> https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85
> >
> )
>
> I don’t see clearly why you need to set explicitly aliases but probably is
> because I am missing implementation details related with the adapter
> itself.
>
> For physical plan I get a list of expressions / projections (via
> RelDataType
> / getRowType) but at some point I need to know if (for example) EXPR$0 ==
> ITEM($0, '_id') (_id is a system field and needs to be fetched differently
> from payload). RelDataType doesn’t have item information.
>
> That’s the only reason I’m manually saving this mapping. Perhaps there is a
> better way to do it ?
>
> On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > Hi Andrei,
> >
> > I tried to have a look in the PR but I am not familiar with the Elastic
> > adapter code so I am missing a lot of context.
> >
> > In general, I think that relying on the on the field names coming from a
> > RelDataType it is not safe. As you observed, it can get messed up quite
> > easily from many parts of Calcite.
> > It is usually better to try to reason using the position and/or the type
> of
> > the field rather than its name.
> >
> > Other than that it seems to me _MAP['_id'] will become a RexCall similar
> to
> > ITEM($0, '_ID') so it appears to me that you can retrieve the field name
> by
> > implementing a RexVisitor.
> > I don't see clearly why you need to set explicitly aliases but probably
> is
> > because I am missing implementation details related with the adapter
> > itself.
> >
> > Best,
> > Stamatis
> >
> > Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <jh...@apache.org>
> > έγραψε:
> >
> > > I added comments to the case. I did not look at the PR, sorry.
> > >
> > > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Can somebody pls take a look at the following PR:
> > > >
> > > > PR: https://github.com/apache/calcite/pull/982
> > > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > > >
> > > > Specifically, I would like to understand if adding explicit mapping
> > > > <
> > >
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > > >
> > > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a
> correct
> > > > (and recommended) approach. The reason I need it is because the
> > original
> > > > item name (eg. `_MAP['foo.bar']`) is lost after converter operation
> (ie
> > > I'm
> > > > not able to derive it from `input.getRowType()`)
> > > >
> > > > ```sql
> > > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> > > 'foo.bar'
> > > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > > >
> > > > -- Query below preserves attribute information
> > > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > > > \"foo.bar\" from elastic)
> > > > ```
> > > >
> > > > Many thanks,
> > > > Andrei.
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <
> zabetak@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Hi Andrei,
> > > >>
> > > >> Many data management systems have internal columns which the users
> may
> > > be
> > > >> able to query (see for example Postgres [1]). It doesn't sound
> > > unnatural to
> > > >> allow users access these fields. On the other hand such system
> columns
> > > do
> > > >> not appear in SELECT * queries and I suppose they are pruned out by
> > > >> projections early on if they are not used by the query. I guess you
> > > could
> > > >> take a similar approach in the ElasticSearch adapter.
> > > >>
> > > >> Best,
> > > >> Stamatis
> > > >>
> > > >> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
> > > >>
> > > >>
> > > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > > <an...@sereda.cc>
> > > >> έγραψε:
> > > >>
> > > >>> Hello,
> > > >>>
> > > >>> Upon indexing, elastic allocates a unique _id
> > > >>> <
> > > >>>
> > > >>
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > > >>>>
> > > >>> for each document (unless specified by the user). Currently when
> > > mapping
> > > >>> between elastic result and calcite return type we query only
> _source
> > or
> > > >>> fields attributes. _id is not part those but exposed at higher
> level
> > in
> > > >>> query response (see SearchHit
> > > >>> <
> > > >>>
> > > >>
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > > >>>>
> > > >>> ).
> > > >>>
> > > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > > fields.get('foo')).
> > > >>>
> > > >>> Do you think we should make _MAP['_id'] available implicitly ?
> > > >>>
> > > >>> I have a couple of use-cases where one needs to know document ID.
> > > >>>
> > > >>> Please note this makes sense only for non-aggregate queries.
> > > >>>
> > > >>> Regards,
> > > >>>
> > > >>> Andrei.
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Andrei Sereda <an...@sereda.cc>.

Other than that it seems to me _MAP[‘_id’] will become a RexCall similar to
ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field name by
implementing a RexVisitor.

Yes. That’s what I’m doing (see isItem
<https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85>
)

I don’t see clearly why you need to set explicitly aliases but probably is
because I am missing implementation details related with the adapter itself.

For physical plan I get a list of expressions / projections (via RelDataType
/ getRowType) but at some point I need to know if (for example) EXPR$0 ==
ITEM($0, '_id') (_id is a system field and needs to be fetched differently
from payload). RelDataType doesn’t have item information.

That’s the only reason I’m manually saving this mapping. Perhaps there is a
better way to do it ?

On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hi Andrei,
>
> I tried to have a look in the PR but I am not familiar with the Elastic
> adapter code so I am missing a lot of context.
>
> In general, I think that relying on the on the field names coming from a
> RelDataType it is not safe. As you observed, it can get messed up quite
> easily from many parts of Calcite.
> It is usually better to try to reason using the position and/or the type of
> the field rather than its name.
>
> Other than that it seems to me _MAP['_id'] will become a RexCall similar to
> ITEM($0, '_ID') so it appears to me that you can retrieve the field name by
> implementing a RexVisitor.
> I don't see clearly why you need to set explicitly aliases but probably is
> because I am missing implementation details related with the adapter
> itself.
>
> Best,
> Stamatis
>
> Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <jh...@apache.org>
> έγραψε:
>
> > I added comments to the case. I did not look at the PR, sorry.
> >
> > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc> wrote:
> > >
> > > Hello,
> > >
> > > Can somebody pls take a look at the following PR:
> > >
> > > PR: https://github.com/apache/calcite/pull/982
> > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > >
> > > Specifically, I would like to understand if adding explicit mapping
> > > <
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > >
> > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> > > (and recommended) approach. The reason I need it is because the
> original
> > > item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie
> > I'm
> > > not able to derive it from `input.getRowType()`)
> > >
> > > ```sql
> > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> > 'foo.bar'
> > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > >
> > > -- Query below preserves attribute information
> > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > > \"foo.bar\" from elastic)
> > > ```
> > >
> > > Many thanks,
> > > Andrei.
> > >
> > >
> > >
> > >
> > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <zabetak@gmail.com
> >
> > > wrote:
> > >
> > >> Hi Andrei,
> > >>
> > >> Many data management systems have internal columns which the users may
> > be
> > >> able to query (see for example Postgres [1]). It doesn't sound
> > unnatural to
> > >> allow users access these fields. On the other hand such system columns
> > do
> > >> not appear in SELECT * queries and I suppose they are pruned out by
> > >> projections early on if they are not used by the query. I guess you
> > could
> > >> take a similar approach in the ElasticSearch adapter.
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> > >> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
> > >>
> > >>
> > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > <an...@sereda.cc>
> > >> έγραψε:
> > >>
> > >>> Hello,
> > >>>
> > >>> Upon indexing, elastic allocates a unique _id
> > >>> <
> > >>>
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > >>>>
> > >>> for each document (unless specified by the user). Currently when
> > mapping
> > >>> between elastic result and calcite return type we query only _source
> or
> > >>> fields attributes. _id is not part those but exposed at higher level
> in
> > >>> query response (see SearchHit
> > >>> <
> > >>>
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > >>>>
> > >>> ).
> > >>>
> > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > fields.get('foo')).
> > >>>
> > >>> Do you think we should make _MAP['_id'] available implicitly ?
> > >>>
> > >>> I have a couple of use-cases where one needs to know document ID.
> > >>>
> > >>> Please note this makes sense only for non-aggregate queries.
> > >>>
> > >>> Regards,
> > >>>
> > >>> Andrei.
> > >>>
> > >>
> >
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hi Andrei,

I tried to have a look in the PR but I am not familiar with the Elastic
adapter code so I am missing a lot of context.

In general, I think that relying on the on the field names coming from a
RelDataType it is not safe. As you observed, it can get messed up quite
easily from many parts of Calcite.
It is usually better to try to reason using the position and/or the type of
the field rather than its name.

Other than that it seems to me _MAP['_id'] will become a RexCall similar to
ITEM($0, '_ID') so it appears to me that you can retrieve the field name by
implementing a RexVisitor.
I don't see clearly why you need to set explicitly aliases but probably is
because I am missing implementation details related with the adapter itself.

Best,
Stamatis

Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <jh...@apache.org>
έγραψε:

> I added comments to the case. I did not look at the PR, sorry.
>
> > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc> wrote:
> >
> > Hello,
> >
> > Can somebody pls take a look at the following PR:
> >
> > PR: https://github.com/apache/calcite/pull/982
> > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> >
> > Specifically, I would like to understand if adding explicit mapping
> > <
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> >
> > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> > (and recommended) approach. The reason I need it is because the original
> > item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie
> I'm
> > not able to derive it from `input.getRowType()`)
> >
> > ```sql
> > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> 'foo.bar'
> > select _MAP['_id'], _MAP['foo.bar'] from elastic
> >
> > -- Query below preserves attribute information
> > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > \"foo.bar\" from elastic)
> > ```
> >
> > Many thanks,
> > Andrei.
> >
> >
> >
> >
> > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> >> Hi Andrei,
> >>
> >> Many data management systems have internal columns which the users may
> be
> >> able to query (see for example Postgres [1]). It doesn't sound
> unnatural to
> >> allow users access these fields. On the other hand such system columns
> do
> >> not appear in SELECT * queries and I suppose they are pruned out by
> >> projections early on if they are not used by the query. I guess you
> could
> >> take a similar approach in the ElasticSearch adapter.
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
> >>
> >>
> >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> <an...@sereda.cc>
> >> έγραψε:
> >>
> >>> Hello,
> >>>
> >>> Upon indexing, elastic allocates a unique _id
> >>> <
> >>>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> >>>>
> >>> for each document (unless specified by the user). Currently when
> mapping
> >>> between elastic result and calcite return type we query only _source or
> >>> fields attributes. _id is not part those but exposed at higher level in
> >>> query response (see SearchHit
> >>> <
> >>>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> >>>>
> >>> ).
> >>>
> >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> fields.get('foo')).
> >>>
> >>> Do you think we should make _MAP['_id'] available implicitly ?
> >>>
> >>> I have a couple of use-cases where one needs to know document ID.
> >>>
> >>> Please note this makes sense only for non-aggregate queries.
> >>>
> >>> Regards,
> >>>
> >>> Andrei.
> >>>
> >>
>
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Julian Hyde <jh...@apache.org>.

I added comments to the case. I did not look at the PR, sorry.

> On Dec 26, 2018, at 1:56 PM, Andrei Sereda <an...@sereda.cc> wrote:
> 
> Hello,
> 
> Can somebody pls take a look at the following PR:
> 
> PR: https://github.com/apache/calcite/pull/982
> JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> 
> Specifically, I would like to understand if adding explicit mapping
> <https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75>
> between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> (and recommended) approach. The reason I need it is because the original
> item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie I'm
> not able to derive it from `input.getRowType()`)
> 
> ```sql
> -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and 'foo.bar'
> select _MAP['_id'], _MAP['foo.bar'] from elastic
> 
> -- Query below preserves attribute information
> select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> \"foo.bar\" from elastic)
> ```
> 
> Many thanks,
> Andrei.
> 
> 
> 
> 
> On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
> 
>> Hi Andrei,
>> 
>> Many data management systems have internal columns which the users may be
>> able to query (see for example Postgres [1]). It doesn't sound unnatural to
>> allow users access these fields. On the other hand such system columns do
>> not appear in SELECT * queries and I suppose they are pruned out by
>> projections early on if they are not used by the query. I guess you could
>> take a similar approach in the ElasticSearch adapter.
>> 
>> Best,
>> Stamatis
>> 
>> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
>> 
>> 
>> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda <an...@sereda.cc>
>> έγραψε:
>> 
>>> Hello,
>>> 
>>> Upon indexing, elastic allocates a unique _id
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
>>>> 
>>> for each document (unless specified by the user). Currently when mapping
>>> between elastic result and calcite return type we query only _source or
>>> fields attributes. _id is not part those but exposed at higher level in
>>> query response (see SearchHit
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
>>>> 
>>> ).
>>> 
>>> Currently  _MAP['foo'] becomes _source.get('foo') (or fields.get('foo')).
>>> 
>>> Do you think we should make _MAP['_id'] available implicitly ?
>>> 
>>> I have a couple of use-cases where one needs to know document ID.
>>> 
>>> Please note this makes sense only for non-aggregate queries.
>>> 
>>> Regards,
>>> 
>>> Andrei.
>>> 
>>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Andrei Sereda <an...@sereda.cc>.

Hello,

Can somebody pls take a look at the following PR:

PR: https://github.com/apache/calcite/pull/982
JIRA: https://issues.apache.org/jira/browse/CALCITE-2755

Specifically, I would like to understand if adding explicit mapping
<https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75>
between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
(and recommended) approach. The reason I need it is because the original
item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie I'm
not able to derive it from `input.getRowType()`)

```sql
-- Here I get EXPR$0 and EXPR$1 but no information about 'id' and 'foo.bar'
select _MAP['_id'], _MAP['foo.bar'] from elastic

-- Query below preserves attribute information
select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
\"foo.bar\" from elastic)
```

Many thanks,
Andrei.




On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hi Andrei,
>
> Many data management systems have internal columns which the users may be
> able to query (see for example Postgres [1]). It doesn't sound unnatural to
> allow users access these fields. On the other hand such system columns do
> not appear in SELECT * queries and I suppose they are pruned out by
> projections early on if they are not used by the query. I guess you could
> take a similar approach in the ElasticSearch adapter.
>
> Best,
> Stamatis
>
> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
>
>
> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda <an...@sereda.cc>
> έγραψε:
>
> > Hello,
> >
> > Upon indexing, elastic allocates a unique _id
> > <
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > >
> > for each document (unless specified by the user). Currently when mapping
> > between elastic result and calcite return type we query only _source or
> > fields attributes. _id is not part those but exposed at higher level in
> > query response (see SearchHit
> > <
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > >
> > ).
> >
> > Currently  _MAP['foo'] becomes _source.get('foo') (or fields.get('foo')).
> >
> > Do you think we should make _MAP['_id'] available implicitly ?
> >
> > I have a couple of use-cases where one needs to know document ID.
> >
> > Please note this makes sense only for non-aggregate queries.
> >
> > Regards,
> >
> > Andrei.
> >
>

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hi Andrei,

Many data management systems have internal columns which the users may be
able to query (see for example Postgres [1]). It doesn't sound unnatural to
allow users access these fields. On the other hand such system columns do
not appear in SELECT * queries and I suppose they are pruned out by
projections early on if they are not used by the query. I guess you could
take a similar approach in the ElasticSearch adapter.

Best,
Stamatis

[1] https://www.postgresql.org/docs/current/ddl-system-columns.html


Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda <an...@sereda.cc>
έγραψε:

> Hello,
>
> Upon indexing, elastic allocates a unique _id
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> >
> for each document (unless specified by the user). Currently when mapping
> between elastic result and calcite return type we query only _source or
> fields attributes. _id is not part those but exposed at higher level in
> query response (see SearchHit
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> >
> ).
>
> Currently  _MAP['foo'] becomes _source.get('foo') (or fields.get('foo')).
>
> Do you think we should make _MAP['_id'] available implicitly ?
>
> I have a couple of use-cases where one needs to know document ID.
>
> Please note this makes sense only for non-aggregate queries.
>
> Regards,
>
> Andrei.
>