You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Maria Podorvanova <po...@gmail.com> on 2021/02/20 08:00:57 UTC

Add datastore for Elasticsearch. Outreachy Week 11 Report

Hi,

Report #11
Week 11: February, 14 - February, 20
Activities:
- Added scaling_factor support [1]
- Removed unsupported Elasticsearch data types [2]
- Implemented Metadata Analyzer for Elasticsearch Store [3]
- Tried to fix range query by “_id” field [4]
- Wrote documentation for Apache Gora website [5]
- Polished and sent my CV for reviewing

Question:

   1. I tried to fix the issue, where Elasticsearch "_id" field does not
   support range queries. I've tried treating "_id" as a number, but one of
   the test "_id" field values is "http://foo.com/". So my approach did not
   work, but I decided to commit[4] my work on this issue in order to show you
   what I tried to do.


[1]
https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
[2]
https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
[3]
https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
[4]
https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
[5]
https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing

Regards,
Maria

Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

Posted by Maria Podorvanova <po...@gmail.com>.
Hi Kevin,

Yes, I will make a PR, once I fix some issues.

Regards,
Maria

On Thu, 25 Feb 2021 at 15:49, Kevin Ratnasekera <dj...@gmail.com>
wrote:

> Hi Maria,
>
> Thank you for hard work Maria. Can you raise a PR, once you are
> comfortable with changes?
>
> Regards
> Kevin
>
> On Thu, Feb 25, 2021 at 10:06 AM Maria Podorvanova <
> podorvanova.maria@gmail.com> wrote:
>
>> Hi John,
>>
>> Thanks for your comment. I am working on it.
>>
>> Regards,
>> Maria
>>
>> On Wed, 24 Feb 2021 at 17:50, John Mora <jh...@gmail.com> wrote:
>>
>>> Hi Maria.
>>>
>>> Thanks for the update.
>>>
>>> Unfortunately, looping through all possible values in the range is not a
>>> practical solution.
>>>
>>> You should use the range query feature for this:
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
>>>
>>> I think you should manually add a special field in the elasticsearch
>>> record that you can range query (you can add it to the mapping file as a
>>> 'mock' primary key field). It will be basically a copy of the '_id' field.
>>>
>>> Here, you can find a similar workaround in the Redis DataStore where
>>> Sorted Sets were as secondary indexes for range queries.
>>>
>>>
>>> https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299
>>>
>>> Best,
>>> John
>>>
>>> El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
>>> podorvanova.maria@gmail.com>) escribió:
>>>
>>>> Hi,
>>>>
>>>> Report #11
>>>> Week 11: February, 14 - February, 20
>>>> Activities:
>>>> - Added scaling_factor support [1]
>>>> - Removed unsupported Elasticsearch data types [2]
>>>> - Implemented Metadata Analyzer for Elasticsearch Store [3]
>>>> - Tried to fix range query by “_id” field [4]
>>>> - Wrote documentation for Apache Gora website [5]
>>>> - Polished and sent my CV for reviewing
>>>>
>>>> Question:
>>>>
>>>>    1. I tried to fix the issue, where Elasticsearch "_id" field does
>>>>    not support range queries. I've tried treating "_id" as a number, but one
>>>>    of the test "_id" field values is "http://foo.com/". So
>>>>    my approach did not work, but I decided to commit[4] my work on this issue
>>>>    in order to show you what I tried to do.
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
>>>> [2]
>>>> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
>>>> [3]
>>>> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
>>>> [4]
>>>> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
>>>> [5]
>>>> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Maria
>>>>
>>>

Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

Posted by Kevin Ratnasekera <dj...@gmail.com>.
Hi Maria,

Thank you for hard work Maria. Can you raise a PR, once you are
comfortable with changes?

Regards
Kevin

On Thu, Feb 25, 2021 at 10:06 AM Maria Podorvanova <
podorvanova.maria@gmail.com> wrote:

> Hi John,
>
> Thanks for your comment. I am working on it.
>
> Regards,
> Maria
>
> On Wed, 24 Feb 2021 at 17:50, John Mora <jh...@gmail.com> wrote:
>
>> Hi Maria.
>>
>> Thanks for the update.
>>
>> Unfortunately, looping through all possible values in the range is not a
>> practical solution.
>>
>> You should use the range query feature for this:
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
>>
>> I think you should manually add a special field in the elasticsearch
>> record that you can range query (you can add it to the mapping file as a
>> 'mock' primary key field). It will be basically a copy of the '_id' field.
>>
>> Here, you can find a similar workaround in the Redis DataStore where
>> Sorted Sets were as secondary indexes for range queries.
>>
>>
>> https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299
>>
>> Best,
>> John
>>
>> El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
>> podorvanova.maria@gmail.com>) escribió:
>>
>>> Hi,
>>>
>>> Report #11
>>> Week 11: February, 14 - February, 20
>>> Activities:
>>> - Added scaling_factor support [1]
>>> - Removed unsupported Elasticsearch data types [2]
>>> - Implemented Metadata Analyzer for Elasticsearch Store [3]
>>> - Tried to fix range query by “_id” field [4]
>>> - Wrote documentation for Apache Gora website [5]
>>> - Polished and sent my CV for reviewing
>>>
>>> Question:
>>>
>>>    1. I tried to fix the issue, where Elasticsearch "_id" field does
>>>    not support range queries. I've tried treating "_id" as a number, but one
>>>    of the test "_id" field values is "http://foo.com/". So
>>>    my approach did not work, but I decided to commit[4] my work on this issue
>>>    in order to show you what I tried to do.
>>>
>>>
>>> [1]
>>> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
>>> [2]
>>> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
>>> [3]
>>> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
>>> [4]
>>> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
>>> [5]
>>> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>>>
>>> Regards,
>>> Maria
>>>
>>

Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

Posted by Maria Podorvanova <po...@gmail.com>.
Hi John,

Thanks for your comment. I am working on it.

Regards,
Maria

On Wed, 24 Feb 2021 at 17:50, John Mora <jh...@gmail.com> wrote:

> Hi Maria.
>
> Thanks for the update.
>
> Unfortunately, looping through all possible values in the range is not a
> practical solution.
>
> You should use the range query feature for this:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
>
> I think you should manually add a special field in the elasticsearch
> record that you can range query (you can add it to the mapping file as a
> 'mock' primary key field). It will be basically a copy of the '_id' field.
>
> Here, you can find a similar workaround in the Redis DataStore where
> Sorted Sets were as secondary indexes for range queries.
>
>
> https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299
>
> Best,
> John
>
> El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
> podorvanova.maria@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #11
>> Week 11: February, 14 - February, 20
>> Activities:
>> - Added scaling_factor support [1]
>> - Removed unsupported Elasticsearch data types [2]
>> - Implemented Metadata Analyzer for Elasticsearch Store [3]
>> - Tried to fix range query by “_id” field [4]
>> - Wrote documentation for Apache Gora website [5]
>> - Polished and sent my CV for reviewing
>>
>> Question:
>>
>>    1. I tried to fix the issue, where Elasticsearch "_id" field does not
>>    support range queries. I've tried treating "_id" as a number, but one of
>>    the test "_id" field values is "http://foo.com/". So my approach did
>>    not work, but I decided to commit[4] my work on this issue in order to show
>>    you what I tried to do.
>>
>>
>> [1]
>> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
>> [2]
>> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
>> [3]
>> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
>> [4]
>> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
>> [5]
>> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>>
>> Regards,
>> Maria
>>
>

Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

Posted by John Mora <jh...@gmail.com>.
Hi Maria.

Thanks for the update.

Unfortunately, looping through all possible values in the range is not a
practical solution.

You should use the range query feature for this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

I think you should manually add a special field in the elasticsearch record
that you can range query (you can add it to the mapping file as a 'mock'
primary key field). It will be basically a copy of the '_id' field.

Here, you can find a similar workaround in the Redis DataStore where Sorted
Sets were as secondary indexes for range queries.

https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299

Best,
John

El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
podorvanova.maria@gmail.com>) escribió:

> Hi,
>
> Report #11
> Week 11: February, 14 - February, 20
> Activities:
> - Added scaling_factor support [1]
> - Removed unsupported Elasticsearch data types [2]
> - Implemented Metadata Analyzer for Elasticsearch Store [3]
> - Tried to fix range query by “_id” field [4]
> - Wrote documentation for Apache Gora website [5]
> - Polished and sent my CV for reviewing
>
> Question:
>
>    1. I tried to fix the issue, where Elasticsearch "_id" field does not
>    support range queries. I've tried treating "_id" as a number, but one of
>    the test "_id" field values is "http://foo.com/". So my approach did
>    not work, but I decided to commit[4] my work on this issue in order to show
>    you what I tried to do.
>
>
> [1]
> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
> [2]
> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
> [3]
> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
> [4]
> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
> [5]
> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>
> Regards,
> Maria
>