You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Maria Podorvanova <po...@gmail.com> on 2021/02/27 08:09:38 UTC

Add datastore for Elasticsearch. Outreachy Week 12 Report

Hi,

Report #12
Week 12: February, 21 - February, 27
Activities:
- Fixed execute method by adding a special "gora_id" field [1]
- Implemented deleting specific fields of the records in deleteByQuery
method [2]
- Implemented MapReduce test [3]
- Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
- All tests in TestElasticsearchStore are passing now
- I also had informal chat with 2 people this week

Questions:

   1. The last commit [4] gives Elasticsearch some time to synchronize all
   its replicas. Without Thread.sleep 10 tests (testQuery, testQueryStartKey,
   testDeleteByQuery etc.) fail and return a different number of hits every
   time I run them. I did not find a better solution, but commit it anyway. Do
   you have any suggestions?
   2. I did not get feedback about Elasticsearch documentation for Apache
   Gora website I sent last week. Do I need to fix something in it?
   3. One of the last goals of my internship is to add the new datastore to
   the GoraExplorer project. Could you tell me if there is any guide on how to
   do it?


[1]
https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
[2]
https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
[3]
https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
[4]
https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca

Regards,
Maria

Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

Posted by Maria Podorvanova <po...@gmail.com>.
Hi John,

Thank you for your response.

1) I have tried to execute a refresh call on the flush method and it is
working now. Thank you very much!

3) I see. I will leave it out for now then.

I will send a PR by the end of today.

Regards,
Maria

On Tue, 2 Mar 2021 at 09:33, John Mora <jh...@gmail.com> wrote:

> Hi Maria.
>
>
> Thanks for your update.
>
> 1) I made some experiments and I think you have to execute a refresh call
> on the flush() method.
> "An elasticsearch refresh
> <http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html>
> makes your documents available for search"
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
>
> Also, if you have problems with the order of the results check out the
> preference parameter
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-preference
>
> 3) Since the internship end is close and the Gora Explorer is an
> independent project (I am not sure if Alfonso has free time). I think we
> can skip that task, but it would be a nice post-outreachy contribution if
> you want.
>
> Please send a PR with your code for review.
>
> Thanks,
> John
>
> El lun, 1 mar 2021 a las 7:23, Maria Podorvanova (<
> podorvanova.maria@gmail.com>) escribió:
>
>> Hi Madhawa,
>>
>> Thank you for your response. I will do that.
>>
>> Regards,
>> Maria
>>
>> On Mon, 1 Mar 2021 at 22:51, Madhawa Gunasekara <ma...@gmail.com>
>> wrote:
>>
>>> Hi Maria,
>>>
>>> 2) Documentation looks fine to me, please refer these to documentation
>>> Jira tickets as well. Let's stick to the same format.
>>> [1] https://issues.apache.org/jira/browse/GORA-625
>>> [2] https://issues.apache.org/jira/browse/GORA-338
>>>
>>> Please create a separate ticket for this documentation.
>>>
>>> Thanks,
>>> Madhawa
>>>
>>>
>>> On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
>>> podorvanova.maria@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Report #12
>>>> Week 12: February, 21 - February, 27
>>>> Activities:
>>>> - Fixed execute method by adding a special "gora_id" field [1]
>>>> - Implemented deleting specific fields of the records in deleteByQuery
>>>> method [2]
>>>> - Implemented MapReduce test [3]
>>>> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
>>>> - All tests in TestElasticsearchStore are passing now
>>>> - I also had informal chat with 2 people this week
>>>>
>>>> Questions:
>>>>
>>>>    1. The last commit [4] gives Elasticsearch some time to synchronize
>>>>    all its replicas. Without Thread.sleep 10 tests (testQuery,
>>>>    testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>>>>    number of hits every time I run them. I did not find a better solution, but
>>>>    commit it anyway. Do you have any suggestions?
>>>>    2. I did not get feedback about Elasticsearch documentation for
>>>>    Apache Gora website I sent last week. Do I need to fix something in it?
>>>>    3. One of the last goals of my internship is to add the new
>>>>    datastore to the GoraExplorer project. Could you tell me if there is any
>>>>    guide on how to do it?
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
>>>> [2]
>>>> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
>>>> [3]
>>>> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
>>>> [4]
>>>> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>>>>
>>>> Regards,
>>>> Maria
>>>>
>>>

Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

Posted by John Mora <jh...@gmail.com>.
Hi Maria.


Thanks for your update.

1) I made some experiments and I think you have to execute a refresh call
on the flush() method.
"An elasticsearch refresh
<http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html>
makes your documents available for search"

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html

Also, if you have problems with the order of the results check out the
preference parameter

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-preference

3) Since the internship end is close and the Gora Explorer is an
independent project (I am not sure if Alfonso has free time). I think we
can skip that task, but it would be a nice post-outreachy contribution if
you want.

Please send a PR with your code for review.

Thanks,
John

El lun, 1 mar 2021 a las 7:23, Maria Podorvanova (<
podorvanova.maria@gmail.com>) escribió:

> Hi Madhawa,
>
> Thank you for your response. I will do that.
>
> Regards,
> Maria
>
> On Mon, 1 Mar 2021 at 22:51, Madhawa Gunasekara <ma...@gmail.com>
> wrote:
>
>> Hi Maria,
>>
>> 2) Documentation looks fine to me, please refer these to documentation
>> Jira tickets as well. Let's stick to the same format.
>> [1] https://issues.apache.org/jira/browse/GORA-625
>> [2] https://issues.apache.org/jira/browse/GORA-338
>>
>> Please create a separate ticket for this documentation.
>>
>> Thanks,
>> Madhawa
>>
>>
>> On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
>> podorvanova.maria@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Report #12
>>> Week 12: February, 21 - February, 27
>>> Activities:
>>> - Fixed execute method by adding a special "gora_id" field [1]
>>> - Implemented deleting specific fields of the records in deleteByQuery
>>> method [2]
>>> - Implemented MapReduce test [3]
>>> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
>>> - All tests in TestElasticsearchStore are passing now
>>> - I also had informal chat with 2 people this week
>>>
>>> Questions:
>>>
>>>    1. The last commit [4] gives Elasticsearch some time to synchronize
>>>    all its replicas. Without Thread.sleep 10 tests (testQuery,
>>>    testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>>>    number of hits every time I run them. I did not find a better solution, but
>>>    commit it anyway. Do you have any suggestions?
>>>    2. I did not get feedback about Elasticsearch documentation for
>>>    Apache Gora website I sent last week. Do I need to fix something in it?
>>>    3. One of the last goals of my internship is to add the new
>>>    datastore to the GoraExplorer project. Could you tell me if there is any
>>>    guide on how to do it?
>>>
>>>
>>> [1]
>>> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
>>> [2]
>>> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
>>> [3]
>>> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
>>> [4]
>>> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>>>
>>> Regards,
>>> Maria
>>>
>>

Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

Posted by Maria Podorvanova <po...@gmail.com>.
Hi Madhawa,

Thank you for your response. I will do that.

Regards,
Maria

On Mon, 1 Mar 2021 at 22:51, Madhawa Gunasekara <ma...@gmail.com> wrote:

> Hi Maria,
>
> 2) Documentation looks fine to me, please refer these to documentation
> Jira tickets as well. Let's stick to the same format.
> [1] https://issues.apache.org/jira/browse/GORA-625
> [2] https://issues.apache.org/jira/browse/GORA-338
>
> Please create a separate ticket for this documentation.
>
> Thanks,
> Madhawa
>
>
> On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
> podorvanova.maria@gmail.com> wrote:
>
>> Hi,
>>
>> Report #12
>> Week 12: February, 21 - February, 27
>> Activities:
>> - Fixed execute method by adding a special "gora_id" field [1]
>> - Implemented deleting specific fields of the records in deleteByQuery
>> method [2]
>> - Implemented MapReduce test [3]
>> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
>> - All tests in TestElasticsearchStore are passing now
>> - I also had informal chat with 2 people this week
>>
>> Questions:
>>
>>    1. The last commit [4] gives Elasticsearch some time to synchronize
>>    all its replicas. Without Thread.sleep 10 tests (testQuery,
>>    testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>>    number of hits every time I run them. I did not find a better solution, but
>>    commit it anyway. Do you have any suggestions?
>>    2. I did not get feedback about Elasticsearch documentation for
>>    Apache Gora website I sent last week. Do I need to fix something in it?
>>    3. One of the last goals of my internship is to add the new datastore
>>    to the GoraExplorer project. Could you tell me if there is any guide on how
>>    to do it?
>>
>>
>> [1]
>> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
>> [2]
>> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
>> [3]
>> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
>> [4]
>> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>>
>> Regards,
>> Maria
>>
>

Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

Posted by Madhawa Gunasekara <ma...@gmail.com>.
Hi Maria,

2) Documentation looks fine to me, please refer these to documentation Jira
tickets as well. Let's stick to the same format.
[1] https://issues.apache.org/jira/browse/GORA-625
[2] https://issues.apache.org/jira/browse/GORA-338

Please create a separate ticket for this documentation.

Thanks,
Madhawa


On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
podorvanova.maria@gmail.com> wrote:

> Hi,
>
> Report #12
> Week 12: February, 21 - February, 27
> Activities:
> - Fixed execute method by adding a special "gora_id" field [1]
> - Implemented deleting specific fields of the records in deleteByQuery
> method [2]
> - Implemented MapReduce test [3]
> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
> - All tests in TestElasticsearchStore are passing now
> - I also had informal chat with 2 people this week
>
> Questions:
>
>    1. The last commit [4] gives Elasticsearch some time to synchronize
>    all its replicas. Without Thread.sleep 10 tests (testQuery,
>    testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>    number of hits every time I run them. I did not find a better solution, but
>    commit it anyway. Do you have any suggestions?
>    2. I did not get feedback about Elasticsearch documentation for Apache
>    Gora website I sent last week. Do I need to fix something in it?
>    3. One of the last goals of my internship is to add the new datastore
>    to the GoraExplorer project. Could you tell me if there is any guide on how
>    to do it?
>
>
> [1]
> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
> [2]
> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
> [3]
> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
> [4]
> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>
> Regards,
> Maria
>