You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/08 21:28:00 UTC
[jira] [Commented] (PIO-102) ESEngineInstances `getAll` results out
of order (Elasticsearch 5.x)
[ https://issues.apache.org/jira/browse/PIO-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079322#comment-16079322 ]
ASF GitHub Bot commented on PIO-102:
------------------------------------
GitHub user mars opened a pull request:
https://github.com/apache/incubator-predictionio/pull/406
[PIO-102] Fix ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
Fix for [PIO-102](https://issues.apache.org/jira/browse/PIO-102)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mars/incubator-predictionio fix-es-getall-order
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-predictionio/pull/406.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #406
----
commit 34fb0de8ae91f3bf9edb7a9823ea1784555845a8
Author: Mars Hall <ma...@heroku.com>
Date: 2017-07-08T01:22:14Z
Append Elasticsearch scroll results to maintain order
----
> ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
> -------------------------------------------------------------------
>
> Key: PIO-102
> URL: https://issues.apache.org/jira/browse/PIO-102
> Project: PredictionIO
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.11.0-incubating
> Reporter: Mars Hall
> Assignee: Mars Hall
>
> Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate.
> In the original Elasticsearch 1.x storage client, [the "scroll" pagination responses are collected by *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] them to one another.
> In the new Elasticsearch 5.x client, [the "scroll" responses are collected by *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152] them to one another.
> This out-of-order concatenation breaks [ESEngineInstances `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192] by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits.
> I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature.
> I'll be opening a pull request shortly with the super-simple fix.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)