You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2017/06/30 19:11:00 UTC

[jira] [Commented] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs

    [ https://issues.apache.org/jira/browse/PIO-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070594#comment-16070594 ] 

Pat Ferrel commented on PIO-96:
-------------------------------

IMO this example is a non-issue. As the author of the UR I can state that Elasticsearch performs the last step in the recommender algorithm: namely the sum of dot products of query against the model data. Therefore is part of the needed compute engine, not just a store. The user doesn't know this but it's true none the less.

However Installation and expansion of PIO, engines, and component services is the single biggest problem encountered on the mailing list. We really need to address it. 

I would suggest we tackle this real problem not by reducing options but by making the composition of services easier to install by providing them in containers with a script per Template for composing them in flexible layouts. We (my group) has a Chef orchestrator that installs some things natively and creates containers for other things. If we can agree on a process or meta-template for this then Template designers and PIO itself can install via this framework.




> Storage corrupted by sharing databases between engines with different storage configs
> -------------------------------------------------------------------------------------
>
>                 Key: PIO-96
>                 URL: https://issues.apache.org/jira/browse/PIO-96
>             Project: PredictionIO
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.11.0-incubating
>            Reporter: Mars Hall
>
> When getting started with PredictionIO, it's no problem to spin up an engine and see it work. Problems emerge when a developer tries running multiple engines with different storage configs on the same underlying database, such as:
> * a Classifier with *Postgres* meta, event, & model storage, and
> * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & model storage.
> The database will become corrupt because the meta tables are stored in different databases, but the dynamically created event & model tables may mistakenly share the same name, like {{pio_event_1}}.
> We are directing folks to avoid this problem with the Heroku buildpack by [isolating each engine's database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database] and [optionally running an eventserver per engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver]. It's still a problem with local development, though.
> It would be great if PredictionIO's management of the database schema's would inherently avoid such conflicts, like by using random/UUIDs for dynamically created table names, so that they will never conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)