You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by pferrel <gi...@git.apache.org> on 2016/08/06 21:23:19 UTC

[GitHub] incubator-predictionio pull request #269: Master

GitHub user pferrel opened a pull request:

    https://github.com/apache/incubator-predictionio/pull/269

    Master

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pferrel/PredictionIO master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-predictionio/pull/269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #269
    
----
commit 327a795d05970dba36594656387b20f7986436f3
Author: pferrel <pa...@occamsmachete.com>
Date:   2015-11-30T23:51:27Z

    cleanup/removal of CLI changes

commit 31789a6bd08fef30e394e1f278af1421c2e15a2c
Author: pferrel <pa...@occamsmachete.com>
Date:   2015-12-02T19:13:47Z

    changing min versions of some componenets

commit dbc07fb0cd82cf7da2e0dff213308fa94b37223f
Author: pferrel <pa...@occamsmachete.com>
Date:   2015-12-03T22:18:15Z

    more upgrade work

commit 7f663efea2bc2bff5f92821f7909770bebf5edf6
Author: pferrel <pa...@occamsmachete.com>
Date:   2015-12-03T22:18:20Z

    Merge branch 'v0.9.6' of https://github.com/pferrel/PredictionIO into v0.9.6

commit b7d40d93ea3cf2c8d3fc92c6ab77dc48997b1311
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T11:09:19Z

    wip-datasource-cleanup-old-events. Initial implementation.

commit 01aac3d49f3b77563b529f9e952da825085b7387
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T13:01:30Z

    wip-add-psql-to-installsh. added posibility to install es + pgsql.

commit 0541c1b77710370305b66f6cc0a116725389969c
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T19:54:06Z

    wip-datasource-cleanup-old-events. added usage example.

commit ea6bf2d02d570e9d4a5b64931d6f1b0933a3f522
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T19:58:15Z

    wip-datasource-cleanup-old-events. incorrect import optimization reverted.

commit c81e6c358a148f4f46e44f549a18153cc7559738
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T20:24:43Z

    wip-start-stop-all-pgsql. added check if pgsql is started.

commit e5b1f3232b94b6e89a8e36e85a959cece38a63e4
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T20:27:11Z

    wip-start-stop-all-pgsql. added abort when needed.

commit 2507994750b499167acc79935033634b24a2c504
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-04T20:29:22Z

    wip-start-stop-all-pgsql. fixed abort condition.

commit 1bdefbaca52094257ee6b55a0c8e99ac26c03161
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-04T22:44:51Z

    Merge pull request #3 from mkorolyov/wip-start-stop-all-pgsql
    
    wip-start-stop-all-pgsql. added check if pgsql is started.

commit 32ebf1ff8c7d0f3416448dade2804e2e25430ee4
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-05T04:31:01Z

    wip-start-stop-all-pgsql. fixed typo and hardcoded ES path.

commit 5a6e8247cc35fd1229152614596ebdf6616f85d8
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-05T08:51:52Z

    wip-start-stop-all-pgsql. start/stop postgresql server. requires root passwd for linux.

commit b401c9692b403e0ab8a21a325fbdeffef5f569da
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-05T20:26:06Z

    Merge pull request #2 from mkorolyov/wip-add-psql-to-installsh
    
    wip-add-psql-to-installsh. added posibility to install es + pgsql.

commit 520ce139f61e0466ab0b98932770fe15661930a7
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-06T17:54:28Z

    fix curling HBase due to Apache changing path

commit 0a444b7b11a2ac55b6499514d3e14dc15cf58811
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-06T20:02:13Z

    Merge pull request #4 from mkorolyov/wip-start-stop-all-pgsql
    
    start/stop postgresql server. requires root password for linux.

commit a61add438c0e1f6bc900f218610a7712097acb71
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-06T20:15:14Z

    Merge pull request #5 from EmergentOrder/v0.9.6
    
    fix curling HBase due to Apache changing path

commit bb9aba5a767cf5b0658ca306cdea6dc0e7be3787
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-08T11:44:11Z

    wip-datasource-cleanup-old-events. added compression of PEvents.

commit 422d6939268bfb90d4f46bcb7b5131ae1f1197d5
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-08T12:49:19Z

    wip-datasource-cleanup-old-events. added compression of LEvents.

commit 37d33b83c7e7eb4c4a4c3144c889541a1a978255
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-08T12:59:43Z

    wip-datasource-cleanup-old-events. added duplicates removing.

commit 7b251afa25a9a49ea27eb1a201f9bcb1a0be6ce4
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-11T14:41:17Z

    wip-datasource-cleanup-old-events. use readable duration.

commit 37dfbe05c0c264684e859960c0aaae5f00b6aaab
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-14T17:08:10Z

    Merge branch 'develop' of https://github.com/EmergentOrder/PredictionIO into v0.9.6

commit c5f1ce0beceb85019ae0a050d73f17557642e7b0
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-19T20:45:31Z

    Merge branch 'develop' of https://github.com/EmergentOrder/PredictionIO into v0.9.6

commit 2f7198390365cec2eafaf4dea33473e9076e5d45
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-19T20:46:51Z

    Merge branch 'v0.9.6' of https://github.com/actionml/PredictionIO-Enterprise into v0.9.6

commit 272dcaf3917a96b9736a2ba8c3e1e57712fba353
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-25T21:01:00Z

    wip-datasource-cleanup-old-events. added wipe methods to p/l event stores.

commit 889ac38953281d219fdfec9fffd12ca63be495d2
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-25T21:07:17Z

    Merge remote-tracking branch 'upstream/v0.9.6' into wip-datasource-cleanup-old-events
    
    # Conflicts:
    #	bin/install.sh

commit 8f11a6ab07ce1cc9c669827cb14e6aa05f4b3039
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-25T21:12:40Z

    wip-datasource-cleanup-old-events. cleaned code in HBPEvents.scala

commit cd47754dc9bef0e6d8878efc0503bd9cb9e0f439
Author: EmergentOrder <le...@gmail.com>
Date:   2016-01-27T19:18:51Z

    update install script to reflect that ES 1.7.4 and HBase 1.1.3 are available

commit cdfe475e3b93daf096ad765bba975bf8577519f2
Author: Maxim Korolyov <mk...@gojuno.com>
Date:   2016-01-28T20:30:35Z

    wip-datasource-cleanup-old-events. fixed compilation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793199
  
    --- Diff: data/src/main/scala/io/prediction/data/storage/Models.scala ---
    @@ -78,3 +78,6 @@ class ModelSerializer extends CustomSerializer[Model](
             Nil)
       }
     ))
    +
    +// Use where models are saved outside the usual methods in pio
    +case class NullModel()
    --- End diff --
    
    ready for review
    
    This is to avoid using a java `null` for cases where the model is not stored in hdfs. By using this model any template can choose to store the model in a DB or as the Universal Recommender does, in Elasticsearch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73806373
  
    --- Diff: build.sbt ---
    @@ -18,9 +18,9 @@ import UnidocKeys._
     
     name := "pio"
     
    -version in ThisBuild := "0.9.6"
    +version in ThisBuild := "0.10.0-snapshot"
     
    -organization in ThisBuild := "io.prediction"
    +organization in ThisBuild := "org.apache.predictionio"
     
    --- End diff --
    
    Is this right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by EmergentOrder <gi...@git.apache.org>.
Github user EmergentOrder commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73914604
  
    --- Diff: build.sbt ---
    @@ -193,3 +193,8 @@ concurrentRestrictions in Global := Seq(
       Tags.limitAll( 1 )
     )
     
    +parallelExecution := false
    +
    +parallelExecution in Global := false
    +
    +testOptions in Test += Tests.Argument("-oDF")
    --- End diff --
    
    Prevent errors from multiple Spark Contexts when running tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by EmergentOrder <gi...@git.apache.org>.
Github user EmergentOrder commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r74992720
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/src/main/scala/DataSource.scala ---
    @@ -1,28 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +
     package com.test
     
    -import io.prediction.controller.PDataSource
    -import io.prediction.controller.EmptyEvaluationInfo
    -import io.prediction.controller.EmptyActualResult
    -import io.prediction.controller.Params
    -import io.prediction.data.storage.Event
    -import io.prediction.data.storage.Storage
    +import org.apache.predictionio.core.SelfCleaningDataSource
    +import org.apache.predictionio.core.EventWindow
    +
    +import org.apache.predictionio.controller.PDataSource
    +import org.apache.predictionio.controller.EmptyEvaluationInfo
    +import org.apache.predictionio.controller.EmptyActualResult
    +import org.apache.predictionio.controller.Params
    +import org.apache.predictionio.data.storage.Event
    +import org.apache.predictionio.data.storage.Storage
     
     import org.apache.spark.SparkContext
     import org.apache.spark.SparkContext._
     import org.apache.spark.rdd.RDD
     
     import grizzled.slf4j.Logger
     
    -case class DataSourceParams(appId: Int) extends Params
    +case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params
     
     class DataSource(val dsp: DataSourceParams)
       extends PDataSource[TrainingData,
    -      EmptyEvaluationInfo, Query, EmptyActualResult] {
    +      EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource {
     
    -  @transient lazy val logger = Logger[this.type]
    +  @transient override lazy val logger = Logger[this.type]
    +
    +  override def appName = dsp.appName
    +  override def eventWindow = dsp.eventWindow
     
       override
       def readTraining(sc: SparkContext): TrainingData = {
    +    val events = cleanPersistedPEvents(sc) 
    +
    --- End diff --
    
    the val events isn't used here. I removed the assignment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by EmergentOrder <gi...@git.apache.org>.
Github user EmergentOrder commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r74988022
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/src/main/scala/DataSource.scala ---
    @@ -1,28 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +
     package com.test
     
    -import io.prediction.controller.PDataSource
    -import io.prediction.controller.EmptyEvaluationInfo
    -import io.prediction.controller.EmptyActualResult
    -import io.prediction.controller.Params
    -import io.prediction.data.storage.Event
    -import io.prediction.data.storage.Storage
    +import org.apache.predictionio.core.SelfCleaningDataSource
    +import org.apache.predictionio.core.EventWindow
    +
    +import org.apache.predictionio.controller.PDataSource
    +import org.apache.predictionio.controller.EmptyEvaluationInfo
    +import org.apache.predictionio.controller.EmptyActualResult
    +import org.apache.predictionio.controller.Params
    +import org.apache.predictionio.data.storage.Event
    +import org.apache.predictionio.data.storage.Storage
     
     import org.apache.spark.SparkContext
     import org.apache.spark.SparkContext._
     import org.apache.spark.rdd.RDD
     
     import grizzled.slf4j.Logger
     
    -case class DataSourceParams(appId: Int) extends Params
    +case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params
     
     class DataSource(val dsp: DataSourceParams)
       extends PDataSource[TrainingData,
    -      EmptyEvaluationInfo, Query, EmptyActualResult] {
    +      EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource {
     
    -  @transient lazy val logger = Logger[this.type]
    +  @transient override lazy val logger = Logger[this.type]
    +
    +  override def appName = dsp.appName
    +  override def eventWindow = dsp.eventWindow
     
       override
       def readTraining(sc: SparkContext): TrainingData = {
    +    val events = cleanPersistedPEvents(sc) 
    +
    --- End diff --
    
    No, it returns Unit now. Will check this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #269: ActionML Master merge PR, discussion only

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/269
  
    I think this is ready for review. 
    
    The main things here are changes to support trimming data in the PEvents and LEvents somewhat like the formerly experimental Cleanup App did. Also compacts $set/$unsets and dedups, all optional to support templates that do not want certain of these features. Also the use of this feature is a trait that must be added to a template to be used at all. So should have no effect on existing templates unless they are altered. Then afaik it should work for all of them.
    
    Some minor libs were upgraded.
    
    Some docs we modified to add information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793249
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/build.sbt ---
    @@ -7,6 +7,6 @@ name := "template-scala-parallel-similarproduct"
     organization := "io.prediction"
     
     libraryDependencies ++= Seq(
    -  "io.prediction"    %% "core"          % "0.8.6" % "provided",
    +  "io.prediction"    %% "core"          % "0.9.6" % "provided",
       "org.apache.spark" %% "spark-core"    % "1.2.0" % "provided",
    -  "org.apache.spark" %% "spark-mllib"   % "1.2.0" % "provided")
    \ No newline at end of file
    +  "org.apache.spark" %% "spark-mllib"   % "1.2.0" % "provided")
    --- End diff --
    
    ready for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793112
  
    --- Diff: build.sbt ---
    @@ -18,9 +18,9 @@ import UnidocKeys._
     
     name := "pio"
     
    -version in ThisBuild := "0.9.6"
    +version in ThisBuild := "0.10.0-snapshot"
     
    -organization in ThisBuild := "io.prediction"
    +organization in ThisBuild := "org.apache"
     
    --- End diff --
    
    is this correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793137
  
    --- Diff: core/build.sbt ---
    @@ -27,13 +27,14 @@ libraryDependencies ++= Seq(
       "io.spray"               %% "spray-routing"    % "1.3.3",
       "net.jodah"               % "typetools"        % "0.3.1",
       "org.apache.spark"       %% "spark-core"       % sparkVersion.value % "provided",
    +  "org.apache.spark"       %% "spark-sql"        % sparkVersion.value % "provided",
       "org.clapper"            %% "grizzled-slf4j"   % "1.0.2",
       "org.elasticsearch"       % "elasticsearch"    % elasticsearchVersion.value,
       "org.json4s"             %% "json4s-native"    % json4sVersion.value,
       "org.json4s"             %% "json4s-ext"       % json4sVersion.value,
    -  "org.scalaj"             %% "scalaj-http"      % "1.1.0",
    -  "org.scalatest"          %% "scalatest"        % "2.1.6" % "test",
    -  "org.slf4j"               % "slf4j-log4j12"    % "1.7.13",
    +  "org.scalaj"             %% "scalaj-http"      % "1.1.6",
    +  "org.scalatest"          %% "scalatest"        % "2.1.7" % "test",
    +  "org.slf4j"               % "slf4j-log4j12"    % "1.7.18",
       "org.specs2"             %% "specs2"           % "2.3.13" % "test")
     
     //testOptions := Seq(Tests.Filter(s => Seq("Dev").exists(s.contains(_))))
    --- End diff --
    
    ready for review
    
    upgraded versions of libs and added spark-sql.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793029
  
    --- Diff: RELEASE.md ---
    @@ -65,7 +106,9 @@ March 4th, 2015 | [Release Notes](https://predictionio.atlassian.net/jira/secure
     
     ###v0.8.6
     
    -Feb 10th, 2015 | [Release Notes](https://predictionio.atlassian.net/jira/secure/ReleaseNote.jspa?projectId=10000&version=13300)
    +Feb 10th, 2015
    +
    +[Release Notes](https://predictionio.atlassian.net/jira/secure/ReleaseNote.jspa?projectId=10000&version=13300)
     
     - New engine template - [Product Ranking](/templates/productranking/quickstart/) for personalized product listing
     - [CloudFormation deployment](/system/deploy-cloudformation/) available
    --- End diff --
    
    Slight reformat and added AML history that is the subject of this PR.
    
    Ready for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel closed the pull request at:

    https://github.com/apache/incubator-predictionio/pull/269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793218
  
    --- Diff: data/src/main/scala/io/prediction/data/storage/hbase/HBPEvents.scala ---
    @@ -109,4 +109,25 @@ class HBPEvents(client: HBClient, config: StorageClientConfig, namespace: String
     
       }
     
    +  def delete(
    +    eventIds: RDD[String], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit = {
    +
    +    checkTableExists(appId, channelId)
    +
    +    val tableName = HBEventsUtil.tableName(namespace, appId, channelId) 
    + 
    +    eventIds.foreachPartition{ iter =>
    +      val conf = HBaseConfiguration.create()
    +      conf.set(TableOutputFormat.OUTPUT_TABLE,
    +        tableName)
    +
    +      val table = new HTable(conf, tableName)
    +      iter.foreach { id =>
    +        val rowKey = HBEventsUtil.RowKey(id)
    +        val delete = new Delete(rowKey.b)
    +        table.delete(delete)
    +      }
    +      table.close 
    +    } 
    +  }
     }
    --- End diff --
    
    ready for review
    
    supports SelfCleaningDatasource.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793210
  
    --- Diff: data/src/main/scala/io/prediction/data/storage/PEvents.scala ---
    @@ -179,4 +179,8 @@ trait PEvents extends Serializable {
         */
       @DeveloperApi
       def write(events: RDD[Event], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit
    +
    +  @DeveloperApi 
    +  def delete(eventIds: RDD[String], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit
    +
     }
    --- End diff --
    
    ready for review
    
    Fixes a problem in the example Cleanup App, where it has to do `rdd.collect` to actually delete events. Collect does not scale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793051
  
    --- Diff: bin/pio-start-all ---
    @@ -57,6 +57,26 @@ else
       exit 1
     fi
     
    +#PGSQL
    +pgsqlStatus="$(ps auxwww | grep postgres | wc -l)"
    +if [[ "$pgsqlStatus" < 5 ]]; then
    +  # Detect OS
    +  OS=`uname`
    +  if [[ "$OS" = "Darwin" ]]; then
    +    pg_cmd=`which pg_ctl`
    +    if [[ "$pg_cmd" != "" ]]; then
    +      pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
    +    fi
    +  elif [[ "$OS" = "Linux" ]]; then
    +    sudo service postgresql start
    +  else
    +    echo -e "\033[1;31mYour OS $OS is not yet supported for automatic postgresql startup:(\033[0m"
    +    echo -e "\033[1;31mPlease do a manual startup!\033[0m"
    +    ${PIO_HOME}/bin/pio-stop-all
    +    exit 1
    +  fi
    +fi
    +
     # PredictionIO Event Server
     echo "Waiting 10 seconds for HBase to fully initialize..."
     sleep 10
    --- End diff --
    
    ready for review
    
    This allows the command to start DBs and Elasticsearch, if it was installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793046
  
    --- Diff: bin/install.sh ---
    @@ -389,16 +409,10 @@ case $source_setup in
         ${SED_CMD} "s|# PIO_STORAGE_SOURCES_LOCALFS|PIO_STORAGE_SOURCES_LOCALFS|" ${pio_dir}/conf/pio-env.sh
         ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE|" ${pio_dir}/conf/pio-env.sh
         ${SED_CMD} "s|# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=.*|PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$elasticsearch_dir|" ${pio_dir}/conf/pio-env.sh
    -
         echo -e "\033[1;32mElasticsearch setup done!\033[0m"
     
         # HBase
         echo -e "\033[1;36mStarting HBase setup in:\033[0m $hbase_dir"
    -    if [[ -e hbase-${HBASE_VERSION}-bin.tar.gz ]]; then
    -      if confirm "Delete existing hbase-$HBASE_VERSION-bin.tar.gz?"; then
    -        rm hbase-${HBASE_VERSION}-bin.tar.gz
    -      fi
    -    fi
         if [[ ! -e hbase-${HBASE_VERSION}-bin.tar.gz ]]; then
           echo "Downloading HBase..."
           curl -O http://archive.apache.org/dist/hbase/${HBASE_VERSION}/hbase-${HBASE_VERSION}-bin.tar.gz
    --- End diff --
    
    ready for review.
    
    This adds an options that install DBs *and * Elasticsearch and fixes a couple bugs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793183
  
    --- Diff: data/build.sbt ---
    @@ -44,9 +44,9 @@ libraryDependencies ++= Seq(
       "org.json4s"             %% "json4s-native"  % json4sVersion.value,
       "org.json4s"             %% "json4s-ext"     % json4sVersion.value,
       "org.postgresql"          % "postgresql"     % "9.4-1204-jdbc41",
    -  "org.scalatest"          %% "scalatest"      % "2.1.6" % "test",
    -  "org.scalikejdbc"        %% "scalikejdbc"    % "2.3.2",
    -  "org.slf4j"               % "slf4j-log4j12"  % "1.7.13",
    +  "org.scalatest"          %% "scalatest"      % "2.1.7" % "test",
    +  "org.scalikejdbc"        %% "scalikejdbc"    % "2.3.5",
    +  "org.slf4j"               % "slf4j-log4j12"  % "1.7.18",
       "org.spark-project.akka" %% "akka-actor"     % "2.3.4-spark",
       "org.specs2"             %% "specs2"         % "2.3.13" % "test")
     
    --- End diff --
    
    ready for review
    
    upgraded libs to match other upgraded modules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793267
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/src/main/scala/ALSAlgorithm.scala ---
    @@ -45,7 +45,7 @@ class ALSAlgorithm(val ap: ALSAlgorithmParams)
     
       @transient lazy val logger = Logger[this.type]
     
    -  def train(data: PreparedData): ALSModel = {
    +  def train(sc: SparkContext, data: PreparedData): ALSModel = {
         require(!data.viewEvents.take(1).isEmpty,
           s"viewEvents in PreparedData cannot be empty." +
           " Please check if DataSource generates TrainingData" +
    --- End diff --
    
    ready for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73982644
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/src/main/scala/DataSource.scala ---
    @@ -1,28 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +
     package com.test
     
    -import io.prediction.controller.PDataSource
    -import io.prediction.controller.EmptyEvaluationInfo
    -import io.prediction.controller.EmptyActualResult
    -import io.prediction.controller.Params
    -import io.prediction.data.storage.Event
    -import io.prediction.data.storage.Storage
    +import org.apache.predictionio.core.SelfCleaningDataSource
    +import org.apache.predictionio.core.EventWindow
    +
    +import org.apache.predictionio.controller.PDataSource
    +import org.apache.predictionio.controller.EmptyEvaluationInfo
    +import org.apache.predictionio.controller.EmptyActualResult
    +import org.apache.predictionio.controller.Params
    +import org.apache.predictionio.data.storage.Event
    +import org.apache.predictionio.data.storage.Storage
     
     import org.apache.spark.SparkContext
     import org.apache.spark.SparkContext._
     import org.apache.spark.rdd.RDD
     
     import grizzled.slf4j.Logger
     
    -case class DataSourceParams(appId: Int) extends Params
    +case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params
     
     class DataSource(val dsp: DataSourceParams)
       extends PDataSource[TrainingData,
    -      EmptyEvaluationInfo, Query, EmptyActualResult] {
    +      EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource {
     
    -  @transient lazy val logger = Logger[this.type]
    +  @transient override lazy val logger = Logger[this.type]
    +
    +  override def appName = dsp.appName
    +  override def eventWindow = dsp.eventWindow
     
       override
       def readTraining(sc: SparkContext): TrainingData = {
    +    val events = cleanPersistedPEvents(sc) 
    +
    --- End diff --
    
    @EmergentOrder does this still return events? Is this not built by default? Can you double check this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793019
  
    --- Diff: README.md ---
    @@ -76,21 +71,17 @@ Status](https://gemnasium.com/PredictionIO/PredictionIO.svg)](https://gemnasium.
     
     Keep track of development and community news.
     
    +*   Subscribe to the user mailing list <ma...@predictionio.incubator.apache.org>
    +    and the dev mailing list <ma...@predictionio.incubator.apache.org>
     *   Follow [@predictionio](https://twitter.com/predictionio) on Twitter.
    -*   Read and subscribe to [the
    -    Newsletter](https://prediction.io/#newsletter).
    -*   Join the [Community
    -    Forum](https://groups.google.com/forum/#!forum/predictionio-user).
     
     
     ## Contributing
     
    -Please read and sign the [Contributor Agreement](http://prediction.io/cla). If
    -you have any questions, you can post on the [Contributor
    -Forum](https://groups.google.com/forum/#!forum/predictionio-dev).
    +Read the [Contribute Code](http://predictionio.incubator.apache.org/community/contribute-code/) page.
     
     You can also list your projects on the [Community Project
    -page](http://docs.prediction.io/community/projects/).
    +page](http://predictionio.incubator.apache.org//community/projects/).
     
     
     ## License
    --- End diff --
    
    ready for review. Basically took the README.md from another PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793297
  
    --- Diff: tools/build.sbt ---
    @@ -23,19 +21,22 @@ libraryDependencies ++= Seq(
       "io.spray"               %% "spray-can"      % "1.3.3",
       "io.spray"               %% "spray-routing"  % "1.3.3",
       "me.lessis"              % "semverfi_2.10"  % "0.1.3",
    -  "org.apache.hadoop"       % "hadoop-common"  % "2.7.1",
    -  "org.apache.hadoop"       % "hadoop-hdfs"    % "2.7.1",
    +  "org.apache.hadoop"       % "hadoop-common"  % "2.6.2",
    +  "org.apache.hadoop"       % "hadoop-hdfs"    % "2.6.2",
       "org.apache.spark"       %% "spark-core"     % sparkVersion.value % "provided",
    --- End diff --
    
    ready for review
    
    We don't need to require hdfs 2.7 as far as I know


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #269: ActionML Master merge PR, discussion only

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/269
  
    Do you mind if I ask the purpose of this PR? I am really interested in looking through some codes related with PredictionIO but I am not aware of ActionML. I would appreciate if the PR description is filled up and maybe a JIRA if this one is a patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #269: ActionML Master merge PR, discussion only

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/269
  
    Notice the PR is from AML Master to Apache develop. When I fix the license and package names they show up as a diff even though the licenses are changed in the develop branch. This and the fact that none of our wrong package names are showing as diffs makes me thing this PR is really showing diffs with the master branch. The only way to sort this is to let git do the merge to apache develop and fix any conflicts then. Since this will be a lot of work, can someone review please, or I'll just start the merge on my local copy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #269: Master

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/269
  
    Just to track progress while merging the ActionML fork.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793121
  
    --- Diff: build.sbt ---
    @@ -193,3 +193,8 @@ concurrentRestrictions in Global := Seq(
       Tags.limitAll( 1 )
     )
     
    +parallelExecution := false
    +
    +parallelExecution in Global := false
    +
    +testOptions in Test += Tests.Argument("-oDF")
    --- End diff --
    
    Not sure why this was done. @EmergentOrder ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793228
  
    --- Diff: data/src/main/scala/io/prediction/data/storage/jdbc/JDBCPEvents.scala ---
    @@ -155,6 +156,22 @@ class JDBCPEvents(client: String, config: StorageClientConfig, namespace: String
         val prop = new java.util.Properties
         prop.setProperty("user", config.properties("USERNAME"))
         prop.setProperty("password", config.properties("PASSWORD"))
    -    eventDF.write.mode(SaveMode.Append).jdbc(client, tableName, prop)
    +    eventDF.write.mode(SaveMode.Overwrite).jdbc(client, tableName, prop)
    +  }
    +  
    +  def delete(eventIds: RDD[String], appId: Int, channelId: Option[Int])(sc: SparkContext): Unit = {
    +
    +    eventIds.foreachPartition{ iter =>
    +
    +      iter.foreach { eventId =>
    +        DB localTx { implicit session =>
    +        val tableName = JDBCUtils.eventTableName(namespace, appId, channelId)
    +        sql"""
    +        delete from $tableName where id = $eventId
    +        """.update().apply()
    +        true
    +        }
    +      }
    +    }
       }
     }
    --- End diff --
    
    ready for review
    
    Supports SelfCleaningDatasource


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73806546
  
    --- Diff: core/src/main/scala/io/prediction/core/SelfCleaningDataSource.scala ---
    @@ -0,0 +1,313 @@
    +package org.apache.predictionio.core
    +
    +import grizzled.slf4j.Logger
    +import org.apache.predictionio.annotation.DeveloperApi
    +import org.apache.predictionio.data.storage.{DataMap, Event,Storage}
    +import org.apache.predictionio.data.store.{Common, LEventStore, PEventStore}
    +import org.apache.spark.SparkContext
    --- End diff --
    
    changed package names, this is a new file so diff didn't catch these


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793246
  
    --- Diff: docs/manual/data/versions.yml ---
    @@ -1,6 +1,6 @@
     pio: 0.9.6
     spark: 1.5.2
     spark_download_filename: spark-1.5.1-bin-hadoop2.6
    -elasticsearch_download_filename: elasticsearch-1.4.4
    -hbase_basename: hbase-1.0.0
    +elasticsearch_download_filename: elasticsearch-1.5.2
    +hbase_basename: hbase-1.1.2
     hbase_variant: bin
    --- End diff --
    
    ready for review
    
    base versions updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793235
  
    --- Diff: data/src/main/scala/io/prediction/data/store/PEventStore.scala ---
    @@ -24,6 +24,8 @@ import org.joda.time.DateTime
     import org.apache.spark.SparkContext
     import org.apache.spark.rdd.RDD
     
    +import scala.concurrent.ExecutionContext
    +
     /** This object provides a set of operation to access Event Store
       * with Spark's parallelization
       */
    --- End diff --
    
    ready for review
    
    Supports SelfCleaningDatasource


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793158
  
    --- Diff: core/src/main/scala/io/prediction/core/SelfCleaningDataSource.scala ---
    @@ -0,0 +1,313 @@
    +package io.prediction.core
    +
    +import grizzled.slf4j.Logger
    +import io.prediction.annotation.DeveloperApi
    +import io.prediction.data.storage.{DataMap, Event,Storage}
    +import io.prediction.data.store.{Common, LEventStore, PEventStore}
    +import org.apache.spark.SparkContext
    +import org.apache.spark.rdd.RDD
    +import org.joda.time.DateTime
    +
    +import scala.concurrent.ExecutionContext.Implicits.global
    +import scala.concurrent.{Await, Future}
    +import scala.concurrent.duration.Duration
    +
    +/** :: DeveloperApi ::
    +  * Base class of cleaned data source.
    +  *
    +  * A cleaned data source consists tools for cleaning events that happened earlier that
    +  * specified duration in seconds from train moment. Also it can remove duplicates and compress
    +  * properties(flat set/unset events to one)
    +  *
    +  */
    +@DeveloperApi
    +trait SelfCleaningDataSource {
    +
    +  implicit object DateTimeOrdering extends Ordering[DateTime] {
    +  def compare(d1: DateTime, d2: DateTime) = d2.compareTo(d1)
    +  }
    +
    +
    +  @transient lazy private val pEventsDb = Storage.getPEvents()
    +  @transient lazy private val lEventsDb = Storage.getLEvents()
    +
    +  /** :: DeveloperApi ::
    +    * Current App name which events will be cleaned.
    +    *
    +    * @return App name
    +    */
    +  @DeveloperApi
    +  def appName: String
    +
    +  /** :: DeveloperApi ::
    +    * Param list that used for cleanup.
    +    *
    +    * @return current event windows that will be used to clean up events.
    +    */
    +  @DeveloperApi
    +  def eventWindow: Option[EventWindow] = None
    +
    +  @transient lazy val logger = Logger[this.type]
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Returns RDD of events happend after duration in event window params.
    +    *
    +    * @return RDD[Event] most recent PEvents.
    +    */
    +  @DeveloperApi
    +  def getCleanedPEvents(pEvents: RDD[Event]): RDD[Event] = { 
    +
    +    eventWindow
    +      .flatMap(_.duration)
    +      .map { duration =>
    +        val fd = Duration(duration)
    +        pEvents.filter(e =>
    +          e.eventTime.isAfter(DateTime.now().minus(fd.toMillis))
    +        )
    +      }.getOrElse(pEvents)
    +  }
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Returns Iterator of events happend after duration in event window params.
    +    *
    +    * @return Iterator[Event] most recent LEvents.
    +    */
    +  @DeveloperApi
    +  def getCleanedLEvents(lEvents: Iterable[Event]): Iterable[Event] = { 
    +
    +    eventWindow
    +      .flatMap(_.duration)
    +      .map { duration =>
    +        val fd = Duration(duration)
    +        lEvents.filter(e =>
    +          e.eventTime.isAfter(DateTime.now().minus(fd.toMillis))
    +        )
    +      }.getOrElse(lEvents).toIterable
    +  }
    +
    +  def compressPProperties(sc: SparkContext, rdd: RDD[Event]): RDD[Event] = {
    +    rdd.filter(isSetEvent)
    +      .groupBy(_.entityType)
    +      .flatMap { pair =>
    +        val (_, ls) = pair
    +        ls.groupBy(_.entityId).map { anotherpair =>
    +          val (_, anotherls) = anotherpair 
    +          compress(anotherls)
    +        }
    +      } ++ rdd.filter(!isSetEvent(_))
    +  }
    +
    +  def compressLProperties(events: Iterable[Event]): Iterable[Event] = {
    +    events.filter(isSetEvent).toIterable
    +      .groupBy(_.entityType)
    +      .map { pair =>
    +        val (_, ls) = pair
    +        compress(ls)
    +      } ++ events.filter(!isSetEvent(_))
    +  }
    +
    +  def removePDuplicates(sc: SparkContext, rdd: RDD[Event]): RDD[Event] = { 
    +    val now = DateTime.now()
    +    rdd.map(x => 
    +      (recreateEvent(x, None, now), (x.eventId, x.eventTime)))
    +      .groupByKey
    +      .map{case (x, y) => recreateEvent(x, y.head._1, y.head._2)}
    +
    +  }
    +
    +  def recreateEvent(x: Event, eventId: Option[String], creationTime: DateTime): Event = {
    +    Event(eventId = eventId, event = x.event, entityType = x.entityType, 
    +          entityId = x.entityId, targetEntityType = x.targetEntityType, 
    +          targetEntityId = x.targetEntityId, properties = x.properties, 
    +          eventTime = creationTime, tags = x.tags, prId= x.prId, 
    +          creationTime = creationTime)  
    +  }
    +
    +  def removeLDuplicates(ls: Iterable[Event]): Iterable[Event] = {
    +    val now = DateTime.now()
    +    ls.toList.map(x => 
    +      (recreateEvent(x, None, now), (x.eventId, x.eventTime)))
    +      .groupBy(_._1).mapValues( _.map( _._2 ) ) 
    +      .map(x => recreateEvent(x._1, x._2.head._1, x._2.head._2))
    +
    +  }
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Filters most recent, compress properties and removes duplicates of PEvents
    +    *
    +    * @return RDD[Event] most recent PEvents
    +    */
    +  @DeveloperApi
    +  def cleanPersistedPEvents(sc: SparkContext): Unit ={
    +     eventWindow match {
    +      case Some(ew) => 
    +        val result = cleanPEvents(sc)
    +        val originalEvents = PEventStore.find(appName)(sc)
    +        val newEvents = result subtract originalEvents
    +        val eventsToRemove = (originalEvents subtract result).map { case e =>
    +          e.eventId.getOrElse("")
    +        }
    +
    +        wipePEvents(newEvents, eventsToRemove, sc)
    +       case None =>
    +    }
    +  }
    +  
    +   /**
    +    * Replace events in Event Store
    +    *
    +    */
    +
    +  def wipePEvents(
    +    newEvents: RDD[Event],
    +    eventsToRemove: RDD[String],
    +    sc: SparkContext
    +  ): Unit = {
    +    val (appId, channelId) = Common.appNameToId(appName, None)
    +  
    +    pEventsDb.write(newEvents, appId)(sc)
    +  
    +    removePEvents(eventsToRemove, appId, sc)
    +  }
    +
    +  def removeEvents(eventsToRemove: Set[String], appId: Int) {
    +    val listOfFuture: List[Future[Boolean]] = eventsToRemove.filter(x =>  x != "").toList.map { case eventId =>
    +        lEventsDb.futureDelete(eventId, appId)
    +    }
    +
    +    val futureOfList: Future[List[Boolean]] = Future.sequence(listOfFuture)
    +    Await.result(futureOfList, scala.concurrent.duration.Duration(60, "minutes"))
    +  }
    +
    +  def removePEvents(eventsToRemove: RDD[String], appId: Int, sc: SparkContext) { 
    +    pEventsDb.delete(eventsToRemove.filter(x =>  x != ""), appId, None)(sc)
    +  }
    +
    +
    +   /**
    +    * Replace events in Event Store
    +    *
    +    * @param events new events
    +    * @param appId delete all events of appId
    +    * @param channelId delete all events of channelId
    +    */
    +  def wipe(
    +    newEvents: Set[Event],
    +    eventsToRemove: Set[String]
    +  ): Unit = { 
    +    val (appId, channelId) = Common.appNameToId(appName, None)
    +   
    +    val listOfFutureNewEvents: List[Future[String]] = newEvents.toList.map { case event =>
    +        lEventsDb.futureInsert(recreateEvent(event, None, event.eventTime), appId)
    +    } 
    +
    +    val futureOfListNewEvents: Future[List[String]] = Future.sequence(listOfFutureNewEvents)
    +    Await.result(futureOfListNewEvents, scala.concurrent.duration.Duration(60, "minutes")) 
    +
    +    removeEvents(eventsToRemove, appId)
    +  }
    +
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Filters most recent, compress properties of PEvents
    +    */
    +  @DeveloperApi
    +  def cleanPEvents(sc: SparkContext): RDD[Event] = {
    +   val pEvents = PEventStore.find(appName)(sc).sortBy(_.eventTime)
    +
    +   val rdd = eventWindow match {
    +      case Some(ew) =>
    +        var updated =
    +          if (ew.compressProperties) compressPProperties(sc, pEvents) else pEvents
    +        
    +        val deduped = if (ew.removeDuplicates) removePDuplicates(sc,updated) else updated
    +        deduped
    +      case None =>
    +        pEvents
    +    }
    +  getCleanedPEvents(rdd)
    +  }
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Filters most recent, compress properties and removes duplicates of LEvents
    +    *
    +    * @return Iterator[Event] most recent LEvents
    +    */
    +  @DeveloperApi
    +  def cleanPersistedLEvents: Unit = {
    +    eventWindow match {
    +      case Some(ew) =>
    +
    +        val result = cleanLEvents().toSet
    +        val originalEvents = LEventStore.find(appName).toSet
    +        val newEvents = result -- originalEvents
    +        val eventsToRemove = (originalEvents -- result).map { case e =>
    +          e.eventId.getOrElse("")
    +        }
    +
    +        wipe(newEvents, eventsToRemove)
    +  
    +       case None =>
    +    }
    +  }
    +
    +  /** :: DeveloperApi ::
    +    *
    +    * Filters most recent, compress properties of LEvents
    +    */
    +  @DeveloperApi
    +  def cleanLEvents(): Iterable[Event] = {
    +    val lEvents = LEventStore.find(appName).toList.sortBy(_.eventTime)
    + 
    +    val events = eventWindow match {
    +      case Some(ew) =>
    +        var updated =
    +          if (ew.compressProperties) compressLProperties(lEvents) else lEvents
    +          val deduped = if (ew.removeDuplicates) removeLDuplicates(updated) else updated
    +        deduped
    +      case None =>
    +        lEvents
    +    }
    +    getCleanedLEvents(events)
    +  }
    +
    +
    +  private def isSetEvent(e: Event): Boolean = {
    +    e.event == "$set" || e.event == "$unset"
    +  }
    +
    +  private def compress(events: Iterable[Event]): Event = {
    +    events.find(_.event == "$set") match {
    +
    +      case Some(first) =>
    +        events.reduce { (e1, e2) =>
    +          val props = e2.event match {
    +            case "$set" =>
    +              e1.properties.fields ++ e2.properties.fields
    +            case "$unset" =>
    +              e1.properties.fields
    +                .filterKeys(f => !e2.properties.fields.contains(f))
    +          }
    +          e1.copy(properties = DataMap(props))
    +        }
    +
    +      case None =>
    +        events.reduce { (e1, e2) =>
    +          e1.copy(properties =
    +            DataMap(e1.properties.fields ++ e2.properties.fields)
    +          )
    +        }
    +    }
    +  }
    +}
    +
    +case class EventWindow(
    +  duration: Option[String] = None,
    +  removeDuplicates: Boolean = false,
    +  compressProperties: Boolean = false
    +)
    --- End diff --
    
    Add the notion of an EventWindow to the Datasource. This provides for deduping events, compressing $set/$unset events, and trimming older events at the desired watermark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793260
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/engine.json ---
    @@ -5,6 +5,10 @@
       "datasource": {
         "params" : {
           "appId": 1
    +      "eventWindow": {
    +        "duration": "5 minutes",
    +        "removeDuplicates":true,
    +        "compressProperties":true
         }
       },
       "algorithms": [
    --- End diff --
    
    ready for review
    
    adds the eventWindow for SelfCleaningDatasource tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793283
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/src/main/scala/DataSource.scala ---
    @@ -13,16 +16,21 @@ import org.apache.spark.rdd.RDD
     
     import grizzled.slf4j.Logger
     
    -case class DataSourceParams(appId: Int) extends Params
    +case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params
     
     class DataSource(val dsp: DataSourceParams)
       extends PDataSource[TrainingData,
    -      EmptyEvaluationInfo, Query, EmptyActualResult] {
    +      EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource {
    +
    +  @transient override lazy val logger = Logger[this.type]
     
    -  @transient lazy val logger = Logger[this.type]
    +  override def appName = dsp.appName
    +  override def eventWindow = dsp.eventWindow
     
       override
       def readTraining(sc: SparkContext): TrainingData = {
    +    val events = cleanPersistedPEvents(sc) 
    +
         val eventsDb = Storage.getPEvents()
     
         // create a RDD of (entityID, User)
    --- End diff --
    
    ready for review
    
    adds vals needed by the SelfCleaningDatasource trait as an example and for tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #269: ActionML Master merge PR, discussion only

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/269
  
    inactive PR, eveything is merged into develop branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73981349
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/build.sbt ---
    @@ -7,6 +7,6 @@ name := "template-scala-parallel-similarproduct"
     organization := "io.prediction"
     
     libraryDependencies ++= Seq(
    -  "io.prediction"    %% "core"          % "0.8.6" % "provided",
    +  "io.prediction"    %% "core"          % "0.9.6" % "provided",
    --- End diff --
    
    @EmergentOrder isn't this wrong? This was the old aml 0.9.6. Don't we need 0.10.0-snapshot until release?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793171
  
    --- Diff: core/src/test/scala/io/prediction/core/SelfCleaningDataSourceTest.scala ---
    @@ -0,0 +1,78 @@
    +package io.prediction.core.test
    +
    +import io.prediction.core.SelfCleaningDataSource
    +import io.prediction.core.EventWindow
    +import io.prediction.workflow.SharedSparkContext
    +
    +import io.prediction.controller.PDataSource
    +import io.prediction.controller.EmptyEvaluationInfo
    +import io.prediction.controller.EmptyActualResult
    +import io.prediction.controller.Params
    +import io.prediction.data.storage.Event
    +import io.prediction.data.storage.Storage
    +import io.prediction.data.store._
    +
    +import org.apache.spark.SparkContext
    +import org.apache.spark.SparkContext._
    +
    +import org.apache.spark.rdd.RDD
    +import org.scalatest.Inspectors._
    +import org.scalatest.Matchers._
    +import org.scalatest.FunSuite
    +import org.scalatest.Inside
    +
    +case class DataSourceParams(appName: String, eventWindow: Option[EventWindow], appId: Int) extends Params
    +
    +class SelfCleaningPDataSource(anAppName: String) extends PDataSource[TrainingData,EmptyEvaluationInfo, Query, EmptyActualResult] with SelfCleaningDataSource {
    +
    +  val (appId, channelId) = io.prediction.data.store.Common.appNameToId(anAppName, None)
    +
    +
    +  val dsp = DataSourceParams(anAppName, Some(EventWindow(Some("1825 days"), true, true)), appId = appId)
    +
    +  override def appName = dsp.appName
    +  override def eventWindow = dsp.eventWindow
    +
    +  override def readTraining(sc: SparkContext): TrainingData = new TrainingData()
    +
    +  def events = Storage.getPEvents().find(appId = dsp.appId)_
    +
    +  def itemEvents = Storage.getPEvents().find(appId = dsp.appId, entityType = Some("item"), eventNames = Some(Seq("$set")))_  
    + 
    +  def eventsAgg = Storage.getPEvents().aggregateProperties(appId = dsp.appId, entityType = "item")_
    +
    +}
    +
    +class SelfCleaningDataSourceTest extends FunSuite with Inside with SharedSparkContext {
    +
    +  //To run manually, requires app "cleanedTest" and test.json data imported to it
    +  ignore("Test event cleanup") {
    +    val source = new SelfCleaningPDataSource("cleanedTest")
    +    val eventsBeforeCount = source.events(sc).count
    +    val itemEventsBeforeCount = source.itemEvents(sc).count
    +
    +    source.cleanPersistedPEvents(sc)
    +
    +    val eventsAfterCount = source.events(sc).count
    +    val eventsAfter = source.events(sc)
    +    val itemEventsAfterCount = source.itemEvents(sc).count   
    +    val distinctEventsAfterCount = eventsAfter.map(x => 
    +      CleanedDataSourceTest.stripIdAndCreationTimeFromEvents(x)).distinct.count
    +   
    +    distinctEventsAfterCount should equal (eventsAfterCount)
    +    eventsBeforeCount should be > (eventsAfterCount) 
    +    itemEventsBeforeCount should be > (itemEventsAfterCount)
    +  }
    +}
    +
    +object CleanedDataSourceTest{
    +  def stripIdAndCreationTimeFromEvents(x: Event): Event = {
    +   Event(event = x.event, entityType = x.entityType, entityId = x.entityId, targetEntityType = x.targetEntityType, targetEntityId = x.targetEntityId, properties = x.properties, eventTime = x.eventTime, tags = x.tags, prId= x.prId, creationTime = x.eventTime)
    +  }
    +}
    +
    +
    +
    +case class Query() extends Serializable
    +
    +class TrainingData() extends Serializable
    --- End diff --
    
    ready for review
    
    Tests for SelfCleaningDatasource


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by pferrel <gi...@git.apache.org>.
Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r73793055
  
    --- Diff: bin/pio-stop-all ---
    @@ -43,3 +43,18 @@ if [ -e ${PIDFILE} ]; then
       cat ${PIDFILE} | xargs kill
       rm ${PIDFILE}
     fi
    +
    +#PGSQL
    +OS=`uname`
    +if [[ "$OS" = "Darwin" ]]; then
    +  pg_cmd=`which pg_ctl`
    +  if [[ "$pg_cmd" != "" ]]; then
    +    pg_ctl -D /usr/local/var/postgres stop -s -m fast
    +  fi
    +elif [[ "$OS" = "Linux" ]]; then
    +  sudo service postgresql stop
    +else
    +  echo -e "\033[1;31mYour OS $OS is not yet supported for automatic postgresql startup:(\033[0m"
    +  echo -e "\033[1;31mPlease do a manual shutdown!\033[0m"
    +  exit 1
    +fi
    --- End diff --
    
    ready for review
    
    This allows the command to stop DBs and Elasticsearch, if ES was installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #269: ActionML Master merge PR, discussi...

Posted by EmergentOrder <gi...@git.apache.org>.
Github user EmergentOrder commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/269#discussion_r74987892
  
    --- Diff: examples/scala-parallel-similarproduct/filterbyyear/build.sbt ---
    @@ -7,6 +7,6 @@ name := "template-scala-parallel-similarproduct"
     organization := "io.prediction"
     
     libraryDependencies ++= Seq(
    -  "io.prediction"    %% "core"          % "0.8.6" % "provided",
    +  "io.prediction"    %% "core"          % "0.9.6" % "provided",
    --- End diff --
    
    Yes, although it will fail without a local build populating the lib in ivy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---