You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by marevol <gi...@git.apache.org> on 2017/04/18 03:11:08 UTC

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

GitHub user marevol opened a pull request:

    https://github.com/apache/incubator-predictionio/pull/371

    [PIO-61] Add S3 Model Data Repository

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marevol/incubator-predictionio storage_s3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-predictionio/pull/371.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #371
    
----
commit 853d39939c737ddd1ada572db1d3454bb2d4e1aa
Author: Shinsuke Sugaya <sh...@yahoo.co.jp>
Date:   2017-04-17T09:43:25Z

    add S3 model data repository

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #371: [PIO-61] Add S3 Model Data Repository

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/371
  
    I would like to merge S3 support this week.
    For travis testing, it passed in https://travis-ci.org/jpioug/incubator-predictionio/builds/235569236 though apache-travis was failed...
    Please let me know if you have any comments/concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-predictionio/pull/371


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by mars <gi...@git.apache.org>.
Github user mars commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r113494975
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    Hi @marevol,
    
    We're building a [predictionio-incubating branch to add authentication to the Elasticsearch REST client](https://github.com/apache/incubator-predictionio/pull/372) with Scala 2.11 & Spark 2.1.
    
    I believe it was during a runtime ES REST client call to `performRequest` suddenly an underlying method signature could not be located in `org.apache.http` package. @dszeto helped me to discover that the `shadeio.data` name was being incorrectly resolved. So, I tried [removing the shade rule](https://github.com/apache/incubator-predictionio/pull/372/files#diff-55cfeb297edd310e1efa8b6ac8bdbae6L39) and it started working. No more errors for that branch.
    
    Maybe the difference is that we're using Spark 2.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #371: [PIO-61] Add S3 Model Data Repository

Posted by dszeto <gi...@git.apache.org>.
Github user dszeto commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/371
  
    Sorry for the delay. I was on vacation.
    
    This looks great to me, especially the use of `localstack`. I think the Travis failed downloading a Python library and erred out. It shouldn't matter IMO. I'll merge this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r113576948
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    Thank you for the info.
    My environment is Scala 2.10 and Spark 1.6.3.
    I'll check it on Spark 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r112782773
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    aws-java-sdk-s3 depends on them, but they conflict with different version files in spark-assembly jar.
    To avoid this conflict, the package replacement is needed in storage/s3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by dszeto <gi...@git.apache.org>.
Github user dszeto commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r112760158
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    Hey @marevol can you explain a little bit regarding these shade rules, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #371: [PIO-61] Add S3 Model Data Repository

Posted by dszeto <gi...@git.apache.org>.
Github user dszeto commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/371
  
    @marevol Would you like to add some integration test for this? https://github.com/jubos/fake-s3 or https://github.com/atlassian/localstack might be useful for mocking S3 locally within Travis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by dszeto <gi...@git.apache.org>.
Github user dszeto commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r113326594
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    I think @mars is seeing some issues with https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/build.sbt#L42.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r113695041
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    Checking on Scala 2.11/Spark 2.1/ES 5.1&5.3, I could not reproduce it yet...
    My test script file is [test_all.sh](https://github.com/jpioug/incubator-predictionio/blob/express-v1/test_all.sh) to build pio and run recommender template.
    Is it reproduced in Docker tests? I'd like to know steps to reproduce it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on a diff in the pull request:

    https://github.com/apache/incubator-predictionio/pull/371#discussion_r113334667
  
    --- Diff: storage/s3/build.sbt ---
    @@ -0,0 +1,44 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import PIOBuild._
    +
    +name := "apache-predictionio-data-s3"
    +
    +libraryDependencies ++= Seq(
    +  "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided",
    +  "com.google.guava"        % "guava"                     % "14.0.1"      % "provided",
    +  "com.amazonaws"           % "aws-java-sdk-s3"           % "1.11.118",
    +  "org.scalatest"           %% "scalatest"                % "2.1.7" % "test")
    +
    +parallelExecution in Test := false
    +
    +pomExtra := childrenPomExtra.value
    +
    +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    +
    +assemblyShadeRules in assembly := Seq(
    +  ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll,
    +  ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll
    +)
    --- End diff --
    
    @mars could you provide info?
    ES storage also conflicts with httpclient jar file of Spark.
    I think an exception occurs if removing it, so it will be needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #371: [PIO-61] Add S3 Model Data Repository

Posted by marevol <gi...@git.apache.org>.
Github user marevol commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/371
  
    Added LocalStack for S3 testing.
    LocalStack/moto seems to have a problem about ETag handling...
    I added DISABLE_CHUNKED_ENCODING option to avoid it in S3 testing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---