You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2015/10/25 05:45:38 UTC

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/9267

    [SPARK-6724] [MLlib] Support model save/load for FPGrowthModel

    Support model save/load for FPGrowthModel

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-6724

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9267
    
----
commit 81f667a4537b60071cb1888ca88aa4bd0734ad2d
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-10-25T04:44:11Z

    Support model save/load for FPGrowthModel

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585211
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    +      val freqType = freqItemsets.schema(1).dataType
    +      require(itemsetType.isInstanceOf[ArrayType],
    +        s"items should be ArrayType, but get $itemsetType")
    --- End diff --
    
    "get" --> "got"  (same in line below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-169139805
  
    @yanboliang No problem; thanks for your updates!  This LGTM
    
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150897568
  
    **[Test build #44311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/consoleFull)** for PR 9267 at commit [`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `class FPGrowthModel[Item: ClassTag: TypeTag] @Since(\"1.3.0\") (`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585205
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    --- End diff --
    
    Add Since tag for object


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168896944
  
    **[Test build #48731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48731/consoleFull)** for PR 9267 at commit [`41b31eb`](https://github.com/apache/spark/commit/41b31ebbd8fe5743fc212c9db74c75c1b577bf67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r42961501
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    +      val freqType = freqItemsets.schema(1).dataType
    +      require(itemsetType.isInstanceOf[ArrayType],
    +        s"items should be ArrayType, but get $itemsetType")
    +      require(freqType.isInstanceOf[LongType], s"freq should be LongType, but get $freqType")
    +      val itemType = itemsetType.asInstanceOf[ArrayType].elementType
    +      val result = itemType match {
    +        case BooleanType => new FreqItemset(Array[Boolean](), 0L)
    +        case BinaryType => new FreqItemset(Array(Array[Byte]()), 0L)
    +        case StringType => new FreqItemset(Array[String](), 0L)
    +        case ByteType => new FreqItemset(Array[Byte](), 0L)
    +        case ShortType => new FreqItemset(Array[Short](), 0L)
    +        case IntegerType => new FreqItemset(Array[Int](), 0L)
    +        case LongType => new FreqItemset(Array[Long](), 0L)
    +        case FloatType => new FreqItemset(Array[Float](), 0L)
    +        case DoubleType => new FreqItemset(Array[Double](), 0L)
    +        case DateType => new FreqItemset(Array[java.sql.Date](), 0L)
    +        case DecimalType.SYSTEM_DEFAULT => new FreqItemset(Array[java.math.BigDecimal](), 0L)
    +        case TimestampType => new FreqItemset(Array[java.sql.Timestamp](), 0L)
    +        case _: ArrayType => new FreqItemset(Array[Seq[_]](), 0L)
    +        case _: MapType => new FreqItemset(Array[Map[_, _]](), 0L)
    +        case _: StructType => new FreqItemset(Array[Row](), 0L)
    +        case other =>
    +          throw new UnsupportedOperationException(s"Schema for type $other is not supported")
    +      }
    +      result
    +    }
    --- End diff --
    
    Maybe we can make the inferring one of Spark SQL functions just like ```ScalaReflection.schemaFor```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585207
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    --- End diff --
    
    no need for private modifier here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150891871
  
    **[Test build #44311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/consoleFull)** for PR 9267 at commit [`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150897588
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168901991
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168147085
  
    @jkbradley I updated the PR and it's not necessary to provide ```TypeTag```. But I can not figure out a way to eliminate ```inferItemType```, because there is no exist API to do the map between DataFrame datatypes and the Scala types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150891642
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by FlytxtRnD <gi...@git.apache.org>.

Github user FlytxtRnD commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r42964572
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -20,17 +20,28 @@ package org.apache.spark.mllib.fpm
     import java.{util => ju}
     import java.lang.{Iterable => JavaIterable}
     
    +import org.apache.spark.mllib.util.Loader._
    --- End diff --
    
    organize imports.This should go down


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48787482
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -20,17 +20,27 @@ package org.apache.spark.mllib.fpm
     import java.{util => ju}
     import java.lang.{Iterable => JavaIterable}
     
    +import org.json4s.DefaultFormats
    --- End diff --
    
    Organize imports (scala before 3rd-party libraries)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585204
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    --- End diff --
    
    Add Since tags for this and other overridden methods (though I need to check to make sure this appears correctly in the docs)
    
    Override documentation for save() to state that it only works for Item datatypes supported by DataFrames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-167912765
  
    I'll take a look at this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-163699840
  
    **[Test build #47509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47509/consoleFull)** for PR 9267 at commit [`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:\n  * `class FPGrowthModel[Item: ClassTag: TypeTag] @Since(\"1.3.0\") (`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150891646
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168146438
  
    **[Test build #48546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48546/consoleFull)** for PR 9267 at commit [`7381b31`](https://github.com/apache/spark/commit/7381b31661df242b04f0d4151ea923e5b3fe3120).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48787491
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/fpm/JavaFPGrowthSuite.java ---
    @@ -69,4 +71,43 @@ public void runFPGrowth() {
           long freq = itemset.freq();
         }
       }
    +
    +  @Test
    +  public void runFPGrowthSaveLoad() {
    +
    +    @SuppressWarnings("unchecked")
    +    JavaRDD<List<String>> rdd = sc.parallelize(Arrays.asList(
    +      Arrays.asList("r z h k p".split(" ")),
    +      Arrays.asList("z y x w v u t s".split(" ")),
    +      Arrays.asList("s x o n r".split(" ")),
    +      Arrays.asList("x z y m t s q e".split(" ")),
    +      Arrays.asList("z".split(" ")),
    +      Arrays.asList("x z y r q t p".split(" "))), 2);
    +
    +    FPGrowthModel<String> model = new FPGrowth()
    +      .setMinSupport(0.5)
    +      .setNumPartitions(2)
    +      .run(rdd);
    +
    +    File tempDir = Utils.createTempDir(
    +      System.getProperty("java.io.tmpdir"), "JavaFPGrowthSuite");
    +    String outputPath = tempDir.getPath();
    +
    +    try {
    +      model.save(sc.sc(), outputPath);
    +    } finally {
    +      FPGrowthModel newModel = FPGrowthModel.load(sc.sc(), outputPath);
    --- End diff --
    
    Loading and testing the model should go under ```try```, not ```finally```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-163700014
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47509/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168150715
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48546/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168150628
  
    **[Test build #48546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48546/consoleFull)** for PR 9267 at commit [`7381b31`](https://github.com/apache/spark/commit/7381b31661df242b04f0d4151ea923e5b3fe3120).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-150897587
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48649364
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -42,8 +53,9 @@ import org.apache.spark.storage.StorageLevel
      */
     @Since("1.3.0")
     @Experimental
    -class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
    -    @Since("1.3.0") val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable {
    +class FPGrowthModel[Item: ClassTag: TypeTag] @Since("1.3.0") (
    --- End diff --
    
    I updated the PR and it's not necessary to provide ```TypeTag```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168901993
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48731/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585210
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    --- End diff --
    
    Use the field name instead of an index to get the element in the schema (to be more robust).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48649549
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    +      val freqType = freqItemsets.schema(1).dataType
    +      require(itemsetType.isInstanceOf[ArrayType],
    +        s"items should be ArrayType, but get $itemsetType")
    +      require(freqType.isInstanceOf[LongType], s"freq should be LongType, but get $freqType")
    +      val itemType = itemsetType.asInstanceOf[ArrayType].elementType
    +      val result = itemType match {
    +        case BooleanType => new FreqItemset(Array[Boolean](), 0L)
    +        case BinaryType => new FreqItemset(Array(Array[Byte]()), 0L)
    +        case StringType => new FreqItemset(Array[String](), 0L)
    +        case ByteType => new FreqItemset(Array[Byte](), 0L)
    +        case ShortType => new FreqItemset(Array[Short](), 0L)
    +        case IntegerType => new FreqItemset(Array[Int](), 0L)
    +        case LongType => new FreqItemset(Array[Long](), 0L)
    +        case FloatType => new FreqItemset(Array[Float](), 0L)
    +        case DoubleType => new FreqItemset(Array[Double](), 0L)
    +        case DateType => new FreqItemset(Array[java.sql.Date](), 0L)
    +        case DecimalType.SYSTEM_DEFAULT => new FreqItemset(Array[java.math.BigDecimal](), 0L)
    +        case TimestampType => new FreqItemset(Array[java.sql.Timestamp](), 0L)
    +        case _: ArrayType => new FreqItemset(Array[Seq[_]](), 0L)
    +        case _: MapType => new FreqItemset(Array[Map[_, _]](), 0L)
    +        case _: StructType => new FreqItemset(Array[Row](), 0L)
    +        case other =>
    +          throw new UnsupportedOperationException(s"Schema for type $other is not supported")
    +      }
    +      result
    +    }
    +
    +    def load[Item: ClassTag: TypeTag](
    +        sc: SparkContext,
    +        path: String,
    +        inferredItemset: FreqItemset[Item]): FPGrowthModel[Item] = {
    +      implicit val formats = DefaultFormats
    +      val sqlContext = new SQLContext(sc)
    +
    +      val (className, formatVersion, metadata) = loadMetadata(sc, path)
    +      assert(className == thisClassName)
    +      assert(formatVersion == thisFormatVersion)
    +
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val freqItemsetsRDD = freqItemsets.map { x =>
    +        val items = x.getAs[Seq[Item]](0).toArray
    --- End diff --
    
    Unfortunately it can not, it will complain the following error:
    ```
    unbound wildcard type
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48787493
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/fpm/JavaFPGrowthSuite.java ---
    @@ -69,4 +71,43 @@ public void runFPGrowth() {
           long freq = itemset.freq();
         }
       }
    +
    +  @Test
    +  public void runFPGrowthSaveLoad() {
    +
    +    @SuppressWarnings("unchecked")
    +    JavaRDD<List<String>> rdd = sc.parallelize(Arrays.asList(
    +      Arrays.asList("r z h k p".split(" ")),
    +      Arrays.asList("z y x w v u t s".split(" ")),
    +      Arrays.asList("s x o n r".split(" ")),
    +      Arrays.asList("x z y m t s q e".split(" ")),
    +      Arrays.asList("z".split(" ")),
    +      Arrays.asList("x z y r q t p".split(" "))), 2);
    +
    +    FPGrowthModel<String> model = new FPGrowth()
    +      .setMinSupport(0.5)
    +      .setNumPartitions(2)
    +      .run(rdd);
    +
    +    File tempDir = Utils.createTempDir(
    +      System.getProperty("java.io.tmpdir"), "JavaFPGrowthSuite");
    +    String outputPath = tempDir.getPath();
    +
    +    try {
    +      model.save(sc.sc(), outputPath);
    +    } finally {
    +      FPGrowthModel newModel = FPGrowthModel.load(sc.sc(), outputPath);
    +      List<FPGrowth.FreqItemset<String>> freqItemsets = newModel.freqItemsets().toJavaRDD()
    +        .collect();
    +      assertEquals(18, freqItemsets.size());
    +
    +      for (FPGrowth.FreqItemset<String> itemset: freqItemsets) {
    +        // Test return types.
    +        List<String> items = itemset.javaItems();
    +        long freq = itemset.freq();
    +      }
    +
    +      Utils.deleteRecursively(tempDir);
    --- End diff --
    
    Just this line needs to be in ```finally```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168841834
  
    @yanboliang This change should let you remove inferItemType.  (Tests pass at least.)
    
    https://github.com/jkbradley/spark/commit/4f5c5a3e852b19a9a0d9b776bb6866ff6eb6921b


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48787485
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -49,6 +60,119 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  /**
    +   * Save this model to the given path.
    +   * It only works for Item datatypes supported by DataFrames.
    +   *
    +   * This saves:
    +   *  - human-readable (JSON) model metadata to path/metadata/
    +   *  - Parquet formatted data to path/data/
    +   *
    +   * The model may be loaded using [[FPGrowthModel.load]].
    +   *
    +   * @param sc  Spark context used to save model data.
    +   * @param path  Path specifying the directory in which to save this model.
    +   *              If the directory already exists, this method throws an exception.
    +   */
    +  @Since("2.0.0")
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +@Since("2.0.0")
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  @Since("2.0.0")
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = SQLContext.getOrCreate(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      // Get the type of item class
    +      val sample = model.freqItemsets.take(1)(0).items(0)
    --- End diff --
    
    ```take(1)(0)``` -> ```first()```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168901934
  
    **[Test build #48731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48731/consoleFull)** for PR 9267 at commit [`41b31eb`](https://github.com/apache/spark/commit/41b31ebbd8fe5743fc212c9db74c75c1b577bf67).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168150714
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585213
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    +      val freqType = freqItemsets.schema(1).dataType
    +      require(itemsetType.isInstanceOf[ArrayType],
    +        s"items should be ArrayType, but get $itemsetType")
    +      require(freqType.isInstanceOf[LongType], s"freq should be LongType, but get $freqType")
    +      val itemType = itemsetType.asInstanceOf[ArrayType].elementType
    +      val result = itemType match {
    +        case BooleanType => new FreqItemset(Array[Boolean](), 0L)
    +        case BinaryType => new FreqItemset(Array(Array[Byte]()), 0L)
    +        case StringType => new FreqItemset(Array[String](), 0L)
    +        case ByteType => new FreqItemset(Array[Byte](), 0L)
    +        case ShortType => new FreqItemset(Array[Short](), 0L)
    +        case IntegerType => new FreqItemset(Array[Int](), 0L)
    +        case LongType => new FreqItemset(Array[Long](), 0L)
    +        case FloatType => new FreqItemset(Array[Float](), 0L)
    +        case DoubleType => new FreqItemset(Array[Double](), 0L)
    +        case DateType => new FreqItemset(Array[java.sql.Date](), 0L)
    +        case DecimalType.SYSTEM_DEFAULT => new FreqItemset(Array[java.math.BigDecimal](), 0L)
    +        case TimestampType => new FreqItemset(Array[java.sql.Timestamp](), 0L)
    +        case _: ArrayType => new FreqItemset(Array[Seq[_]](), 0L)
    +        case _: MapType => new FreqItemset(Array[Map[_, _]](), 0L)
    +        case _: StructType => new FreqItemset(Array[Row](), 0L)
    +        case other =>
    +          throw new UnsupportedOperationException(s"Schema for type $other is not supported")
    +      }
    +      result
    +    }
    +
    +    def load[Item: ClassTag: TypeTag](
    +        sc: SparkContext,
    +        path: String,
    +        inferredItemset: FreqItemset[Item]): FPGrowthModel[Item] = {
    +      implicit val formats = DefaultFormats
    +      val sqlContext = new SQLContext(sc)
    +
    +      val (className, formatVersion, metadata) = loadMetadata(sc, path)
    +      assert(className == thisClassName)
    +      assert(formatVersion == thisFormatVersion)
    +
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val freqItemsetsRDD = freqItemsets.map { x =>
    --- End diff --
    
    Use ```freqItemsets.select("items", "freq")``` to ensure the column order here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-167930451
  
    Your PR looks good, but I hope we can eliminate the need for Item to be restricted by TypeTag.
    
    Also, eliminating ```inferItemType``` will be nice to avoid the need for updates if Catalyst supports more types in the future.
    
    This should probably include a Java unit test b/c of how it uses types.
    
    Thanks!  I'll watch for updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9267


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-163700012
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585214
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    +
    +      val metadata = compact(render(
    +        ("class" -> thisClassName) ~ ("version" -> thisFormatVersion)))
    +      sc.parallelize(Seq(metadata), 1).saveAsTextFile(Loader.metadataPath(path))
    +
    +      val itemType = ScalaReflection.schemaFor[Item].dataType
    +      val fields = Array(StructField("items", ArrayType(itemType)),
    +        StructField("freq", LongType))
    +      val schema = StructType(fields)
    +      val rowDataRDD = model.freqItemsets.map { x =>
    +        Row(x.items, x.freq)
    +      }
    +      sqlContext.createDataFrame(rowDataRDD, schema).write.parquet(Loader.dataPath(path))
    +    }
    +
    +    def inferItemType(sc: SparkContext, path: String): FreqItemset[_] = {
    +      val sqlContext = new SQLContext(sc)
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val itemsetType = freqItemsets.schema(0).dataType
    +      val freqType = freqItemsets.schema(1).dataType
    +      require(itemsetType.isInstanceOf[ArrayType],
    +        s"items should be ArrayType, but get $itemsetType")
    +      require(freqType.isInstanceOf[LongType], s"freq should be LongType, but get $freqType")
    +      val itemType = itemsetType.asInstanceOf[ArrayType].elementType
    +      val result = itemType match {
    +        case BooleanType => new FreqItemset(Array[Boolean](), 0L)
    +        case BinaryType => new FreqItemset(Array(Array[Byte]()), 0L)
    +        case StringType => new FreqItemset(Array[String](), 0L)
    +        case ByteType => new FreqItemset(Array[Byte](), 0L)
    +        case ShortType => new FreqItemset(Array[Short](), 0L)
    +        case IntegerType => new FreqItemset(Array[Int](), 0L)
    +        case LongType => new FreqItemset(Array[Long](), 0L)
    +        case FloatType => new FreqItemset(Array[Float](), 0L)
    +        case DoubleType => new FreqItemset(Array[Double](), 0L)
    +        case DateType => new FreqItemset(Array[java.sql.Date](), 0L)
    +        case DecimalType.SYSTEM_DEFAULT => new FreqItemset(Array[java.math.BigDecimal](), 0L)
    +        case TimestampType => new FreqItemset(Array[java.sql.Timestamp](), 0L)
    +        case _: ArrayType => new FreqItemset(Array[Seq[_]](), 0L)
    +        case _: MapType => new FreqItemset(Array[Map[_, _]](), 0L)
    +        case _: StructType => new FreqItemset(Array[Row](), 0L)
    +        case other =>
    +          throw new UnsupportedOperationException(s"Schema for type $other is not supported")
    +      }
    +      result
    +    }
    +
    +    def load[Item: ClassTag: TypeTag](
    +        sc: SparkContext,
    +        path: String,
    +        inferredItemset: FreqItemset[Item]): FPGrowthModel[Item] = {
    +      implicit val formats = DefaultFormats
    +      val sqlContext = new SQLContext(sc)
    +
    +      val (className, formatVersion, metadata) = loadMetadata(sc, path)
    +      assert(className == thisClassName)
    +      assert(formatVersion == thisFormatVersion)
    +
    +      val freqItemsets = sqlContext.read.parquet(Loader.dataPath(path))
    +      val freqItemsetsRDD = freqItemsets.map { x =>
    +        val items = x.getAs[Seq[Item]](0).toArray
    --- End diff --
    
    Are you able to do ```getSeq[_]``` here?  I'm wondering if we can eliminate ```inferItemType```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-163645456
  
    **[Test build #47509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47509/consoleFull)** for PR 9267 at commit [`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585203
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -42,8 +53,9 @@ import org.apache.spark.storage.StorageLevel
      */
     @Since("1.3.0")
     @Experimental
    -class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
    -    @Since("1.3.0") val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable {
    +class FPGrowthModel[Item: ClassTag: TypeTag] @Since("1.3.0") (
    --- End diff --
    
    This will break API compatibility since old user code might provide a class for which a TypeTag is not available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9267#discussion_r48585208
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
    @@ -53,6 +65,96 @@ class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
         val associationRules = new AssociationRules(confidence)
         associationRules.run(freqItemsets)
       }
    +
    +  override def save(sc: SparkContext, path: String): Unit = {
    +    FPGrowthModel.SaveLoadV1_0.save(this, path)
    +  }
    +
    +  override protected val formatVersion: String = "1.0"
    +}
    +
    +object FPGrowthModel extends Loader[FPGrowthModel[_]] {
    +
    +  override def load(sc: SparkContext, path: String): FPGrowthModel[_] = {
    +    val inferredItemset = FPGrowthModel.SaveLoadV1_0.inferItemType(sc, path)
    +    FPGrowthModel.SaveLoadV1_0.load(sc, path, inferredItemset)
    +  }
    +
    +  private[fpm] object SaveLoadV1_0 {
    +
    +    private val thisFormatVersion = "1.0"
    +
    +    private[fpm] val thisClassName = "org.apache.spark.mllib.fpm.FPGrowthModel"
    +
    +    def save[Item: ClassTag: TypeTag](model: FPGrowthModel[Item], path: String): Unit = {
    +      val sc = model.freqItemsets.sparkContext
    +      val sqlContext = new SQLContext(sc)
    --- End diff --
    
    Use SQLContext.getOrCreate (here and elsewhere)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9267#issuecomment-168904235
  
    @jkbradley That's cool! It also works well in my environment. Thanks for your kindly help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org