You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2017/10/18 07:00:45 UTC

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/19525

    [SPARK-22289] [ML] Add JSON support for Matrix parameters (LR with coefficients bound)

    ## What changes were proposed in this pull request?
    jira: https://issues.apache.org/jira/browse/SPARK-22289 
    
    add JSON encode/decode for Param[Matrix]. 
    
    The issue was reported by Nic Eggert during saving LR model with LowerBoundsOnCoefficients. 
    There're two ways to resolve this as I see:
    1. Support save/load on LogisticRegressionParams, and also adjust the save/load in LogisticRegression and LogisticRegressionModel.
    2. Directly support Matrix in Param.jsonEncode, similar to what we have done for Vector.
    
    After some discussion in jira, we prefer the fix to support Matrix as a valid Param type, for simplicity and convenience for other classes.
    
    ## How was this patch tested?
    
    new unit test to cover the LR case and JsonMatrixConverter


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark lrsave

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19525
    
----
commit 92f599d3fb7159cfc1694d2c65321dd2abdc4ccc
Author: Yuhao Yang <yu...@intel.com>
Date:   2017-10-18T06:49:01Z

    add json for Matrix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r150430257
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object DenseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] =
    --- End diff --
    
    I didn't plan to make it public. Shall I remove @Since?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #84644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84644/testReport)** for PR 19525 at commit [`33e08ee`](https://github.com/apache/spark/commit/33e08ee22f2e37f47da246ce197f699351a3bc1b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r155411609
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    @hhbyyh You have got a point there, agree.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r155414954
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
    @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
           LogisticRegressionSuite.allParamSettings, checkModelData)
       }
     
    +  test("read/write with BoundsOnCoefficients") {
    +    def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = {
    +      assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients)
    +      assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients)
    --- End diff --
    
    Sure, you can update existing read/write test or your test, as long as to check model itself rather than parameters.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #84686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84686/testReport)** for PR 19525 at commit [`5799456`](https://github.com/apache/spark/commit/579945668e6f7e93d76102185bdf1d274e6c7773).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r151854676
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object DenseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] =
    --- End diff --
    
    @yanboliang any suggestion?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Thanks for the review @yanboliang 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r150432943
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
    @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
           LogisticRegressionSuite.allParamSettings, checkModelData)
       }
     
    +  test("read/write with BoundsOnCoefficients") {
    +    def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = {
    +      assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients)
    +      assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients)
    --- End diff --
    
    The existing read/write test has `elasticNetParam = 0.1`.
    I can update the `checkModelData` function here to check model `coefficients` and `intercept` if it's OK with you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r155412287
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -122,17 +124,33 @@ private[ml] object Param {
     
       /** Decodes a param value from JSON. */
       def jsonDecode[T](json: String): T = {
    -    parse(json) match {
    +    val jValue = parse(json)
    +    jValue match {
           case JString(x) =>
             x.asInstanceOf[T]
           case JObject(v) =>
             val keys = v.map(_._1)
    -        assert(keys.contains("type") && keys.contains("values"),
    -          s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.")
    -        JsonVectorConverter.fromJson(json).asInstanceOf[T]
    +        if (keys.contains("class")) {
    +          implicit val formats = DefaultFormats
    +          val className = (jValue \ "class").extract[String]
    +          className match {
    +            case JsonMatrixConverter.className =>
    +              val checkFields = Array("numRows", "numCols", "values", "isTransposed")
    --- End diff --
    
    Should we check _type_ as well?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #84644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84644/testReport)** for PR 19525 at commit [`33e08ee`](https://github.com/apache/spark/commit/33e08ee22f2e37f47da246ce197f699351a3bc1b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r145330685
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -122,17 +124,33 @@ private[ml] object Param {
     
       /** Decodes a param value from JSON. */
       def jsonDecode[T](json: String): T = {
    -    parse(json) match {
    +    val jValue = parse(json)
    +    jValue match {
           case JString(x) =>
             x.asInstanceOf[T]
           case JObject(v) =>
             val keys = v.map(_._1)
    -        assert(keys.contains("type") && keys.contains("values"),
    -          s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.")
    -        JsonVectorConverter.fromJson(json).asInstanceOf[T]
    +        if (keys.contains("class")) {
    +          implicit val formats = DefaultFormats
    +          val className = (jValue \ "class").extract[String]
    +          className match {
    +            case JsonMatrixConverter.className =>
    +              val checkFields = Array("numRows", "numCols", "values", "isTransposed")
    +              require(checkFields.forall(keys.contains), s"Expect a JSON serialized Matrix" +
    +                s" but cannot find fields ${checkFields.mkString(", ")} in $json.")
    +              JsonMatrixConverter.fromJson(json).asInstanceOf[T]
    +
    +            case s => throw new SparkException(s"unrecognized class $s in $json")
    +          }
    +        } else { // Vector does not have class info in json
    +          require(keys.contains("type") && keys.contains("values"), s"Expect a JSON serialized" +
    +            s" vector/matrix but cannot find fields 'type' and 'values' in $json.")
    --- End diff --
    
    /matrix should be removed here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #82875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82875/testReport)** for PR 19525 at commit [`92f599d`](https://github.com/apache/spark/commit/92f599d3fb7159cfc1694d2c65321dd2abdc4ccc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r150482756
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object DenseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] =
    --- End diff --
    
    I think it can be made public, like `vector`, it is useful for developers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #82875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82875/testReport)** for PR 19525 at commit [`92f599d`](https://github.com/apache/spark/commit/92f599d3fb7159cfc1694d2c65321dd2abdc4ccc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `            case s => throw new SparkException(s\"unrecognized class $s in $json\")`
      * `         else `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19525


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r150486465
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    I think `class: "matrix"` is OK. shorter identifier is better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    **[Test build #84686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84686/testReport)** for PR 19525 at commit [`5799456`](https://github.com/apache/spark/commit/579945668e6f7e93d76102185bdf1d274e6c7773).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149530436
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    I'd suggest a more shorter string(or integer) to identify this is a matrix, it should be huge burden to store so long metadata string for a matrix with several elements. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149522834
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -827,6 +831,11 @@ class SparseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object SparseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(
    --- End diff --
    
    Ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r150432221
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    Sure. I can see your point. An example for the json,
    {"class":"org.apache.spark.ml.linalg.Matrix","type":0,"numRows":3,"numCols":2,"colPtrs":[0,1,3,4],"rowIndices":[1,0,1,0],"values":[1.21,2.3,9.8,9.0],"isTransposed":true}
    
    Sure we can use Int for `class` or different ranges of `type` to distinguish different classes. While storage efficiency is important, I also want to have a solution that's easy to extend for future and friendly for debugging or trouble shooting. Thus keeping the literal class info in the json is still attractive to me. Is it OK if we simply use `class:Matrix` for this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r145333064
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -122,17 +124,33 @@ private[ml] object Param {
     
       /** Decodes a param value from JSON. */
       def jsonDecode[T](json: String): T = {
    -    parse(json) match {
    +    val jValue = parse(json)
    +    jValue match {
           case JString(x) =>
             x.asInstanceOf[T]
           case JObject(v) =>
             val keys = v.map(_._1)
    -        assert(keys.contains("type") && keys.contains("values"),
    -          s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.")
    -        JsonVectorConverter.fromJson(json).asInstanceOf[T]
    +        if (keys.contains("class")) {
    +          implicit val formats = DefaultFormats
    +          val className = (jValue \ "class").extract[String]
    +          className match {
    +            case JsonMatrixConverter.className =>
    +              val checkFields = Array("numRows", "numCols", "values", "isTransposed")
    +              require(checkFields.forall(keys.contains), s"Expect a JSON serialized Matrix" +
    +                s" but cannot find fields ${checkFields.mkString(", ")} in $json.")
    +              JsonMatrixConverter.fromJson(json).asInstanceOf[T]
    +
    +            case s => throw new SparkException(s"unrecognized class $s in $json")
    +          }
    +        } else { // Vector does not have class info in json
    +          require(keys.contains("type") && keys.contains("values"), s"Expect a JSON serialized" +
    +            s" vector/matrix but cannot find fields 'type' and 'values' in $json.")
    --- End diff --
    
    /matrix should be removed here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84686/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r155414284
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object DenseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] =
    --- End diff --
    
    I'm neutral on this issue. It's ok to let it private but should remove ```Since```. We can make it public later when there is clear requirement.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149533876
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
    @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
           LogisticRegressionSuite.allParamSettings, checkModelData)
       }
     
    +  test("read/write with BoundsOnCoefficients") {
    +    def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = {
    +      assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients)
    +      assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients)
    --- End diff --
    
    In ```checkModelData```, we should check model itself like ```coefficients``` and ```intercept```. We already have check equality for all params including ```lowerBoundsOnCoefficients``` inside of ```testEstimatorAndModelReadWrite```.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149532602
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    Or can we just use ```type``` to identify vector and matrix? For example, ```type``` less than 10 is reserved for vector and more than 10 is for matrix. What do you think of it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84644/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Will make a pass soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r155413347
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -122,17 +124,33 @@ private[ml] object Param {
     
       /** Decodes a param value from JSON. */
       def jsonDecode[T](json: String): T = {
    -    parse(json) match {
    +    val jValue = parse(json)
    +    jValue match {
           case JString(x) =>
             x.asInstanceOf[T]
           case JObject(v) =>
             val keys = v.map(_._1)
    -        assert(keys.contains("type") && keys.contains("values"),
    -          s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.")
    -        JsonVectorConverter.fromJson(json).asInstanceOf[T]
    +        if (keys.contains("class")) {
    +          implicit val formats = DefaultFormats
    +          val className = (jValue \ "class").extract[String]
    +          className match {
    +            case JsonMatrixConverter.className =>
    +              val checkFields = Array("numRows", "numCols", "values", "isTransposed")
    +              require(checkFields.forall(keys.contains), s"Expect a JSON serialized Matrix" +
    +                s" but cannot find fields ${checkFields.mkString(", ")} in $json.")
    +              JsonMatrixConverter.fromJson(json).asInstanceOf[T]
    +
    +            case s => throw new SparkException(s"unrecognized class $s in $json")
    +          }
    +        } else { // Vector does not have class info in json
    --- End diff --
    
    I'd suggest to add more comment here to clarify why vector doesn't have _class_ info in json, it should facilitate the code maintenance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149534129
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
    @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
           LogisticRegressionSuite.allParamSettings, checkModelData)
       }
     
    +  test("read/write with BoundsOnCoefficients") {
    +    def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = {
    +      assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients)
    +      assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients)
    --- End diff --
    
    Or we can merge this test case with existing read/write test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r145331849
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.ml.linalg
    +
    +import org.json4s.DefaultFormats
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
    +
    +private[ml] object JsonMatrixConverter {
    +
    +  /** Unique class name for identifying JSON object encoded by this class. */
    +  val className = "org.apache.spark.ml.linalg.Matrix"
    --- End diff --
    
    I added this as an identifier, so during loading we know which JSON converter to invoke. Yet I didn't add it for JsonVectorConverter, to avoid breaking old models.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82875/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19525#discussion_r149522660
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
    @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
     @Since("2.0.0")
     object DenseMatrix {
     
    +  @Since("2.3.0")
    +  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] =
    --- End diff --
    
    If this is a public API, we should remove ```private[ml]```.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19525: [SPARK-22289] [ML] Add JSON support for Matrix parameter...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/19525
  
    Merged into master and branch-2.2. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org