You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tomasatdatabricks <gi...@git.apache.org> on 2018/01/05 18:32:08 UTC

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

GitHub user tomasatdatabricks opened a pull request:

    https://github.com/apache/spark/pull/20168

    SPARK-22730 Add ImageSchema support for non-integer image formats

    ## What changes were proposed in this pull request?
    Added functionality to handle all OpenCV modes to ImageSchema:
      1. updated toImage and toNDArray functions to handle non-uint8 based images.
      2. add information about individual OpenCv modes
    
    ## How was this patch tested?
    Added test for conversion between numpy arrays and images stored as all possible OpenCV modes. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tomasatdatabricks/spark tomas/ImageSchemaUpdate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20168
    
----
commit fe87dd112709ca2b101d78c97c4826536ec7da7d
Author: tomasatdatabricks <to...@...>
Date:   2017-12-29T22:56:28Z

    Added functionality for handling non-uint8-based images for ImageSchema

commit 70bae2f7e9d85a5f464f1bfc3a9426136259d5d1
Author: tomasatdatabricks <to...@...>
Date:   2018-01-03T19:14:06Z

    Added test for conversion between array and image struct for all ocv types.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86156/testReport)** for PR 20168 at commit [`896ccc2`](https://github.com/apache/spark/commit/896ccc21582f1610e38dc91a67eca90c8914e2e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Btw, I think this isn't only to add non-integer image formats. So the PR title may be changed too. Like "Add ImageSchema support for all OpenCV image types"?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664806
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  /**
    +   * Return the supported OpenCvType with matching mode or raise error if there is no matching type.
    +   *
    +   * @param mode: mode of existing OpenCvType
    +   * @return OpenCvType that matches the given mode
    +   */
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    --- End diff --
    
    `getOcvTypeByMode` or `findOcvTypeByMode`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160303876
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    --- End diff --
    
    Yes, good catch. There should have been if else clause for name and return Undefined for mode == -1.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86059/testReport)** for PR 20168 at commit [`2401add`](https://github.com/apache/spark/commit/2401add863079abb067cacffa4ba1842efb0e517).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @tomasatdatabricks nice PR!  I've added a few comments.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86059/testReport)** for PR 20168 at commit [`2401add`](https://github.com/apache/spark/commit/2401add863079abb067cacffa4ba1842efb0e517).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160304870
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -201,8 +243,9 @@ def readImages(self, path, recursive=False, numPartitions=-1,
             .. versionadded:: 2.3.0
             """
     
    -        spark = SparkSession.builder.getOrCreate()
    -        image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema
    +        ctx = SparkContext.getOrCreate()
    --- End diff --
    
    Good catch. Looks like this was caused by auto-rebasing onto the latest changes. My original change was only to replace _active_context with getOrCreate() but it was clearly made obsolete in the meantime. I'll just remove it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664731
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    +        """
    +        Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +
    +        :param: str name: OpenCv type name; must be equal to name of one of the supported types.
    +        :return: OpenCvType with matching name.
    +
    +        """
    +
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    `getOcvTypeByMode` or `findOcvTypeByMode`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86148/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161334810
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,27 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        s = np.random.RandomState(seed=987)
    +        ary_src = s.rand(4, 10, 10)
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            npary0 = ary_src[..., 0:ocvType.nChannels].astype(ocvType.nptype)
    +            img = ImageSchema.toImage(npary0)
    +            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
    +            npary1 = ImageSchema.toNDArray(img)
    +            np.testing.assert_array_equal(npary0, npary1)
    --- End diff --
    
    good point, I'll add the test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161656541
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    --- End diff --
    
    same minor nitpick: "OpenCV" instead of "open cv", and in code below as well


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160271945
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    Because I am not seeing the method called `ocvTypeByMode` in `ImageSchema.scala`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161366123
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
    +
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    +    ocvTypes.find(x => x.mode == mode).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +  }
    +
    +  val undefinedImageType = OpenCvType(-1, "N/A", -1)
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  val ocvTypes = {
    --- End diff --
    
    Could we set the explicit type?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @MrBago @tomasatdatabricks the changes look good to me, I went through everything one more time, I'll sign off as soon as the python tests are fixed (it looks like there were some style issues in last commit) and all other dev comments are resolved, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161904044
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
    @@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext {
             val bytes20 = getData(row).slice(0, 20)
     
             val (expectedMode, expectedBytes) = firstBytes20(filename)
    --- End diff --
    
    Yes, good catch. The name is definitely misleading as it is now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161665337
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    --- End diff --
    
    `getOcvTypeByName` or `findOcvTypeByName`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85884/testReport)** for PR 20168 at commit [`763c8a6`](https://github.com/apache/spark/commit/763c8a613ddcb0c7960c800b4fb7496964d29b11).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160016719
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    --- End diff --
    
    Is it for copy? I usually do `list(self._ocvTypes)` tho.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86149/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @tomasatdatabricks, mind updating this? Lately I happened to take a look for this few times. I will try to review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91608/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86149/testReport)** for PR 20168 at commit [`9ec8cd3`](https://github.com/apache/spark/commit/9ec8cd3c6ac48d6be0de30dca384e982335dc2d0).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161366218
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,33 @@ def ocvTypes(self):
             """
    --- End diff --
    
    Seems we should fix the doc for `:return:`. Seems it's going to be a list now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160300141
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    Spark's approach has been to keep python and scala APIs as close as possible.
    
    We should either try and copy the scala api here (make an `OpenCvType` object with a `get` method and `__call__` method), or drop the `OpenCvType` object from scala side and use the same method names as we're adding to the python side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160016683
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    +  }
    +
    +  object OpenCvType {
    +    def get(name: String): OpenCvType = {
    +      ocvTypes.find(x => x.name == name).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv type " + name))
    +    }
    +    def get(mode: Int): OpenCvType = {
    +      ocvTypes.find(x => x.mode == mode).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +    }
    +    val undefinedType = OpenCvType(-1, "N/A", -1)
    +  }
     
       /**
    -   * (Java-specific) OpenCV type mapping supported
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
        */
    -  val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava
    +  val ocvTypes = {
    +    val types =
    +      for (nc <- Array(1, 2, 3, 4);
    +           dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F"))
    +        yield (dt, nc)
    +    val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j)
    +    OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
    +  }
    +
    +  val javaOcvTypes = ocvTypes.asJava
    --- End diff --
    
    Hm .. why did we remove the doc here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160264983
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    --- End diff --
    
    How about something like:
    
    ```
    s = np.random.RandomState(seed=987)
    ary_src  = s.rand(4, 10, 10)
    ```
      


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85884/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160291075
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    --- End diff --
    
    Spark uses scala "string interpolation" a lot of places, I think it's quite nice and very readable: `s"OpenCvType(mode = $mode, name = $name)"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86059/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86207/testReport)** for PR 20168 at commit [`5a632f5`](https://github.com/apache/spark/commit/5a632f5f60afd2e8c225703532d17ed9e56e47f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161662177
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  /**
    +   * Return the supported OpenCvType with matching mode or raise error if there is no matching type.
    +   *
    +   * @param mode: mode of existing OpenCvType
    +   * @return OpenCvType that matches the given mode
    +   */
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    +    ocvTypes.find(x => x.mode == mode).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +  }
    +
    +  val undefinedImageType = OpenCvType(-1, "N/A", -1)
    +
    +  /**
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
    +   */
    +  val ocvTypes: IndexedSeq[OpenCvType] = {
    +    val types =
    +      for (nc <- Array(1, 2, 3, 4);
    --- End diff --
    
    `numChannel`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161661795
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161663005
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  /**
    +   * Return the supported OpenCvType with matching mode or raise error if there is no matching type.
    +   *
    +   * @param mode: mode of existing OpenCvType
    +   * @return OpenCvType that matches the given mode
    +   */
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    +    ocvTypes.find(x => x.mode == mode).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +  }
    +
    +  val undefinedImageType = OpenCvType(-1, "N/A", -1)
    +
    +  /**
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    --- End diff --
    
    Add a brief header for row/column.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161663481
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    --- End diff --
    
    Add a reference link for OpenCV data type? Like this one: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160306292
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            x = [[ary_src[i][j][0:ocvType.nChannels]
    +                  for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
    +            npary0 = np.array(x).astype(ocvType.nptype)
    --- End diff --
    
    good point, I'll change that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161366196
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
    +
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    +    ocvTypes.find(x => x.mode == mode).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +  }
    +
    +  val undefinedImageType = OpenCvType(-1, "N/A", -1)
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  val ocvTypes = {
    +    val types =
    +      for (nc <- Array(1, 2, 3, 4);
    +           dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F"))
    +        yield (dt, nc)
    +    val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j)
    +    undefinedImageType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
    +  }
     
       /**
    -   * (Java-specific) OpenCV type mapping supported
    +   *  (Java Specific) list of OpenCv types
    --- End diff --
    
    Let's keep as is `(Java-specific)`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160259006
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -143,12 +174,12 @@ object ImageSchema {
     
           val height = img.getHeight
           val width = img.getWidth
    -      val (nChannels, mode) = if (isGray) {
    -        (1, ocvTypes("CV_8UC1"))
    +      val (nChannels, mode: Int) = if (isGray) {
    --- End diff --
    
    don't you need to add the other types here (eg floating point, etc)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    cc @jkbradley, @imatiach-msft, @MrBago and @thunterdb.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85742/testReport)** for PR 20168 at commit [`70bae2f`](https://github.com/apache/spark/commit/70bae2f7e9d85a5f464f1bfc3a9426136259d5d1).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161666778
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        s = np.random.RandomState(seed=987)
    +        ary_src = s.rand(4, 10, 10)
    --- End diff --
    
    s.rand(4, 10, 10) -> s.rand(10, 10, 4)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161903575
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    --- End diff --
    
    I prefer the ocvTypeByName unless there is a consensus otherwise. Find suggests slightly different usage patterns in my opinion and so does get, to some extent, + get would not be consistent with the rest of the python api. I don't feel particularly strong about this so if more people think it should be change I'd be happy to change it .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160502167
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    I think we could make either API work for both languages but it's a bit unnatural. There's a tradeoff between doing the most natural and appropriate thing in each language and having matching APIs, Spark has chosen to prefer making the APIs match so let's do our best to do that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Overall looks good to me. Just some minor comments regarding with code comments and naming.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160016767
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    Is it meant to be public? Seems doc is missing and this one doesn't look consistent with Scala side?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160305359
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            x = [[ary_src[i][j][0:ocvType.nChannels]
    +                  for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
    +            npary0 = np.array(x).astype(ocvType.nptype)
    +            img = ImageSchema.toImage(npary0)
    +            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
    +            npary1 = ImageSchema.toNDArray(img)
    +            np.testing.assert_array_equal(npary0, npary1)
    +
         def test_read_images(self):
             data_path = 'data/mllib/images/kittens'
    --- End diff --
    
    As per my comment above, this PR does not address reading images, only image schema <=> numpy arrays conversions. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161665832
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        s = np.random.RandomState(seed=987)
    +        ary_src = s.rand(4, 10, 10)
    --- End diff --
    
    ary_src -> array_src?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @tomasatdatabricks it looks like there are some conflicts that need to be resolved, otherwise looks good to me, can you please update the PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86207/testReport)** for PR 20168 at commit [`5a632f5`](https://github.com/apache/spark/commit/5a632f5f60afd2e8c225703532d17ed9e56e47f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664786
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +   *
    +   * @param name: name of existing OpenCvType
    +   * @return OpenCvType that matches the given name
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  def ocvTypeByName(name: String): OpenCvType = {
    --- End diff --
    
    `getOcvTypeByName` or `findOcvTypeByName`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86156/testReport)** for PR 20168 at commit [`896ccc2`](https://github.com/apache/spark/commit/896ccc21582f1610e38dc91a67eca90c8914e2e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160265175
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,7 +72,7 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
    --- End diff --
    
    Have you check every place to use `getOrCreate` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160257066
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    --- End diff --
    
    confused, didn't you make this "N/A" instead of undefined - how does the test pass?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160293554
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    --- End diff --
    
    I think this test is failing:
    
    ```
    ======================================================================
    FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types
        self.assertEqual("Undefined", ocvList[0].name)
    AssertionError: 'Undefined' != u'CV_N/AC-1'
    
    ----------------------------------------------------------------------
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664852
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    +        """
    +        Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +
    +        :param: str name: OpenCv type name; must be equal to name of one of the supported types.
    +        :return: OpenCvType with matching name.
    +
    +        """
    +
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    +        """
    +        Return the supported OpenCvType with matching mode or raise error if there is no matching type.
    --- End diff --
    
    `OpenCvType` -> `OcvType`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @MrBago @tomasatdatabricks I think the breaking changes are fine, the code was marked experimental and it is expected that the interfaces will change a lot initially based on early feedback.  The PR looks good to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160303995
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    +  }
    +
    +  object OpenCvType {
    +    def get(name: String): OpenCvType = {
    +      ocvTypes.find(x => x.name == name).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv type " + name))
    +    }
    +    def get(mode: Int): OpenCvType = {
    +      ocvTypes.find(x => x.mode == mode).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +    }
    +    val undefinedType = OpenCvType(-1, "N/A", -1)
    --- End diff --
    
    Yes, the name method should have special case for mode == -1. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86149/testReport)** for PR 20168 at commit [`9ec8cd3`](https://github.com/apache/spark/commit/9ec8cd3c6ac48d6be0de30dca384e982335dc2d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85878/testReport)** for PR 20168 at commit [`eee25ce`](https://github.com/apache/spark/commit/eee25ceffde2c1d6ca248eceb17a559e2f921cc6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160305509
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    +        if self._ocvTypesByMode is None:
    +            self._ocvTypesByMode = {x.mode: x for x in self.ocvTypes}
    +        return self._ocvTypesByMode[mode]
    --- End diff --
    
    good catch, it should be there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86061/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161366253
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,33 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    --- End diff --
    
    Let's write a doc and doctest too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85878/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160260468
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            x = [[ary_src[i][j][0:ocvType.nChannels]
    +                  for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
    +            npary0 = np.array(x).astype(ocvType.nptype)
    +            img = ImageSchema.toImage(npary0)
    +            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
    +            npary1 = ImageSchema.toNDArray(img)
    +            np.testing.assert_array_equal(npary0, npary1)
    +
         def test_read_images(self):
             data_path = 'data/mllib/images/kittens'
    --- End diff --
    
    not necessary for this PR, but it would be nice to add images with the new types to verify they can be read in with the new changes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160304128
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    --- End diff --
    
    That's a good point, will do.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #91608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91608/testReport)** for PR 20168 at commit [`5a632f5`](https://github.com/apache/spark/commit/5a632f5f60afd2e8c225703532d17ed9e56e47f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160303763
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -143,12 +174,12 @@ object ImageSchema {
     
           val height = img.getHeight
           val width = img.getWidth
    -      val (nChannels, mode) = if (isGray) {
    -        (1, ocvTypes("CV_8UC1"))
    +      val (nChannels, mode: Int) = if (isGray) {
    --- End diff --
    
    The decode function still only handles  unsigned bytes. The intention of this PR was only to add support for conversions between ImageSchema struct and numpy arrays and vice versa. Such use case occurs e.g. if you want to store images that have been preprocessed to be used by TF model (e.g. normalized). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160255634
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -201,8 +243,9 @@ def readImages(self, path, recursive=False, numPartitions=-1,
             .. versionadded:: 2.3.0
             """
     
    -        spark = SparkSession.builder.getOrCreate()
    -        image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema
    +        ctx = SparkContext.getOrCreate()
    --- End diff --
    
    minor comment: the change before was only two lines whereas this is 3 lines - I think the previous change was better, what is the advantage of the new code?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160264829
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    Why it is not consistent with Scala side?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160305779
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,7 +72,7 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
    --- End diff --
    
    Yes, everywhere in image schema code at least. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161328577
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,27 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        s = np.random.RandomState(seed=987)
    +        ary_src = s.rand(4, 10, 10)
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            npary0 = ary_src[..., 0:ocvType.nChannels].astype(ocvType.nptype)
    +            img = ImageSchema.toImage(npary0)
    +            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
    +            npary1 = ImageSchema.toNDArray(img)
    +            np.testing.assert_array_equal(npary0, npary1)
    --- End diff --
    
    Can we also check `nparry1.dtype = ocvType.nptype`, numpy allows arrays of different types to be equal if their contents compare equal, eg `[0, 1] == [0.0, 1.0]`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86148/testReport)** for PR 20168 at commit [`68a5a94`](https://github.com/apache/spark/commit/68a5a94bdcfeec7268d5790bb8dcb1cadad8cc9b).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85880/testReport)** for PR 20168 at commit [`48eddf1`](https://github.com/apache/spark/commit/48eddf10c884f9eea368efa9001b9fefd42de9e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664597
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    +        """
    +        Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +
    +        :param: str name: OpenCv type name; must be equal to name of one of the supported types.
    +        :return: OpenCvType with matching name.
    --- End diff --
    
    `OpenCvType` -> `OcvType`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161665015
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -128,11 +183,17 @@ def toNDArray(self, image):
             height = image.height
             width = image.width
             nChannels = image.nChannels
    +        ocvType = self.ocvTypeByMode(image.mode)
    +        if nChannels != ocvType.nChannels:
    +            raise ValueError(
    +                "Image has %d channels but OcvType '%s' expects %d channels." %
    --- End diff --
    
    `Image has %d channels but its OcvType ...`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85880/testReport)** for PR 20168 at commit [`48eddf1`](https://github.com/apache/spark/commit/48eddf10c884f9eea368efa9001b9fefd42de9e9).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160293894
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    +        for ocvType in ImageSchema.ocvTypes:
    +            if ocvType.name == 'Undefined':
    +                continue
    +            x = [[ary_src[i][j][0:ocvType.nChannels]
    +                  for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
    +            npary0 = np.array(x).astype(ocvType.nptype)
    --- End diff --
    
    If ary_src is an array, this becomes nparry0 = ary_src[..., :ocvType.nChannels].astype(ocvType.nptype).
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161366127
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
    +   */
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" }
    +    override def toString: String = s"OpenCvType(mode = $mode, name = $name)"
    +  }
    +
    +  def ocvTypeByName(name: String): OpenCvType = {
    +    ocvTypes.find(x => x.name == name).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv type " + name))
    +  }
    +
    +  def ocvTypeByMode(mode: Int): OpenCvType = {
    +    ocvTypes.find(x => x.mode == mode).getOrElse(
    +      throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +  }
    +
    +  val undefinedImageType = OpenCvType(-1, "N/A", -1)
     
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  val ocvTypes = {
    +    val types =
    +      for (nc <- Array(1, 2, 3, 4);
    +           dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F"))
    +        yield (dt, nc)
    +    val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j)
    +    undefinedImageType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
    +  }
     
       /**
    -   * (Java-specific) OpenCV type mapping supported
    +   *  (Java Specific) list of OpenCv types
        */
    -  val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava
    --- End diff --
    
    Let's set the explicit type here .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160316719
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -143,12 +174,12 @@ object ImageSchema {
     
           val height = img.getHeight
           val width = img.getWidth
    -      val (nChannels, mode) = if (isGray) {
    -        (1, ocvTypes("CV_8UC1"))
    +      val (nChannels, mode: Int) = if (isGray) {
    --- End diff --
    
    ah, sorry, I see, I was confused by the description, if we aren't supporting float images for reading that makes sense now


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160264533
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    +        if self._ocvTypesByMode is None:
    +            self._ocvTypesByMode = {x.mode: x for x in self.ocvTypes}
    +        return self._ocvTypesByMode[mode]
    --- End diff --
    
    Why not add `if mode not in self._ocvTypesByMode:` check here ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160267553
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -143,12 +174,12 @@ object ImageSchema {
     
           val height = img.getHeight
           val width = img.getWidth
    -      val (nChannels, mode) = if (isGray) {
    -        (1, ocvTypes("CV_8UC1"))
    +      val (nChannels, mode: Int) = if (isGray) {
    --- End diff --
    
    In the code bellow we always call `toBytes` per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).
    
    Technically you can register image readers in java and use a BufferedImage of `TYPE_CUSTOM` for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for all...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161904121
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        s = np.random.RandomState(seed=987)
    +        ary_src = s.rand(4, 10, 10)
    --- End diff --
    
    Yes, that was the intention, good catch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86148/testReport)** for PR 20168 at commit [`68a5a94`](https://github.com/apache/spark/commit/68a5a94bdcfeec7268d5790bb8dcb1cadad8cc9b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86061/testReport)** for PR 20168 at commit [`d2a864e`](https://github.com/apache/spark/commit/d2a864e346c5b6606ddac497c5b81916b24d088e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161656419
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    +  /**
    +   * OpenCv type representation
    +   *
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    --- End diff --
    
    small nitpick: I think we should always spell it as "OpenCV" to be consistent in the comments (unless you have any good objections)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86207/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85884/testReport)** for PR 20168 at commit [`763c8a6`](https://github.com/apache/spark/commit/763c8a613ddcb0c7960c800b4fb7496964d29b11).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664566
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    +        """
    +        Return the supported OpenCvType with matching name or raise error if there is no matching type.
    --- End diff --
    
    `OpenCvType` -> `OcvType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160305918
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    --- End diff --
    
    yes it is for copy. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160496086
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    --- End diff --
    
    It is not consistent because python can not overload methods based on type but I can rename the Scala side to match python. It does not make a big difference in this case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86156/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160267831
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    +  }
    +
    +  object OpenCvType {
    +    def get(name: String): OpenCvType = {
    +      ocvTypes.find(x => x.name == name).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv type " + name))
    +    }
    +    def get(mode: Int): OpenCvType = {
    +      ocvTypes.find(x => x.mode == mode).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +    }
    +    val undefinedType = OpenCvType(-1, "N/A", -1)
    --- End diff --
    
    Can you add the type here, I think all public members need to have explicit type, even trivial members.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by tomasatdatabricks <gi...@git.apache.org>.
Github user tomasatdatabricks commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @MrBago Here is the description of the breaking changes. ImageSchema.ocvTypes and ImageSchema.javaOcvTypes changed types from Map[String,Int] to list of OpenCvType.
    
    ImageSchema.ocvTypes: Map[String, Int] -> IndexedSeq[ImageSchema.OcvType]. 
    ImageSchema.javaOcvTypes: java.util.Map[String,Int] -> java.util.List[ImageSchema.OcvType].


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160258674
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    +  }
    +
    +  object OpenCvType {
    +    def get(name: String): OpenCvType = {
    +      ocvTypes.find(x => x.name == name).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv type " + name))
    +    }
    +    def get(mode: Int): OpenCvType = {
    +      ocvTypes.find(x => x.mode == mode).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +    }
    +    val undefinedType = OpenCvType(-1, "N/A", -1)
    --- End diff --
    
    if I understand correctly the name for undefinedType will then be:
    def name: String = "CV_" + dataType + "C" + nChannels 
    which is "CV_-1C-1"?  But below in the tests you are checking for the name to be "Undefined"?  If this is correct, can it be fixed by overriding the name to check if datatype is N/A or something similar and in that case using a name of Undefined?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160267784
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ---
    @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
     @Since("2.3.0")
     object ImageSchema {
     
    -  val undefinedImageType = "Undefined"
    -
       /**
    -   * (Scala-specific) OpenCV type mapping supported
    +   * OpenCv type representation
    +   * @param mode ordinal for the type
    +   * @param dataType open cv data type
    +   * @param nChannels number of color channels
        */
    -  val ocvTypes: Map[String, Int] = Map(
    -    undefinedImageType -> -1,
    -    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
    -  )
    +  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
    +    def name: String = "CV_" + dataType + "C" + nChannels
    +    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
    +  }
    +
    +  object OpenCvType {
    +    def get(name: String): OpenCvType = {
    +      ocvTypes.find(x => x.name == name).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv type " + name))
    +    }
    +    def get(mode: Int): OpenCvType = {
    +      ocvTypes.find(x => x.mode == mode).getOrElse(
    +        throw new IllegalArgumentException("Unknown open cv mode " + mode))
    +    }
    +    val undefinedType = OpenCvType(-1, "N/A", -1)
    +  }
     
       /**
    -   * (Java-specific) OpenCV type mapping supported
    +   * A Mapping of Type to Numbers in OpenCV
    +   *
    +   *        C1 C2  C3  C4
    +   * CV_8U   0  8  16  24
    +   * CV_8S   1  9  17  25
    +   * CV_16U  2 10  18  26
    +   * CV_16S  3 11  19  27
    +   * CV_32S  4 12  20  28
    +   * CV_32F  5 13  21  29
    +   * CV_64F  6 14  22  30
        */
    -  val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava
    +  val ocvTypes = {
    +    val types =
    +      for (nc <- Array(1, 2, 3, 4);
    +           dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F"))
    +        yield (dt, nc)
    +    val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j)
    +    OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
    +  }
    +
    +  val javaOcvTypes = ocvTypes.asJava
    --- End diff --
    
    explicit type.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85880/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160294482
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -71,9 +88,30 @@ def ocvTypes(self):
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +    def ocvTypeByName(self, name):
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(
    +                    self._ocvTypesByName.keys())))
    --- End diff --
    
    style: can we put this on the above line?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Let's fix the PR title to `[SPARK-22730][ML] ...` BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85742/testReport)** for PR 20168 at commit [`70bae2f`](https://github.com/apache/spark/commit/70bae2f7e9d85a5f464f1bfc3a9426136259d5d1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664859
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -55,25 +72,66 @@ def imageSchema(self):
             """
     
             if self._imageSchema is None:
    -            ctx = SparkContext._active_spark_context
    +            ctx = SparkContext.getOrCreate()
                 jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema()
                 self._imageSchema = _parse_datatype_json_string(jschema.json())
             return self._imageSchema
     
         @property
         def ocvTypes(self):
             """
    -        Returns the OpenCV type mapping supported.
    +        Return the supported OpenCV types.
     
    -        :return: a dictionary containing the OpenCV type mapping supported.
    +        :return: a list containing the supported OpenCV types.
     
             .. versionadded:: 2.3.0
             """
     
             if self._ocvTypes is None:
    -            ctx = SparkContext._active_spark_context
    -            self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes())
    -        return self._ocvTypes
    +            ctx = SparkContext.getOrCreate()
    +            ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()
    +            self._ocvTypes = [self._OcvType(name=x.name(),
    +                                            mode=x.mode(),
    +                                            nChannels=x.nChannels(),
    +                                            dataType=x.dataType(),
    +                                            nptype=self._ocvToNumpyMap[x.dataType()])
    +                              for x in ocvTypeList]
    +        return self._ocvTypes[:]
    +
    +
    +    def ocvTypeByName(self, name):
    +        """
    +        Return the supported OpenCvType with matching name or raise error if there is no matching type.
    +
    +        :param: str name: OpenCv type name; must be equal to name of one of the supported types.
    +        :return: OpenCvType with matching name.
    +
    +        """
    +
    +        if self._ocvTypesByName is None:
    +            self._ocvTypesByName = {x.name: x for x in self.ocvTypes}
    +        if name not in self._ocvTypesByName:
    +            raise ValueError(
    +                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
    +                (name, str(self._ocvTypesByName.keys())))
    +        return self._ocvTypesByName[name]
    +
    +    def ocvTypeByMode(self, mode):
    +        """
    +        Return the supported OpenCvType with matching mode or raise error if there is no matching type.
    +
    +        :param: int mode: OpenCv type mode; must be equal to mode of one of the supported types.
    +        :return: OpenCvType with matching mode.
    --- End diff --
    
    `OpenCvType` -> `OcvType`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: SPARK-22730 Add ImageSchema support for non-integer imag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85742/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #91608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91608/testReport)** for PR 20168 at commit [`5a632f5`](https://github.com/apache/spark/commit/5a632f5f60afd2e8c225703532d17ed9e56e47f7).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160255996
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1843,6 +1844,28 @@ def tearDown(self):
     
     class ImageReaderTest(SparkSessionTestCase):
     
    +    def test_ocv_types(self):
    +        ocvList = ImageSchema.ocvTypes
    +        self.assertEqual("Undefined", ocvList[0].name)
    +        self.assertEqual(-1, ocvList[0].mode)
    +        self.assertEqual("N/A", ocvList[0].dataType)
    +        for x in ocvList:
    +            self.assertEqual(x, ImageSchema.ocvTypeByName(x.name))
    +            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
    +
    +    def test_conversions(self):
    +        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
    --- End diff --
    
    can you pass a seed to the random generator (I believe the policy for spark is to always generate the same random numbers in tests in order to reduce flaky tests)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r160294154
  
    --- Diff: python/pyspark/ml/image.py ---
    @@ -150,29 +194,27 @@ def toImage(self, array, origin=""):
                     "array argument should be numpy.ndarray; however, it got [%s]." % type(array))
     
             if array.ndim != 3:
    -            raise ValueError("Invalid array shape")
    +            raise ValueError("Invalid array shape %s" % str(array.shape))
     
             height, width, nChannels = array.shape
    -        ocvTypes = ImageSchema.ocvTypes
    -        if nChannels == 1:
    -            mode = ocvTypes["CV_8UC1"]
    -        elif nChannels == 3:
    -            mode = ocvTypes["CV_8UC3"]
    -        elif nChannels == 4:
    -            mode = ocvTypes["CV_8UC4"]
    -        else:
    -            raise ValueError("Invalid number of channels")
    +        dtype = array.dtype
    +        if dtype not in self._numpyToOcvMap:
    +            raise ValueError(
    +                "Unsupported array data type '%s', currently only supported formats are %s" %
    +                (str(array.dtype), str(self._numpyToOcvMap.keys())))
    --- End diff --
    
    "%s" will call `__str__` automatically so you don't need to wrap in `str`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #85878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85878/testReport)** for PR 20168 at commit [`eee25ce`](https://github.com/apache/spark/commit/eee25ceffde2c1d6ca248eceb17a559e2f921cc6).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for all OpenCv...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    @tomasatdatabricks @MrBago @WeichenXu123 sorry, any updates on this PR?  It has been a while.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20168#discussion_r161664060
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala ---
    @@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext {
             val bytes20 = getData(row).slice(0, 20)
     
             val (expectedMode, expectedBytes) = firstBytes20(filename)
    --- End diff --
    
    Since you use `ocvTypeByName` below to look up for it, it should be named as `expectedType` or `expectedTypeName`, other than `expectedMode`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20168: [SPARK-22730][ML] Add ImageSchema support for non-intege...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20168
  
    **[Test build #86061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86061/testReport)** for PR 20168 at commit [`d2a864e`](https://github.com/apache/spark/commit/d2a864e346c5b6606ddac497c5b81916b24d088e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org