You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aa8y <gi...@git.apache.org> on 2015/12/10 08:24:15 UTC

[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

GitHub user aa8y opened a pull request:

    https://github.com/apache/spark/pull/10247

    [SPARK-11962] Added getAsOpt functions to Row and tests for it.

    getAsOpt[T] functions have been added to Row and GeneicRowWithSchema to get the values present in a row object optionally. Corresponding tests have also been added. Filed a pull request again after fixing style issues.
    
    This is a new pull request created to replace #10028.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aa8y/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10247
    
----
commit 7e187055deea5feef2669b18a906b49dc4d24443
Author: Arun Allamsetty <ar...@gmail.com>
Date:   2015-12-10T06:02:31Z

    Added getAsOpt methods and tests for it.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-217117009
  
    I believe this PR should be closed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-217194679
  
    Won't do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164601273
  
    I still don't think we should swallow class cast exceptions.  It means the person doesn't know the schema of their data (and they probably want to know they are wrong).  Similarly I'd throw `UnsupportedOperationException` as we do elsewhere when the schema is not available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-163522426
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164604835
  
    Isn't the whole point of `Option` to not throw exceptions? Actually it's to not return `null`s I guess. But throwing an exception would defeat the purpose of returning an `Option`. If a person really wants that information, (s)he can still use the existing `getAs` methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47579991
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -325,6 +341,14 @@ trait Row extends Serializable {
       def getAs[T](i: Int): T = get(i).asInstanceOf[T]
     
       /**
    +   * Returns the Optional value of a given fieldName.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned. None is
    +   * also returned if either the schema is not defined or else the given fieldName does not exist
    +   * in the schema.
    +   */
    +  def getAsOpt[T](fieldName: String): Option[T] = None
    --- End diff --
    
    I considered returning a `Try`. But there are a lot of cases, personally, where I don't care about the error since I know the kind of bad data I can expect. But that's just me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47579021
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -316,6 +317,21 @@ trait Row extends Serializable {
       }
     
       /**
    +   * Returns the Optional value at position i.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned.
    +   */
    +  def getAsOpt[T](i: Int): Option[T] = {
    --- End diff --
    
    `getOption` would be more inline with similar functions in the Scala collections API (i.e. `headOption` `reduceOption`).  We also typically avoid abbreviation unless its very common.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by jodersky <gi...@git.apache.org>.
Github user jodersky commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47417301
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -325,6 +341,14 @@ trait Row extends Serializable {
       def getAs[T](i: Int): T = get(i).asInstanceOf[T]
     
       /**
    +   * Returns the Optional value of a given fieldName.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned. None is
    +   * also returned if either the schema is not defined or else the given fieldName does not exist
    +   * in the schema.
    +   */
    +  def getAsOpt[T](fieldName: String): Option[T] = None.asInstanceOf[Option[T]]
    --- End diff --
    
    Nitpick: `asInstanceOf` is not required here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164600717
  
    @marmbrus I agree. I wasn't a fan of `getAsOpt` but couldn't think of a better name then. I've updated them all to `getOption`. So does this look better now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47580230
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -325,6 +341,14 @@ trait Row extends Serializable {
       def getAs[T](i: Int): T = get(i).asInstanceOf[T]
     
       /**
    +   * Returns the Optional value of a given fieldName.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned. None is
    +   * also returned if either the schema is not defined or else the given fieldName does not exist
    +   * in the schema.
    +   */
    +  def getAsOpt[T](fieldName: String): Option[T] = None
    --- End diff --
    
    Yeah, but this is data coming out of Spark SQL, so the type will always be correct.  Bad data would not make it this far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164606013
  
    Don't catch the `ClassCastException` or throw `UnsupportedOperationException` when the schema is not present (which will happen if you use `fieldIndexes` anyway).  Just check out the structure of the other get by field name methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47579389
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -325,6 +341,14 @@ trait Row extends Serializable {
       def getAs[T](i: Int): T = get(i).asInstanceOf[T]
     
       /**
    +   * Returns the Optional value of a given fieldName.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned. None is
    +   * also returned if either the schema is not defined or else the given fieldName does not exist
    +   * in the schema.
    +   */
    +  def getAsOpt[T](fieldName: String): Option[T] = None
    --- End diff --
    
    Again, I'm not sure we want to hide errors from the user by returning `None`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47579246
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
    @@ -316,6 +317,21 @@ trait Row extends Serializable {
       }
     
       /**
    +   * Returns the Optional value at position i.
    +   * If the value is null or if it cannot be cast to the requested type, None is returned.
    +   */
    +  def getAsOpt[T](i: Int): Option[T] = {
    +    val value = {
    +      if (i < length) {
    +        if (isNullAt(i)) None
    +        else catching(classOf[ClassCastException]) opt getAs[T](i)
    --- End diff --
    
    We typically avoid infix notation and this style of `if`.
    
    https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
    
    Also, why do we want to hide `ClassCastException` from the user, this seems more confusing than helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164594278
  
    @jodersky: I incorporated your recommendations and also updated my branch with the current master. Can you or one of the admins please ask Jenkins to test this build. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164605446
  
    I think the point of `Option` is typesafe null handling, not hiding structural problems with your code (i.e. using the wrong type for a column).  If this fails then you are going to get `None` always, not just for bad data (because the type of a column will always be the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by jodersky <gi...@git.apache.org>.
Github user jodersky commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10247#discussion_r47417343
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -211,6 +211,18 @@ class GenericRowWithSchema(values: Array[Any], override val schema: StructType)
       protected def this() = this(null, null)
     
       override def fieldIndex(name: String): Int = schema.fieldIndex(name)
    +
    +  override def getAsOpt[T](fieldName: String): Option[T] = {
    +    val index = schema match {
    +      case null => -1
    +      case _ =>
    +        val fieldNames = schema.fieldNames
    +        if (fieldNames.contains(fieldName)) fieldNames.indexOf(fieldName) else -1
    +    }
    +    val value = if (index < 0) None else getAsOpt[T](index)
    +
    +    value.asInstanceOf[Option[T]]
    --- End diff --
    
    Nitpick: `asInstanceOf` is not required here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y closed the pull request at:

    https://github.com/apache/spark/pull/10247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by jodersky <gi...@git.apache.org>.
Github user jodersky commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164596063
  
    Great! Unfortunately I can't help with the tests though.
    Check out this [page](https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers) to get a list of maintainers and their components. In this case I think @rxin and @marmbrus should be the ones to check your PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11962] Added getAsOpt functions to Row ...

Posted by aa8y <gi...@git.apache.org>.
Github user aa8y commented on the pull request:

    https://github.com/apache/spark/pull/10247#issuecomment-164605606
  
    What would you propose as a solution instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org