You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zuowang <gi...@git.apache.org> on 2016/03/04 03:04:02 UTC

[GitHub] spark pull request: Avoid call defaultSize of ObjectType

GitHub user zuowang opened a pull request:

    https://github.com/apache/spark/pull/11508

    Avoid call defaultSize of ObjectType

    ## What changes were proposed in this pull request?
    
    Avoid call defaultSize of ObjectType.
    
    ## How was this patch tested?
    
    Manual test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuowang/spark fix-exception-in-ObjectType

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11508
    
----
commit 95603ac2ce2ecce6c992db72ecdcdcad79a203e8
Author: Zuo Wang <zu...@intel.com>
Date:   2016-03-03T06:08:59Z

    Avoid call defaultSize of ObjectType
    
    Avoid call defaultSize of ObjectType.
    
    Manual test.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-216619359
  
    @zuowang We already have a default size (4k) for ObjectType, do you mind close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-196560959
  
     - Please propagate more information about the problem from JIRA into the description of the PR.  This is going to become the commit message.
     - Every bug fix needs a regression test.  Manual testing is not acceptable.  Try and take the problem reported by the user and distill it to the simplest possible code that reproduces the problem.  Add this to `DatasetSuite`.
     - 0 is probably not a great estimate for the size of an object.
     - Instead of special casing the handling in one place, it would probably be better to implment this function for ObjectType.  Probably pick the same value we use for binary or string types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Avoid call defaultSize of ObjectType

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-192059151
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Avoid call defaultSize of ObjectType

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-192070631
  
    JIRA? Motivation for the change? This isn't a good description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by koertkuipers <gi...@git.apache.org>.
Github user koertkuipers commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-202136461
  
    would it be possible to have a variation of ObjectType that can take in info like defaultSize which it takes from the real type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by zuowang <gi...@git.apache.org>.
Github user zuowang closed the pull request at:

    https://github.com/apache/spark/pull/11508


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by koertkuipers <gi...@git.apache.org>.
Github user koertkuipers commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-202111907
  
    it might seem easiest to put a defaultSize on ObjectType, but i think that is masking the real problem, which is that the optimizer replaces the real types with ObjectType in EliminateSerialization to avoid serialization, and as a result the actual type and defaultSize info is lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11508#discussion_r55007233
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -320,8 +321,14 @@ abstract class UnaryNode extends LogicalPlan {
       override def statistics: Statistics = {
         // There should be some overhead in Row object, the size should not be zero when there is
         // no columns, this help to prevent divide-by-zero error.
    -    val childRowSize = child.output.map(_.dataType.defaultSize).sum + 8
    -    val outputRowSize = output.map(_.dataType.defaultSize).sum + 8
    +    val childRowSize = child.output.map(n => n.dataType match {
    --- End diff --
    
    Nit, but this can be simpler in Scala
    
    ```
    val childRowSize = child.output.map(_.dataType).map {
      case _: ObjectType => 0L
      case dt => dt.defaultSize
    }.sum + 8
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by zuowang <gi...@git.apache.org>.
Github user zuowang commented on the pull request:

    https://github.com/apache/spark/pull/11508#issuecomment-192072778
  
    It's for JIRA https://issues.apache.org/jira/browse/SPARK-13531


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13531] [SQL] Avoid call defaultSize of ...

Posted by zuowang <gi...@git.apache.org>.
Github user zuowang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11508#discussion_r55034146
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -320,8 +321,14 @@ abstract class UnaryNode extends LogicalPlan {
       override def statistics: Statistics = {
         // There should be some overhead in Row object, the size should not be zero when there is
         // no columns, this help to prevent divide-by-zero error.
    -    val childRowSize = child.output.map(_.dataType.defaultSize).sum + 8
    -    val outputRowSize = output.map(_.dataType.defaultSize).sum + 8
    +    val childRowSize = child.output.map(n => n.dataType match {
    --- End diff --
    
    Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org