You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/04/08 12:48:07 UTC

[GitHub] spark pull request: [SQL] Faster Scala row conversion

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/5419

    [SQL] Faster Scala row conversion

    This is a follow-up of #5279 and #5398. `ScalaReflection.convertRowToScala` is on a critical path, but was implemented in a rather inefficient way.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark faster-row-conversion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5419
    
----
commit d698d033c03bdeaa891181dfff53b8729311d654
Author: Cheng Lian <li...@databricks.com>
Date:   2015-04-08T10:44:12Z

    Faster Scala row conversion

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5419#issuecomment-90900218
  
      [Test build #29856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29856/consoleFull) for   PR 5419 at commit [`d698d03`](https://github.com/apache/spark/commit/d698d033c03bdeaa891181dfff53b8729311d654).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5419#discussion_r28029884
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -150,7 +150,7 @@ abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanTy
         }.toSeq
       }
     
    -  def schema: StructType = StructType.fromAttributes(output)
    +  lazy val schema: StructType = StructType.fromAttributes(output)
    --- End diff --
    
    LGTM.  Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng closed the pull request at:

    https://github.com/apache/spark/pull/5419


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5419#issuecomment-90900232
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29856/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5419#discussion_r28029875
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -95,10 +95,14 @@ trait ScalaReflection {
       }
     
       def convertRowToScala(r: Row, schema: StructType): Row = {
    -    // TODO: This is very slow!!!
    -    new GenericRowWithSchema(
    -      r.toSeq.zip(schema.fields.map(_.dataType))
    -        .map(r_dt => convertToScala(r_dt._1, r_dt._2)).toArray, schema)
    +    val fields = schema.fields
    +    val values = new Array[Any](r.length)
    +    var i = 0
    +    while (i < values.length) {
    +      values(i) = convertToScala(r(i), fields(i).dataType)
    +      i += 1
    +    }
    +    new GenericRowWithSchema(values, schema)
    --- End diff --
    
    Do you mind reverting this to avoid conflicting with #5279?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5419#issuecomment-90879308
  
      [Test build #29856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29856/consoleFull) for   PR 5419 at commit [`d698d03`](https://github.com/apache/spark/commit/d698d033c03bdeaa891181dfff53b8729311d654).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Faster Scala row conversion

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/5419#issuecomment-91146773
  
    Actually the `convertRowToScala` part is already handled in #5279, and the lazy val schema part is handled in #5398, so I'm closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org