You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mateiz <gi...@git.apache.org> on 2014/08/17 04:11:15 UTC

[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

GitHub user mateiz opened a pull request:

    https://github.com/apache/spark/pull/1990

    [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

    BroadcastHashJoin has a broadcastFuture variable that tries to collect
    the broadcasted table in a separate thread, but this doesn't help
    because it's a lazy val that only gets initialized when you attempt to
    build the RDD. Thus queries that broadcast multiple tables would collect
    and broadcast them sequentially. I changed this to a val to let it start
    collecting right when the operator is created.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mateiz/spark spark-3084

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1990
    
----
commit f468766e2051f323ed81ecc53c27bed7becdc9b1
Author: Matei Zaharia <ma...@databricks.com>
Date:   2014-08-17T02:09:34Z

    [SPARK-3084] Collect broadcasted tables in parallel in joins
    
    BroadcastHashJoin has a broadcastFuture variable that tries to collect
    the broadcasted table in a separate thread, but this doesn't help
    because it's a lazy val that only gets initialized when you attempt to
    build the RDD. Thus queries that broadcast multiple tables would collect
    and broadcast them sequentially. I changed this to a val to let it start
    collecting right when the operator is created.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1990#issuecomment-52413314
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1990#issuecomment-52522510
  
    All of the unit test failures look like connections refused to the thrift server (a known flakey test suite).  I'm going to go ahead and merge this into master and 1.1.  Thanks Matei!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1990#issuecomment-52412546
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18686/consoleFull) for   PR 1990 at commit [`f468766`](https://github.com/apache/spark/commit/f468766e2051f323ed81ecc53c27bed7becdc9b1).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CompressedSerializer(FramedSerializer):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1990#issuecomment-52411601
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18686/consoleFull) for   PR 1990 at commit [`f468766`](https://github.com/apache/spark/commit/f468766e2051f323ed81ecc53c27bed7becdc9b1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1990


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1990#issuecomment-52414552
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org