You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/04/25 22:40:37 UTC

[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/12667

    [SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD

    ## What changes were proposed in this pull request?
    
    Builds on https://github.com/apache/spark/pull/12243 to help benchmark improvements by interleaving CPU and I/O in FileScanRDD. 
    
    ## How was this patch tested?
    
    Existing Tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark filescan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12667
    
----
commit cc6d98a17f6fa4249951802f981c2224d354e651
Author: Nong Li <no...@databricks.com>
Date:   2016-04-05T20:36:34Z

    [SPARK-14467][SQL] Interleave CPU and IO better in FileScanRDD.
    
    This patch updates FileScanRDD to start reading from the next file while the current file
    is being processed. The goal is to have better interleaving of CPU and IO. It does this
    by launching a future which will asynchronously start preparing the next file to be read.
    The expectation is that the async task is IO intensive and the current file (which
    includes all the computation for the query plan) is CPU intensive. For some file formats,
    this would just mean opening the file and the initial setup. For file formats like
    parquet, this would mean doing all the IO for all the columns.

commit bc11dd580a751b2e39694223ecbf1fa2b4a7bdc0
Author: Nong Li <no...@databricks.com>
Date:   2016-04-07T23:26:50Z

    Simplify and fix tests.

commit 0655e5ef7e7c9d216ddef06500fbfd683941056f
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T19:01:38Z

    Resolve conflicts

commit 8aebf9427e6630046ae297b38f964d0809c3d348
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T19:07:39Z

    restructure

commit 8799cc873900cf9e4c37012e7a6d607eeabfbdd5
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T20:08:47Z

    add nextIterator

commit f3a21672e94cdb67e2cb69d60af327cff0b2cf54
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T20:36:20Z

    cleanup

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-214545001
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56922/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-215536188
  
    **[Test build #57265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57265/consoleFull)** for PR 12667 at commit [`796d5eb`](https://github.com/apache/spark/commit/796d5ebf5d0b24a9e5f49bac4d0661ef833f1f78).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-215512499
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-215513526
  
    **[Test build #57265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57265/consoleFull)** for PR 12667 at commit [`796d5eb`](https://github.com/apache/spark/commit/796d5ebf5d0b24a9e5f49bac4d0661ef833f1f78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-214544995
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by sameeragarwal <gi...@git.apache.org>.
GitHub user sameeragarwal reopened a pull request:

    https://github.com/apache/spark/pull/12667

    [SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD

    ## What changes were proposed in this pull request?
    
    Builds on https://github.com/apache/spark/pull/12243 to help benchmark improvements by interleaving CPU and I/O in FileScanRDD. 
    
    ## How was this patch tested?
    
    Existing Tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark filescan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12667
    
----
commit cc6d98a17f6fa4249951802f981c2224d354e651
Author: Nong Li <no...@databricks.com>
Date:   2016-04-05T20:36:34Z

    [SPARK-14467][SQL] Interleave CPU and IO better in FileScanRDD.
    
    This patch updates FileScanRDD to start reading from the next file while the current file
    is being processed. The goal is to have better interleaving of CPU and IO. It does this
    by launching a future which will asynchronously start preparing the next file to be read.
    The expectation is that the async task is IO intensive and the current file (which
    includes all the computation for the query plan) is CPU intensive. For some file formats,
    this would just mean opening the file and the initial setup. For file formats like
    parquet, this would mean doing all the IO for all the columns.

commit bc11dd580a751b2e39694223ecbf1fa2b4a7bdc0
Author: Nong Li <no...@databricks.com>
Date:   2016-04-07T23:26:50Z

    Simplify and fix tests.

commit 0655e5ef7e7c9d216ddef06500fbfd683941056f
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T19:01:38Z

    Resolve conflicts

commit 8aebf9427e6630046ae297b38f964d0809c3d348
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T19:07:39Z

    restructure

commit 8799cc873900cf9e4c37012e7a6d607eeabfbdd5
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T20:08:47Z

    add nextIterator

commit f3a21672e94cdb67e2cb69d60af327cff0b2cf54
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-25T20:36:20Z

    cleanup

commit ec7d65db3657b8b7ffff77e4acc23842ee1cca29
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-27T22:44:32Z

    Merge branch 'master' of github.com:apache/spark into filescan

commit 796d5ebf5d0b24a9e5f49bac4d0661ef833f1f78
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-27T23:12:44Z

    fix conf

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-215536529
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal closed the pull request at:

    https://github.com/apache/spark/pull/12667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-215536531
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57265/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-214544438
  
    **[Test build #56922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56922/consoleFull)** for PR 12667 at commit [`f3a2167`](https://github.com/apache/spark/commit/f3a21672e94cdb67e2cb69d60af327cff0b2cf54).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12667#issuecomment-214514867
  
    **[Test build #56922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56922/consoleFull)** for PR 12667 at commit [`f3a2167`](https://github.com/apache/spark/commit/f3a21672e94cdb67e2cb69d60af327cff0b2cf54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14467][SQL] Experiments: Async I/O in F...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal closed the pull request at:

    https://github.com/apache/spark/pull/12667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org