You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2016/07/07 14:12:30 UTC

[GitHub] spark pull request #14091: [SPARK-16412][SQL] Generate Java code that gets a...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/14091

    [SPARK-16412][SQL] Generate Java code that gets an array in each column of CachedBatch when DataFrame.cache() is called

    ## What changes were proposed in this pull request?
    
    Waiting #11956 to be merged.
    
    This PR generates Java code to directly get an array of each column from CachedBatch when DataFrame.cache() is called. This is done in whole stage code generation.
    
    When DataFrame.cache() is called, data is stored as column-oriented storage (columnar cache) in CachedBatch. This PR avoid conversion from column-oriented storage to row-oriented storage. This PR handles an array type that is stored into a column. 
    
    This PR generates code both for row-oriented storage and column-oriented storage only if
     - InMemoryColumnarTableScan exists in a plan sub-tree. A decision is performed by checking an given iterator is ColumnaIterator at runtime
     - Sort or join does not exist in a plan sub-tree. 
    
    This PR generates Java code for columnar cache only if types in all columns, which are accessed in operations, are primitive or an array
    
    I will add benchmark suites into  [here](https://github.com/kiszk/spark/blob/SPARK-14098/sql/core/src/test/scala/org/apache/spark/sql/DataFrameCacheBenchmark.scala)
    
    
    
    ## How was this patch tested?
    
    Added new tests into `DataFrameCacheSuite.scala`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-16412

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14091
    
----
commit 09af5a5851786b918f45c6f997b1c357745fe883
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2016-07-07T10:36:14Z

    support codegen for an array in CachedBatch

commit 8e218e38d5acb6c04db221fcd3cd6d2483926552
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2016-07-07T10:36:34Z

    update test suites

commit 54df41c8691f02dd9eac3eef3d816a130b87a5c9
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2016-07-07T13:18:58Z

    remove debug print

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    **[Test build #61913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61913/consoleFull)** for PR 14091 at commit [`54df41c`](https://github.com/apache/spark/commit/54df41c8691f02dd9eac3eef3d816a130b87a5c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL][WIP] Generate Java code that gets an ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    Hi @kiszk, I just wonder if it is still WIP (just for curiosity).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL][WIP] Generate Java code that gets an ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    **[Test build #63350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63350/consoleFull)** for PR 14091 at commit [`61a4754`](https://github.com/apache/spark/commit/61a4754b898755e293a741fa74518d6e76c5c538).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL][WIP] Generate Java code that gets an ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    **[Test build #61913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61913/consoleFull)** for PR 14091 at commit [`54df41c`](https://github.com/apache/spark/commit/54df41c8691f02dd9eac3eef3d816a130b87a5c9).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL][WIP] Generate Java code that gets an ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    **[Test build #63350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63350/consoleFull)** for PR 14091 at commit [`61a4754`](https://github.com/apache/spark/commit/61a4754b898755e293a741fa74518d6e76c5c538).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL] Generate Java code that gets an array...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61913/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14091: [SPARK-16412][SQL][WIP] Generate Java code that gets an ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14091
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63350/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14091: [SPARK-16412][SQL][WIP] Generate Java code that g...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14091


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org