You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/05/17 08:58:00 UTC

[GitHub] spark pull request #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep U...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/18014

    [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeArrayData for array

    ## What changes were proposed in this pull request?
    
    This PR enhances `ColumnVector` to keep `UnsafeArrayData` for array to use `ColumnVector` for table cache (e.g. CACHE table, DataFrame.cache).
    
    Current `ColumnVector` accepts only primitive-type Java array as an input for array. It is good to keep data from Parquet.
    
    This PR changed or added the following APIs:
    
    `ColumnVector ColumnVector.allocate(int capacity, DataType type, MemoryMode mode, boolean useUnsafeArrayData)`
    * When the last is true, the `ColumnVector` can keep `UnsafeArrayData`. If it is false, the `ColumnVector` cannot keep `UnsafeArrayData`. 
    
    `int ColumnVector.putArray(int rowId, ArrayData array)`
    * When this `ColumnVector` was generated with `useUnsafeArrayData=true`, this method stores `UnsafeArrayData` into `ColumnVector`. Otherwise, throw an exception.
    
    `ArrayData ColumnVector.getArray(int rowId)`
    * When this `ColumnVector` was generated with `useUnsafeArrayData=true`, this method returns  `UnsafeArrayData`.
    
    ## How was this patch tested?
    
    Update existing testsuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark OnHeapColumnVector

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18014
    
----
commit b939a0819ac9feb4ff779f50a19ceb101cada999
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-04-24T16:56:48Z

    Keep UnsafeArrayData for Array in ColumnVector

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77024/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    @cloud-fan Could you please let us know your thoughts? 
    Is it better to use binary type or to add simple logic for `UnsafeArrayData` and others in `ColumnVector`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    Yes. Let me implement new `putArray(int rowId, Array array)` that uses `ColumnVector.Array` and stores primitive-type elements into a primitive array (e.g. `intData`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    **[Test build #77023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77023/testReport)** for PR 18014 at commit [`b939a08`](https://github.com/apache/spark/commit/b939a0819ac9feb4ff779f50a19ceb101cada999).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    **[Test build #77024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77024/testReport)** for PR 18014 at commit [`3f726ba`](https://github.com/apache/spark/commit/3f726ba8ff6d37c5b9b25590b5d76a1376352aa3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    **[Test build #77024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77024/testReport)** for PR 18014 at commit [`3f726ba`](https://github.com/apache/spark/commit/3f726ba8ff6d37c5b9b25590b5d76a1376352aa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org