You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gczsjdy <gi...@git.apache.org> on 2017/07/14 03:15:05 UTC

[GitHub] spark pull request #18632: Reset BufferHolder while initialize an UnsafeRowW...

GitHub user gczsjdy opened a pull request:

    https://github.com/apache/spark/pull/18632

    Reset BufferHolder while initialize an UnsafeRowWriter

    ## What changes were proposed in this pull request?
    
    `UnsafeRowWriter`'s construtor should contain `BufferHolder.reset` to make the writer out of the box.
    
    While writing `UnsafeRow` using `UnsafeRowWriter`, developers should manually call `BufferHolder.reset` to make the `BufferHolder`'s `cursor`(which indicates where to write variable length portion) right in order to write variable length fields like UTF8String, byte[], etc.
    
    If a developer doesn't reuse the `BufferHolder` so maybe he never noticed `reset` and the comments in code, the `UnsafeRow` won't be correct if he also writes variable length UTF8String. This API design doesn't make sense. We should reset the `BufferHolder` to make `UnsafeRowWriter` out of the box, but not let user read the code and manually call it.
     
    ## How was this patch tested?
    
    `makeKeyRow(long k1, String k2)` in `RowBasedKeyValueBatchSuite` already tested it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gczsjdy/spark unsaferow_buf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18632
    
----
commit a0985404363f2975bf673e37306d0bd1c700a4d0
Author: GuoChenzhao <ch...@intel.com>
Date:   2017-07-14T02:17:23Z

    Reset BufferHolder while initialize an UnsafeRowWriter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    OK to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by gczsjdy <gi...@git.apache.org>.
Github user gczsjdy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18632#discussion_r127908258
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) {
         this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
         this.fixedSize = nullBitsSize + 8 * numFields;
         this.startingOffset = holder.cursor;
    +    holder.reset();
    --- End diff --
    
    @cloud-fan @viirya For your worries, maybe we can move the `holder.reset()` to `BufferHolder`'s constructor. Then the holder will be reset only once, and also it'ok to continue writing from a buffer's current cursor. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18632#discussion_r127866899
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) {
         this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
         this.fixedSize = nullBitsSize + 8 * numFields;
         this.startingOffset = holder.cursor;
    +    holder.reset();
    --- End diff --
    
    I not very sure about this, but what if this writer is for inner struct? Then the buffer holder is shared between many writers and we should only reset once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    Mostly `UnsafeRowWriter` instances and `BufferHolder`  instance are reused during the query execution and we need to reset the buffer holder for every input row, so I don't think there is a place  inside `UnsafeRowWriter` can call `holder.reset` automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by gczsjdy <gi...@git.apache.org>.
Github user gczsjdy closed the pull request at:

    https://github.com/apache/spark/pull/18632


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18632#discussion_r127956559
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) {
         this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
         this.fixedSize = nullBitsSize + 8 * numFields;
         this.startingOffset = holder.cursor;
    +    holder.reset();
    --- End diff --
    
    > move the holder.reset() to BufferHolder's constructor
    
    Resetting itself in its constructor doesn't make any sense


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by gczsjdy <gi...@git.apache.org>.
Github user gczsjdy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18632#discussion_r127907518
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) {
         this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
         this.fixedSize = nullBitsSize + 8 * numFields;
         this.startingOffset = holder.cursor;
    +    holder.reset();
    --- End diff --
    
    What do you mean by 'writer is for inner struct'? @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18632#discussion_r127904688
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) {
         this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
         this.fixedSize = nullBitsSize + 8 * numFields;
         this.startingOffset = holder.cursor;
    +    holder.reset();
    --- End diff --
    
    I think we don't guarantee all the calls to this `UnsafeRowWriter` constructor are at the timing of new incoming record. It could be possible that we pass a `BufferHolder` into this constructor but the holder is already written with some data and we want to continue writing from current cursor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: Reset BufferHolder while initialize an UnsafeRowWriter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    **[Test build #79702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79702/testReport)** for PR 18632 at commit [`a098540`](https://github.com/apache/spark/commit/a0985404363f2975bf673e37306d0bd1c700a4d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79702/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by gczsjdy <gi...@git.apache.org>.
Github user gczsjdy commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    @cloud-fan @viirya @gatorsmile Could you please help me review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    **[Test build #79702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79702/testReport)** for PR 18632 at commit [`a098540`](https://github.com/apache/spark/commit/a0985404363f2975bf673e37306d0bd1c700a4d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by gczsjdy <gi...@git.apache.org>.
Github user gczsjdy commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    @cloud-fan You are right, thanks. I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18632
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org