You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zeroshade (via GitHub)" <gi...@apache.org> on 2023/05/19 17:01:05 UTC

[GitHub] [arrow] zeroshade opened a new pull request, #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

zeroshade opened a new pull request, #35690:
URL: https://github.com/apache/arrow/pull/35690

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   If a column to be written to parquet is a list array with only nulls in it, then the child will be empty and potentially have nil buffers. This case isn't checked in the `writeDenseArrow` function, so we blindly attempt to grab the bytes from the buffers which might be nil. 
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ### What changes are included in this PR?
   Adding a check of `leafArr.Len()` at the top of `writeDenseArrow`. If the leaf array's length is > 0 we can be sure that the data buffer is non-nil. So we can just bail if the leaf array is empty, nothing to write.
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ### Are these changes tested?
   Unit test is added.
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #35690:
URL: https://github.com/apache/arrow/pull/35690#discussion_r1199387889


##########
go/parquet/pqarrow/encode_arrow.go:
##########
@@ -246,6 +246,10 @@ type binary64arr interface {
 }
 
 func writeDenseArrow(ctx *arrowWriteContext, cw file.ColumnChunkWriter, leafArr arrow.Array, defLevels, repLevels []int16, maybeParentNulls bool) (err error) {
+	if leafArr.Len() == 0 {

Review Comment:
   Turns out you're right and it ended up causing a different test to fail. So it looks like i just needed to add the checks for whether or not the buffer is nil *sigh*. I've updated this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #35690:
URL: https://github.com/apache/arrow/pull/35690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35690:
URL: https://github.com/apache/arrow/pull/35690#issuecomment-1554933139

   * Closes: #35684


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #35690:
URL: https://github.com/apache/arrow/pull/35690#issuecomment-1554933123

   CC @yevgenypats please give this a try and see that it fixes your issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on a diff in pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on code in PR #35690:
URL: https://github.com/apache/arrow/pull/35690#discussion_r1199347189


##########
go/parquet/pqarrow/encode_arrow.go:
##########
@@ -246,6 +246,10 @@ type binary64arr interface {
 }
 
 func writeDenseArrow(ctx *arrowWriteContext, cw file.ColumnChunkWriter, leafArr arrow.Array, defLevels, repLevels []int16, maybeParentNulls bool) (err error) {
+	if leafArr.Len() == 0 {

Review Comment:
   Can't comment on it directly since it's not in the diff, but it's not immediately obvious to me why we'd handle this case differently: https://github.com/apache/arrow/blob/c8363fa209fed4dfa30a0eee70283d62fc6f1627/go/parquet/pqarrow/encode_arrow.go#L270-L275
   
   Also, it seems like that branch may never be taken now? Unless it's only there to cover an edge-case where `leafArr` was reassigned to an empty storage array after the initial `Len() == 0` check: https://github.com/apache/arrow/blob/c8363fa209fed4dfa30a0eee70283d62fc6f1627/go/parquet/pqarrow/encode_arrow.go#L253-L257



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35690:
URL: https://github.com/apache/arrow/pull/35690#issuecomment-1569045402

   Benchmark runs are scheduled for baseline = 83eae1c7ec1c5a8fc5e675110baf02439823ac3b and contender = 584dc7bcde7afa0c67cedf737bac6b4b7a221088. 584dc7bcde7afa0c67cedf737bac6b4b7a221088 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9614df0765e74dd78b4a9e8b1dac6313...c11a305bfaa04256a0cfd81cd2d4cd22/)
   [Finished :arrow_down:0.89% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4d52841f1e0a4345bb06511bae5a660e...97a973a48b5f4c1aa5acb7f587d3dbf1/)
   [Finished :arrow_down:2.61% :arrow_up:5.56%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/21e5b6af6d934cf8a51927f09a824b33...3a26bc43b5d74370b5c2b5de14c57684/)
   [Finished :arrow_down:1.03% :arrow_up:1.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ca7de0385695466083c644c2d26bc3e5...678e29ba1a694843b8656c26bfa133c1/)
   Buildkite builds:
   [Finished] [`584dc7bc` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2913)
   [Finished] [`584dc7bc` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2949)
   [Finished] [`584dc7bc` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2914)
   [Finished] [`584dc7bc` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2939)
   [Finished] [`83eae1c7` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2912)
   [Finished] [`83eae1c7` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2948)
   [Finished] [`83eae1c7` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2913)
   [Finished] [`83eae1c7` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2938)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] yevgenypats commented on pull request #35690: GH-35684: [Go][Parquet] Fix nil dereference with nil list array

Posted by "yevgenypats (via GitHub)" <gi...@apache.org>.
yevgenypats commented on PR #35690:
URL: https://github.com/apache/arrow/pull/35690#issuecomment-1555657910

   Thanks looks good to me! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org