You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/24 03:29:10 UTC

[GitHub] [arrow] emkornfield opened a new pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

emkornfield opened a new pull request #8516:
URL: https://github.com/apache/arrow/pull/8516


   The arrow code path doesn't call this method of not-null columns with no null parents, so this wasn't caught in arrow tests.
   
   The only time we can pass through counts is if definition level happens to be zero.
   
   Also fix some potential UBSan issues with adding offset to values.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8516:
URL: https://github.com/apache/arrow/pull/8516#issuecomment-724741673


   Rebased.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8516:
URL: https://github.com/apache/arrow/pull/8516


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #8516:
URL: https://github.com/apache/arrow/pull/8516#discussion_r520337463



##########
File path: cpp/src/parquet/column_writer.cc
##########
@@ -1190,7 +1192,7 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<
                                   int64_t* out_spaced_values_to_write,
                                   int64_t* null_count) {
     if (bits_buffer_ == nullptr) {
-      if (!level_info_.HasNullableValues()) {
+      if (level_info_.def_level == 0) {

Review comment:
       I added comments and a check.  I document the method to make it clearer that it always updates the output parameters as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8516:
URL: https://github.com/apache/arrow/pull/8516#issuecomment-715666118


   https://issues.apache.org/jira/browse/PARQUET-1935


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8516:
URL: https://github.com/apache/arrow/pull/8516#discussion_r519886116



##########
File path: cpp/src/parquet/column_writer.cc
##########
@@ -1190,7 +1192,7 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<
                                   int64_t* out_spaced_values_to_write,
                                   int64_t* null_count) {
     if (bits_buffer_ == nullptr) {
-      if (!level_info_.HasNullableValues()) {
+      if (level_info_.def_level == 0) {

Review comment:
       Interesting. Why is a special case needed for this? Would you like to add a comment?

##########
File path: cpp/src/parquet/column_writer.cc
##########
@@ -1190,7 +1192,7 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<
                                   int64_t* out_spaced_values_to_write,
                                   int64_t* null_count) {
     if (bits_buffer_ == nullptr) {
-      if (!level_info_.HasNullableValues()) {
+      if (level_info_.def_level == 0) {

Review comment:
       Do you also want to check `def_levels` for null/non-null-ness? (debug mode check perhaps?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org