You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ap...@apache.org on 2021/07/21 11:12:07 UTC

[arrow] branch master updated: ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload" error

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 998f472  ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload" error
998f472 is described below

commit 998f4723976c3caa48bedbd4b432b531b842b89b
Author: Hideaki Hayashi <hi...@gmail.com>
AuthorDate: Wed Jul 21 13:10:23 2021 +0200

    ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload" error
    
    Judging from the comment "avoid spending time validating UTF8 data" with the setting of the false value to the cast_options.allow_invalid_utf8, it seems to me this was intended to be true rather than false.
    
    Also, this resolved the error I was getting through the arrow R package, which seems to be ARROW-12007.
    
    Closes #10759 from hideaki/cancel_unnecessary_utf8_check
    
    Authored-by: Hideaki Hayashi <hi...@gmail.com>
    Signed-off-by: Antoine Pitrou <an...@python.org>
---
 cpp/src/parquet/arrow/reader_internal.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpp/src/parquet/arrow/reader_internal.cc b/cpp/src/parquet/arrow/reader_internal.cc
index 0ffa3e8..f136870 100644
--- a/cpp/src/parquet/arrow/reader_internal.cc
+++ b/cpp/src/parquet/arrow/reader_internal.cc
@@ -429,7 +429,7 @@ Status TransferBinary(RecordReader* reader, MemoryPool* pool,
   }
   ::arrow::compute::ExecContext ctx(pool);
   ::arrow::compute::CastOptions cast_options;
-  cast_options.allow_invalid_utf8 = false;  // avoid spending time validating UTF8 data
+  cast_options.allow_invalid_utf8 = true;  // avoid spending time validating UTF8 data
 
   auto binary_reader = dynamic_cast<BinaryRecordReader*>(reader);
   DCHECK(binary_reader);