You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ap...@apache.org on 2021/07/21 11:12:07 UTC
[arrow] branch master updated: ARROW-12007: [C++] Loading parquet
file returns "Invalid UTF8 payload" error
This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 998f472 ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload" error
998f472 is described below
commit 998f4723976c3caa48bedbd4b432b531b842b89b
Author: Hideaki Hayashi <hi...@gmail.com>
AuthorDate: Wed Jul 21 13:10:23 2021 +0200
ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload" error
Judging from the comment "avoid spending time validating UTF8 data" with the setting of the false value to the cast_options.allow_invalid_utf8, it seems to me this was intended to be true rather than false.
Also, this resolved the error I was getting through the arrow R package, which seems to be ARROW-12007.
Closes #10759 from hideaki/cancel_unnecessary_utf8_check
Authored-by: Hideaki Hayashi <hi...@gmail.com>
Signed-off-by: Antoine Pitrou <an...@python.org>
---
cpp/src/parquet/arrow/reader_internal.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/cpp/src/parquet/arrow/reader_internal.cc b/cpp/src/parquet/arrow/reader_internal.cc
index 0ffa3e8..f136870 100644
--- a/cpp/src/parquet/arrow/reader_internal.cc
+++ b/cpp/src/parquet/arrow/reader_internal.cc
@@ -429,7 +429,7 @@ Status TransferBinary(RecordReader* reader, MemoryPool* pool,
}
::arrow::compute::ExecContext ctx(pool);
::arrow::compute::CastOptions cast_options;
- cast_options.allow_invalid_utf8 = false; // avoid spending time validating UTF8 data
+ cast_options.allow_invalid_utf8 = true; // avoid spending time validating UTF8 data
auto binary_reader = dynamic_cast<BinaryRecordReader*>(reader);
DCHECK(binary_reader);