You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2021/05/21 01:58:00 UTC
[jira] [Created] (PARQUET-2052) Integer overflow when writing huge
binary using dictionary encoding
Chao Sun created PARQUET-2052:
---------------------------------
Summary: Integer overflow when writing huge binary using dictionary encoding
Key: PARQUET-2052
URL: https://issues.apache.org/jira/browse/PARQUET-2052
Project: Parquet
Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun
To check whether it should fallback to plain encoding, {{DictionaryValuesWriter}} currently use two variables: {{dictionaryByteSize}} and {{maxDictionaryByteSize}}, both of which are integer. This will cause issue when one first writes a relatively small binary within the threshold and then write a huge string which cause {{dictionaryByteSize}} overflow and becoming negative.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)