You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2021/05/21 01:58:00 UTC

[jira] [Created] (PARQUET-2052) Integer overflow when writing huge binary using dictionary encoding

Chao Sun created PARQUET-2052:
---------------------------------

             Summary: Integer overflow when writing huge binary using dictionary encoding
                 Key: PARQUET-2052
                 URL: https://issues.apache.org/jira/browse/PARQUET-2052
             Project: Parquet
          Issue Type: Bug
            Reporter: Chao Sun
            Assignee: Chao Sun


To check whether it should fallback to plain encoding, {{DictionaryValuesWriter}} currently use two variables: {{dictionaryByteSize}} and {{maxDictionaryByteSize}}, both of which are integer. This will cause issue when one first writes a relatively small binary within the threshold and then write a huge string which cause {{dictionaryByteSize}} overflow and becoming negative.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)