You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2019/02/26 21:36:00 UTC

[jira] [Created] (HIVE-21328) Call To Hadoop Text getBytes() Without Call to getLength()

BELUGA BEHR created HIVE-21328:
----------------------------------

             Summary: Call To Hadoop Text getBytes() Without Call to getLength()
                 Key: HIVE-21328
                 URL: https://issues.apache.org/jira/browse/HIVE-21328
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 4.0.0, 3.2.0
            Reporter: BELUGA BEHR


I'm not sure if there is actually a bug, but this looks highly suspect:

{code:java}
  public Object set(final Object o, final Text text) {
    return new BytesWritable(text == null ? null : text.getBytes());
  }
{code}

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106

There are two components to a Text object.  There are the internal bytes and the length of the bytes.  The two are independent.  I.e., a quick "reset" on the Text object simply sets the internal length counter to zero.  This code is potentially looking at obsolete data that it shouldn't be seeing because it is not considering the length of the Text.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)