You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/09/28 03:44:00 UTC

[jira] [Created] (PARQUET-1117) ParquetRecordWriter does not provide interface like getRowCount(),getRawDataSize() like org.apache.orc.Writer

liyunzhang_intel created PARQUET-1117:
-----------------------------------------

             Summary: ParquetRecordWriter does not provide interface like getRowCount(),getRawDataSize() like org.apache.orc.Writer  
                 Key: PARQUET-1117
                 URL: https://issues.apache.org/jira/browse/PARQUET-1117
             Project: Parquet
          Issue Type: Bug
            Reporter: liyunzhang_intel


Hive with orc can update the statistics like rowCount,rawDataSize after loading data to table. Hive with parquet cannot and need to use analyze command like "analyze table xxx compute statistics noscan" to update these two statistics info.  The reason is ParquetRecordWriter used in hive does not provide interfaces like getRowCount(),getRawDataSize(). While org.apache.orc.Writer  provides these [two interfaces|https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Writer.java#L68 ].  Anyone knows how to get rowCount and rawDataSize in ParquetRecordWriter?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)