You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/09/28 03:44:00 UTC
[jira] [Created] (PARQUET-1117) ParquetRecordWriter does not
provide interface like getRowCount(),getRawDataSize() like
org.apache.orc.Writer
liyunzhang_intel created PARQUET-1117:
-----------------------------------------
Summary: ParquetRecordWriter does not provide interface like getRowCount(),getRawDataSize() like org.apache.orc.Writer
Key: PARQUET-1117
URL: https://issues.apache.org/jira/browse/PARQUET-1117
Project: Parquet
Issue Type: Bug
Reporter: liyunzhang_intel
Hive with orc can update the statistics like rowCount,rawDataSize after loading data to table. Hive with parquet cannot and need to use analyze command like "analyze table xxx compute statistics noscan" to update these two statistics info. The reason is ParquetRecordWriter used in hive does not provide interfaces like getRowCount(),getRawDataSize(). While org.apache.orc.Writer provides these [two interfaces|https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Writer.java#L68 ]. Anyone knows how to get rowCount and rawDataSize in ParquetRecordWriter?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)