You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ravi Tatapudi <ra...@in.ibm.com> on 2016/06/15 14:50:56 UTC

Parquet-API writing lot of log-messages impacting performance benchmarks

Hello,

As part of performance testing of "Parquet-write", I see the following 
issue:

1) For writing a 50 GB parquet-file, I see that, around 140 MB of 
log-messages (shown below) are being written by Parquet-API:

=======================
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.InternalParquetRecordWriter: 
Flushing mem columnStore to file. allocated memory: 116,400
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 1,312B for [L_ORDERKEY] INT32: 319 values, 1,276B raw, 1,276B 
comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 1,312B for [L_PARTKEY] INT32: 319 values, 1,276B raw, 1,276B comp, 
1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 1,312B for [L_SUPPKEY] INT32: 319 values, 1,276B raw, 1,276B comp, 
1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 1,312B for [L_LINENUMBER] INT32: 319 values, 1,276B raw, 1,276B 
comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 318B for [L_QUANTITY] FLOAT: 319 values, 282B raw, 282B comp, 1 
pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 80 entries, 320B 
raw, 80B comp}
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 318B for [L_EXTENDEDPRICE] FLOAT: 319 values, 282B raw, 282B comp, 
1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 80 entries, 320B 
raw, 80B comp}
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 318B for [L_DISCOUNT] FLOAT: 319 values, 282B raw, 282B comp, 1 
pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 80 entries, 320B 
raw, 80B comp}
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 318B for [L_TAX] FLOAT: 319 values, 282B raw, 282B comp, 1 pages, 
encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 80 entries, 320B raw, 80B 
comp}
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 2,267B for [L_RETURNFLAG] BINARY: 319 values, 2,233B raw, 2,233B 
comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 2,267B for [L_LINESTATUS] BINARY: 319 values, 2,233B raw, 2,233B 
comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
Jun 7, 2016 1:09:51 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
written 9,010B for [L_SHIPDATE] BINARY: 319 values, 8,932B raw, 8,932B 
comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
........................................
........................................
=======================

Is there any way to suppress these messages ?  Could you please let me 
know.

Thanks,
 Ravi