You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Daniel Weeks (JIRA)" <ji...@apache.org> on 2015/07/25 01:56:05 UTC
[jira] [Updated] (PARQUET-99) Large rows cause unnecessary OOM
exceptions
[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Weeks updated PARQUET-99:
--------------------------------
Affects Version/s: 1.8.1
1.7.0
1.8.0
> Large rows cause unnecessary OOM exceptions
> -------------------------------------------
>
> Key: PARQUET-99
> URL: https://issues.apache.org/jira/browse/PARQUET-99
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.8.1
> Reporter: Tongjie Chen
> Assignee: Daniel Weeks
>
> If columns contains lots of lengthy string value, it will run into OOM error during writing.
> 2014-09-22 19:16:11,626 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
> at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> at parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:144)
> at parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:308)
> at parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:233)
> at parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:108)
> at parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:110)
> at parquet.column.impl.ColumnWriterImpl.writePage(ColumnWriterImpl.java:147)
> at parquet.column.impl.ColumnWriterImpl.flush(ColumnWriterImpl.java:236)
> at parquet.column.impl.ColumnWriteStoreImpl.flush(ColumnWriteStoreImpl.java:113)
> at parquet.hadoop.InternalParquetRecordWriter.flushStore(InternalParquetRecordWriter.java:151)
> at parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:130)
> at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:122)
> at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
> at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
> at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
> at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:688)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
> at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
> at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)