You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/10/02 19:53:00 UTC
[jira] [Comment Edited] (PARQUET-1918) Avoid Copy of Bytes in
Protobuf BinaryWriter
[ https://issues.apache.org/jira/browse/PARQUET-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206414#comment-17206414 ]
David Mollitor edited comment on PARQUET-1918 at 10/2/20, 7:52 PM:
-------------------------------------------------------------------
Unit tests fail with:
Trying to address with THRIFT-5288
{code:java}
java.lang.Exception: java.nio.ReadOnlyBufferException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.nio.ReadOnlyBufferException
at java.nio.ByteBuffer.array(ByteBuffer.java:996)
at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.writeBinary(TCompactProtocol.java:375)
at org.apache.parquet.format.InterningProtocol.writeBinary(InterningProtocol.java:135)
at org.apache.parquet.format.ColumnIndex$ColumnIndexStandardScheme.write(ColumnIndex.java:945)
at org.apache.parquet.format.ColumnIndex$ColumnIndexStandardScheme.write(ColumnIndex.java:820)
at org.apache.parquet.format.ColumnIndex.write(ColumnIndex.java:728)
at org.apache.parquet.format.Util.write(Util.java:372)
at org.apache.parquet.format.Util.writeColumnIndex(Util.java:69)
at org.apache.parquet.hadoop.ParquetFileWriter.serializeColumnIndexes(ParquetFileWriter.java:1087)
at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1050)
{code}
was (Author: belugabehr):
Unit tests fail with:
{code:java}
java.lang.Exception: java.nio.ReadOnlyBufferException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.nio.ReadOnlyBufferException
at java.nio.ByteBuffer.array(ByteBuffer.java:996)
at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.writeBinary(TCompactProtocol.java:375)
at org.apache.parquet.format.InterningProtocol.writeBinary(InterningProtocol.java:135)
at org.apache.parquet.format.ColumnIndex$ColumnIndexStandardScheme.write(ColumnIndex.java:945)
at org.apache.parquet.format.ColumnIndex$ColumnIndexStandardScheme.write(ColumnIndex.java:820)
at org.apache.parquet.format.ColumnIndex.write(ColumnIndex.java:728)
at org.apache.parquet.format.Util.write(Util.java:372)
at org.apache.parquet.format.Util.writeColumnIndex(Util.java:69)
at org.apache.parquet.hadoop.ParquetFileWriter.serializeColumnIndexes(ParquetFileWriter.java:1087)
at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1050)
{code}
> Avoid Copy of Bytes in Protobuf BinaryWriter
> --------------------------------------------
>
> Key: PARQUET-1918
> URL: https://issues.apache.org/jira/browse/PARQUET-1918
> Project: Parquet
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
>
> {code:java|title=ProtoWriteSupport.java}
> class BinaryWriter extends FieldWriter {
> @Override
> final void writeRawValue(Object value) {
> ByteString byteString = (ByteString) value;
> Binary binary = Binary.fromConstantByteArray(byteString.toByteArray());
> recordConsumer.addBinary(binary);
> }
> }
> {code}
> {{toByteArray()}} creates a copy of the buffer. There is already support with Parquet and Protobuf to pass instead a ByteBuffer which avoids the copy.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)