You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Eric Wohlstadter <wo...@gmail.com> on 2018/04/27 19:19:25 UTC

[Java] org.apache.arrow.vector.ipc.ArrowWriter.recordBlocks

Hi all,
 In the context of ArrowStreamWriter:

- It looks like field ArrowWriter.recordBlocks is populated and consumes
memory, e.g. in ArrowWriter.writeRecordBatch

- But the List<ArrowBlock> is never used (it is used in ArrowFileWriter but
not ArrowStreamWriter)

Would it be safe for me to extend ArrowStreamWriter and override
writeRecordBatch with an implementation that does not populate the
recordBlocks?

This is for HIVE-19305 (if anyone has time to take a look and provide
feedback, that would be much appreciated)

Thanks for your help,

--Eric

Re: [Java] org.apache.arrow.vector.ipc.ArrowWriter.recordBlocks

Posted by Emilio Lahr-Vivaz <el...@ccri.com>.
 From my time working on the arrow writers, I think that would be fine. 
You could do the same thing with the dictionary blocks, as well.

As an implementation idea, it might be cleaner to add some callback 
hooks, i.e. onRecordBlockWritten(), and then implement that in the 
FileWriter instead of having the base ArrowWriter track the blocks.

Thanks,

Emilio

On 04/27/2018 03:19 PM, Eric Wohlstadter wrote:
> Hi all,
>   In the context of ArrowStreamWriter:
>
> - It looks like field ArrowWriter.recordBlocks is populated and consumes
> memory, e.g. in ArrowWriter.writeRecordBatch
>
> - But the List<ArrowBlock> is never used (it is used in ArrowFileWriter but
> not ArrowStreamWriter)
>
> Would it be safe for me to extend ArrowStreamWriter and override
> writeRecordBatch with an implementation that does not populate the
> recordBlocks?
>
> This is for HIVE-19305 (if anyone has time to take a look and provide
> feedback, that would be much appreciated)
>
> Thanks for your help,
>
> --Eric
>