You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Nishanth S <ni...@gmail.com> on 2017/10/23 20:26:28 UTC

Flush Avro Files based on size

Hello,
Is  there a property that can be set  to datafilewriter that would   flush
the file to disk as it reaches a  particular  size. Is this something in
the pipeline.


Thanks,
Nishanth

Re: Flush Avro Files based on size

Posted by Nishanth S <ni...@gmail.com>.
Hello  All,

How can I flush data  from a datafilewriter  based on size/number of
records ?. Does a similar functionality exists today  ?. If not does it
make sense to write one . This is more related to a batch  execution where
you  close the files before jvm exits and can lead to lot of  small files
on the  destination file system .Now if your fs is something like  hdfs it
could lead to small files problem . I could of course do a merge of these
files at intervals but want to see if there is a better solution.My plan
is to have the batch changed to something like a listener that listens to
an input directory and then keep the datafile writer running .Do a flush at
a pre configured interval  based on time/size/no of records.

Thanks,
Nishanth

On Mon, Oct 23, 2017 at 2:26 PM, Nishanth S <ni...@gmail.com> wrote:

> Hello,
> Is  there a property that can be set  to datafilewriter that would   flush
> the file to disk as it reaches a  particular  size. Is this something in
> the pipeline.
>
>
> Thanks,
> Nishanth
>