You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Nishanth S <ni...@gmail.com> on 2017/10/23 20:26:28 UTC
Flush Avro Files based on size
Hello,
Is there a property that can be set to datafilewriter that would flush
the file to disk as it reaches a particular size. Is this something in
the pipeline.
Thanks,
Nishanth
Re: Flush Avro Files based on size
Posted by Nishanth S <ni...@gmail.com>.
Hello All,
How can I flush data from a datafilewriter based on size/number of
records ?. Does a similar functionality exists today ?. If not does it
make sense to write one . This is more related to a batch execution where
you close the files before jvm exits and can lead to lot of small files
on the destination file system .Now if your fs is something like hdfs it
could lead to small files problem . I could of course do a merge of these
files at intervals but want to see if there is a better solution.My plan
is to have the batch changed to something like a listener that listens to
an input directory and then keep the datafile writer running .Do a flush at
a pre configured interval based on time/size/no of records.
Thanks,
Nishanth
On Mon, Oct 23, 2017 at 2:26 PM, Nishanth S <ni...@gmail.com> wrote:
> Hello,
> Is there a property that can be set to datafilewriter that would flush
> the file to disk as it reaches a particular size. Is this something in
> the pipeline.
>
>
> Thanks,
> Nishanth
>