You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Eric Fukuda <e....@gmail.com> on 2016/07/05 22:41:38 UTC
Number of records per batch
Hi,
Does anyone know if there is a way to increase or specify the number of
records per batch manually?
Thanks,
Eric
Re: Number of records per batch
Posted by Eric Fukuda <e....@gmail.com>.
Since I'm reading a JSON file, I will try changing
JSONRecordReader.DEFAULT_ROWS_PER_BATCH. Thanks for the advice!
Eric
On Wed, Jul 6, 2016 at 12:42 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:
> It depends on the data you are querying, for .json you could change the
> value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default
> to 4096, but this will only affect the size of the batches produced by the
> reader, other operators may still alter the batch size
>
> On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > Thanks Abdel. Looking at the code, it looks like the maximum number of
> > records in a batch is 64k. I suspect the reason I'm having only 4k is
> that
> > it reached the capacity of the buffer in the batch. Is there a way to
> > relieve this capacity restriction? It doesn't have to be a configuration
> > option. I don't mind changing and compiling the code.
> >
> > On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > Unfortunately I don't think there is way to do it.
> > >
> > > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com>
> > wrote:
> > >
> > > > I'm trying to see how performance differs with different batch sizes.
> > My
> > > > table has 13 integer fields and 1 string field, and has 8M records.
> > > > Following the code with a debugger, there seem to be 4096 records in
> a
> > > > batch. Can this be 8192 or larger?
> > > >
> > > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > hey Eric,
> > > > >
> > > > > Can you give more information about what you are trying to achieve
> ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Does anyone know if there is a way to increase or specify the
> > number
> > > of
> > > > > > records per batch manually?
> > > > > >
> > > > > > Thanks,
> > > > > > Eric
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > > <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > > <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
> <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
Re: Number of records per batch
Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
It depends on the data you are querying, for .json you could change the
value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default
to 4096, but this will only affect the size of the batches produced by the
reader, other operators may still alter the batch size
On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <e....@gmail.com> wrote:
> Thanks Abdel. Looking at the code, it looks like the maximum number of
> records in a batch is 64k. I suspect the reason I'm having only 4k is that
> it reached the capacity of the buffer in the batch. Is there a way to
> relieve this capacity restriction? It doesn't have to be a configuration
> option. I don't mind changing and compiling the code.
>
> On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > Unfortunately I don't think there is way to do it.
> >
> > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com>
> wrote:
> >
> > > I'm trying to see how performance differs with different batch sizes.
> My
> > > table has 13 integer fields and 1 string field, and has 8M records.
> > > Following the code with a debugger, there seem to be 4096 records in a
> > > batch. Can this be 8192 or larger?
> > >
> > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > >
> > > wrote:
> > >
> > > > hey Eric,
> > > >
> > > > Can you give more information about what you are trying to achieve ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Does anyone know if there is a way to increase or specify the
> number
> > of
> > > > > records per batch manually?
> > > > >
> > > > > Thanks,
> > > > > Eric
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > > <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> > <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>
--
Abdelhakim Deneche
Software Engineer
<http://www.mapr.com/>
Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
Re: Number of records per batch
Posted by Eric Fukuda <e....@gmail.com>.
Thanks Abdel. Looking at the code, it looks like the maximum number of
records in a batch is 64k. I suspect the reason I'm having only 4k is that
it reached the capacity of the buffer in the batch. Is there a way to
relieve this capacity restriction? It doesn't have to be a configuration
option. I don't mind changing and compiling the code.
On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:
> Unfortunately I don't think there is way to do it.
>
> On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > I'm trying to see how performance differs with different batch sizes. My
> > table has 13 integer fields and 1 string field, and has 8M records.
> > Following the code with a debugger, there seem to be 4096 records in a
> > batch. Can this be 8192 or larger?
> >
> > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > hey Eric,
> > >
> > > Can you give more information about what you are trying to achieve ?
> > >
> > > Thanks
> > >
> > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Does anyone know if there is a way to increase or specify the number
> of
> > > > records per batch manually?
> > > >
> > > > Thanks,
> > > > Eric
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > > <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
> <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
Re: Number of records per batch
Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Unfortunately I don't think there is way to do it.
On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com> wrote:
> I'm trying to see how performance differs with different batch sizes. My
> table has 13 integer fields and 1 string field, and has 8M records.
> Following the code with a debugger, there seem to be 4096 records in a
> batch. Can this be 8192 or larger?
>
> On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > hey Eric,
> >
> > Can you give more information about what you are trying to achieve ?
> >
> > Thanks
> >
> > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Does anyone know if there is a way to increase or specify the number of
> > > records per batch manually?
> > >
> > > Thanks,
> > > Eric
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> > <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>
--
Abdelhakim Deneche
Software Engineer
<http://www.mapr.com/>
Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
Re: Number of records per batch
Posted by Eric Fukuda <e....@gmail.com>.
I'm trying to see how performance differs with different batch sizes. My
table has 13 integer fields and 1 string field, and has 8M records.
Following the code with a debugger, there seem to be 4096 records in a
batch. Can this be 8192 or larger?
On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:
> hey Eric,
>
> Can you give more information about what you are trying to achieve ?
>
> Thanks
>
> On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > Hi,
> >
> > Does anyone know if there is a way to increase or specify the number of
> > records per batch manually?
> >
> > Thanks,
> > Eric
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
> <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
Re: Number of records per batch
Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
hey Eric,
Can you give more information about what you are trying to achieve ?
Thanks
On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com> wrote:
> Hi,
>
> Does anyone know if there is a way to increase or specify the number of
> records per batch manually?
>
> Thanks,
> Eric
>
--
Abdelhakim Deneche
Software Engineer
<http://www.mapr.com/>
Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>