You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Eric Fukuda <e....@gmail.com> on 2016/07/05 22:41:38 UTC

Number of records per batch

Hi,

Does anyone know if there is a way to increase or specify the number of
records per batch manually?

Thanks,
Eric

Re: Number of records per batch

Posted by Eric Fukuda <e....@gmail.com>.
Since I'm reading a JSON file, I will try changing
JSONRecordReader.DEFAULT_ROWS_PER_BATCH. Thanks for the advice!

Eric

On Wed, Jul 6, 2016 at 12:42 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> It depends on the data you are querying, for .json you could change the
> value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default
> to 4096, but this will only affect the size of the batches produced by the
> reader, other operators may still alter the batch size
>
> On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > Thanks Abdel. Looking at the code, it looks like the maximum number of
> > records in a batch is 64k. I suspect the reason I'm having only 4k is
> that
> > it reached the capacity of the buffer in the batch. Is there a way to
> > relieve this capacity restriction? It doesn't have to be a configuration
> > option. I don't mind changing and compiling the code.
> >
> > On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > Unfortunately I don't think there is way to do it.
> > >
> > > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com>
> > wrote:
> > >
> > > > I'm trying to see how performance differs with different batch sizes.
> > My
> > > > table has 13 integer fields and 1 string field, and has 8M records.
> > > > Following the code with a debugger, there seem to be 4096 records in
> a
> > > > batch. Can this be 8192 or larger?
> > > >
> > > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > hey Eric,
> > > > >
> > > > > Can you give more information about what you are trying to achieve
> ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Does anyone know if there is a way to increase or specify the
> > number
> > > of
> > > > > > records per batch manually?
> > > > > >
> > > > > > Thanks,
> > > > > > Eric
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Number of records per batch

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
It depends on the data you are querying, for .json you could change the
value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default
to 4096, but this will only affect the size of the batches produced by the
reader, other operators may still alter the batch size

On Tue, Jul 5, 2016 at 7:30 PM, Eric Fukuda <e....@gmail.com> wrote:

> Thanks Abdel. Looking at the code, it looks like the maximum number of
> records in a batch is 64k. I suspect the reason I'm having only 4k is that
> it reached the capacity of the buffer in the batch. Is there a way to
> relieve this capacity restriction? It doesn't have to be a configuration
> option. I don't mind changing and compiling the code.
>
> On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > Unfortunately I don't think there is way to do it.
> >
> > On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com>
> wrote:
> >
> > > I'm trying to see how performance differs with different batch sizes.
> My
> > > table has 13 integer fields and 1 string field, and has 8M records.
> > > Following the code with a debugger, there seem to be 4096 records in a
> > > batch. Can this be 8192 or larger?
> > >
> > > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > >
> > > wrote:
> > >
> > > > hey Eric,
> > > >
> > > > Can you give more information about what you are trying to achieve ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Does anyone know if there is a way to increase or specify the
> number
> > of
> > > > > records per batch manually?
> > > > >
> > > > > Thanks,
> > > > > Eric
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Number of records per batch

Posted by Eric Fukuda <e....@gmail.com>.
Thanks Abdel. Looking at the code, it looks like the maximum number of
records in a batch is 64k. I suspect the reason I'm having only 4k is that
it reached the capacity of the buffer in the batch. Is there a way to
relieve this capacity restriction? It doesn't have to be a configuration
option. I don't mind changing and compiling the code.

On Tue, Jul 5, 2016 at 8:55 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Unfortunately I don't think there is way to do it.
>
> On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > I'm trying to see how performance differs with different batch sizes. My
> > table has 13 integer fields and 1 string field, and has 8M records.
> > Following the code with a debugger, there seem to be 4096 records in a
> > batch. Can this be 8192 or larger?
> >
> > On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > hey Eric,
> > >
> > > Can you give more information about what you are trying to achieve ?
> > >
> > > Thanks
> > >
> > > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Does anyone know if there is a way to increase or specify the number
> of
> > > > records per batch manually?
> > > >
> > > > Thanks,
> > > > Eric
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Number of records per batch

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Unfortunately I don't think there is way to do it.

On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda <e....@gmail.com> wrote:

> I'm trying to see how performance differs with different batch sizes. My
> table has 13 integer fields and 1 string field, and has 8M records.
> Following the code with a debugger, there seem to be 4096 records in a
> batch. Can this be 8192 or larger?
>
> On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > hey Eric,
> >
> > Can you give more information about what you are trying to achieve ?
> >
> > Thanks
> >
> > On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Does anyone know if there is a way to increase or specify the number of
> > > records per batch manually?
> > >
> > > Thanks,
> > > Eric
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Number of records per batch

Posted by Eric Fukuda <e....@gmail.com>.
I'm trying to see how performance differs with different batch sizes. My
table has 13 integer fields and 1 string field, and has 8M records.
Following the code with a debugger, there seem to be 4096 records in a
batch. Can this be 8192 or larger?

On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> hey Eric,
>
> Can you give more information about what you are trying to achieve ?
>
> Thanks
>
> On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com> wrote:
>
> > Hi,
> >
> > Does anyone know if there is a way to increase or specify the number of
> > records per batch manually?
> >
> > Thanks,
> > Eric
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Number of records per batch

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
hey Eric,

Can you give more information about what you are trying to achieve ?

Thanks

On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda <e....@gmail.com> wrote:

> Hi,
>
> Does anyone know if there is a way to increase or specify the number of
> records per batch manually?
>
> Thanks,
> Eric
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>