You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Abhi Basu <90...@gmail.com> on 2016/05/17 20:45:21 UTC

CDH 5.5 - Kudu error not enough space remaining in buffer for op

What is the limit of columns in Kudu?

I am using 1000 gen dataset, specifically the chr22 table which has 500,000
rows x 1101 columns. This table has been built In Impala/HDFS. I am trying
to create a new Kudu table as select from that table. I get the following
error:

Error while applying Kudu session.: Incomplete: not enough space remaining
in buffer for op (required 46.7K, 6.96M already used

When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see the
following. What configuration needs to be tweaked?


Memory usage by subsystem
IdParentLimitCurrent ConsumptionPeak consumption
root none 50.12G 4.97M 6.08M
block_cache-sharded_lru_cache root none 937.9K 937.9K
code_cache-sharded_lru_cache root none 1B 1B
server root none 2.3K 201.4K
tablet-00000000000000000000000000000000 server none 530B 200.1K
MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
log_block_manager server none 1.8K 2.7K

Thanks,
-- 
Abhi Basu

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, May 18, 2016 at 3:42 PM, Abhi Basu <90...@gmail.com> wrote:

> Todd:
>
> Thanks for the update. So Kudu is not designed to be a common storage
> system for long-term and streaming data/random access? Just curious.
>

I'd say it is, but right now we are focusing on more common use cases that
one might have in a relational columnar database. Having 1000 ~30 byte
columns is a relatively rare type of table in my experience, so we haven't
focused our testing and tuning for that use case.

-Todd


>
> On Wed, May 18, 2016 at 3:38 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hm, so each of the strings is about 27 bytes, so each row is 27KB.  So, a
>> batch size of 500 is still >13MB. I'd start with something very low like
>> 10, and work your way up. That said, this is definitely not in the
>> "standard" use cases for which Kudu has been designed.
>>
>> I'd also recommend using compression and/or dictionary coding for a table
>> if you have many repeat values. Unfortunately, it's not currently do this
>> when creating a table using Impala.
>>
>> -Todd
>>
>> On Wed, May 18, 2016 at 10:51 AM, Abhi Basu <90...@gmail.com> wrote:
>>
>>> Query: describe kudu_db.chr22_kudu
>>> +-------------+--------+---------+
>>> | name        | type   | comment |
>>> +-------------+--------+---------+
>>> | pos         | int    |         |
>>> | id          | string |         |
>>> | chrom       | string |         |
>>> | ref         | string |         |
>>> | alt         | string |         |
>>> | qual        | string |         |
>>> | filter      | string |         |
>>> | info        | string |         |
>>> | format_type | string |         |
>>> | hg00096     | string |         |
>>> | hg00097     | string |         |
>>> | hg00099     | string |         |
>>> | hg00100     | string |         |
>>> | hg00101     | string |         |
>>> | hg00102     | string |         |
>>> | hg00103     | string |         |
>>> | hg00104     | string |         |
>>>
>>> ..........
>>>
>>> all the way to column na20828 string.
>>>
>>> Each hg and na columns have values like:
>>> | hg00096                    |
>>> +----------------------------+
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>>
>>>
>>>
>>> On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>>> What are the types of your 1000 columns? Maybe an even smaller batch
>>>> size is necessary.
>>>>
>>>> -Todd
>>>>
>>>> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <90...@gmail.com> wrote:
>>>>
>>>>> I have tried with batch_size=500 and still get same error. For your
>>>>> reference are attached info that may help diagnose.
>>>>>
>>>>> Error: Error while applying Kudu session.: Incomplete: not enough
>>>>> space remaining in buffer for op (required 46.7K, 7.00M already used
>>>>>
>>>>>
>>>>> Config settings:
>>>>>
>>>>> Kudu Tablet Server Block Cache Capacity   1 GB
>>>>> Kudu Tablet Server Hard Memory Limit  16 GB
>>>>>
>>>>>
>>>>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>>>>> wdberkeley@cloudera.com> wrote:
>>>>>
>>>>>> Both options are more or less the same idea- the point is you need
>>>>>> less rows going in per batch so you don't go over the batch size limit.
>>>>>> Follow what Todd said as he explained it more clearly and suggested a
>>>>>> better way.
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the updates. I will give both options a try and report
>>>>>>> back.
>>>>>>>
>>>>>>> If you are interested in testing with such datasets, I can help.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Abhi
>>>>>>>
>>>>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Abhi,
>>>>>>>>
>>>>>>>> Will is right that the error is client-side, and probably happening
>>>>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>>>>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>>>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>>>>>>
>>>>>>>> I don't think his suggested workaround will help, but you can try
>>>>>>>> running 'set batch_size=500' before running the create table or insert
>>>>>>>> query.
>>>>>>>>
>>>>>>>> In terms of max supported columns, most of the workloads we are
>>>>>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>>>>>> though you should probably be prepared for some rough edges.
>>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>>>>> wdberkeley@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Abhi.
>>>>>>>>>
>>>>>>>>> I believe that error is actually coming from the client, not the
>>>>>>>>> server. See e,g,
>>>>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>>>>>>> that link is to master branch not the exact release you are using).
>>>>>>>>>
>>>>>>>>> If you look around there, you'll see that the max is set by
>>>>>>>>> something called max_buffer_size_, which appears to be hardcoded to 7 *
>>>>>>>>> 1024 * 1024 bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>>>>>
>>>>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>>>>> don't get errors trying to double insert rows.
>>>>>>>>>
>>>>>>>>> -Will
>>>>>>>>>
>>>>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> What is the limit of columns in Kudu?
>>>>>>>>>>
>>>>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which
>>>>>>>>>> has 500,000 rows x 1101 columns. This table has been built In Impala/HDFS.
>>>>>>>>>> I am trying to create a new Kudu table as select from that table. I get the
>>>>>>>>>> following error:
>>>>>>>>>>
>>>>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>>>>
>>>>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I
>>>>>>>>>> see the following. What configuration needs to be tweaked?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Memory usage by subsystem
>>>>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>>>>> server root none 2.3K 201.4K
>>>>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B
>>>>>>>>>> 265B
>>>>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B
>>>>>>>>>> 28.5K
>>>>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>>>>> 87.8K
>>>>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> --
>>>>>>>>>> Abhi Basu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Abhi Basu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhi Basu
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Abhi Basu
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Abhi Basu
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Abhi Basu <90...@gmail.com>.
Todd:

Thanks for the update. So Kudu is not designed to be a common storage
system for long-term and streaming data/random access? Just curious.

On Wed, May 18, 2016 at 3:38 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hm, so each of the strings is about 27 bytes, so each row is 27KB.  So, a
> batch size of 500 is still >13MB. I'd start with something very low like
> 10, and work your way up. That said, this is definitely not in the
> "standard" use cases for which Kudu has been designed.
>
> I'd also recommend using compression and/or dictionary coding for a table
> if you have many repeat values. Unfortunately, it's not currently do this
> when creating a table using Impala.
>
> -Todd
>
> On Wed, May 18, 2016 at 10:51 AM, Abhi Basu <90...@gmail.com> wrote:
>
>> Query: describe kudu_db.chr22_kudu
>> +-------------+--------+---------+
>> | name        | type   | comment |
>> +-------------+--------+---------+
>> | pos         | int    |         |
>> | id          | string |         |
>> | chrom       | string |         |
>> | ref         | string |         |
>> | alt         | string |         |
>> | qual        | string |         |
>> | filter      | string |         |
>> | info        | string |         |
>> | format_type | string |         |
>> | hg00096     | string |         |
>> | hg00097     | string |         |
>> | hg00099     | string |         |
>> | hg00100     | string |         |
>> | hg00101     | string |         |
>> | hg00102     | string |         |
>> | hg00103     | string |         |
>> | hg00104     | string |         |
>>
>> ..........
>>
>> all the way to column na20828 string.
>>
>> Each hg and na columns have values like:
>> | hg00096                    |
>> +----------------------------+
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>
>>
>>
>> On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> What are the types of your 1000 columns? Maybe an even smaller batch
>>> size is necessary.
>>>
>>> -Todd
>>>
>>> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <90...@gmail.com> wrote:
>>>
>>>> I have tried with batch_size=500 and still get same error. For your
>>>> reference are attached info that may help diagnose.
>>>>
>>>> Error: Error while applying Kudu session.: Incomplete: not enough space
>>>> remaining in buffer for op (required 46.7K, 7.00M already used
>>>>
>>>>
>>>> Config settings:
>>>>
>>>> Kudu Tablet Server Block Cache Capacity   1 GB
>>>> Kudu Tablet Server Hard Memory Limit  16 GB
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>>>> wdberkeley@cloudera.com> wrote:
>>>>
>>>>> Both options are more or less the same idea- the point is you need
>>>>> less rows going in per batch so you don't go over the batch size limit.
>>>>> Follow what Todd said as he explained it more clearly and suggested a
>>>>> better way.
>>>>>
>>>>> -Will
>>>>>
>>>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the updates. I will give both options a try and report
>>>>>> back.
>>>>>>
>>>>>> If you are interested in testing with such datasets, I can help.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Abhi
>>>>>>
>>>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Abhi,
>>>>>>>
>>>>>>> Will is right that the error is client-side, and probably happening
>>>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>>>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>>>>>
>>>>>>> I don't think his suggested workaround will help, but you can try
>>>>>>> running 'set batch_size=500' before running the create table or insert
>>>>>>> query.
>>>>>>>
>>>>>>> In terms of max supported columns, most of the workloads we are
>>>>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>>>>> though you should probably be prepared for some rough edges.
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>>>> wdberkeley@cloudera.com> wrote:
>>>>>>>
>>>>>>>> Hi Abhi.
>>>>>>>>
>>>>>>>> I believe that error is actually coming from the client, not the
>>>>>>>> server. See e,g,
>>>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>>>>>> that link is to master branch not the exact release you are using).
>>>>>>>>
>>>>>>>> If you look around there, you'll see that the max is set by
>>>>>>>> something called max_buffer_size_, which appears to be hardcoded to 7 *
>>>>>>>> 1024 * 1024 bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>>>>
>>>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>>>> don't get errors trying to double insert rows.
>>>>>>>>
>>>>>>>> -Will
>>>>>>>>
>>>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What is the limit of columns in Kudu?
>>>>>>>>>
>>>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which
>>>>>>>>> has 500,000 rows x 1101 columns. This table has been built In Impala/HDFS.
>>>>>>>>> I am trying to create a new Kudu table as select from that table. I get the
>>>>>>>>> following error:
>>>>>>>>>
>>>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>>>
>>>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I
>>>>>>>>> see the following. What configuration needs to be tweaked?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Memory usage by subsystem
>>>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>>>> server root none 2.3K 201.4K
>>>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B
>>>>>>>>> 28.5K
>>>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>>>> 87.8K
>>>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> --
>>>>>>>>> Abhi Basu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Abhi Basu
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Abhi Basu
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Todd Lipcon <to...@cloudera.com>.
Hm, so each of the strings is about 27 bytes, so each row is 27KB.  So, a
batch size of 500 is still >13MB. I'd start with something very low like
10, and work your way up. That said, this is definitely not in the
"standard" use cases for which Kudu has been designed.

I'd also recommend using compression and/or dictionary coding for a table
if you have many repeat values. Unfortunately, it's not currently do this
when creating a table using Impala.

-Todd

On Wed, May 18, 2016 at 10:51 AM, Abhi Basu <90...@gmail.com> wrote:

> Query: describe kudu_db.chr22_kudu
> +-------------+--------+---------+
> | name        | type   | comment |
> +-------------+--------+---------+
> | pos         | int    |         |
> | id          | string |         |
> | chrom       | string |         |
> | ref         | string |         |
> | alt         | string |         |
> | qual        | string |         |
> | filter      | string |         |
> | info        | string |         |
> | format_type | string |         |
> | hg00096     | string |         |
> | hg00097     | string |         |
> | hg00099     | string |         |
> | hg00100     | string |         |
> | hg00101     | string |         |
> | hg00102     | string |         |
> | hg00103     | string |         |
> | hg00104     | string |         |
>
> ..........
>
> all the way to column na20828 string.
>
> Each hg and na columns have values like:
> | hg00096                    |
> +----------------------------+
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
> | 0|0:0.000:0.00,-5.00,-5.00 |
>
>
>
> On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> What are the types of your 1000 columns? Maybe an even smaller batch size
>> is necessary.
>>
>> -Todd
>>
>> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <90...@gmail.com> wrote:
>>
>>> I have tried with batch_size=500 and still get same error. For your
>>> reference are attached info that may help diagnose.
>>>
>>> Error: Error while applying Kudu session.: Incomplete: not enough space
>>> remaining in buffer for op (required 46.7K, 7.00M already used
>>>
>>>
>>> Config settings:
>>>
>>> Kudu Tablet Server Block Cache Capacity   1 GB
>>> Kudu Tablet Server Hard Memory Limit  16 GB
>>>
>>>
>>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>>> wdberkeley@cloudera.com> wrote:
>>>
>>>> Both options are more or less the same idea- the point is you need less
>>>> rows going in per batch so you don't go over the batch size limit. Follow
>>>> what Todd said as he explained it more clearly and suggested a better way.
>>>>
>>>> -Will
>>>>
>>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com> wrote:
>>>>
>>>>> Thanks for the updates. I will give both options a try and report back.
>>>>>
>>>>> If you are interested in testing with such datasets, I can help.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Abhi
>>>>>
>>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Abhi,
>>>>>>
>>>>>> Will is right that the error is client-side, and probably happening
>>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>>>>
>>>>>> I don't think his suggested workaround will help, but you can try
>>>>>> running 'set batch_size=500' before running the create table or insert
>>>>>> query.
>>>>>>
>>>>>> In terms of max supported columns, most of the workloads we are
>>>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>>>> though you should probably be prepared for some rough edges.
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>>> wdberkeley@cloudera.com> wrote:
>>>>>>
>>>>>>> Hi Abhi.
>>>>>>>
>>>>>>> I believe that error is actually coming from the client, not the
>>>>>>> server. See e,g,
>>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>>>>> that link is to master branch not the exact release you are using).
>>>>>>>
>>>>>>> If you look around there, you'll see that the max is set by
>>>>>>> something called max_buffer_size_, which appears to be hardcoded to 7 *
>>>>>>> 1024 * 1024 bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>>>
>>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>>> don't get errors trying to double insert rows.
>>>>>>>
>>>>>>> -Will
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What is the limit of columns in Kudu?
>>>>>>>>
>>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>>>>>>> trying to create a new Kudu table as select from that table. I get the
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>>
>>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I
>>>>>>>> see the following. What configuration needs to be tweaked?
>>>>>>>>
>>>>>>>>
>>>>>>>> Memory usage by subsystem
>>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>>> server root none 2.3K 201.4K
>>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>>> 87.8K
>>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --
>>>>>>>> Abhi Basu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhi Basu
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Abhi Basu
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Abhi Basu
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Abhi Basu <90...@gmail.com>.
Query: describe kudu_db.chr22_kudu
+-------------+--------+---------+
| name        | type   | comment |
+-------------+--------+---------+
| pos         | int    |         |
| id          | string |         |
| chrom       | string |         |
| ref         | string |         |
| alt         | string |         |
| qual        | string |         |
| filter      | string |         |
| info        | string |         |
| format_type | string |         |
| hg00096     | string |         |
| hg00097     | string |         |
| hg00099     | string |         |
| hg00100     | string |         |
| hg00101     | string |         |
| hg00102     | string |         |
| hg00103     | string |         |
| hg00104     | string |         |

..........

all the way to column na20828 string.

Each hg and na columns have values like:
| hg00096                    |
+----------------------------+
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |



On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <to...@cloudera.com> wrote:

> What are the types of your 1000 columns? Maybe an even smaller batch size
> is necessary.
>
> -Todd
>
> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <90...@gmail.com> wrote:
>
>> I have tried with batch_size=500 and still get same error. For your
>> reference are attached info that may help diagnose.
>>
>> Error: Error while applying Kudu session.: Incomplete: not enough space
>> remaining in buffer for op (required 46.7K, 7.00M already used
>>
>>
>> Config settings:
>>
>> Kudu Tablet Server Block Cache Capacity   1 GB
>> Kudu Tablet Server Hard Memory Limit  16 GB
>>
>>
>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>> wdberkeley@cloudera.com> wrote:
>>
>>> Both options are more or less the same idea- the point is you need less
>>> rows going in per batch so you don't go over the batch size limit. Follow
>>> what Todd said as he explained it more clearly and suggested a better way.
>>>
>>> -Will
>>>
>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com> wrote:
>>>
>>>> Thanks for the updates. I will give both options a try and report back.
>>>>
>>>> If you are interested in testing with such datasets, I can help.
>>>>
>>>> Thanks,
>>>>
>>>> Abhi
>>>>
>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>
>>>>> Hi Abhi,
>>>>>
>>>>> Will is right that the error is client-side, and probably happening
>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>>>
>>>>> I don't think his suggested workaround will help, but you can try
>>>>> running 'set batch_size=500' before running the create table or insert
>>>>> query.
>>>>>
>>>>> In terms of max supported columns, most of the workloads we are
>>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>>> though you should probably be prepared for some rough edges.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>> wdberkeley@cloudera.com> wrote:
>>>>>
>>>>>> Hi Abhi.
>>>>>>
>>>>>> I believe that error is actually coming from the client, not the
>>>>>> server. See e,g,
>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>>>> that link is to master branch not the exact release you are using).
>>>>>>
>>>>>> If you look around there, you'll see that the max is set by something
>>>>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>>>>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>>
>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>> don't get errors trying to double insert rows.
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> What is the limit of columns in Kudu?
>>>>>>>
>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>>>>>> trying to create a new Kudu table as select from that table. I get the
>>>>>>> following error:
>>>>>>>
>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>
>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
>>>>>>> the following. What configuration needs to be tweaked?
>>>>>>>
>>>>>>>
>>>>>>> Memory usage by subsystem
>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>> server root none 2.3K 201.4K
>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>> 87.8K
>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Abhi Basu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Abhi Basu
>>>>
>>>
>>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Todd Lipcon <to...@cloudera.com>.
What are the types of your 1000 columns? Maybe an even smaller batch size
is necessary.

-Todd

On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <90...@gmail.com> wrote:

> I have tried with batch_size=500 and still get same error. For your
> reference are attached info that may help diagnose.
>
> Error: Error while applying Kudu session.: Incomplete: not enough space
> remaining in buffer for op (required 46.7K, 7.00M already used
>
>
> Config settings:
>
> Kudu Tablet Server Block Cache Capacity   1 GB
> Kudu Tablet Server Hard Memory Limit  16 GB
>
>
> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <wdberkeley@cloudera.com
> > wrote:
>
>> Both options are more or less the same idea- the point is you need less
>> rows going in per batch so you don't go over the batch size limit. Follow
>> what Todd said as he explained it more clearly and suggested a better way.
>>
>> -Will
>>
>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com> wrote:
>>
>>> Thanks for the updates. I will give both options a try and report back.
>>>
>>> If you are interested in testing with such datasets, I can help.
>>>
>>> Thanks,
>>>
>>> Abhi
>>>
>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>>> Hi Abhi,
>>>>
>>>> Will is right that the error is client-side, and probably happening
>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>>
>>>> I don't think his suggested workaround will help, but you can try
>>>> running 'set batch_size=500' before running the create table or insert
>>>> query.
>>>>
>>>> In terms of max supported columns, most of the workloads we are
>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>> territory" where it's much more likely you'll hit problems like this and
>>>> quite possibly others as well. Will be interested to hear your experiences,
>>>> though you should probably be prepared for some rough edges.
>>>>
>>>> -Todd
>>>>
>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>> wdberkeley@cloudera.com> wrote:
>>>>
>>>>> Hi Abhi.
>>>>>
>>>>> I believe that error is actually coming from the client, not the
>>>>> server. See e,g,
>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>>> that link is to master branch not the exact release you are using).
>>>>>
>>>>> If you look around there, you'll see that the max is set by something
>>>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>>>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>
>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>> don't get errors trying to double insert rows.
>>>>>
>>>>> -Will
>>>>>
>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:
>>>>>
>>>>>> What is the limit of columns in Kudu?
>>>>>>
>>>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>>>>> trying to create a new Kudu table as select from that table. I get the
>>>>>> following error:
>>>>>>
>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>
>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
>>>>>> the following. What configuration needs to be tweaked?
>>>>>>
>>>>>>
>>>>>> Memory usage by subsystem
>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>> root none 50.12G 4.97M 6.08M
>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>> server root none 2.3K 201.4K
>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>> 87.8K
>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>
>>>>>> Thanks,
>>>>>> --
>>>>>> Abhi Basu
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Abhi Basu
>>>
>>
>>
>
>
> --
> Abhi Basu
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Abhi Basu <90...@gmail.com>.
I have tried with batch_size=500 and still get same error. For your
reference are attached info that may help diagnose.

Error: Error while applying Kudu session.: Incomplete: not enough space
remaining in buffer for op (required 46.7K, 7.00M already used


Config settings:

Kudu Tablet Server Block Cache Capacity   1 GB
Kudu Tablet Server Hard Memory Limit  16 GB


On Wed, May 18, 2016 at 8:26 AM, William Berkeley <wd...@cloudera.com>
wrote:

> Both options are more or less the same idea- the point is you need less
> rows going in per batch so you don't go over the batch size limit. Follow
> what Todd said as he explained it more clearly and suggested a better way.
>
> -Will
>
> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com> wrote:
>
>> Thanks for the updates. I will give both options a try and report back.
>>
>> If you are interested in testing with such datasets, I can help.
>>
>> Thanks,
>>
>> Abhi
>>
>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> Hi Abhi,
>>>
>>> Will is right that the error is client-side, and probably happening
>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>
>>> I don't think his suggested workaround will help, but you can try
>>> running 'set batch_size=500' before running the create table or insert
>>> query.
>>>
>>> In terms of max supported columns, most of the workloads we are focusing
>>> on are more like typical data-warehouse tables, on the order of a couple
>>> hundred columns. Crossing into the 1000+ range enters "uncharted territory"
>>> where it's much more likely you'll hit problems like this and quite
>>> possibly others as well. Will be interested to hear your experiences,
>>> though you should probably be prepared for some rough edges.
>>>
>>> -Todd
>>>
>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>> wdberkeley@cloudera.com> wrote:
>>>
>>>> Hi Abhi.
>>>>
>>>> I believe that error is actually coming from the client, not the
>>>> server. See e,g,
>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>>> that link is to master branch not the exact release you are using).
>>>>
>>>> If you look around there, you'll see that the max is set by something
>>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>
>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>> don't get errors trying to double insert rows.
>>>>
>>>> -Will
>>>>
>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:
>>>>
>>>>> What is the limit of columns in Kudu?
>>>>>
>>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>>>> trying to create a new Kudu table as select from that table. I get the
>>>>> following error:
>>>>>
>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>
>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
>>>>> the following. What configuration needs to be tweaked?
>>>>>
>>>>>
>>>>> Memory usage by subsystem
>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>> root none 50.12G 4.97M 6.08M
>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>> server root none 2.3K 201.4K
>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
>>>>> log_block_manager server none 1.8K 2.7K
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> Abhi Basu
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>


-- 
Abhi Basu

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by William Berkeley <wd...@cloudera.com>.
Both options are more or less the same idea- the point is you need less
rows going in per batch so you don't go over the batch size limit. Follow
what Todd said as he explained it more clearly and suggested a better way.

-Will

On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <90...@gmail.com> wrote:

> Thanks for the updates. I will give both options a try and report back.
>
> If you are interested in testing with such datasets, I can help.
>
> Thanks,
>
> Abhi
>
> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hi Abhi,
>>
>> Will is right that the error is client-side, and probably happening
>> because your rows are so wide.Impala typically will batch 1000 rows at a
>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>> overflow the max buffer size that Will mentioned. This seems quite probable
>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>
>> I don't think his suggested workaround will help, but you can try running
>> 'set batch_size=500' before running the create table or insert query.
>>
>> In terms of max supported columns, most of the workloads we are focusing
>> on are more like typical data-warehouse tables, on the order of a couple
>> hundred columns. Crossing into the 1000+ range enters "uncharted territory"
>> where it's much more likely you'll hit problems like this and quite
>> possibly others as well. Will be interested to hear your experiences,
>> though you should probably be prepared for some rough edges.
>>
>> -Todd
>>
>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>> wdberkeley@cloudera.com> wrote:
>>
>>> Hi Abhi.
>>>
>>> I believe that error is actually coming from the client, not the server.
>>> See e,g,
>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>>> that link is to master branch not the exact release you are using).
>>>
>>> If you look around there, you'll see that the max is set by something
>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>
>>> I think the simple workaround would be to do the CTAS as a CTAS + insert
>>> as select. Pick a condition that bipartitions the table, so you don't get
>>> errors trying to double insert rows.
>>>
>>> -Will
>>>
>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:
>>>
>>>> What is the limit of columns in Kudu?
>>>>
>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>>> trying to create a new Kudu table as select from that table. I get the
>>>> following error:
>>>>
>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>
>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
>>>> the following. What configuration needs to be tweaked?
>>>>
>>>>
>>>> Memory usage by subsystem
>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>> root none 50.12G 4.97M 6.08M
>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>> server root none 2.3K 201.4K
>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
>>>> log_block_manager server none 1.8K 2.7K
>>>>
>>>> Thanks,
>>>> --
>>>> Abhi Basu
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Abhi Basu
>

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Abhi Basu <90...@gmail.com>.
Thanks for the updates. I will give both options a try and report back.

If you are interested in testing with such datasets, I can help.

Thanks,

Abhi

On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Abhi,
>
> Will is right that the error is client-side, and probably happening
> because your rows are so wide.Impala typically will batch 1000 rows at a
> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
> overflow the max buffer size that Will mentioned. This seems quite probable
> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>
> I don't think his suggested workaround will help, but you can try running
> 'set batch_size=500' before running the create table or insert query.
>
> In terms of max supported columns, most of the workloads we are focusing
> on are more like typical data-warehouse tables, on the order of a couple
> hundred columns. Crossing into the 1000+ range enters "uncharted territory"
> where it's much more likely you'll hit problems like this and quite
> possibly others as well. Will be interested to hear your experiences,
> though you should probably be prepared for some rough edges.
>
> -Todd
>
> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <wdberkeley@cloudera.com
> > wrote:
>
>> Hi Abhi.
>>
>> I believe that error is actually coming from the client, not the server.
>> See e,g,
>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
>> that link is to master branch not the exact release you are using).
>>
>> If you look around there, you'll see that the max is set by something
>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>
>> I think the simple workaround would be to do the CTAS as a CTAS + insert
>> as select. Pick a condition that bipartitions the table, so you don't get
>> errors trying to double insert rows.
>>
>> -Will
>>
>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:
>>
>>> What is the limit of columns in Kudu?
>>>
>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>> trying to create a new Kudu table as select from that table. I get the
>>> following error:
>>>
>>> Error while applying Kudu session.: Incomplete: not enough space
>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>
>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see the
>>> following. What configuration needs to be tweaked?
>>>
>>>
>>> Memory usage by subsystem
>>> IdParentLimitCurrent ConsumptionPeak consumption
>>> root none 50.12G 4.97M 6.08M
>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>> code_cache-sharded_lru_cache root none 1B 1B
>>> server root none 2.3K 201.4K
>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
>>> log_block_manager server none 1.8K 2.7K
>>>
>>> Thanks,
>>> --
>>> Abhi Basu
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Abhi,

Will is right that the error is client-side, and probably happening because
your rows are so wide.Impala typically will batch 1000 rows at a time when
inserting into Kudu, so if each of your rows is 7-8KB, that will overflow
the max buffer size that Will mentioned. This seems quite probable if your
data is 1000 columns of doubles or int64s (which are 8 bytes each).

I don't think his suggested workaround will help, but you can try running
'set batch_size=500' before running the create table or insert query.

In terms of max supported columns, most of the workloads we are focusing on
are more like typical data-warehouse tables, on the order of a couple
hundred columns. Crossing into the 1000+ range enters "uncharted territory"
where it's much more likely you'll hit problems like this and quite
possibly others as well. Will be interested to hear your experiences,
though you should probably be prepared for some rough edges.

-Todd

On Tue, May 17, 2016 at 8:32 PM, William Berkeley <wd...@cloudera.com>
wrote:

> Hi Abhi.
>
> I believe that error is actually coming from the client, not the server.
> See e,g,
> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 (NB
> that link is to master branch not the exact release you are using).
>
> If you look around there, you'll see that the max is set by something
> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>
> I think the simple workaround would be to do the CTAS as a CTAS + insert
> as select. Pick a condition that bipartitions the table, so you don't get
> errors trying to double insert rows.
>
> -Will
>
> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:
>
>> What is the limit of columns in Kudu?
>>
>> I am using 1000 gen dataset, specifically the chr22 table which has
>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>> trying to create a new Kudu table as select from that table. I get the
>> following error:
>>
>> Error while applying Kudu session.: Incomplete: not enough space
>> remaining in buffer for op (required 46.7K, 6.96M already used
>>
>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see the
>> following. What configuration needs to be tweaked?
>>
>>
>> Memory usage by subsystem
>> IdParentLimitCurrent ConsumptionPeak consumption
>> root none 50.12G 4.97M 6.08M
>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>> code_cache-sharded_lru_cache root none 1B 1B
>> server root none 2.3K 201.4K
>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
>> log_block_manager server none 1.8K 2.7K
>>
>> Thanks,
>> --
>> Abhi Basu
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

Posted by William Berkeley <wd...@cloudera.com>.
Hi Abhi.

I believe that error is actually coming from the client, not the server.
See e,g,
https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
(NB
that link is to master branch not the exact release you are using).

If you look around there, you'll see that the max is set by something
called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).

I think the simple workaround would be to do the CTAS as a CTAS + insert as
select. Pick a condition that bipartitions the table, so you don't get
errors trying to double insert rows.

-Will

On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <90...@gmail.com> wrote:

> What is the limit of columns in Kudu?
>
> I am using 1000 gen dataset, specifically the chr22 table which has
> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
> trying to create a new Kudu table as select from that table. I get the
> following error:
>
> Error while applying Kudu session.: Incomplete: not enough space remaining
> in buffer for op (required 46.7K, 6.96M already used
>
> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see the
> following. What configuration needs to be tweaked?
>
>
> Memory usage by subsystem
> IdParentLimitCurrent ConsumptionPeak consumption
> root none 50.12G 4.97M 6.08M
> block_cache-sharded_lru_cache root none 937.9K 937.9K
> code_cache-sharded_lru_cache root none 1B 1B
> server root none 2.3K 201.4K
> tablet-00000000000000000000000000000000 server none 530B 200.1K
> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
> DeltaMemStores tablet-00000000000000000000000000000000 none 265B 87.8K
> log_block_manager server none 1.8K 2.7K
>
> Thanks,
> --
> Abhi Basu
>