You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Neophytos Demetriou <ne...@gmail.com> on 2022/01/27 22:11:27 UTC

possible cell buffer size issue

Hi,

I'm new to the list but not new to Cassandra. I'm writing an app on top of
C* and I have come across an issue (huge cell buffer size after applying a
mutation) that I haven't been able to resolve yet. I would appreciate any
suggestions/help to resolve this. Here are the details:

1. I have a column family defined as follows:

TableMetadata.Builder metadata =
TableMetadata
    .builder(KEYSPACE1, CF_STANDARD1)
    .addPartitionKeyColumn("key", Int32Type.instance)
    .addRegularColumn(
    "a",
    MapType.getInstance(AsciiType.instance,
SetType.getInstance(UTF8Type.instance,false),false))
    .addRegularColumn("b", UTF8Type.instance);

2. And here's a test that I wrote and works on cassandra-4.0 branch:

Row.Builder builder = BTreeRow.unsortedBuilder();
builder.newRow(Clustering.EMPTY);

ColumnMetadata def = metadata.getColumn(new ColumnIdentifier("b", true));

Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(),
UTF8Type.instance.decompose("/b1"));
builder.addCell(cell);

PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk,
builder.build());
new Mutation(update).apply();

Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build());
assertEquals(3, row.getCell(def).buffer().array().length);

3. However, in my app when I do the getOnlyRow after applying the mutation
the string value of b is 3 but the buffer().array().length is 1048576.

4. Restarting the app (which starts the cassandra daemon), fixes the issue
i.e. getOnlyRow returns the correct buffer size.

5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.

If you need further info, please do not hesitate to ask.

- Neophytos

PS. I'm experimenting with C* internals for the first time so it's very
likely I'm doing something wrong.

Re: possible cell buffer size issue

Posted by Neophytos Demetriou <ne...@gmail.com>.
Thank you Bowen, I ended up using the type of the cell to get the string
for now.

On Fri, Jan 28, 2022 at 5:01 AM Bowen Song <bo...@bso.ng> wrote:

> Just FYI, you may want to do put the return value of cell.buffer() in a
> variable instead of calling it twice, because there's no guarantee that you
> will get the same (cached) ByteBuffer object on the second call. Also, you
> may want to do a rewind() first, just in case...
> On 28/01/2022 09:22, Neophytos Demetriou wrote:
>
> I've solved the issue with the following for the time being:
>
> byte[] arr = new byte[cell.buffer().remaining()];cell.buffer().get(arr);
>
> I shouldn't have been calling array() in the first place it seems.
>
> - Neophytos
>
> On Fri, Jan 28, 2022 at 2:06 AM Neophytos Demetriou <ne...@gmail.com>
> wrote:
>
>> Hi, thanks for the prompt reply.
>>
>> I've tried this. Here's what I'm writing:
>> bytes: 3 capacity: 3 limit: 3 offset: 0
>>
>> Here's what I'm reading:
>> cell buffer size: 1048576 capacity: 1048576 limit: 212 arrayOffset: 0
>>
>> It still does not seem right. I would have expected Cassandra to allocate
>> a buffer the size of the text field. Unless I'm missing something,
>> org.apache.cassandra.db.marshal.AbstractType#read already does this. It
>> calls org.apache.cassandra.utils.ByteBufferUtil#read that allocates a byte
>> array the size of the given length. I'm still checking but it could be that
>> the readUnsignedVInt call in AbstractType#read reads the wrong thing under
>> the given circumstances (very likely an issue on my end). I would welcome
>> any ideas on how to debug this.
>>
>> - Neophytos
>>
>> On Thu, Jan 27, 2022 at 5:43 PM Bowen Song <bo...@bso.ng> wrote:
>>
>>> I'm not a Java developer, but based on my best knowledge,
>>> ByteBuffer.array() method returns the whole byte array, not just the part
>>> of the byte array that's meaningful (i.e. has ever been written to). You
>>> may want to check the difference between the bb.capacity() and bb.limit(),
>>> and also check the bb.arrayOffset() because the first element is not always
>>> at beginning of the byte array.
>>> On 27/01/2022 22:11, Neophytos Demetriou wrote:
>>>
>>> Hi,
>>>
>>> I'm new to the list but not new to Cassandra. I'm writing an app on top
>>> of C* and I have come across an issue (huge cell buffer size after applying
>>> a mutation) that I haven't been able to resolve yet. I would appreciate any
>>> suggestions/help to resolve this. Here are the details:
>>>
>>> 1. I have a column family defined as follows:
>>>
>>> TableMetadata.Builder metadata =
>>> TableMetadata
>>>     .builder(KEYSPACE1, CF_STANDARD1)
>>>     .addPartitionKeyColumn("key", Int32Type.instance)
>>>     .addRegularColumn(
>>>     "a",    MapType.getInstance(AsciiType.instance, SetType.getInstance(UTF8Type.instance,false),false))
>>>     .addRegularColumn("b", UTF8Type.instance);
>>>
>>> 2. And here's a test that I wrote and works on cassandra-4.0 branch:
>>>
>>> Row.Builder builder = BTreeRow.unsortedBuilder();builder.newRow(Clustering.EMPTY);ColumnMetadata def = metadata.getColumn(new ColumnIdentifier("b", true));Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(), UTF8Type.instance.decompose("/b1"));builder.addCell(cell);PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk, builder.build());new Mutation(update).apply();Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build());assertEquals(3, row.getCell(def).buffer().array().length);
>>>
>>> 3. However, in my app when I do the getOnlyRow after applying the
>>> mutation the string value of b is 3 but the buffer().array().length is
>>> 1048576.
>>>
>>> 4. Restarting the app (which starts the cassandra daemon), fixes the
>>> issue i.e. getOnlyRow returns the correct buffer size.
>>>
>>> 5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.
>>>
>>> If you need further info, please do not hesitate to ask.
>>>
>>> - Neophytos
>>>
>>> PS. I'm experimenting with C* internals for the first time so it's very
>>> likely I'm doing something wrong.
>>>
>>>
>>>

Re: possible cell buffer size issue

Posted by Bowen Song <bo...@bso.ng>.
Just FYI, you may want to do put the return value of cell.buffer() in a 
variable instead of calling it twice, because there's no guarantee that 
you will get the same (cached) ByteBuffer object on the second call. 
Also, you may want to do a rewind() first, just in case...

On 28/01/2022 09:22, Neophytos Demetriou wrote:
> I've solved the issue with the following for the time being:
> byte[] arr =new byte[cell.buffer().remaining()]; cell.buffer().get(arr);
> I shouldn't have been calling array() in the first place it seems.
>
> - Neophytos
>
> On Fri, Jan 28, 2022 at 2:06 AM Neophytos Demetriou 
> <ne...@gmail.com> wrote:
>
>     Hi, thanks for the prompt reply.
>
>     I've tried this. Here's what I'm writing:
>     bytes: 3 capacity: 3 limit: 3 offset: 0
>
>     Here's what I'm reading:
>     cell buffer size: 1048576 capacity: 1048576 limit: 212 arrayOffset: 0
>
>     It still does not seem right. I would have expected Cassandra to
>     allocate a buffer the size of the text field. Unless I'm missing
>     something, org.apache.cassandra.db.marshal.AbstractType#read
>     already does this. It calls
>     org.apache.cassandra.utils.ByteBufferUtil#read that allocates a
>     byte array the size of the given length. I'm still checking but it
>     could be that the readUnsignedVInt call in AbstractType#read reads
>     the wrong thing under the given circumstances (very likely an
>     issue on my end). I would welcome any ideas on how to debug this.
>
>     - Neophytos
>
>     On Thu, Jan 27, 2022 at 5:43 PM Bowen Song <bo...@bso.ng> wrote:
>
>         I'm not a Java developer, but based on my best knowledge,
>         ByteBuffer.array() method returns the whole byte array, not
>         just the part of the byte array that's meaningful (i.e. has
>         ever been written to). You may want to check the difference
>         between the bb.capacity() and bb.limit(), and also check the
>         bb.arrayOffset() because the first element is not always at
>         beginning of the byte array.
>
>         On 27/01/2022 22:11, Neophytos Demetriou wrote:
>>         Hi,
>>
>>         I'm new to the list but not new to Cassandra. I'm writing an
>>         app on top of C* and I have come across an issue (huge cell
>>         buffer size after applying a mutation) that I haven't been
>>         able to resolve yet. I would appreciate any suggestions/help
>>         to resolve this. Here are the details:
>>
>>         1. I have a column family defined as follows:
>>         TableMetadata.Builder metadata =
>>         TableMetadata
>>              .builder(KEYSPACE1, CF_STANDARD1)
>>              .addPartitionKeyColumn("key", Int32Type.instance)
>>              .addRegularColumn(
>>              "a", MapType.getInstance(AsciiType.instance, SetType.getInstance(UTF8Type.instance,false),false))
>>              .addRegularColumn("b", UTF8Type.instance);
>>         2. And here's a test that I wrote and works on cassandra-4.0
>>         branch:
>>         Row.Builder builder = BTreeRow.unsortedBuilder(); builder.newRow(Clustering.EMPTY); ColumnMetadata def =metadata.getColumn(new ColumnIdentifier("b", true)); Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(), UTF8Type.instance.decompose("/b1")); builder.addCell(cell); PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk, builder.build()); new Mutation(update).apply(); Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build()); assertEquals(3, row.getCell(def).buffer().array().length);
>>         3. However, in my app when I do the getOnlyRow after applying
>>         the mutation the string value of b is 3 but the
>>         buffer().array().length is 1048576.
>>
>>         4. Restarting the app (which starts the cassandra daemon),
>>         fixes the issue i.e. getOnlyRow returns the correct buffer size.
>>
>>         5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.
>>
>>         If you need further info, please do not hesitate to ask.
>>
>>         - Neophytos
>>
>>         PS. I'm experimenting with C* internals for the first time so
>>         it's very likely I'm doing something wrong.
>>
>>

Re: possible cell buffer size issue

Posted by Neophytos Demetriou <ne...@gmail.com>.
I've solved the issue with the following for the time being:

byte[] arr = new byte[cell.buffer().remaining()];
cell.buffer().get(arr);

I shouldn't have been calling array() in the first place it seems.

- Neophytos

On Fri, Jan 28, 2022 at 2:06 AM Neophytos Demetriou <ne...@gmail.com>
wrote:

> Hi, thanks for the prompt reply.
>
> I've tried this. Here's what I'm writing:
> bytes: 3 capacity: 3 limit: 3 offset: 0
>
> Here's what I'm reading:
> cell buffer size: 1048576 capacity: 1048576 limit: 212 arrayOffset: 0
>
> It still does not seem right. I would have expected Cassandra to allocate
> a buffer the size of the text field. Unless I'm missing something,
> org.apache.cassandra.db.marshal.AbstractType#read already does this. It
> calls org.apache.cassandra.utils.ByteBufferUtil#read that allocates a byte
> array the size of the given length. I'm still checking but it could be that
> the readUnsignedVInt call in AbstractType#read reads the wrong thing under
> the given circumstances (very likely an issue on my end). I would welcome
> any ideas on how to debug this.
>
> - Neophytos
>
> On Thu, Jan 27, 2022 at 5:43 PM Bowen Song <bo...@bso.ng> wrote:
>
>> I'm not a Java developer, but based on my best knowledge,
>> ByteBuffer.array() method returns the whole byte array, not just the part
>> of the byte array that's meaningful (i.e. has ever been written to). You
>> may want to check the difference between the bb.capacity() and bb.limit(),
>> and also check the bb.arrayOffset() because the first element is not always
>> at beginning of the byte array.
>> On 27/01/2022 22:11, Neophytos Demetriou wrote:
>>
>> Hi,
>>
>> I'm new to the list but not new to Cassandra. I'm writing an app on top
>> of C* and I have come across an issue (huge cell buffer size after applying
>> a mutation) that I haven't been able to resolve yet. I would appreciate any
>> suggestions/help to resolve this. Here are the details:
>>
>> 1. I have a column family defined as follows:
>>
>> TableMetadata.Builder metadata =
>> TableMetadata
>>     .builder(KEYSPACE1, CF_STANDARD1)
>>     .addPartitionKeyColumn("key", Int32Type.instance)
>>     .addRegularColumn(
>>     "a",    MapType.getInstance(AsciiType.instance, SetType.getInstance(UTF8Type.instance,false),false))
>>     .addRegularColumn("b", UTF8Type.instance);
>>
>> 2. And here's a test that I wrote and works on cassandra-4.0 branch:
>>
>> Row.Builder builder = BTreeRow.unsortedBuilder();builder.newRow(Clustering.EMPTY);ColumnMetadata def = metadata.getColumn(new ColumnIdentifier("b", true));Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(), UTF8Type.instance.decompose("/b1"));builder.addCell(cell);PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk, builder.build());new Mutation(update).apply();Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build());assertEquals(3, row.getCell(def).buffer().array().length);
>>
>> 3. However, in my app when I do the getOnlyRow after applying the
>> mutation the string value of b is 3 but the buffer().array().length is
>> 1048576.
>>
>> 4. Restarting the app (which starts the cassandra daemon), fixes the
>> issue i.e. getOnlyRow returns the correct buffer size.
>>
>> 5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.
>>
>> If you need further info, please do not hesitate to ask.
>>
>> - Neophytos
>>
>> PS. I'm experimenting with C* internals for the first time so it's very
>> likely I'm doing something wrong.
>>
>>
>>

Re: possible cell buffer size issue

Posted by Neophytos Demetriou <ne...@gmail.com>.
Hi, thanks for the prompt reply.

I've tried this. Here's what I'm writing:
bytes: 3 capacity: 3 limit: 3 offset: 0

Here's what I'm reading:
cell buffer size: 1048576 capacity: 1048576 limit: 212 arrayOffset: 0

It still does not seem right. I would have expected Cassandra to allocate a
buffer the size of the text field. Unless I'm missing something,
org.apache.cassandra.db.marshal.AbstractType#read already does this. It
calls org.apache.cassandra.utils.ByteBufferUtil#read that allocates a byte
array the size of the given length. I'm still checking but it could be that
the readUnsignedVInt call in AbstractType#read reads the wrong thing under
the given circumstances (very likely an issue on my end). I would welcome
any ideas on how to debug this.

- Neophytos

On Thu, Jan 27, 2022 at 5:43 PM Bowen Song <bo...@bso.ng> wrote:

> I'm not a Java developer, but based on my best knowledge,
> ByteBuffer.array() method returns the whole byte array, not just the part
> of the byte array that's meaningful (i.e. has ever been written to). You
> may want to check the difference between the bb.capacity() and bb.limit(),
> and also check the bb.arrayOffset() because the first element is not always
> at beginning of the byte array.
> On 27/01/2022 22:11, Neophytos Demetriou wrote:
>
> Hi,
>
> I'm new to the list but not new to Cassandra. I'm writing an app on top of
> C* and I have come across an issue (huge cell buffer size after applying a
> mutation) that I haven't been able to resolve yet. I would appreciate any
> suggestions/help to resolve this. Here are the details:
>
> 1. I have a column family defined as follows:
>
> TableMetadata.Builder metadata =
> TableMetadata
>     .builder(KEYSPACE1, CF_STANDARD1)
>     .addPartitionKeyColumn("key", Int32Type.instance)
>     .addRegularColumn(
>     "a",    MapType.getInstance(AsciiType.instance, SetType.getInstance(UTF8Type.instance,false),false))
>     .addRegularColumn("b", UTF8Type.instance);
>
> 2. And here's a test that I wrote and works on cassandra-4.0 branch:
>
> Row.Builder builder = BTreeRow.unsortedBuilder();builder.newRow(Clustering.EMPTY);ColumnMetadata def = metadata.getColumn(new ColumnIdentifier("b", true));Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(), UTF8Type.instance.decompose("/b1"));builder.addCell(cell);PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk, builder.build());new Mutation(update).apply();Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build());assertEquals(3, row.getCell(def).buffer().array().length);
>
> 3. However, in my app when I do the getOnlyRow after applying the mutation
> the string value of b is 3 but the buffer().array().length is 1048576.
>
> 4. Restarting the app (which starts the cassandra daemon), fixes the issue
> i.e. getOnlyRow returns the correct buffer size.
>
> 5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.
>
> If you need further info, please do not hesitate to ask.
>
> - Neophytos
>
> PS. I'm experimenting with C* internals for the first time so it's very
> likely I'm doing something wrong.
>
>
>

Re: possible cell buffer size issue

Posted by Bowen Song <bo...@bso.ng>.
I'm not a Java developer, but based on my best knowledge, 
ByteBuffer.array() method returns the whole byte array, not just the 
part of the byte array that's meaningful (i.e. has ever been written 
to). You may want to check the difference between the bb.capacity() and 
bb.limit(), and also check the bb.arrayOffset() because the first 
element is not always at beginning of the byte array.

On 27/01/2022 22:11, Neophytos Demetriou wrote:
> Hi,
>
> I'm new to the list but not new to Cassandra. I'm writing an app on 
> top of C* and I have come across an issue (huge cell buffer size after 
> applying a mutation) that I haven't been able to resolve yet. I would 
> appreciate any suggestions/help to resolve this. Here are the details:
>
> 1. I have a column family defined as follows:
> TableMetadata.Builder metadata =
> TableMetadata
>      .builder(KEYSPACE1, CF_STANDARD1)
>      .addPartitionKeyColumn("key", Int32Type.instance)
>      .addRegularColumn(
>      "a", MapType.getInstance(AsciiType.instance, SetType.getInstance(UTF8Type.instance,false),false))
>      .addRegularColumn("b", UTF8Type.instance);
> 2. And here's a test that I wrote and works on cassandra-4.0 branch:
> Row.Builder builder = BTreeRow.unsortedBuilder(); builder.newRow(Clustering.EMPTY); ColumnMetadata def =metadata.getColumn(new ColumnIdentifier("b", true)); Cell<?> cell = BufferCell.live(def, System.currentTimeMillis(), UTF8Type.instance.decompose("/b1")); builder.addCell(cell); PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, dk, builder.build()); new Mutation(update).apply(); Row row = Util.getOnlyRow(Util.cmd(cfs, dk).withLimit(1).build()); assertEquals(3, row.getCell(def).buffer().array().length);
> 3. However, in my app when I do the getOnlyRow after applying the 
> mutation the string value of b is 3 but the buffer().array().length is 
> 1048576.
>
> 4. Restarting the app (which starts the cassandra daemon), fixes the 
> issue i.e. getOnlyRow returns the correct buffer size.
>
> 5. I'm importing cassandra-all 4.0.1 and the app uses jdk-11.
>
> If you need further info, please do not hesitate to ask.
>
> - Neophytos
>
> PS. I'm experimenting with C* internals for the first time so it's 
> very likely I'm doing something wrong.
>
>