You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by shalom sagges <sh...@gmail.com> on 2018/04/04 12:25:19 UTC

Text or....

Hi All,

A certain application is writing ~55,000 characters for a single row. Most
of these characters are entered to one column with "text" data type.

This looks insanely large for one row.
Would you suggest to change the data type from "text" to BLOB or any other
option that might fit this scenario?

Thanks!

Re: Text or....

Posted by Nicolas Guyomar <ni...@gmail.com>.
Hi Shalom,

You might want to compress on application side before inserting in
Cassandra, using the algorithm on your choice, based on compression ratio
and speed that you found acceptable with your use case


On 4 April 2018 at 14:38, shalom sagges <sh...@gmail.com> wrote:

> Thanks DuyHai!
>
> I'm using the default table compression. Is there anything else I should
> look into?
> Regarding the table compression, I understand that for write heavy tables,
> it's best to keep the default and not compress it further. Have I
> understood correctly?
>
> On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Compress it and stores it as a blob.
>> Unless you ever need to index it but I guess even with SASI indexing a so
>> huge text block is not a good idea
>>
>> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <sh...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> A certain application is writing ~55,000 characters for a single row.
>>> Most of these characters are entered to one column with "text" data type.
>>>
>>> This looks insanely large for one row.
>>> Would you suggest to change the data type from "text" to BLOB or any
>>> other option that might fit this scenario?
>>>
>>> Thanks!
>>>
>>
>>
>

Re: Text or....

Posted by shalom sagges <sh...@gmail.com>.
Thanks DuyHai!

I'm using the default table compression. Is there anything else I should
look into?
Regarding the table compression, I understand that for write heavy tables,
it's best to keep the default and not compress it further. Have I
understood correctly?

On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan <do...@gmail.com> wrote:

> Compress it and stores it as a blob.
> Unless you ever need to index it but I guess even with SASI indexing a so
> huge text block is not a good idea
>
> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <sh...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>

Re: Text or....

Posted by DuyHai Doan <do...@gmail.com>.
Compress it and stores it as a blob.
Unless you ever need to index it but I guess even with SASI indexing a so
huge text block is not a good idea

On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges <sh...@gmail.com>
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>

Re: Text or....

Posted by Jon Haddad <jo...@jonhaddad.com>.
Depending on the compression rate, I think it would generate less garbage on the Cassandra side if you compressed it client side.  Something to test out.


> On Apr 4, 2018, at 7:19 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> Compressing server side and validating checksums is hugely important in the more frequently used versions of cassandra - so since you probably want to run compression on the server anyway, I’m not sure why you’d compress it twice 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Apr 4, 2018, at 6:23 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
> 
>> Compressing client-side is better because it will save:
>> 
>> 1) a lot of bandwidth on the network
>> 2) a lot of Cassandra CPU because no decompression server-side
>> 3) a lot of Cassandra HEAP because the compressed blob should be relatively small (text data compress very well) compared to the raw size
>> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <jeronimo.barros@gmail.com <ma...@gmail.com>> wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <shalomsagges@gmail.com <ma...@gmail.com>> wrote:
>> Hi All, 
>> 
>> A certain application is writing ~55,000 characters for a single row. Most of these characters are entered to one column with "text" data type. 
>> 
>> This looks insanely large for one row. 
>> Would you suggest to change the data type from "text" to BLOB or any other option that might fit this scenario?
>> 
>> Thanks!
>> 
>> 


Re: Text or....

Posted by Jeff Jirsa <jj...@gmail.com>.
Compressing server side and validating checksums is hugely important in the more frequently used versions of cassandra - so since you probably want to run compression on the server anyway, I’m not sure why you’d compress it twice 

-- 
Jeff Jirsa


> On Apr 4, 2018, at 6:23 AM, DuyHai Doan <do...@gmail.com> wrote:
> 
> Compressing client-side is better because it will save:
> 
> 1) a lot of bandwidth on the network
> 2) a lot of Cassandra CPU because no decompression server-side
> 3) a lot of Cassandra HEAP because the compressed blob should be relatively small (text data compress very well) compared to the raw size
> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <je...@gmail.com> wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <sh...@gmail.com> wrote:
>>> Hi All, 
>>> 
>>> A certain application is writing ~55,000 characters for a single row. Most of these characters are entered to one column with "text" data type. 
>>> 
>>> This looks insanely large for one row. 
>>> Would you suggest to change the data type from "text" to BLOB or any other option that might fit this scenario?
>>> 
>>> Thanks!
>> 
> 

Re: Text or....

Posted by DuyHai Doan <do...@gmail.com>.
Compressing client-side is better because it will save:

1) a lot of bandwidth on the network
2) a lot of Cassandra CPU because no decompression server-side
3) a lot of Cassandra HEAP because the compressed blob should be relatively
small (text data compress very well) compared to the raw size

On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <
jeronimo.barros@gmail.com> wrote:

> Hi,
>
> We use a pseudo file-system table where the chunks are blobs of 64 KB and
> we never had any performance issue.
>
> Primary-key structure is ((file-uuid), chunck-id).
>
> Jero
>
> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <sh...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>

Re: Text or....

Posted by "Jeronimo de A. Barros" <je...@gmail.com>.
Hi,

We use a pseudo file-system table where the chunks are blobs of 64 KB and
we never had any performance issue.

Primary-key structure is ((file-uuid), chunck-id).

Jero

On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges <sh...@gmail.com>
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>