You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dop Sun <su...@dopsun.com> on 2010/04/28 14:37:45 UTC

What's the best maximum size for a single column?

Hi,

 

Yesterday, I saw a lot of discussion about how to store a file (big one). It
looks like the suggestion is store in multiple rows (even not multiple
column in a single row).

 

My question is:

Is there any best maximum column size which can help to make the decision on
the segment size? Is this related with the memory size or other stuff?

 

Thanks,

Regards,

Dop

Inserting files to Cassandra timeouts

Posted by Spacejatsi <sp...@gmail.com>.

Hi all,

i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data. 
I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and I finally get an timeout to the client. 
It is not going to OOM, but it just jams. 

Here is what I had last marks in log file:
NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is 4108386304



I think that I must have something wrong in my configurations or in how I use cassandra, because here people are inserting 10 times more stuff and it works. 

Column family I using:
<ColumnFamily CompareWith="BytesType" Name="Standard1"/>
Basically inserting with key name is "Folder_name" and column name is "file name" and value is the file content. 
I tried with Hector(mainly) and directly using thrift(insert and batch_mutate). 

In my case, the data does not need to readable immediately after insert, but I don't know it that helps in anyway. 


My environment :
mac and/or linux, tested in both
java 1.6.0_17 
Cassandra 0.6.1



 <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
<CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
<RowWarningThresholdInMB>512</RowWarningThresholdInMB>
  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  <MemtableThroughputInMB>64</MemtableThroughputInMB>
  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
  <ConcurrentReads>8</ConcurrentReads>
  <ConcurrentWrites>32</ConcurrentWrites>
  <CommitLogSync>batch</CommitLogSync>
  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
  <GCGraceSeconds>500</GCGraceSeconds>

JVM_OPTS=" \
        -server \
        -Xms3G \
        -Xmx3G \
        -XX:PermSize=512m \
        -XX:MaxPermSize=800m \
        -XX:MaxNewSize=256m \
        -XX:NewSize=128m \
        -XX:TargetSurvivorRatio=90 \
        -XX:+AggressiveOpts \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:+HeapDumpOnOutOfMemoryError \
        -XX:SurvivorRatio=128 \
        -XX:MaxTenuringThreshold=0 \
        -XX:+DisableExplicitGC \
        -Dcom.sun.management.jmxremote.port=8080 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"

RE: What's the best maximum size for a single column?

Posted by Mark Jones <MJ...@imagehawk.com>.

The max size would probably be best determined by looking at the size of your MemTable

  <!--
   ~ Flush memtable after this much data has been inserted, including
   ~ overwritten data.  There is one memtable per column family, and
   ~ this threshold is based solely on the amount of data stored, not
   ~ actual heap memory usage (there is some overhead in indexing the
   ~ columns).
  -->
  <MemtableThroughputInMB>64</MemtableThroughputInMB>

Read repair is on a per column basis, every column gets a timestamp, and the overhead of a name.  So, balance those 3 out and you have a pretty good idea of what to do.

From: Dop Sun [mailto:sunht@dopsun.com]
Sent: Thursday, April 29, 2010 7:38 AM
To: user@cassandra.apache.org
Subject: RE: What's the best maximum size for a single column?

Is there any practical number can refer to?

Like what's the size (big one) used in single columns in your application?

From: uncle mantis [mailto:unclemantis@gmail.com]
Sent: Thursday, April 29, 2010 1:57 AM
To: user@cassandra.apache.org
Subject: Re: What's the best maximum size for a single column?

There is no column size limitation. As to performance due to the size of a column and with the speeds that Cassandra are running at, I don't belive it would make a bit of a difference if it was 1 byte or a million bytes.

Can anyone here prove me right or wrong?

Regards,

Michael
On Wed, Apr 28, 2010 at 7:37 AM, Dop Sun <su...@dopsun.com>> wrote:
Hi,

Yesterday, I saw a lot of discussion about how to store a file (big one). It looks like the suggestion is store in multiple rows (even not multiple column in a single row).

My question is:
Is there any best maximum column size which can help to make the decision on the segment size? Is this related with the memory size or other stuff?

Thanks,
Regards,
Dop

RE: What's the best maximum size for a single column?

Posted by Dop Sun <su...@dopsun.com>.

Is there any practical number can refer to?

 

Like what’s the size (big one) used in single columns in your application? 

 

From: uncle mantis [mailto:unclemantis@gmail.com] 
Sent: Thursday, April 29, 2010 1:57 AM
To: user@cassandra.apache.org
Subject: Re: What's the best maximum size for a single column?

 

There is no column size limitation. As to performance due to the size of a column and with the speeds that Cassandra are running at, I don't belive it would make a bit of a difference if it was 1 byte or a million bytes.

 

Can anyone here prove me right or wrong?

Regards,

Michael



On Wed, Apr 28, 2010 at 7:37 AM, Dop Sun <su...@dopsun.com> wrote:

Hi,

 

Yesterday, I saw a lot of discussion about how to store a file (big one). It looks like the suggestion is store in multiple rows (even not multiple column in a single row).

 

My question is:

Is there any best maximum column size which can help to make the decision on the segment size? Is this related with the memory size or other stuff?

 

Thanks,

Regards,

Dop

Re: What's the best maximum size for a single column?

Posted by uncle mantis <un...@gmail.com>.

There is no column size limitation. As to performance due to the size of a
column and with the speeds that Cassandra are running at, I don't belive it
would make a bit of a difference if it was 1 byte or a million bytes.

Can anyone here prove me right or wrong?

Regards,

Michael

On Wed, Apr 28, 2010 at 7:37 AM, Dop Sun <su...@dopsun.com> wrote:

>  Hi,
>
>
>
> Yesterday, I saw a lot of discussion about how to store a file (big one).
> It looks like the suggestion is store in multiple rows (even not multiple
> column in a single row).
>
>
>
> My question is:
>
> Is there any best maximum column size which can help to make the decision
> on the segment size? Is this related with the memory size or other stuff?
>
>
>
> Thanks,
>
> Regards,
>
> Dop
>