You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/04/28 16:47:49 UTC

Re: compaction slow while sstable>25GB,limitation of the sstable size?

Compaction time is proportional to the size of the sstable, yes.  Not
sure how it could be otherwise.  And it does generate a lot of
garbage.  So unless you are seeing concurrent failures in the GC and
corresponding large pause times, your heap should be fine, as long as
the rows you are compacting aren't too large.

2010/4/28 casablinca126.com <ca...@126.com>:
>  hi,
>        The compaction process is very slow, when the size of new generating sstable file grows upon 25GB;
> at the meantime, the garbage collector is running frequently.
>        Firstly, I have a question that, is there a limitation of the sstable size? if not, is 2GB heap size not
> enough for processing such a large file?
>        I'm using cassandara-0.6.1, the heap size of jvm is 2GB(maximum in 32-bit system) .
>
>        Thanks in advance !
> Best Regards,
> Cao Jiguang
>
> --------------
> casablinca126.com
> 2010-04-28
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Re: Re: compaction slow while sstable>25GB,limitation of thesstablesize?

Posted by Schubert Zhang <zs...@gmail.com>.
I have ever modify the code to set INDEX_INTERVAL = 512, to decrease the
memory usage. And it seems working fine.
Is it right?

2010/4/30 casablinca126.com <ca...@126.com>

> hi,
>        It seems changing the INDEX_INTERVAL with conflict with
> AntiEntropyService, right?
>        I will reconstruct my sstables.
>        Thank you, Jonathan!
> cheers,
>        Cao Jiguang
>
>
> ------------------
> casablinca126.com
> 2010-04-30
>
> -------------------------------------------------------------
> �����ˣ�Jonathan Ellis
> �������ڣ�2010-04-29 20:54:03
> �ռ��ˣ�user@cassandra.apache.org
> ���ͣ�
> ���⣺Re: Re: compaction slow while sstable>25GB,limitation of thesstablesize?
>
> 2010/4/29 casablinca126.com <ca...@126.com>:
> > Hi,
> > ???�Now I start to know what's really happenning. The INDEX_INTERVAL(in
> IndexSummary.java) was set to be 4; so at least 1/4
> > of the indices are in the heap. For a node with 20M columns, most of the
> heap is occupied by indices, and of course a poor performance
> > with processing large files.
> > ???�Is it possible to modify the INDEX_INTERVAL without reconstruct the
> sstables? I modified the code, �and restart every node,
> > but "NotFoundException()" is reported when read the columns.
>
> No.  INDEX_INTERVAL is not intended to be configurable.  (It is
> all-caps, the convention for constants...)
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Re: Re: compaction slow while sstable>25GB,limitation of thesstablesize?

Posted by "casablinca126.com" <ca...@126.com>.
hi,	
	It seems changing the INDEX_INTERVAL with conflict with AntiEntropyService, right?
	I will reconstruct my sstables.
	Thank you, Jonathan!
cheers,
	Cao Jiguang
 	

------------------				 
casablinca126.com
2010-04-30

-------------------------------------------------------------
�����ˣ�Jonathan Ellis
�������ڣ�2010-04-29 20:54:03
�ռ��ˣ�user@cassandra.apache.org
���ͣ�
���⣺Re: Re: compaction slow while sstable>25GB,limitation of thesstablesize?

2010/4/29 casablinca126.com <ca...@126.com>:
> Hi,
> ???�Now I start to know what's really happenning. The INDEX_INTERVAL(in IndexSummary.java) was set to be 4; so at least 1/4
> of the indices are in the heap. For a node with 20M columns, most of the heap is occupied by indices, and of course a poor performance
> with processing large files.
> ???�Is it possible to modify the INDEX_INTERVAL without reconstruct the sstables? I modified the code, �and restart every node,
> but "NotFoundException()" is reported when read the columns.

No.  INDEX_INTERVAL is not intended to be configurable.  (It is
all-caps, the convention for constants...)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Re: compaction slow while sstable>25GB,limitation of the sstablesize?

Posted by Jonathan Ellis <jb...@gmail.com>.
2010/4/29 casablinca126.com <ca...@126.com>:
> Hi,
>        Now I start to know what's really happenning. The INDEX_INTERVAL(in IndexSummary.java) was set to be 4; so at least 1/4
> of the indices are in the heap. For a node with 20M columns, most of the heap is occupied by indices, and of course a poor performance
> with processing large files.
>        Is it possible to modify the INDEX_INTERVAL without reconstruct the sstables? I modified the code,  and restart every node,
> but "NotFoundException()" is reported when read the columns.

No.  INDEX_INTERVAL is not intended to be configurable.  (It is
all-caps, the convention for constants...)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Re: compaction slow while sstable>25GB,limitation of the sstablesize?

Posted by "casablinca126.com" <ca...@126.com>.
Hi,	
	Now I start to know what's really happenning. The INDEX_INTERVAL(in IndexSummary.java) was set to be 4; so at least 1/4 
of the indices are in the heap. For a node with 20M columns, most of the heap is occupied by indices, and of course a poor performance
with processing large files.
	Is it possible to modify the INDEX_INTERVAL without reconstruct the sstables? I modified the code,  and restart every node,
but "NotFoundException()" is reported when read the columns.
	Thanks !

Best regards,
	Cao Jiguang

------------------				 
casablinca126.com
2010-04-29

-------------------------------------------------------------
�����ˣ�Jonathan Ellis
�������ڣ�2010-04-28 22:48:44
�ռ��ˣ�user@cassandra.apache.org
���ͣ�
���⣺Re: compaction slow while sstable>25GB,limitation of the sstablesize?

Compaction time is proportional to the size of the sstable, yes.  Not
sure how it could be otherwise.  And it does generate a lot of
garbage.  So unless you are seeing concurrent failures in the GC and
corresponding large pause times, your heap should be fine, as long as
the rows you are compacting aren't too large.

2010/4/28 casablinca126.com <ca...@126.com>:
> �hi,
> ???�The compaction process is very slow, when the size of new generating sstable file grows upon 25GB;
> at the meantime, the garbage collector is running frequently.
> ???�Firstly, I have a question that, is there a limitation of the sstable size? if not, is 2GB heap size not
> enough for processing such a large file?
> ???�I'm using cassandara-0.6.1, the heap size of jvm is 2GB(maximum in 32-bit system) .
>
> ???�Thanks in advance !
> Best Regards,
> Cao Jiguang
>
> --------------
> casablinca126.com
> 2010-04-28
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com