You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Weijun Li <we...@gmail.com> on 2010/04/03 00:57:11 UTC

Heap sudden jump during import

I'm running a test to write 30 million columns (700bytes each) to Cassandra:
the process ran smoothly for about 20mil then the heap usage suddenly jumped
from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
will freeze for long time (terrible latency, no response to nodetool that I
have to stop the import client ) before it comes back to normal . It's a
single node cluster with JVM maximum heap size of 3GB. So what could cause
this spike? What kind of tool can I use to find out what are the objects
that are filling the additional 1GB heap? I did a heap dump but could get
jhat to work to browse the dumped file.

Thanks,

-Weijun

Re: Heap sudden jump during import

Posted by Tatu Saloranta <ts...@gmail.com>.
On Wed, Apr 7, 2010 at 1:51 PM, Eric Evans <ee...@rackspace.com> wrote:
> On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote:
>> On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight <be...@gmail.com>
>> wrote:
>> > When import, all data in json file will load in memory. So that, you
>> can not
>> > import large data.
>> > You need to export large sstable file to many small json files, and
>> run
>> > import.
>>
>> Why would you ever read the whole file in memory? JSON is very easily
>> streamable. Or does the whole data set need to be validated or
>> something (I assume not, if file splitting could be used). Perhaps it
>> is just an implementation flaw in importer tool.
>
> It's been awhile, but if I'm not mistaken, this is because we're writing
> SSTables and the records must be written in decorated-key sorted order.

Ok. It might make sense to solve this then, for example by using
external sorting?

(reminds me that I must clean up and release basic on-disk merge sort
code that seems to be something that is not included in existing
commons lib, oddly enough -- we used it for this purpose, pre-sorting
data for systems that required it, or benefited heavily)

-+ Tatu +-

Re: Heap sudden jump during import

Posted by Eric Evans <ee...@rackspace.com>.
On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote:
> On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight <be...@gmail.com>
> wrote:
> > When import, all data in json file will load in memory. So that, you
> can not
> > import large data.
> > You need to export large sstable file to many small json files, and
> run
> > import.
> 
> Why would you ever read the whole file in memory? JSON is very easily
> streamable. Or does the whole data set need to be validated or
> something (I assume not, if file splitting could be used). Perhaps it
> is just an implementation flaw in importer tool. 

It's been awhile, but if I'm not mistaken, this is because we're writing
SSTables and the records must be written in decorated-key sorted order.

-- 
Eric Evans
eevans@rackspace.com


Re: Heap sudden jump during import

Posted by Tatu Saloranta <ts...@gmail.com>.
On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight <be...@gmail.com> wrote:
> When import, all data in json file will load in memory. So that, you can not
> import large data.
> You need to export large sstable file to many small json files, and run
> import.

Why would you ever read the whole file in memory? JSON is very easily
streamable. Or does the whole data set need to be validated or
something (I assume not, if file splitting could be used). Perhaps it
is just an implementation flaw in importer tool.

-+ Tatu +-

Re: Heap sudden jump during import

Posted by JKnight JKnight <be...@gmail.com>.
When import, all data in json file will load in memory. So that, you can not
import large data.
You need to export large sstable file to many small json files, and run
import.


On Mon, Apr 5, 2010 at 5:26 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Usually sudden heap jumps involve compacting large rows.
>
> 0.6 (since beta3) includes a warning log when it finishes compacts a
> row over 500MB by default, in the hopes that this will give you enough
> time to fix things before whatever is making large rows makes one too
> large to fit in memory.
>
> On Fri, Apr 2, 2010 at 4:57 PM, Weijun Li <we...@gmail.com> wrote:
> > I'm running a test to write 30 million columns (700bytes each) to
> Cassandra:
> > the process ran smoothly for about 20mil then the heap usage suddenly
> jumped
> > from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> > will freeze for long time (terrible latency, no response to nodetool that
> I
> > have to stop the import client ) before it comes back to normal . It's a
> > single node cluster with JVM maximum heap size of 3GB. So what could
> cause
> > this spike? What kind of tool can I use to find out what are the objects
> > that are filling the additional 1GB heap? I did a heap dump but could get
> > jhat to work to browse the dumped file.
> >
> > Thanks,
> >
> > -Weijun
> >
>



-- 
Best regards,
JKnight

Re: Heap sudden jump during import

Posted by Jonathan Ellis <jb...@gmail.com>.
Usually sudden heap jumps involve compacting large rows.

0.6 (since beta3) includes a warning log when it finishes compacts a
row over 500MB by default, in the hopes that this will give you enough
time to fix things before whatever is making large rows makes one too
large to fit in memory.

On Fri, Apr 2, 2010 at 4:57 PM, Weijun Li <we...@gmail.com> wrote:
> I'm running a test to write 30 million columns (700bytes each) to Cassandra:
> the process ran smoothly for about 20mil then the heap usage suddenly jumped
> from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> will freeze for long time (terrible latency, no response to nodetool that I
> have to stop the import client ) before it comes back to normal . It's a
> single node cluster with JVM maximum heap size of 3GB. So what could cause
> this spike? What kind of tool can I use to find out what are the objects
> that are filling the additional 1GB heap? I did a heap dump but could get
> jhat to work to browse the dumped file.
>
> Thanks,
>
> -Weijun
>

Re: Heap sudden jump during import

Posted by Weijun Li <we...@gmail.com>.
Eclipse Memory Analyser rocks! Thanks a lot!!

-Weijun

On Sat, Apr 3, 2010 at 2:25 AM, Benoit Perroud <be...@noisette.ch> wrote:

> Have a look at either Eclipse Memory Analyser (they have a standalone
> version of the memory analyser) or YourKit Java Profiler (commercial,
> but with evaluation license). I successfully load and browse heap
> bigger than the available memory on the system.
>
> Regards,
>
> Benoit
>
> 2010/4/3 Weijun Li <we...@gmail.com>:
> > Thank you Benoit. I did a search but couldn't find any that you
> mentioned.
> > Both jhat and netbean load entire map file int memory. Do you know the
> name
> > of the tools that requires less memory to view map file?
> > Thanks,
> > -Weijun
> >
> > On Sat, Apr 3, 2010 at 12:55 AM, Benoit Perroud <be...@noisette.ch>
> wrote:
> >>
> >> It exists other tools than jhat to browse a heap dump, which stream
> >> the heap dump instead of loading it full in memory like jhat do.
> >>
> >> Kind regards,
> >>
> >> Benoit.
> >>
> >> 2010/4/3 Weijun Li <we...@gmail.com>:
> >> > I'm running a test to write 30 million columns (700bytes each) to
> >> > Cassandra:
> >> > the process ran smoothly for about 20mil then the heap usage suddenly
> >> > jumped
> >> > from 2GB to 3GB which is the up limit of JVM, --from this point
> >> > Cassandra
> >> > will freeze for long time (terrible latency, no response to nodetool
> >> > that I
> >> > have to stop the import client ) before it comes back to normal . It's
> a
> >> > single node cluster with JVM maximum heap size of 3GB. So what could
> >> > cause
> >> > this spike? What kind of tool can I use to find out what are the
> objects
> >> > that are filling the additional 1GB heap? I did a heap dump but could
> >> > get
> >> > jhat to work to browse the dumped file.
> >> >
> >> > Thanks,
> >> >
> >> > -Weijun
> >> >
> >
> >
>

Re: Heap sudden jump during import

Posted by Benoit Perroud <be...@noisette.ch>.
Have a look at either Eclipse Memory Analyser (they have a standalone
version of the memory analyser) or YourKit Java Profiler (commercial,
but with evaluation license). I successfully load and browse heap
bigger than the available memory on the system.

Regards,

Benoit

2010/4/3 Weijun Li <we...@gmail.com>:
> Thank you Benoit. I did a search but couldn't find any that you mentioned.
> Both jhat and netbean load entire map file int memory. Do you know the name
> of the tools that requires less memory to view map file?
> Thanks,
> -Weijun
>
> On Sat, Apr 3, 2010 at 12:55 AM, Benoit Perroud <be...@noisette.ch> wrote:
>>
>> It exists other tools than jhat to browse a heap dump, which stream
>> the heap dump instead of loading it full in memory like jhat do.
>>
>> Kind regards,
>>
>> Benoit.
>>
>> 2010/4/3 Weijun Li <we...@gmail.com>:
>> > I'm running a test to write 30 million columns (700bytes each) to
>> > Cassandra:
>> > the process ran smoothly for about 20mil then the heap usage suddenly
>> > jumped
>> > from 2GB to 3GB which is the up limit of JVM, --from this point
>> > Cassandra
>> > will freeze for long time (terrible latency, no response to nodetool
>> > that I
>> > have to stop the import client ) before it comes back to normal . It's a
>> > single node cluster with JVM maximum heap size of 3GB. So what could
>> > cause
>> > this spike? What kind of tool can I use to find out what are the objects
>> > that are filling the additional 1GB heap? I did a heap dump but could
>> > get
>> > jhat to work to browse the dumped file.
>> >
>> > Thanks,
>> >
>> > -Weijun
>> >
>
>

Re: Heap sudden jump during import

Posted by Weijun Li <we...@gmail.com>.
Thank you Benoit. I did a search but couldn't find any that you mentioned.
Both jhat and netbean load entire map file int memory. Do you know the name
of the tools that requires less memory to view map file?

Thanks,
-Weijun

On Sat, Apr 3, 2010 at 12:55 AM, Benoit Perroud <be...@noisette.ch> wrote:

> It exists other tools than jhat to browse a heap dump, which stream
> the heap dump instead of loading it full in memory like jhat do.
>
> Kind regards,
>
> Benoit.
>
> 2010/4/3 Weijun Li <we...@gmail.com>:
> > I'm running a test to write 30 million columns (700bytes each) to
> Cassandra:
> > the process ran smoothly for about 20mil then the heap usage suddenly
> jumped
> > from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> > will freeze for long time (terrible latency, no response to nodetool that
> I
> > have to stop the import client ) before it comes back to normal . It's a
> > single node cluster with JVM maximum heap size of 3GB. So what could
> cause
> > this spike? What kind of tool can I use to find out what are the objects
> > that are filling the additional 1GB heap? I did a heap dump but could get
> > jhat to work to browse the dumped file.
> >
> > Thanks,
> >
> > -Weijun
> >
>

Re: Heap sudden jump during import

Posted by Benoit Perroud <be...@noisette.ch>.
It exists other tools than jhat to browse a heap dump, which stream
the heap dump instead of loading it full in memory like jhat do.

Kind regards,

Benoit.

2010/4/3 Weijun Li <we...@gmail.com>:
> I'm running a test to write 30 million columns (700bytes each) to Cassandra:
> the process ran smoothly for about 20mil then the heap usage suddenly jumped
> from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> will freeze for long time (terrible latency, no response to nodetool that I
> have to stop the import client ) before it comes back to normal . It's a
> single node cluster with JVM maximum heap size of 3GB. So what could cause
> this spike? What kind of tool can I use to find out what are the objects
> that are filling the additional 1GB heap? I did a heap dump but could get
> jhat to work to browse the dumped file.
>
> Thanks,
>
> -Weijun
>