You are viewing a plain text version of this content. The canonical link for it is here.

Posted to kato-dev@incubator.apache.org by Lukasz <fl...@intercel.com.pl> on 2010/01/08 19:11:23 UTC

Processing huge heap dump.

Hello,

In my work I have faced problem where I have to process 60GB heap dump.
Probably it wasn't be nothing scary if I would have proper hardware to 
process such file.
I've noticed that most of the tools for dump processing requires:
a) amount of memory at least equal to dump size
b) amount of free disk space at least equal to dump size (to create indexes)
Unfortunately, I haven't access to machine which where both requirements 
are meet at once.

Processing I would like to perform is backtracking of references to 
instances of one class (which causes out of memory). Assuming that hard 
disk read will be my performance bottleneck, I should be able to 
backtrace few levels during the night.
I have only raw overview of algorithm in my head, but it seems that 
something like "visitor pattern" would be enough for me.

Since I have read about kato some time ago, I wanted to give it a try.

For dev purposes I have prepared ~370MB heap dump with around 10 000 000 
of simple object added to collection (which probably multiply amount of 
object on heap).

1) Default approach:
Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);

I was waiting few minutes, but it didn't finish processing, so it looks 
that there will be no chance to process 60GB dump.

2) HProfFile
Fast look at HProfView class, give me some idea how I can visit all 
objects (records) on a heap.

I wrote simple app which only iterate through all records, but it also 
turned out to be quite slow and memory consuming. Following is some metrics:
---------------------------------
MemoryPool: PS Old Gen
Hello World!
heapDump: 
org.apache.kato.hprof.datalayer.HProfFile$HeapDumpHProfRecord@f0eed6
HeapSubRecord: 100000 (946ms, 4199kB)
HeapSubRecord: 200000 (2064ms, 7955kB)
HeapSubRecord: 300000 (3123ms, 11759kB)
HeapSubRecord: 400000 (3933ms, 14811kB)
HeapSubRecord: 500000 (3908ms, 17927kB)
HeapSubRecord: 600000 (7269ms, 21039kB)
HeapSubRecord: 700000 (7736ms, 24139kB)
HeapSubRecord: 800000 (7866ms, 27147kB)
HeapSubRecord: 900000 (7753ms, 30263kB)
HeapSubRecord: 1000000 (7684ms, 33299kB)
HeapSubRecord: 1100000 (13515ms, 36487kB)
HeapSubRecord: 1200000 (15525ms, 39623kB)
HeapSubRecord: 1300000 (15405ms, 42723kB)
HeapSubRecord: 1400000 (15567ms, 39115kB)
HeapSubRecord: 1500000 (15459ms, 42203kB)
HeapSubRecord: 1600000 (15692ms, 43838kB)
HeapSubRecord: 1700000 (15424ms, 45926kB)
HeapSubRecord: 1800000 (15327ms, 49026kB)
HeapSubRecord: 1900000 (15416ms, 48505kB)
HeapSubRecord: 2000000 (15352ms, 51629kB)
-------------------------------
It means that iterating over first 100 000 of records took 946ms and 
4199kB of OldGen was consumed.
Iterating over next 100 000 of records took 2064ms and 7955kB of OldGen 
was consumed.
And so on, 100 000 of records is the interval for printing stats.

One core of cpu was saturated. It also looks like required amount of 
memory will be equal to dump size.
I could start 4 threads to make better utilization of CPU, but since it 
looks like HProfFile instance is not thread safe I would have to create 
4 instances of HProfFile, which means that required amount of memory 
will be like 4 x dumpSize.

That's all I made so far. I didn't track what in HProfFile consumes CPU 
and memory, my blind guess is that CachedRandomAccesDataProvider is 
involved.

Regards
Lukasz

Re: Processing huge heap dump.

Posted by Stuart Monteith <st...@stoo.me.uk>.

Hi,
     I've changed the code base to do as you have described. Having 
longs as parameters, of course, is necessary with large dumps. I 
increased the default values, but I found that the defaults you 
suggested won't run on a relatively small machine.
In the absence of a clever heuristic, I've added a couple of properties 
to adjust the array sizes.

org.apache.kato.hprof.maparraysize
     This is the BitMaskMappingArray size for the whole file. The 
default is 10000

org.apache.kato.hprof.heap.maparraysize
         This is the heap record's BitMaskMappingArray size, the default 
is 100000


Thanks,
     Stuart

Lukasz wrote:
> Hi Steve,
>
> Actually I've made two changes.
>
> First was increasing the size of array in classes BitMaskMappingArray 
> and ArrayBitMaskMappingStrategy, which is probably the same what you 
> have made.
> I had similar idea about using heuristic/prediction of how many 
> records can be in a dump.
> Other (or additional) solution could be increasing cacheSize instead 
> of bucketSize when cache will be exhausted.
>
> I have filling that main purpose of those classes is providing mapping 
> between "index of record" and "absolute position of the record in the 
> file".
> And if my filling are correct, the mapping comprise of:
> 1) finding absolute location in the file
> 2) skipping a number of records (called "seek") before we set reader's 
> position just on the beginning of required record.
> And the second step is the problem (I believe), because the bigger 
> bucket we have, the more records we have to skip (methods called: 
> "moveRelativeElement" are responsible).
> I think this, and the fact that bucketSize is doubled when cache is 
> exhausted, are the reasons why performance drops over the time.
>
> To fix it I changed following parts of code in 
> org.apache.kato.hprof.datalayer.HProfFile:
>        public HeapDumpHProfRecord(short tag, IDataProvider dataProvider)
>                throws IOException {
>            ...
>            subRecordArray = new BitMaskMappingArray(50000000, 25,
>                    new HProfDumpRecordArrayProvider(dataProvider));
>        }
> and
>    private void buildHeader() throws IOException {
>        ...
>        array = new BitMaskMappingArray(1000000, 25, new 
> HPROFRecordProvider(
>                subset));
>
>    }
>
>
> Second problem was incorrect size of HeapDumpHProfRecord which was 
> read from the file (recordSize overflow problem).
> I didn't find general solution for this problem. I just assumed that 
> HeapDumpHProfRecord will end with the end of the file, which is true 
> in my case.
>
> To achieve this I made following changes (also in HProfFile):
>                case 0x0c:
>                    long l = recordProvider.dataRemaining();
>                    IHeapDumpHProfRecord rec = new 
> HeapDumpHProfRecord(tag,
>                            new SubsetDataProvider(recordProvider, l));
>                    recordProvider.moveBy(l);
>                    return rec;
>
> I also had to change signature of "moveBy" method to accept long 
> instead of int.
>
> I'll try to send you part of my source code together with .svn 
> metadata, so you should be able to find what exactly I've changed.
>
> Regards
> Lukasz
>
>
>
> Steve Poole wrote:
>> I've run some performance tests on various configs for HProfFile 
>> caching.
>> See here
>> http://spreadsheets.google.com/pub?key=tes0VC-Wyz-lTe9sKNen2Hw&output=html 
>>
>>
>> Our current setting for cache size and bucket size is 1000,25  - 
>> Bucket size
>> does effect the speed of analysis but its cache size that really does 
>> it!
>> We need to configure the cache size based on some heuristic about the 
>> size
>> of the dump file.
>>
>> On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <fl...@intercel.com.pl> wrote:
>>
>>> Hi Steve, Stuart,
>>>
>>> In last two days I was trying to move further.
>>>
>>> I tried to prepare some workaround for incorrect length.
>>> First idea was to utilize the fact that integer overflow probably took
>>> place, so I tried to search for 0x2c just at overflow points 
>>> (probably quite
>>> brittle approach). But it turned out, that in my heap dumps (both 
>>> real one
>>> and prepared for development purposes) there is no 0x2c byte 
>>> after/at the
>>> end of the dump.
>>> I've noticed however that for both dumps the heap record ends at the 
>>> end of
>>> the file. Probably it is not general rule, but for my purposes it 
>>> should be
>>> enough, so I changed kato sources slightly to utilize this and 
>>> decided to
>>> move further.
>>>
>>> So far I have some code which extracts static data from dump (strings,
>>> classes) and searches for instances of given class.
>>> I have run it against real dump. It took around 40 min (to get 
>>> static data)
>>> and 50 min (to find instances), which is not bad I think.
>>> I didn't made careful reasoning about performance but CPU (not I/O) 
>>> looked
>>> to be a bottleneck. Performance also looked to be quite stable 
>>> across whole
>>> dump.
>>>
>>> 60GB dump was extracted for core file using jmap (I don't remember 
>>> exactly
>>> from which version of java this jmap came from, but for sure it was 
>>> Java
>>> 1.6).
>>> Following JVM version was dumped to core file:
>>> java version "1.6.0_10"
>>> Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
>>> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
>>>
>>> I was also suspecting that heap dump segments were introduced to handle
>>> +4GB heap sizes, but it looks like sometimes it can not be a case.
>>>
>>> Anyway, it is nice that I was able to go through whole dump without any
>>> error.
>>>
>>> Regards
>>> Lukasz
>>>
>>>
>>> Steve Poole wrote:
>>>
>>>>  Hi Lukasz,
>>>>
>>>>  You say your  hprof format is 1.0.1   so , according to the 
>>>> information I
>>>> have about hprof format,  that means your won't have any heap dump
>>>> segments
>>>> in your file.  Apparently they are in the  1.0.2 format which is a 
>>>> Java 6
>>>> addition and is only used when the heap is too big for the old 
>>>> format to
>>>> cope with.
>>>>
>>>> What JVM and OS are you using to produce this dump?
>>>>
>>>> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> You are quite right, 4 bytes for the record length. If we had the 
>>>>> file
>>>>> format wrong for 64 bit
>>>>> hprof dumps we'd not be able to read any dump, so it must be an 
>>>>> overflow
>>>>> of
>>>>> the record length.
>>>>> In that case, the only possible solution I can think of would be 
>>>>> for us
>>>>> to
>>>>> change to code to search
>>>>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of 
>>>>> the HEAP
>>>>> DUMP.
>>>>>
>>>>> Of course, there was some puzzling over the difference between 
>>>>> HEAP DUMP
>>>>> SEGMENT (0x1c) and HEAP DUMP
>>>>> (0x0c) before.
>>>>>
>>>>> I'd expect we'd get:
>>>>>
>>>>>   HEAP DUMP
>>>>>       heap dump records....
>>>>>   HEAP DUMP END
>>>>>
>>>>> or
>>>>>
>>>>>   HEAP DUMP SEGMENT
>>>>>       heap dump records
>>>>>   HEAP DUMP SEGMENT
>>>>>       heap dump records
>>>>>      :      :
>>>>>   HEAP DUMP END
>>>>>
>>>>>
>>>>> If we ignore the record size for the HEAP DUMP and calculate it
>>>>> ourselves,
>>>>> we might have more luck.
>>>>>
>>>>> Regards,
>>>>>   Stuart
>>>>>
>>>>>
>>>>>
>>>>> Lukasz wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Since performance looks not bad after changing array size, today 
>>>>>> I have
>>>>>> tried to process my 60GB heap dump. Unfortunately, very fast I've 
>>>>>> ended
>>>>>> with
>>>>>> exception:
>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: range
>>>>>> -1293366370 is less than 0
>>>>>> at
>>>>>>
>>>>>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29) 
>>>>>>
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487) 
>>>>>>
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437) 
>>>>>>
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56) 
>>>>>>
>>>>>> at
>>>>>> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784) 
>>>>>>
>>>>>> ... (my code) ...
>>>>>>
>>>>>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>>>>>
>>>>>> Following is the part of the dump which I believes is responsible 
>>>>>> for
>>>>>> that
>>>>>> (B2 E8 CB 9E):
>>>>>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ 
>>>>>> B7 AE
>>>>>> 20
>>>>>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>>>>>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ 
>>>>>> EA F8
>>>>>> 0C
>>>>>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>>>>>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ 
>>>>>> BE 21
>>>>>> 3D
>>>>>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>>>>>
>>>>>> What is interesting hprof file format reserves only 4 bytes for 
>>>>>> length
>>>>>> of
>>>>>> a record, which allows to express record whose size is at most 4GB
>>>>>> (assuming
>>>>>> unsigned 4 bytes).
>>>>>>
>>>>>> My dump's version is: JAVA PROFILE 1.0.1
>>>>>>
>>>>>> Regards
>>>>>> Lukasz
>>>>>>
>>>>>>
>>>>>> Lukasz wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi Stuart, Steve,
>>>>>>>
>>>>>>> I've taken deeper look into code. I still didn't trace carefully 
>>>>>>> index
>>>>>>> calculation in classes BitMaskMappingArray and
>>>>>>> ArrayBitMaskMappingStrategy,
>>>>>>> but I managed to improve performance by increasing arrays size 
>>>>>>> in those
>>>>>>> classes (which is set in HProfFile class).
>>>>>>>
>>>>>>> If I understand code correctly, when capacity of 
>>>>>>> BitMaskMappingArray
>>>>>>> will
>>>>>>> be exhausted bucketSize is doubled, which in turn causes that more
>>>>>>> reads
>>>>>>> (even cached) is required to set position of
>>>>>>> IDataProvider/IArrayEntryProvider.
>>>>>>>
>>>>>>> Following are loading time results for default array size (1000) 
>>>>>>> and
>>>>>>> increased (1000000). Test ran against generated dump file (5000000
>>>>>>> instances
>>>>>>> of Data).
>>>>>>> Default (1000):
>>>>>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>>>>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>>>>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>>>>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>>>>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>>>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably 
>>>>>>> buckedSize was
>>>>>>> doubled)
>>>>>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>>>>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>>>>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>>>>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>>>>>> ... (I didn't wait until end)
>>>>>>>
>>>>>>> Increased(1000000):
>>>>>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>>>>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>>>>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>>>>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>>>>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>>>>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>>>>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>>>>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>>>>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>>>>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>>>>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>>>>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>>>>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>>>>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>>>>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>>>>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>>>>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>>>>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>>>>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>>>>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>>>>>> ...
>>>>>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>>>>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>>>>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>>>>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>>>>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>>>>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>>>>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>>>>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>>>>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>>>>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>>>>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>>>>>> (end)
>>>>>>>
>>>>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of 
>>>>>>> objects (if
>>>>>>> I
>>>>>>> remember correctly).
>>>>>>>
>>>>>>> Regards
>>>>>>> Lukasz
>>>>>>>
>>>>>>>
>>>>>>> Stuart Monteith wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The hprof dump reader spends a lot of time reading the whole 
>>>>>>>> file, for
>>>>>>>> various reason.
>>>>>>>> The indices it has in memory are constructed through an initial 
>>>>>>>> read,
>>>>>>>> and this is also
>>>>>>>> the source of the memory usage. In addition, there is some 
>>>>>>>> correlation
>>>>>>>> to be done which
>>>>>>>> also takes up time, and induces yet more reading.
>>>>>>>>
>>>>>>>> I'm sure some work could be done to improve the performance 
>>>>>>>> further,
>>>>>>>> but
>>>>>>>> we'll have to
>>>>>>>> look at the tradeoff between diskspace and memory usage. The hprof
>>>>>>>> file
>>>>>>>> format itself
>>>>>>>> is what it is, however, and we have no influence over that. The 
>>>>>>>> CJVMTI
>>>>>>>> agent is has lots of
>>>>>>>> room for improvement, but I suspect its potential for 
>>>>>>>> improvement is
>>>>>>>> unlikely to be much better
>>>>>>>> than existing hprof implementations. The built-in JVM hprof dumper
>>>>>>>> will
>>>>>>>> probably be a hard act
>>>>>>>> to follow.
>>>>>>>>
>>>>>>>> The HProf implementation is not thread-safe. Realistically, I 
>>>>>>>> think it
>>>>>>>> is something that ought to
>>>>>>>> be considered once things are more mature. There will be 
>>>>>>>> algorithms
>>>>>>>> that
>>>>>>>> can deal with the JVM
>>>>>>>> structure sensible.
>>>>>>>>
>>>>>>>> And thanks Lukasz, it's great to have your input.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Stuart
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> -- 
>>>>> Stuart Monteith
>>>>> http://blog.stoo.me.uk/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

-- 
Stuart Monteith
http://blog.stoo.me.uk/

Re: Processing huge heap dump.

Posted by Lukasz <fl...@intercel.com.pl>.

Hi Steve,

Actually I've made two changes.

First was increasing the size of array in classes BitMaskMappingArray 
and ArrayBitMaskMappingStrategy, which is probably the same what you 
have made.
I had similar idea about using heuristic/prediction of how many records 
can be in a dump.
Other (or additional) solution could be increasing cacheSize instead of 
bucketSize when cache will be exhausted.

I have filling that main purpose of those classes is providing mapping 
between "index of record" and "absolute position of the record in the 
file".
And if my filling are correct, the mapping comprise of:
1) finding absolute location in the file
2) skipping a number of records (called "seek") before we set reader's 
position just on the beginning of required record.
And the second step is the problem (I believe), because the bigger 
bucket we have, the more records we have to skip (methods called: 
"moveRelativeElement" are responsible).
I think this, and the fact that bucketSize is doubled when cache is 
exhausted, are the reasons why performance drops over the time.

To fix it I changed following parts of code in 
org.apache.kato.hprof.datalayer.HProfFile:
        public HeapDumpHProfRecord(short tag, IDataProvider dataProvider)
                throws IOException {
            ...
            subRecordArray = new BitMaskMappingArray(50000000, 25,
                    new HProfDumpRecordArrayProvider(dataProvider));
        }
and
    private void buildHeader() throws IOException {
        ...
        array = new BitMaskMappingArray(1000000, 25, new 
HPROFRecordProvider(
                subset));

    }


Second problem was incorrect size of HeapDumpHProfRecord which was read 
from the file (recordSize overflow problem).
I didn't find general solution for this problem. 
I just assumed that HeapDumpHProfRecord will end with the end of the 
file, which is true in my case.

To achieve this I made following changes (also in HProfFile):
                case 0x0c:
                    long l = recordProvider.dataRemaining();
                    IHeapDumpHProfRecord rec = new HeapDumpHProfRecord(tag,
                            new SubsetDataProvider(recordProvider, l));
                    recordProvider.moveBy(l);
                    return rec;

I also had to change signature of "moveBy" method to accept long instead 
of int.

I'll try to send you part of my source code together with .svn metadata, 
so you should be able to find what exactly I've changed.

Regards
Lukasz



Steve Poole wrote:
> I've run some performance tests on various configs for HProfFile caching.
> See here
> http://spreadsheets.google.com/pub?key=tes0VC-Wyz-lTe9sKNen2Hw&output=html
>
> Our current setting for cache size and bucket size is 1000,25  - Bucket size
> does effect the speed of analysis but its cache size that really does it!
> We need to configure the cache size based on some heuristic about the size
> of the dump file.
>
> On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <fl...@intercel.com.pl> wrote:
>
>   
>> Hi Steve, Stuart,
>>
>> In last two days I was trying to move further.
>>
>> I tried to prepare some workaround for incorrect length.
>> First idea was to utilize the fact that integer overflow probably took
>> place, so I tried to search for 0x2c just at overflow points (probably quite
>> brittle approach). But it turned out, that in my heap dumps (both real one
>> and prepared for development purposes) there is no 0x2c byte after/at the
>> end of the dump.
>> I've noticed however that for both dumps the heap record ends at the end of
>> the file. Probably it is not general rule, but for my purposes it should be
>> enough, so I changed kato sources slightly to utilize this and decided to
>> move further.
>>
>> So far I have some code which extracts static data from dump (strings,
>> classes) and searches for instances of given class.
>> I have run it against real dump. It took around 40 min (to get static data)
>> and 50 min (to find instances), which is not bad I think.
>> I didn't made careful reasoning about performance but CPU (not I/O) looked
>> to be a bottleneck. Performance also looked to be quite stable across whole
>> dump.
>>
>> 60GB dump was extracted for core file using jmap (I don't remember exactly
>> from which version of java this jmap came from, but for sure it was Java
>> 1.6).
>> Following JVM version was dumped to core file:
>> java version "1.6.0_10"
>> Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
>> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
>>
>> I was also suspecting that heap dump segments were introduced to handle
>> +4GB heap sizes, but it looks like sometimes it can not be a case.
>>
>> Anyway, it is nice that I was able to go through whole dump without any
>> error.
>>
>> Regards
>> Lukasz
>>
>>
>> Steve Poole wrote:
>>
>>     
>>>  Hi Lukasz,
>>>
>>>  You say your  hprof format is 1.0.1   so , according to the information I
>>> have about hprof format,  that means your won't have any heap dump
>>> segments
>>> in your file.  Apparently they are in the  1.0.2 format which is a Java 6
>>> addition and is only used when the heap is too big for the old format to
>>> cope with.
>>>
>>> What JVM and OS are you using to produce this dump?
>>>
>>> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk>
>>> wrote:
>>>
>>>
>>>
>>>       
>>>> Hi,
>>>>
>>>>
>>>> You are quite right, 4 bytes for the record length. If we had the file
>>>> format wrong for 64 bit
>>>> hprof dumps we'd not be able to read any dump, so it must be an overflow
>>>> of
>>>> the record length.
>>>> In that case, the only possible solution I can think of would be for us
>>>> to
>>>> change to code to search
>>>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP
>>>> DUMP.
>>>>
>>>> Of course, there was some puzzling over the difference between HEAP DUMP
>>>> SEGMENT (0x1c) and HEAP DUMP
>>>> (0x0c) before.
>>>>
>>>> I'd expect we'd get:
>>>>
>>>>   HEAP DUMP
>>>>       heap dump records....
>>>>   HEAP DUMP END
>>>>
>>>> or
>>>>
>>>>   HEAP DUMP SEGMENT
>>>>       heap dump records
>>>>   HEAP DUMP SEGMENT
>>>>       heap dump records
>>>>      :      :
>>>>   HEAP DUMP END
>>>>
>>>>
>>>> If we ignore the record size for the HEAP DUMP and calculate it
>>>> ourselves,
>>>> we might have more luck.
>>>>
>>>> Regards,
>>>>   Stuart
>>>>
>>>>
>>>>
>>>> Lukasz wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Hello,
>>>>>
>>>>> Since performance looks not bad after changing array size, today I have
>>>>> tried to process my 60GB heap dump. Unfortunately, very fast I've ended
>>>>> with
>>>>> exception:
>>>>> Exception in thread "main" java.lang.IllegalArgumentException: range
>>>>> -1293366370 is less than 0
>>>>> at
>>>>>
>>>>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
>>>>>
>>>>> at
>>>>>
>>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
>>>>>
>>>>> at
>>>>>
>>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
>>>>>
>>>>> at
>>>>>
>>>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
>>>>> at
>>>>> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
>>>>> ... (my code) ...
>>>>>
>>>>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>>>>
>>>>> Following is the part of the dump which I believes is responsible for
>>>>> that
>>>>> (B2 E8 CB 9E):
>>>>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE
>>>>> 20
>>>>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>>>>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8
>>>>> 0C
>>>>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>>>>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21
>>>>> 3D
>>>>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>>>>
>>>>> What is interesting hprof file format reserves only 4 bytes for length
>>>>> of
>>>>> a record, which allows to express record whose size is at most 4GB
>>>>> (assuming
>>>>> unsigned 4 bytes).
>>>>>
>>>>> My dump's version is: JAVA PROFILE 1.0.1
>>>>>
>>>>> Regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>> Lukasz wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Hi Stuart, Steve,
>>>>>>
>>>>>> I've taken deeper look into code. I still didn't trace carefully index
>>>>>> calculation in classes BitMaskMappingArray and
>>>>>> ArrayBitMaskMappingStrategy,
>>>>>> but I managed to improve performance by increasing arrays size in those
>>>>>> classes (which is set in HProfFile class).
>>>>>>
>>>>>> If I understand code correctly, when capacity of BitMaskMappingArray
>>>>>> will
>>>>>> be exhausted bucketSize is doubled, which in turn causes that more
>>>>>> reads
>>>>>> (even cached) is required to set position of
>>>>>> IDataProvider/IArrayEntryProvider.
>>>>>>
>>>>>> Following are loading time results for default array size (1000) and
>>>>>> increased (1000000). Test ran against generated dump file (5000000
>>>>>> instances
>>>>>> of Data).
>>>>>> Default (1000):
>>>>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>>>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>>>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>>>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>>>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
>>>>>> doubled)
>>>>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>>>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>>>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>>>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>>>>> ... (I didn't wait until end)
>>>>>>
>>>>>> Increased(1000000):
>>>>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>>>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>>>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>>>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>>>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>>>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>>>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>>>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>>>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>>>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>>>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>>>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>>>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>>>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>>>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>>>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>>>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>>>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>>>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>>>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>>>>> ...
>>>>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>>>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>>>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>>>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>>>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>>>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>>>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>>>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>>>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>>>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>>>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>>>>> (end)
>>>>>>
>>>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if
>>>>>> I
>>>>>> remember correctly).
>>>>>>
>>>>>> Regards
>>>>>> Lukasz
>>>>>>
>>>>>>
>>>>>> Stuart Monteith wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> The hprof dump reader spends a lot of time reading the whole file, for
>>>>>>> various reason.
>>>>>>> The indices it has in memory are constructed through an initial read,
>>>>>>> and this is also
>>>>>>> the source of the memory usage. In addition, there is some correlation
>>>>>>> to be done which
>>>>>>> also takes up time, and induces yet more reading.
>>>>>>>
>>>>>>> I'm sure some work could be done to improve the performance further,
>>>>>>> but
>>>>>>> we'll have to
>>>>>>> look at the tradeoff between diskspace and memory usage. The hprof
>>>>>>> file
>>>>>>> format itself
>>>>>>> is what it is, however, and we have no influence over that. The CJVMTI
>>>>>>> agent is has lots of
>>>>>>> room for improvement, but I suspect its potential for improvement is
>>>>>>> unlikely to be much better
>>>>>>> than existing hprof implementations. The built-in JVM hprof dumper
>>>>>>> will
>>>>>>> probably be a hard act
>>>>>>> to follow.
>>>>>>>
>>>>>>> The HProf implementation is not thread-safe. Realistically, I think it
>>>>>>> is something that ought to
>>>>>>> be considered once things are more mature. There will be algorithms
>>>>>>> that
>>>>>>> can deal with the JVM
>>>>>>> structure sensible.
>>>>>>>
>>>>>>> And thanks Lukasz, it's great to have your input.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Stuart
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>             
>>>>> --
>>>>>           
>>>> Stuart Monteith
>>>> http://blog.stoo.me.uk/
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>>     
>
>
>

Re: Processing huge heap dump.

Posted by Steve Poole <sp...@googlemail.com>.

I've run some performance tests on various configs for HProfFile caching.
See here
http://spreadsheets.google.com/pub?key=tes0VC-Wyz-lTe9sKNen2Hw&output=html

Our current setting for cache size and bucket size is 1000,25  - Bucket size
does effect the speed of analysis but its cache size that really does it!
We need to configure the cache size based on some heuristic about the size
of the dump file.

On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <fl...@intercel.com.pl> wrote:

> Hi Steve, Stuart,
>
> In last two days I was trying to move further.
>
> I tried to prepare some workaround for incorrect length.
> First idea was to utilize the fact that integer overflow probably took
> place, so I tried to search for 0x2c just at overflow points (probably quite
> brittle approach). But it turned out, that in my heap dumps (both real one
> and prepared for development purposes) there is no 0x2c byte after/at the
> end of the dump.
> I've noticed however that for both dumps the heap record ends at the end of
> the file. Probably it is not general rule, but for my purposes it should be
> enough, so I changed kato sources slightly to utilize this and decided to
> move further.
>
> So far I have some code which extracts static data from dump (strings,
> classes) and searches for instances of given class.
> I have run it against real dump. It took around 40 min (to get static data)
> and 50 min (to find instances), which is not bad I think.
> I didn't made careful reasoning about performance but CPU (not I/O) looked
> to be a bottleneck. Performance also looked to be quite stable across whole
> dump.
>
> 60GB dump was extracted for core file using jmap (I don't remember exactly
> from which version of java this jmap came from, but for sure it was Java
> 1.6).
> Following JVM version was dumped to core file:
> java version "1.6.0_10"
> Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
>
> I was also suspecting that heap dump segments were introduced to handle
> +4GB heap sizes, but it looks like sometimes it can not be a case.
>
> Anyway, it is nice that I was able to go through whole dump without any
> error.
>
> Regards
> Lukasz
>
>
> Steve Poole wrote:
>
>>  Hi Lukasz,
>>
>>  You say your  hprof format is 1.0.1   so , according to the information I
>> have about hprof format,  that means your won't have any heap dump
>> segments
>> in your file.  Apparently they are in the  1.0.2 format which is a Java 6
>> addition and is only used when the heap is too big for the old format to
>> cope with.
>>
>> What JVM and OS are you using to produce this dump?
>>
>> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk>
>> wrote:
>>
>>
>>
>>> Hi,
>>>
>>>
>>> You are quite right, 4 bytes for the record length. If we had the file
>>> format wrong for 64 bit
>>> hprof dumps we'd not be able to read any dump, so it must be an overflow
>>> of
>>> the record length.
>>> In that case, the only possible solution I can think of would be for us
>>> to
>>> change to code to search
>>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP
>>> DUMP.
>>>
>>> Of course, there was some puzzling over the difference between HEAP DUMP
>>> SEGMENT (0x1c) and HEAP DUMP
>>> (0x0c) before.
>>>
>>> I'd expect we'd get:
>>>
>>>   HEAP DUMP
>>>       heap dump records....
>>>   HEAP DUMP END
>>>
>>> or
>>>
>>>   HEAP DUMP SEGMENT
>>>       heap dump records
>>>   HEAP DUMP SEGMENT
>>>       heap dump records
>>>      :      :
>>>   HEAP DUMP END
>>>
>>>
>>> If we ignore the record size for the HEAP DUMP and calculate it
>>> ourselves,
>>> we might have more luck.
>>>
>>> Regards,
>>>   Stuart
>>>
>>>
>>>
>>> Lukasz wrote:
>>>
>>>
>>>
>>>> Hello,
>>>>
>>>> Since performance looks not bad after changing array size, today I have
>>>> tried to process my 60GB heap dump. Unfortunately, very fast I've ended
>>>> with
>>>> exception:
>>>> Exception in thread "main" java.lang.IllegalArgumentException: range
>>>> -1293366370 is less than 0
>>>> at
>>>>
>>>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
>>>> at
>>>> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
>>>> ... (my code) ...
>>>>
>>>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>>>
>>>> Following is the part of the dump which I believes is responsible for
>>>> that
>>>> (B2 E8 CB 9E):
>>>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE
>>>> 20
>>>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>>>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8
>>>> 0C
>>>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>>>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21
>>>> 3D
>>>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>>>
>>>> What is interesting hprof file format reserves only 4 bytes for length
>>>> of
>>>> a record, which allows to express record whose size is at most 4GB
>>>> (assuming
>>>> unsigned 4 bytes).
>>>>
>>>> My dump's version is: JAVA PROFILE 1.0.1
>>>>
>>>> Regards
>>>> Lukasz
>>>>
>>>>
>>>> Lukasz wrote:
>>>>
>>>>
>>>>
>>>>> Hi Stuart, Steve,
>>>>>
>>>>> I've taken deeper look into code. I still didn't trace carefully index
>>>>> calculation in classes BitMaskMappingArray and
>>>>> ArrayBitMaskMappingStrategy,
>>>>> but I managed to improve performance by increasing arrays size in those
>>>>> classes (which is set in HProfFile class).
>>>>>
>>>>> If I understand code correctly, when capacity of BitMaskMappingArray
>>>>> will
>>>>> be exhausted bucketSize is doubled, which in turn causes that more
>>>>> reads
>>>>> (even cached) is required to set position of
>>>>> IDataProvider/IArrayEntryProvider.
>>>>>
>>>>> Following are loading time results for default array size (1000) and
>>>>> increased (1000000). Test ran against generated dump file (5000000
>>>>> instances
>>>>> of Data).
>>>>> Default (1000):
>>>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
>>>>> doubled)
>>>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>>>> ... (I didn't wait until end)
>>>>>
>>>>> Increased(1000000):
>>>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>>>> ...
>>>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>>>> (end)
>>>>>
>>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if
>>>>> I
>>>>> remember correctly).
>>>>>
>>>>> Regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>> Stuart Monteith wrote:
>>>>>
>>>>>
>>>>>
>>>>>> The hprof dump reader spends a lot of time reading the whole file, for
>>>>>> various reason.
>>>>>> The indices it has in memory are constructed through an initial read,
>>>>>> and this is also
>>>>>> the source of the memory usage. In addition, there is some correlation
>>>>>> to be done which
>>>>>> also takes up time, and induces yet more reading.
>>>>>>
>>>>>> I'm sure some work could be done to improve the performance further,
>>>>>> but
>>>>>> we'll have to
>>>>>> look at the tradeoff between diskspace and memory usage. The hprof
>>>>>> file
>>>>>> format itself
>>>>>> is what it is, however, and we have no influence over that. The CJVMTI
>>>>>> agent is has lots of
>>>>>> room for improvement, but I suspect its potential for improvement is
>>>>>> unlikely to be much better
>>>>>> than existing hprof implementations. The built-in JVM hprof dumper
>>>>>> will
>>>>>> probably be a hard act
>>>>>> to follow.
>>>>>>
>>>>>> The HProf implementation is not thread-safe. Realistically, I think it
>>>>>> is something that ought to
>>>>>> be considered once things are more mature. There will be algorithms
>>>>>> that
>>>>>> can deal with the JVM
>>>>>> structure sensible.
>>>>>>
>>>>>> And thanks Lukasz, it's great to have your input.
>>>>>>
>>>>>> Regards,
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>> --
>>> Stuart Monteith
>>> http://blog.stoo.me.uk/
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
Steve

Re: Processing huge heap dump.

Posted by Steve Poole <sp...@googlemail.com>.

Hi Likasz  - good news that you can get through the whole dump now :-)  I'd
like to get a list of the changes you made to the reader if that's
possible?



On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <fl...@intercel.com.pl> wrote:

> Hi Steve, Stuart,
>
> In last two days I was trying to move further.
>
> I tried to prepare some workaround for incorrect length.
> First idea was to utilize the fact that integer overflow probably took
> place, so I tried to search for 0x2c just at overflow points (probably quite
> brittle approach). But it turned out, that in my heap dumps (both real one
> and prepared for development purposes) there is no 0x2c byte after/at the
> end of the dump.
> I've noticed however that for both dumps the heap record ends at the end of
> the file. Probably it is not general rule, but for my purposes it should be
> enough, so I changed kato sources slightly to utilize this and decided to
> move further.
>
> So far I have some code which extracts static data from dump (strings,
> classes) and searches for instances of given class.
> I have run it against real dump. It took around 40 min (to get static data)
> and 50 min (to find instances), which is not bad I think.
> I didn't made careful reasoning about performance but CPU (not I/O) looked
> to be a bottleneck. Performance also looked to be quite stable across whole
> dump.
>
> 60GB dump was extracted for core file using jmap (I don't remember exactly
> from which version of java this jmap came from, but for sure it was Java
> 1.6).
> Following JVM version was dumped to core file:
> java version "1.6.0_10"
> Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
>
> I was also suspecting that heap dump segments were introduced to handle
> +4GB heap sizes, but it looks like sometimes it can not be a case.
>
> Anyway, it is nice that I was able to go through whole dump without any
> error.
>
> Regards
> Lukasz
>
>
> Steve Poole wrote:
>
>>  Hi Lukasz,
>>
>>  You say your  hprof format is 1.0.1   so , according to the information I
>> have about hprof format,  that means your won't have any heap dump
>> segments
>> in your file.  Apparently they are in the  1.0.2 format which is a Java 6
>> addition and is only used when the heap is too big for the old format to
>> cope with.
>>
>> What JVM and OS are you using to produce this dump?
>>
>> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk>
>> wrote:
>>
>>
>>
>>> Hi,
>>>
>>>
>>> You are quite right, 4 bytes for the record length. If we had the file
>>> format wrong for 64 bit
>>> hprof dumps we'd not be able to read any dump, so it must be an overflow
>>> of
>>> the record length.
>>> In that case, the only possible solution I can think of would be for us
>>> to
>>> change to code to search
>>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP
>>> DUMP.
>>>
>>> Of course, there was some puzzling over the difference between HEAP DUMP
>>> SEGMENT (0x1c) and HEAP DUMP
>>> (0x0c) before.
>>>
>>> I'd expect we'd get:
>>>
>>>   HEAP DUMP
>>>       heap dump records....
>>>   HEAP DUMP END
>>>
>>> or
>>>
>>>   HEAP DUMP SEGMENT
>>>       heap dump records
>>>   HEAP DUMP SEGMENT
>>>       heap dump records
>>>      :      :
>>>   HEAP DUMP END
>>>
>>>
>>> If we ignore the record size for the HEAP DUMP and calculate it
>>> ourselves,
>>> we might have more luck.
>>>
>>> Regards,
>>>   Stuart
>>>
>>>
>>>
>>> Lukasz wrote:
>>>
>>>
>>>
>>>> Hello,
>>>>
>>>> Since performance looks not bad after changing array size, today I have
>>>> tried to process my 60GB heap dump. Unfortunately, very fast I've ended
>>>> with
>>>> exception:
>>>> Exception in thread "main" java.lang.IllegalArgumentException: range
>>>> -1293366370 is less than 0
>>>> at
>>>>
>>>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
>>>>
>>>> at
>>>>
>>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
>>>> at
>>>> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
>>>> ... (my code) ...
>>>>
>>>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>>>
>>>> Following is the part of the dump which I believes is responsible for
>>>> that
>>>> (B2 E8 CB 9E):
>>>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE
>>>> 20
>>>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>>>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8
>>>> 0C
>>>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>>>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21
>>>> 3D
>>>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>>>
>>>> What is interesting hprof file format reserves only 4 bytes for length
>>>> of
>>>> a record, which allows to express record whose size is at most 4GB
>>>> (assuming
>>>> unsigned 4 bytes).
>>>>
>>>> My dump's version is: JAVA PROFILE 1.0.1
>>>>
>>>> Regards
>>>> Lukasz
>>>>
>>>>
>>>> Lukasz wrote:
>>>>
>>>>
>>>>
>>>>> Hi Stuart, Steve,
>>>>>
>>>>> I've taken deeper look into code. I still didn't trace carefully index
>>>>> calculation in classes BitMaskMappingArray and
>>>>> ArrayBitMaskMappingStrategy,
>>>>> but I managed to improve performance by increasing arrays size in those
>>>>> classes (which is set in HProfFile class).
>>>>>
>>>>> If I understand code correctly, when capacity of BitMaskMappingArray
>>>>> will
>>>>> be exhausted bucketSize is doubled, which in turn causes that more
>>>>> reads
>>>>> (even cached) is required to set position of
>>>>> IDataProvider/IArrayEntryProvider.
>>>>>
>>>>> Following are loading time results for default array size (1000) and
>>>>> increased (1000000). Test ran against generated dump file (5000000
>>>>> instances
>>>>> of Data).
>>>>> Default (1000):
>>>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
>>>>> doubled)
>>>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>>>> ... (I didn't wait until end)
>>>>>
>>>>> Increased(1000000):
>>>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>>>> ...
>>>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>>>> (end)
>>>>>
>>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if
>>>>> I
>>>>> remember correctly).
>>>>>
>>>>> Regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>> Stuart Monteith wrote:
>>>>>
>>>>>
>>>>>
>>>>>> The hprof dump reader spends a lot of time reading the whole file, for
>>>>>> various reason.
>>>>>> The indices it has in memory are constructed through an initial read,
>>>>>> and this is also
>>>>>> the source of the memory usage. In addition, there is some correlation
>>>>>> to be done which
>>>>>> also takes up time, and induces yet more reading.
>>>>>>
>>>>>> I'm sure some work could be done to improve the performance further,
>>>>>> but
>>>>>> we'll have to
>>>>>> look at the tradeoff between diskspace and memory usage. The hprof
>>>>>> file
>>>>>> format itself
>>>>>> is what it is, however, and we have no influence over that. The CJVMTI
>>>>>> agent is has lots of
>>>>>> room for improvement, but I suspect its potential for improvement is
>>>>>> unlikely to be much better
>>>>>> than existing hprof implementations. The built-in JVM hprof dumper
>>>>>> will
>>>>>> probably be a hard act
>>>>>> to follow.
>>>>>>
>>>>>> The HProf implementation is not thread-safe. Realistically, I think it
>>>>>> is something that ought to
>>>>>> be considered once things are more mature. There will be algorithms
>>>>>> that
>>>>>> can deal with the JVM
>>>>>> structure sensible.
>>>>>>
>>>>>> And thanks Lukasz, it's great to have your input.
>>>>>>
>>>>>> Regards,
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>> --
>>> Stuart Monteith
>>> http://blog.stoo.me.uk/
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
Steve

Re: Processing huge heap dump.

Posted by Lukasz <fl...@intercel.com.pl>.

Hi Steve, Stuart,

In last two days I was trying to move further.

I tried to prepare some workaround for incorrect length.
First idea was to utilize the fact that integer overflow probably took 
place, so I tried to search for 0x2c just at overflow points (probably 
quite brittle approach). But it turned out, that in my heap dumps (both 
real one and prepared for development purposes) there is no 0x2c byte 
after/at the end of the dump.
I've noticed however that for both dumps the heap record ends at the end 
of the file. Probably it is not general rule, but for my purposes it 
should be enough, so I changed kato sources slightly to utilize this and 
decided to move further.

So far I have some code which extracts static data from dump (strings, 
classes) and searches for instances of given class.
I have run it against real dump. It took around 40 min (to get static 
data) and 50 min (to find instances), which is not bad I think.
I didn't made careful reasoning about performance but CPU (not I/O) 
looked to be a bottleneck. Performance also looked to be quite stable 
across whole dump.

60GB dump was extracted for core file using jmap (I don't remember 
exactly from which version of java this jmap came from, but for sure it 
was Java 1.6).
Following JVM version was dumped to core file:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

I was also suspecting that heap dump segments were introduced to handle 
+4GB heap sizes, but it looks like sometimes it can not be a case.

Anyway, it is nice that I was able to go through whole dump without any 
error.

Regards
Lukasz

Steve Poole wrote:
>  Hi Lukasz,
>
>  You say your  hprof format is 1.0.1   so , according to the information I
> have about hprof format,  that means your won't have any heap dump segments
> in your file.  Apparently they are in the  1.0.2 format which is a Java 6
> addition and is only used when the heap is too big for the old format to
> cope with.
>
> What JVM and OS are you using to produce this dump?
>
> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk> wrote:
>
>   
>> Hi,
>>
>>
>> You are quite right, 4 bytes for the record length. If we had the file
>> format wrong for 64 bit
>> hprof dumps we'd not be able to read any dump, so it must be an overflow of
>> the record length.
>> In that case, the only possible solution I can think of would be for us to
>> change to code to search
>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP
>> DUMP.
>>
>> Of course, there was some puzzling over the difference between HEAP DUMP
>> SEGMENT (0x1c) and HEAP DUMP
>> (0x0c) before.
>>
>> I'd expect we'd get:
>>
>>    HEAP DUMP
>>        heap dump records....
>>    HEAP DUMP END
>>
>> or
>>
>>    HEAP DUMP SEGMENT
>>        heap dump records
>>    HEAP DUMP SEGMENT
>>        heap dump records
>>       :      :
>>    HEAP DUMP END
>>
>>
>> If we ignore the record size for the HEAP DUMP and calculate it ourselves,
>> we might have more luck.
>>
>> Regards,
>>    Stuart
>>
>>
>>
>> Lukasz wrote:
>>
>>     
>>> Hello,
>>>
>>> Since performance looks not bad after changing array size, today I have
>>> tried to process my 60GB heap dump. Unfortunately, very fast I've ended with
>>> exception:
>>> Exception in thread "main" java.lang.IllegalArgumentException: range
>>> -1293366370 is less than 0
>>> at
>>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
>>>
>>> at
>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
>>>
>>> at
>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
>>>
>>> at
>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
>>> at org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
>>> ... (my code) ...
>>>
>>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>>
>>> Following is the part of the dump which I believes is responsible for that
>>> (B2 E8 CB 9E):
>>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE 20
>>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8 0C
>>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21 3D
>>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>>
>>> What is interesting hprof file format reserves only 4 bytes for length of
>>> a record, which allows to express record whose size is at most 4GB (assuming
>>> unsigned 4 bytes).
>>>
>>> My dump's version is: JAVA PROFILE 1.0.1
>>>
>>> Regards
>>> Lukasz
>>>
>>>
>>> Lukasz wrote:
>>>
>>>       
>>>> Hi Stuart, Steve,
>>>>
>>>> I've taken deeper look into code. I still didn't trace carefully index
>>>> calculation in classes BitMaskMappingArray and ArrayBitMaskMappingStrategy,
>>>> but I managed to improve performance by increasing arrays size in those
>>>> classes (which is set in HProfFile class).
>>>>
>>>> If I understand code correctly, when capacity of BitMaskMappingArray will
>>>> be exhausted bucketSize is doubled, which in turn causes that more reads
>>>> (even cached) is required to set position of
>>>> IDataProvider/IArrayEntryProvider.
>>>>
>>>> Following are loading time results for default array size (1000) and
>>>> increased (1000000). Test ran against generated dump file (5000000 instances
>>>> of Data).
>>>> Default (1000):
>>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
>>>> doubled)
>>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>>> ... (I didn't wait until end)
>>>>
>>>> Increased(1000000):
>>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>>> ...
>>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>>> (end)
>>>>
>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if I
>>>> remember correctly).
>>>>
>>>> Regards
>>>> Lukasz
>>>>
>>>>
>>>> Stuart Monteith wrote:
>>>>
>>>>         
>>>>> The hprof dump reader spends a lot of time reading the whole file, for
>>>>> various reason.
>>>>> The indices it has in memory are constructed through an initial read,
>>>>> and this is also
>>>>> the source of the memory usage. In addition, there is some correlation
>>>>> to be done which
>>>>> also takes up time, and induces yet more reading.
>>>>>
>>>>> I'm sure some work could be done to improve the performance further, but
>>>>> we'll have to
>>>>> look at the tradeoff between diskspace and memory usage. The hprof file
>>>>> format itself
>>>>> is what it is, however, and we have no influence over that. The CJVMTI
>>>>> agent is has lots of
>>>>> room for improvement, but I suspect its potential for improvement is
>>>>> unlikely to be much better
>>>>> than existing hprof implementations. The built-in JVM hprof dumper will
>>>>> probably be a hard act
>>>>> to follow.
>>>>>
>>>>> The HProf implementation is not thread-safe. Realistically, I think it
>>>>> is something that ought to
>>>>> be considered once things are more mature. There will be algorithms that
>>>>> can deal with the JVM
>>>>> structure sensible.
>>>>>
>>>>> And thanks Lukasz, it's great to have your input.
>>>>>
>>>>> Regards,
>>>>> Stuart
>>>>>
>>>>>           
>>>>         
>> --
>> Stuart Monteith
>> http://blog.stoo.me.uk/
>>
>>
>>     
>
>
>

Re: Processing huge heap dump.

Posted by Steve Poole <sp...@googlemail.com>.

 Hi Lukasz,

 You say your  hprof format is 1.0.1   so , according to the information I
have about hprof format,  that means your won't have any heap dump segments
in your file.  Apparently they are in the  1.0.2 format which is a Java 6
addition and is only used when the heap is too big for the old format to
cope with.

What JVM and OS are you using to produce this dump?

On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <st...@stoo.me.uk> wrote:

> Hi,
>
>
> You are quite right, 4 bytes for the record length. If we had the file
> format wrong for 64 bit
> hprof dumps we'd not be able to read any dump, so it must be an overflow of
> the record length.
> In that case, the only possible solution I can think of would be for us to
> change to code to search
> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP
> DUMP.
>
> Of course, there was some puzzling over the difference between HEAP DUMP
> SEGMENT (0x1c) and HEAP DUMP
> (0x0c) before.
>
> I'd expect we'd get:
>
>    HEAP DUMP
>        heap dump records....
>    HEAP DUMP END
>
> or
>
>    HEAP DUMP SEGMENT
>        heap dump records
>    HEAP DUMP SEGMENT
>        heap dump records
>       :      :
>    HEAP DUMP END
>
>
> If we ignore the record size for the HEAP DUMP and calculate it ourselves,
> we might have more luck.
>
> Regards,
>    Stuart
>
>
>
> Lukasz wrote:
>
>> Hello,
>>
>> Since performance looks not bad after changing array size, today I have
>> tried to process my 60GB heap dump. Unfortunately, very fast I've ended with
>> exception:
>> Exception in thread "main" java.lang.IllegalArgumentException: range
>> -1293366370 is less than 0
>> at
>> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
>>
>> at
>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
>>
>> at
>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
>>
>> at
>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
>> at org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
>> ... (my code) ...
>>
>> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>>
>> Following is the part of the dump which I believes is responsible for that
>> (B2 E8 CB 9E):
>> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE 20
>> 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
>> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8 0C
>> 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
>> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21 3D
>> 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>>
>> What is interesting hprof file format reserves only 4 bytes for length of
>> a record, which allows to express record whose size is at most 4GB (assuming
>> unsigned 4 bytes).
>>
>> My dump's version is: JAVA PROFILE 1.0.1
>>
>> Regards
>> Lukasz
>>
>>
>> Lukasz wrote:
>>
>>> Hi Stuart, Steve,
>>>
>>> I've taken deeper look into code. I still didn't trace carefully index
>>> calculation in classes BitMaskMappingArray and ArrayBitMaskMappingStrategy,
>>> but I managed to improve performance by increasing arrays size in those
>>> classes (which is set in HProfFile class).
>>>
>>> If I understand code correctly, when capacity of BitMaskMappingArray will
>>> be exhausted bucketSize is doubled, which in turn causes that more reads
>>> (even cached) is required to set position of
>>> IDataProvider/IArrayEntryProvider.
>>>
>>> Following are loading time results for default array size (1000) and
>>> increased (1000000). Test ran against generated dump file (5000000 instances
>>> of Data).
>>> Default (1000):
>>> HeapSubRecord: 100000 (866ms, 4215kB)
>>> HeapSubRecord: 200000 (1716ms, 7879kB)
>>> HeapSubRecord: 300000 (2833ms, 11263kB)
>>> HeapSubRecord: 400000 (3889ms, 14283kB)
>>> HeapSubRecord: 500000 (3893ms, 17319kB)
>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
>>> doubled)
>>> HeapSubRecord: 700000 (7721ms, 23531kB)
>>> HeapSubRecord: 800000 (7729ms, 26567kB)
>>> HeapSubRecord: 900000 (7731ms, 29671kB)
>>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>>> ... (I didn't wait until end)
>>>
>>> Increased(1000000):
>>> HeapSubRecord: 100000 (622ms, 17809kB)
>>> HeapSubRecord: 200000 (309ms, 20345kB)
>>> HeapSubRecord: 300000 (283ms, 23861kB)
>>> HeapSubRecord: 400000 (274ms, 27921kB)
>>> HeapSubRecord: 500000 (269ms, 29957kB)
>>> HeapSubRecord: 600000 (264ms, 31993kB)
>>> HeapSubRecord: 700000 (272ms, 36097kB)
>>> HeapSubRecord: 800000 (288ms, 37739kB)
>>> HeapSubRecord: 900000 (263ms, 39835kB)
>>> HeapSubRecord: 1000000 (259ms, 41931kB)
>>> HeapSubRecord: 1100000 (300ms, 44773kB)
>>> HeapSubRecord: 1200000 (283ms, 46901kB)
>>> HeapSubRecord: 1300000 (291ms, 49029kB)
>>> HeapSubRecord: 1400000 (328ms, 53801kB)
>>> HeapSubRecord: 1500000 (259ms, 53801kB)
>>> HeapSubRecord: 1600000 (272ms, 58125kB)
>>> HeapSubRecord: 1700000 (264ms, 60293kB)
>>> HeapSubRecord: 1800000 (264ms, 62473kB)
>>> HeapSubRecord: 1900000 (361ms, 61373kB)
>>> HeapSubRecord: 2000000 (274ms, 63105kB)
>>> ...
>>> HeapSubRecord: 9000000 (284ms, 231969kB)
>>> HeapSubRecord: 9100000 (272ms, 233597kB)
>>> HeapSubRecord: 9200000 (281ms, 236357kB)
>>> HeapSubRecord: 9300000 (274ms, 240469kB)
>>> HeapSubRecord: 9400000 (279ms, 244541kB)
>>> HeapSubRecord: 9500000 (269ms, 246549kB)
>>> HeapSubRecord: 9600000 (279ms, 250565kB)
>>> HeapSubRecord: 9700000 (265ms, 252573kB)
>>> HeapSubRecord: 9800000 (279ms, 256629kB)
>>> HeapSubRecord: 9900000 (265ms, 258669kB)
>>> HeapSubRecord: 10000000 (463ms, 263997kB)
>>> (end)
>>>
>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if I
>>> remember correctly).
>>>
>>> Regards
>>> Lukasz
>>>
>>>
>>> Stuart Monteith wrote:
>>>
>>>> The hprof dump reader spends a lot of time reading the whole file, for
>>>> various reason.
>>>> The indices it has in memory are constructed through an initial read,
>>>> and this is also
>>>> the source of the memory usage. In addition, there is some correlation
>>>> to be done which
>>>> also takes up time, and induces yet more reading.
>>>>
>>>> I'm sure some work could be done to improve the performance further, but
>>>> we'll have to
>>>> look at the tradeoff between diskspace and memory usage. The hprof file
>>>> format itself
>>>> is what it is, however, and we have no influence over that. The CJVMTI
>>>> agent is has lots of
>>>> room for improvement, but I suspect its potential for improvement is
>>>> unlikely to be much better
>>>> than existing hprof implementations. The built-in JVM hprof dumper will
>>>> probably be a hard act
>>>> to follow.
>>>>
>>>> The HProf implementation is not thread-safe. Realistically, I think it
>>>> is something that ought to
>>>> be considered once things are more mature. There will be algorithms that
>>>> can deal with the JVM
>>>> structure sensible.
>>>>
>>>> And thanks Lukasz, it's great to have your input.
>>>>
>>>> Regards,
>>>> Stuart
>>>>
>>>
>>>
>>
> --
> Stuart Monteith
> http://blog.stoo.me.uk/
>
>


-- 
Steve

Re: Processing huge heap dump.

Posted by Stuart Monteith <st...@stoo.me.uk>.

Hi,


You are quite right, 4 bytes for the record length. If we had the file 
format wrong for 64 bit
hprof dumps we'd not be able to read any dump, so it must be an overflow 
of the record length.
In that case, the only possible solution I can think of would be for us 
to change to code to search
for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAP 
DUMP.

Of course, there was some puzzling over the difference between HEAP DUMP 
SEGMENT (0x1c) and HEAP DUMP
(0x0c) before.

I'd expect we'd get:

     HEAP DUMP
         heap dump records....
     HEAP DUMP END

or

     HEAP DUMP SEGMENT
         heap dump records
     HEAP DUMP SEGMENT
         heap dump records
        :      :
     HEAP DUMP END


If we ignore the record size for the HEAP DUMP and calculate it 
ourselves, we might have more luck.

Regards,
     Stuart


Lukasz wrote:
> Hello,
>
> Since performance looks not bad after changing array size, today I 
> have tried to process my 60GB heap dump. Unfortunately, very fast I've 
> ended with exception:
> Exception in thread "main" java.lang.IllegalArgumentException: range 
> -1293366370 is less than 0
> at 
> org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29) 
>
> at 
> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487) 
>
> at 
> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437) 
>
> at 
> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56) 
>
> at 
> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
> ... (my code) ...
>
> It looks like range of HeapDumpHProfRecord had value: -1293366370.
>
> Following is the part of the dump which I believes is responsible for 
> that (B2 E8 CB 9E):
> 0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE 
> 20 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
> 0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8 
> 0C 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
> 0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21 
> 3D 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
>
> What is interesting hprof file format reserves only 4 bytes for length 
> of a record, which allows to express record whose size is at most 4GB 
> (assuming unsigned 4 bytes).
>
> My dump's version is: JAVA PROFILE 1.0.1
>
> Regards
> Lukasz
>
>
> Lukasz wrote:
>> Hi Stuart, Steve,
>>
>> I've taken deeper look into code. I still didn't trace carefully 
>> index calculation in classes BitMaskMappingArray and 
>> ArrayBitMaskMappingStrategy, but I managed to improve performance by 
>> increasing arrays size in those classes (which is set in HProfFile 
>> class).
>>
>> If I understand code correctly, when capacity of BitMaskMappingArray 
>> will be exhausted bucketSize is doubled, which in turn causes that 
>> more reads (even cached) is required to set position of 
>> IDataProvider/IArrayEntryProvider.
>>
>> Following are loading time results for default array size (1000) and 
>> increased (1000000). Test ran against generated dump file (5000000 
>> instances of Data).
>> Default (1000):
>> HeapSubRecord: 100000 (866ms, 4215kB)
>> HeapSubRecord: 200000 (1716ms, 7879kB)
>> HeapSubRecord: 300000 (2833ms, 11263kB)
>> HeapSubRecord: 400000 (3889ms, 14283kB)
>> HeapSubRecord: 500000 (3893ms, 17319kB)
>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was 
>> doubled)
>> HeapSubRecord: 700000 (7721ms, 23531kB)
>> HeapSubRecord: 800000 (7729ms, 26567kB)
>> HeapSubRecord: 900000 (7731ms, 29671kB)
>> HeapSubRecord: 1000000 (7704ms, 32731kB)
>> ... (I didn't wait until end)
>>
>> Increased(1000000):
>> HeapSubRecord: 100000 (622ms, 17809kB)
>> HeapSubRecord: 200000 (309ms, 20345kB)
>> HeapSubRecord: 300000 (283ms, 23861kB)
>> HeapSubRecord: 400000 (274ms, 27921kB)
>> HeapSubRecord: 500000 (269ms, 29957kB)
>> HeapSubRecord: 600000 (264ms, 31993kB)
>> HeapSubRecord: 700000 (272ms, 36097kB)
>> HeapSubRecord: 800000 (288ms, 37739kB)
>> HeapSubRecord: 900000 (263ms, 39835kB)
>> HeapSubRecord: 1000000 (259ms, 41931kB)
>> HeapSubRecord: 1100000 (300ms, 44773kB)
>> HeapSubRecord: 1200000 (283ms, 46901kB)
>> HeapSubRecord: 1300000 (291ms, 49029kB)
>> HeapSubRecord: 1400000 (328ms, 53801kB)
>> HeapSubRecord: 1500000 (259ms, 53801kB)
>> HeapSubRecord: 1600000 (272ms, 58125kB)
>> HeapSubRecord: 1700000 (264ms, 60293kB)
>> HeapSubRecord: 1800000 (264ms, 62473kB)
>> HeapSubRecord: 1900000 (361ms, 61373kB)
>> HeapSubRecord: 2000000 (274ms, 63105kB)
>> ...
>> HeapSubRecord: 9000000 (284ms, 231969kB)
>> HeapSubRecord: 9100000 (272ms, 233597kB)
>> HeapSubRecord: 9200000 (281ms, 236357kB)
>> HeapSubRecord: 9300000 (274ms, 240469kB)
>> HeapSubRecord: 9400000 (279ms, 244541kB)
>> HeapSubRecord: 9500000 (269ms, 246549kB)
>> HeapSubRecord: 9600000 (279ms, 250565kB)
>> HeapSubRecord: 9700000 (265ms, 252573kB)
>> HeapSubRecord: 9800000 (279ms, 256629kB)
>> HeapSubRecord: 9900000 (265ms, 258669kB)
>> HeapSubRecord: 10000000 (463ms, 263997kB)
>> (end)
>>
>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects 
>> (if I remember correctly).
>>
>> Regards
>> Lukasz
>>
>>
>> Stuart Monteith wrote:
>>> The hprof dump reader spends a lot of time reading the whole file, 
>>> for various reason.
>>> The indices it has in memory are constructed through an initial 
>>> read, and this is also
>>> the source of the memory usage. In addition, there is some 
>>> correlation to be done which
>>> also takes up time, and induces yet more reading.
>>>
>>> I'm sure some work could be done to improve the performance further, 
>>> but we'll have to
>>> look at the tradeoff between diskspace and memory usage. The hprof 
>>> file format itself
>>> is what it is, however, and we have no influence over that. The 
>>> CJVMTI agent is has lots of
>>> room for improvement, but I suspect its potential for improvement is 
>>> unlikely to be much better
>>> than existing hprof implementations. The built-in JVM hprof dumper 
>>> will probably be a hard act
>>> to follow.
>>>
>>> The HProf implementation is not thread-safe. Realistically, I think 
>>> it is something that ought to
>>> be considered once things are more mature. There will be algorithms 
>>> that can deal with the JVM
>>> structure sensible.
>>>
>>> And thanks Lukasz, it's great to have your input.
>>>
>>> Regards,
>>> Stuart
>>
>

-- 
Stuart Monteith
http://blog.stoo.me.uk/

Re: Processing huge heap dump.

Posted by Lukasz <fl...@intercel.com.pl>.

Hello,

Since performance looks not bad after changing array size, today I have 
tried to process my 60GB heap dump. Unfortunately, very fast I've ended 
with exception:
Exception in thread "main" java.lang.IllegalArgumentException: range 
-1293366370 is less than 0
at 
org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
at 
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
at 
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
at 
org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
at org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
... (my code) ...

It looks like range of HeapDumpHProfRecord had value: -1293366370.

Following is the part of the dump which I believes is responsible for 
that (B2 E8 CB 9E):
0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE 
20 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8 
0C 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21 
3D 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8

What is interesting hprof file format reserves only 4 bytes for length 
of a record, which allows to express record whose size is at most 4GB 
(assuming unsigned 4 bytes).

My dump's version is: JAVA PROFILE 1.0.1

Regards
Lukasz


Lukasz wrote:
> Hi Stuart, Steve,
>
> I've taken deeper look into code. I still didn't trace carefully index 
> calculation in classes BitMaskMappingArray and 
> ArrayBitMaskMappingStrategy, but I managed to improve performance by 
> increasing arrays size in those classes (which is set in HProfFile 
> class).
>
> If I understand code correctly, when capacity of BitMaskMappingArray 
> will be exhausted bucketSize is doubled, which in turn causes that 
> more reads (even cached) is required to set position of 
> IDataProvider/IArrayEntryProvider.
>
> Following are loading time results for default array size (1000) and 
> increased (1000000). Test ran against generated dump file (5000000 
> instances of Data).
> Default (1000):
> HeapSubRecord: 100000 (866ms, 4215kB)
> HeapSubRecord: 200000 (1716ms, 7879kB)
> HeapSubRecord: 300000 (2833ms, 11263kB)
> HeapSubRecord: 400000 (3889ms, 14283kB)
> HeapSubRecord: 500000 (3893ms, 17319kB)
> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was 
> doubled)
> HeapSubRecord: 700000 (7721ms, 23531kB)
> HeapSubRecord: 800000 (7729ms, 26567kB)
> HeapSubRecord: 900000 (7731ms, 29671kB)
> HeapSubRecord: 1000000 (7704ms, 32731kB)
> ... (I didn't wait until end)
>
> Increased(1000000):
> HeapSubRecord: 100000 (622ms, 17809kB)
> HeapSubRecord: 200000 (309ms, 20345kB)
> HeapSubRecord: 300000 (283ms, 23861kB)
> HeapSubRecord: 400000 (274ms, 27921kB)
> HeapSubRecord: 500000 (269ms, 29957kB)
> HeapSubRecord: 600000 (264ms, 31993kB)
> HeapSubRecord: 700000 (272ms, 36097kB)
> HeapSubRecord: 800000 (288ms, 37739kB)
> HeapSubRecord: 900000 (263ms, 39835kB)
> HeapSubRecord: 1000000 (259ms, 41931kB)
> HeapSubRecord: 1100000 (300ms, 44773kB)
> HeapSubRecord: 1200000 (283ms, 46901kB)
> HeapSubRecord: 1300000 (291ms, 49029kB)
> HeapSubRecord: 1400000 (328ms, 53801kB)
> HeapSubRecord: 1500000 (259ms, 53801kB)
> HeapSubRecord: 1600000 (272ms, 58125kB)
> HeapSubRecord: 1700000 (264ms, 60293kB)
> HeapSubRecord: 1800000 (264ms, 62473kB)
> HeapSubRecord: 1900000 (361ms, 61373kB)
> HeapSubRecord: 2000000 (274ms, 63105kB)
> ...
> HeapSubRecord: 9000000 (284ms, 231969kB)
> HeapSubRecord: 9100000 (272ms, 233597kB)
> HeapSubRecord: 9200000 (281ms, 236357kB)
> HeapSubRecord: 9300000 (274ms, 240469kB)
> HeapSubRecord: 9400000 (279ms, 244541kB)
> HeapSubRecord: 9500000 (269ms, 246549kB)
> HeapSubRecord: 9600000 (279ms, 250565kB)
> HeapSubRecord: 9700000 (265ms, 252573kB)
> HeapSubRecord: 9800000 (279ms, 256629kB)
> HeapSubRecord: 9900000 (265ms, 258669kB)
> HeapSubRecord: 10000000 (463ms, 263997kB)
> (end)
>
> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if 
> I remember correctly).
>
> Regards
> Lukasz
>
>
> Stuart Monteith wrote:
>> The hprof dump reader spends a lot of time reading the whole file, 
>> for various reason.
>> The indices it has in memory are constructed through an initial read, 
>> and this is also
>> the source of the memory usage. In addition, there is some 
>> correlation to be done which
>> also takes up time, and induces yet more reading.
>>
>> I'm sure some work could be done to improve the performance further, 
>> but we'll have to
>> look at the tradeoff between diskspace and memory usage. The hprof 
>> file format itself
>> is what it is, however, and we have no influence over that. The 
>> CJVMTI agent is has lots of
>> room for improvement, but I suspect its potential for improvement is 
>> unlikely to be much better
>> than existing hprof implementations. The built-in JVM hprof dumper 
>> will probably be a hard act
>> to follow.
>>
>> The HProf implementation is not thread-safe. Realistically, I think 
>> it is something that ought to
>> be considered once things are more mature. There will be algorithms 
>> that can deal with the JVM
>> structure sensible.
>>
>> And thanks Lukasz, it's great to have your input.
>>
>> Regards,
>> Stuart
>

Re: Processing huge heap dump.

Posted by Lukasz <fl...@intercel.com.pl>.

Hi Stuart, Steve,

I've taken deeper look into code. I still didn't trace carefully index 
calculation in classes BitMaskMappingArray and 
ArrayBitMaskMappingStrategy, but I managed to improve performance by 
increasing arrays size in those classes (which is set in HProfFile class).

If I understand code correctly, when capacity of BitMaskMappingArray 
will be exhausted bucketSize is doubled, which in turn causes that more 
reads (even cached) is required to set position of 
IDataProvider/IArrayEntryProvider.

Following are loading time results for default array size (1000) and 
increased (1000000). Test ran against generated dump file (5000000 
instances of Data).
Default (1000):
HeapSubRecord: 100000 (866ms, 4215kB)
HeapSubRecord: 200000 (1716ms, 7879kB)
HeapSubRecord: 300000 (2833ms, 11263kB)
HeapSubRecord: 400000 (3889ms, 14283kB)
HeapSubRecord: 500000 (3893ms, 17319kB)
HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was 
doubled)
HeapSubRecord: 700000 (7721ms, 23531kB)
HeapSubRecord: 800000 (7729ms, 26567kB)
HeapSubRecord: 900000 (7731ms, 29671kB)
HeapSubRecord: 1000000 (7704ms, 32731kB)
... (I didn't wait until end)

Increased(1000000):
HeapSubRecord: 100000 (622ms, 17809kB)
HeapSubRecord: 200000 (309ms, 20345kB)
HeapSubRecord: 300000 (283ms, 23861kB)
HeapSubRecord: 400000 (274ms, 27921kB)
HeapSubRecord: 500000 (269ms, 29957kB)
HeapSubRecord: 600000 (264ms, 31993kB)
HeapSubRecord: 700000 (272ms, 36097kB)
HeapSubRecord: 800000 (288ms, 37739kB)
HeapSubRecord: 900000 (263ms, 39835kB)
HeapSubRecord: 1000000 (259ms, 41931kB)
HeapSubRecord: 1100000 (300ms, 44773kB)
HeapSubRecord: 1200000 (283ms, 46901kB)
HeapSubRecord: 1300000 (291ms, 49029kB)
HeapSubRecord: 1400000 (328ms, 53801kB)
HeapSubRecord: 1500000 (259ms, 53801kB)
HeapSubRecord: 1600000 (272ms, 58125kB)
HeapSubRecord: 1700000 (264ms, 60293kB)
HeapSubRecord: 1800000 (264ms, 62473kB)
HeapSubRecord: 1900000 (361ms, 61373kB)
HeapSubRecord: 2000000 (274ms, 63105kB)
...
HeapSubRecord: 9000000 (284ms, 231969kB)
HeapSubRecord: 9100000 (272ms, 233597kB)
HeapSubRecord: 9200000 (281ms, 236357kB)
HeapSubRecord: 9300000 (274ms, 240469kB)
HeapSubRecord: 9400000 (279ms, 244541kB)
HeapSubRecord: 9500000 (269ms, 246549kB)
HeapSubRecord: 9600000 (279ms, 250565kB)
HeapSubRecord: 9700000 (265ms, 252573kB)
HeapSubRecord: 9800000 (279ms, 256629kB)
HeapSubRecord: 9900000 (265ms, 258669kB)
HeapSubRecord: 10000000 (463ms, 263997kB)
(end)

i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if I 
remember correctly).

Regards
Lukasz


Stuart Monteith wrote:
> The hprof dump reader spends a lot of time reading the whole file, for 
> various reason.
> The indices it has in memory are constructed through an initial read, 
> and this is also
> the source of the memory usage. In addition, there is some correlation 
> to be done which
> also takes up time, and induces yet more reading.
>
> I'm sure some work could be done to improve the performance further, 
> but we'll have to
> look at the tradeoff between diskspace and memory usage. The hprof 
> file format itself
> is what it is, however, and we have no influence over that. The CJVMTI 
> agent is has lots of
> room for improvement, but I suspect its potential for improvement is 
> unlikely to be much better
> than existing hprof implementations. The built-in JVM hprof dumper 
> will probably be a hard act
> to follow.
>
> The HProf implementation is not thread-safe. Realistically, I think it 
> is something that ought to
> be considered once things are more mature. There will be algorithms 
> that can deal with the JVM
> structure sensible.
>
> And thanks Lukasz, it's great to have your input.
>
> Regards,
>     Stuart

Re: Processing huge heap dump.

Posted by Stuart Monteith <st...@stoo.me.uk>.

The hprof dump reader spends a lot of time reading the whole file, for 
various reason.
The indices it has in memory are constructed through an initial read, 
and this is also
the source of the memory usage. In addition, there is some correlation 
to be done which
also takes up time, and induces yet more reading.

I'm sure some work could be done to improve the performance further, but 
we'll have to
look at the tradeoff between diskspace and memory usage. The hprof file 
format itself
is what it is, however, and we have no influence over that. The CJVMTI 
agent is has lots of
room for improvement, but I suspect its potential for improvement is 
unlikely to be much better
than existing hprof implementations. The built-in JVM hprof dumper will 
probably be a hard act
to follow.

The HProf implementation is not thread-safe. Realistically, I think it 
is something that ought to
be considered once things are more mature. There will be algorithms that 
can deal with the JVM
structure sensible.

And thanks Lukasz, it's great to have your input.

Regards,
     Stuart


Steve Poole wrote:
> Hi Lukaz - thanks for posting.
>
> On Fri, Jan 8, 2010 at 7:11 PM, Lukasz<fl...@intercel.com.pl>  wrote:
>
>    
>> Hello,
>>
>> In my work I have faced problem where I have to process 60GB heap dump.
>> Probably it wasn't be nothing scary if I would have proper hardware to
>> process such file.
>> I've noticed that most of the tools for dump processing requires:
>> a) amount of memory at least equal to dump size
>> b) amount of free disk space at least equal to dump size (to create
>> indexes)
>> Unfortunately, I haven't access to machine which where both requirements
>> are meet at once.
>>
>>
>>      
> Yes I agree -  for a) above I'd say that its common to need 1.5 times the
> size of the original heap.
>
>
>    
>> Processing I would like to perform is backtracking of references to
>> instances of one class (which causes out of memory). Assuming that hard disk
>> read will be my performance bottleneck, I should be able to backtrace few
>> levels during the night.
>> I have only raw overview of algorithm in my head, but it seems that
>> something like "visitor pattern" would be enough for me.
>>
>> Since I have read about kato some time ago, I wanted to give it a try.
>>
>> For dev purposes I have prepared ~370MB heap dump with around 10 000 000 of
>> simple object added to collection (which probably multiply amount of object
>> on heap).
>>
>> Can you share the code you used to generate the data in the dumo?
>>      
>
>    
>> 1) Default approach:
>> Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);
>>
>> I was waiting few minutes, but it didn't finish processing, so it looks
>> that there will be no chance to process 60GB dump.
>>      
>
> I suspect that the main reason why this is taking so long is that the HPROF
> reader has to read all the dump first since it doesn't know what questions
> you need answering,   That's generally true of any dump reader
> unfortunately.
>
>    
>>      
> 2) HProfFile
>    
>> Fast look at HProfView class, give me some idea how I can visit all objects
>> (records) on a heap.
>>
>> I wrote simple app which only iterate through all records, but it also
>> turned out to be quite slow and memory consuming. Following is some metrics:
>> ---------------------------------
>> MemoryPool: PS Old Gen
>> Hello World!
>> heapDump:
>> org.apache.kato.hprof.datalayer.HProfFile$HeapDumpHProfRecord@f0eed6
>> HeapSubRecord: 100000 (946ms, 4199kB)
>> HeapSubRecord: 200000 (2064ms, 7955kB)
>> HeapSubRecord: 300000 (3123ms, 11759kB)
>> HeapSubRecord: 400000 (3933ms, 14811kB)
>> HeapSubRecord: 500000 (3908ms, 17927kB)
>> HeapSubRecord: 600000 (7269ms, 21039kB)
>> HeapSubRecord: 700000 (7736ms, 24139kB)
>> HeapSubRecord: 800000 (7866ms, 27147kB)
>> HeapSubRecord: 900000 (7753ms, 30263kB)
>> HeapSubRecord: 1000000 (7684ms, 33299kB)
>> HeapSubRecord: 1100000 (13515ms, 36487kB)
>> HeapSubRecord: 1200000 (15525ms, 39623kB)
>> HeapSubRecord: 1300000 (15405ms, 42723kB)
>> HeapSubRecord: 1400000 (15567ms, 39115kB)
>> HeapSubRecord: 1500000 (15459ms, 42203kB)
>> HeapSubRecord: 1600000 (15692ms, 43838kB)
>> HeapSubRecord: 1700000 (15424ms, 45926kB)
>> HeapSubRecord: 1800000 (15327ms, 49026kB)
>> HeapSubRecord: 1900000 (15416ms, 48505kB)
>> HeapSubRecord: 2000000 (15352ms, 51629kB)
>> -------------------------------
>> It means that iterating over first 100 000 of records took 946ms and 4199kB
>> of OldGen was consumed.
>> Iterating over next 100 000 of records took 2064ms and 7955kB of OldGen was
>> consumed.
>> And so on, 100 000 of records is the interval for printing stats.
>>
>> One core of cpu was saturated. It also looks like required amount of memory
>> will be equal to dump size.
>> I could start 4 threads to make better utilization of CPU, but since it
>> looks like HProfFile instance is not thread safe I would have to create 4
>> instances of HProfFile, which means that required amount of memory will be
>> like 4 x dumpSize.
>>
>> That's all I made so far. I didn't track what in HProfFile consumes CPU and
>> memory, my blind guess is that CachedRandomAccesDataProvider is involved.
>>
>> Thanks for this Luksz -   you are probably the first person to use this
>>      
> code other than the developers and its great to get some feedback.  Can you
> share the  code you used to create the dump and to visit the HPROF records?
> Stuart has made some performance adjustments to the hprof code and we'll see
> if we can do better.
>
> On the spec list we're discussing the basics of a "snapshot" dump concept
> where only what you need gets dumped.    I wonder if the same idea could be
> applied to opening a dump.   It would be great to know when reading a dump
> that certain information is not required -  that should improve
> performance.
>
>
>
> Regards
>    
>> Lukasz
>>
>>
>>      
>
>    

-- 
Stuart Monteith
http://blog.stoo.me.uk/

Re: Processing huge heap dump.

Posted by Lukasz <fl...@intercel.com.pl>.

Hi Steve,

Thanks for answer.

Following is the code I used.
I created a dump by running GenHeap and invoking dumpHeap(..) from jconsole.
I wrote, that my dump was around ~370MB, but today when I regenerate it 
it has ~440MB (maybe last time I took dump before all instances where 
created).

Yesterday I tried to take a look into parser code, but I haven't yet 
good image of what (and way) is going on there.

Regards,
Lukasz



--------------------------- Heap generator 
------------------------------------------
import java.util.concurrent.ConcurrentLinkedQueue;

public class GenHeap {

    private ConcurrentLinkedQueue<Data> queue;

    public GenHeap(int size) {
        queue = new ConcurrentLinkedQueue<Data>();
        for (int i=0; i<size; ++i) {
            queue.add(new Data());
        }
    }

    private static GenHeap root;

    public static void main(String[] args) throws Exception {
        root = new GenHeap(Integer.parseInt(args[0]));
        System.out.println("Ready for taking a dump.");
        Thread.sleep(1000000);
        System.out.println("root: "+root);
    }
}

public class Data {
    private static int counter = 0;
    public int myid = ++counter;
}
---------------------------------------------------------------------------------------------------
------------------------ Analyzer 
----------------------------------------------------------
package mykato;

import java.io.File;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import javax.tools.diagnostics.FactoryRegistry;
import javax.tools.diagnostics.image.Image;
import org.apache.kato.hprof.datalayer.HProfFactory;
import org.apache.kato.hprof.datalayer.HProfFile;
import org.apache.kato.hprof.datalayer.IHProfRecord;
import org.apache.kato.hprof.datalayer.IHeapDumpHProfRecord;

public class App {

    public static void main(String[] args) throws Exception {
        System.out.println("Hello World!");
        String pathToDump = "/tmp/kato.hprof";
        File dump = new File(pathToDump);

//        defaultApproach(dump);
        hprofReaderApproach(dump);

        System.out.println("TaDa!");
    }

    private static void defaultApproach(File dump) throws Exception {
        Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);
    }

    private static void hprofReaderApproach(File dump) throws Exception {
        HProfFile hprofFile = HProfFactory.createReader(dump);
//        HProfFile hprofFile = new HProfFile(new 
RandomAccesDataProvider(dump));
        hprofFile.open();

        int r = 0;
        IHProfRecord record;
        while ((record = hprofFile.getRecord(r)) != null) {
            if (record instanceof IHeapDumpHProfRecord) {
                processHeapDump((IHeapDumpHProfRecord)record);
            }
            ++r;
        }

    }

    private static void processHeapDump(IHeapDumpHProfRecord heapDump) {
        System.out.println("heapDump: "+heapDump);

        int r = 0;
        long lastTimestamp = System.currentTimeMillis();
        IHProfRecord record;
        while ((record = heapDump.getSubRecord(r)) != null) {
            if (++r % 100000 == 0) {
                long timestamp = System.currentTimeMillis();
                System.out.println("HeapSubRecord: "+r+" 
("+(timestamp-lastTimestamp)+"ms, "+getMemoryUsed()+"kB)");
                lastTimestamp = timestamp;
            }
        }
    }

    private static MemoryPoolMXBean oldGenPool;
    static {
        for (MemoryPoolMXBean pool : 
ManagementFactory.getMemoryPoolMXBeans()) {
            if (pool.getName().toLowerCase().contains("old")) {
                oldGenPool = pool;
                break;
            }
        }
        System.out.println("MemoryPool: "+oldGenPool.getName());
    }

    private static long getMemoryUsed() {
        return (int)(oldGenPool.getUsage().getUsed() / 1024);
    }
}
--------------------------------------------------------------------------------------------------------------




Steve Poole wrote:
>
>> Thanks for this Luksz -   you are probably the first person to use this
>>     
> code other than the developers and its great to get some feedback.  Can you
> share the  code you used to create the dump and to visit the HPROF records?
> Stuart has made some performance adjustments to the hprof code and we'll see
> if we can do better.
>
> On the spec list we're discussing the basics of a "snapshot" dump concept
> where only what you need gets dumped.    I wonder if the same idea could be
> applied to opening a dump.   It would be great to know when reading a dump
> that certain information is not required -  that should improve
> performance.
>
>

Re: Processing huge heap dump.

Posted by Steve Poole <sp...@googlemail.com>.

Hi Lukaz - thanks for posting.

On Fri, Jan 8, 2010 at 7:11 PM, Lukasz <fl...@intercel.com.pl> wrote:

> Hello,
>
> In my work I have faced problem where I have to process 60GB heap dump.
> Probably it wasn't be nothing scary if I would have proper hardware to
> process such file.
> I've noticed that most of the tools for dump processing requires:
> a) amount of memory at least equal to dump size
> b) amount of free disk space at least equal to dump size (to create
> indexes)
> Unfortunately, I haven't access to machine which where both requirements
> are meet at once.
>
>
Yes I agree -  for a) above I'd say that its common to need 1.5 times the
size of the original heap.


> Processing I would like to perform is backtracking of references to
> instances of one class (which causes out of memory). Assuming that hard disk
> read will be my performance bottleneck, I should be able to backtrace few
> levels during the night.
> I have only raw overview of algorithm in my head, but it seems that
> something like "visitor pattern" would be enough for me.
>
> Since I have read about kato some time ago, I wanted to give it a try.
>
> For dev purposes I have prepared ~370MB heap dump with around 10 000 000 of
> simple object added to collection (which probably multiply amount of object
> on heap).
>
> Can you share the code you used to generate the data in the dumo?


> 1) Default approach:
> Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);
>
> I was waiting few minutes, but it didn't finish processing, so it looks
> that there will be no chance to process 60GB dump.


I suspect that the main reason why this is taking so long is that the HPROF
reader has to read all the dump first since it doesn't know what questions
you need answering,   That's generally true of any dump reader
unfortunately.

>

2) HProfFile
> Fast look at HProfView class, give me some idea how I can visit all objects
> (records) on a heap.
>
> I wrote simple app which only iterate through all records, but it also
> turned out to be quite slow and memory consuming. Following is some metrics:
> ---------------------------------
> MemoryPool: PS Old Gen
> Hello World!
> heapDump:
> org.apache.kato.hprof.datalayer.HProfFile$HeapDumpHProfRecord@f0eed6
> HeapSubRecord: 100000 (946ms, 4199kB)
> HeapSubRecord: 200000 (2064ms, 7955kB)
> HeapSubRecord: 300000 (3123ms, 11759kB)
> HeapSubRecord: 400000 (3933ms, 14811kB)
> HeapSubRecord: 500000 (3908ms, 17927kB)
> HeapSubRecord: 600000 (7269ms, 21039kB)
> HeapSubRecord: 700000 (7736ms, 24139kB)
> HeapSubRecord: 800000 (7866ms, 27147kB)
> HeapSubRecord: 900000 (7753ms, 30263kB)
> HeapSubRecord: 1000000 (7684ms, 33299kB)
> HeapSubRecord: 1100000 (13515ms, 36487kB)
> HeapSubRecord: 1200000 (15525ms, 39623kB)
> HeapSubRecord: 1300000 (15405ms, 42723kB)
> HeapSubRecord: 1400000 (15567ms, 39115kB)
> HeapSubRecord: 1500000 (15459ms, 42203kB)
> HeapSubRecord: 1600000 (15692ms, 43838kB)
> HeapSubRecord: 1700000 (15424ms, 45926kB)
> HeapSubRecord: 1800000 (15327ms, 49026kB)
> HeapSubRecord: 1900000 (15416ms, 48505kB)
> HeapSubRecord: 2000000 (15352ms, 51629kB)
> -------------------------------
> It means that iterating over first 100 000 of records took 946ms and 4199kB
> of OldGen was consumed.
> Iterating over next 100 000 of records took 2064ms and 7955kB of OldGen was
> consumed.
> And so on, 100 000 of records is the interval for printing stats.
>
> One core of cpu was saturated. It also looks like required amount of memory
> will be equal to dump size.
> I could start 4 threads to make better utilization of CPU, but since it
> looks like HProfFile instance is not thread safe I would have to create 4
> instances of HProfFile, which means that required amount of memory will be
> like 4 x dumpSize.
>
> That's all I made so far. I didn't track what in HProfFile consumes CPU and
> memory, my blind guess is that CachedRandomAccesDataProvider is involved.
>
> Thanks for this Luksz -   you are probably the first person to use this
code other than the developers and its great to get some feedback.  Can you
share the  code you used to create the dump and to visit the HPROF records?
Stuart has made some performance adjustments to the hprof code and we'll see
if we can do better.

On the spec list we're discussing the basics of a "snapshot" dump concept
where only what you need gets dumped.    I wonder if the same idea could be
applied to opening a dump.   It would be great to know when reading a dump
that certain information is not required -  that should improve
performance.



Regards
> Lukasz
>
>


-- 
Steve