You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@wicket.apache.org by richard emberson <ri...@gmail.com> on 2011/07/09 19:53:43 UTC

Page De-Serialization and memory

This is a question for Wicket masters and those application builders
whose application match the criteria as specified below.

[In this case, a Wicket master is someone with a knowledge
of how Wicket is being used in a wide spectrum of applications
so that they have a feel for what use-cases exist in the real world.]

Wicket is used in a wide range of applications with a variety of
usage patterns. What I am interested in are those applications where
an appreciable number of the pages in memory are pages that had
previously been serialized and stored to disk and then reanimated,
not found in an in-memory cache and had to be read from disk and
de-serialized back into an in-memory page; which is to say,
applications with an appreciable number of reanimated pages.

Firstly, do such applications exists? These are real-world
applications where a significant number of pages in-memory
are reanimated pages.

For such applications, what percentage of all pages at any
given time are reanimated pages?
Is it, say, a couple of percent? Two or three in which case its not
very significant.
Or, is it, say, 50%? Meaning that half of all pages currently in
memory had been serialized to disk, flushed from any in-memory cache
and then, as needed, de-serialized back into a Page.

Thanks

Richard
-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Martin Grigorov <mg...@apache.org>.
On Mon, Jul 11, 2011 at 5:12 PM, richard emberson
<ri...@gmail.com> wrote:
> When you say 10000 times, you set NOS_TIMES to 10000?
I mean NOS_TRIALS.
> (NOS_TIMES should have been called ARRAY_SIZE).
>
> Richard
>
> On 07/11/2011 05:38 AM, Martin Grigorov wrote:
>>
>> Running the third method (the 'problematic' one) 10000 times shows no
>> changes in the PermGen space in VisualVM graphics.
>> The value is stable at 7.9Mb.
>>
>> MemoryMXBean shows that non-heap space increases more than heap space
>> but I didn't find any resource explaining what is included in this
>> non-heap statistics.
>>
>> The proof that PermGen is quite stable can be seen with:  -verbose:gc
>> -XX:+PrintGCDetails
>>
>> It produces something like:
>> [Full GC (System) [PSYoungGen: 0K->0K(76480K)] [PSOldGen:
>> 1372K->1372K(174784K)] 1372K->1372K(251264K) [PSPermGen:
>> 6746K->6746K(16384K)], 0.0198550 secs] [Times: user=0.01 sys=0.00,
>> real=0.02 secs]
>>
>> Comparing several such outputs shows that PermGen is stable (not
>> increasing, not decreasing).
>>
>> Almost all of the memory allocation happens in the YoungGen and rarely
>> in the OldGen. This is normal because Label objects are created and
>> then discarded.
>>
>> On Sun, Jul 10, 2011 at 11:37 AM, Martin Grigorov<mg...@apache.org>
>>  wrote:
>>>
>>> Hi,
>>>
>>> About the use cases: my experience is that most of the time the uses
>>> the in-memory pages (for each listener callback execution, for ajax
>>> requests,...).
>>> Previous version of a page, or previous page is needed when the user
>>> clicks browser back button. Even in this case most of the time the
>>> in-memory cache is hit. Only when the user goes several pages back and
>>> this page is not in-memory then the disk store is used.
>>>
>>> So far so good, but...! Even in-memory store contains serialized
>>> versions of the Page, named SerializedPage. This is a struct which
>>> contains
>>> {
>>>  sessionId: String,
>>>  pageId: int,
>>>  data: byte[]
>>> }
>>> so the Page is serialized back and forth when stored in *any*
>>> IPageStore/IDataStore.
>>>
>>> This is the current state in Wicket 1.5.
>>>
>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>> improved to work with Page instances but we decided to postpone this
>>> optimization for 1.5.0+.
>>>
>>> About new String("someLiteral"): I don't remember lately seeing this
>>> code neither in libraries, nor in applications. This constructor
>>> should be used only when the developer explicitly wants this string to
>>> not be interned and stored in the PermGen space, i.e. it will be
>>> stored in the heap space.
>>> Your benchmark test tests exactly this - the heap space.
>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>> after deserialization.
>>> I'm not very into Java Serialization but indeed it seems the Strings
>>> are deserialized in the heap. But even in this case they go in the
>>> Eden space, i.e. they are reclaimed soon after.
>>>
>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>> <ri...@gmail.com>  wrote:
>>>>
>>>> I you run the little Java program I included, you will see that
>>>> there is an impact - de-serialized objects take more memory.
>>>>
>>>> Richard
>>>>
>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>
>>>>> string literals are interned by the jvm so they should have a minimal
>>>>> memory impact.
>>>>>
>>>>> -igor
>>>>>
>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>> <ri...@gmail.com>    wrote:
>>>>>>
>>>>>> Martin,
>>>>>>
>>>>>> The reason I was interested was because it struck me a couple of
>>>>>> days ago that while each Page, tree of Components, is created
>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>> as instance variables in the tree are shared
>>>>>> between all copies of the Page but that when such a Page is
>>>>>> serialized to disk and then de-serialized, each String becomes its own
>>>>>> copy unique to that particular Page. This means that if an
>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>> there could be a bunch of memory being used for all the String
>>>>>> copies.
>>>>>>
>>>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>>>> there are three different ways of creating an array of
>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>    new Label(some_string)
>>>>>>
>>>>>> The first is to share the same String over all instance of the Label.
>>>>>>    new Label(the_string)
>>>>>> The second is to make a copy of the String when creating each
>>>>>> Label;
>>>>>>    new Label(new String(the_string))
>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>> the byte array for each Label.
>>>>>>
>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>> is shared by all Labels while the second and third approach
>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>> de-serialized String is replaced with the original instance of the
>>>>>> String, then the third approach uses only as much memory as the
>>>>>> first approach.
>>>>>>
>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>>> then there could be a memory saving by
>>>>>> making de-serialization smarter about possible shared objects.
>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>> usage patterns, it might be worth looking into.
>>>>>>
>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>> application might fit the use-case.
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>
>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>
>>>>>>> But I am no master ;)
>>>>>>>
>>>>>>> **
>>>>>>> Martin
>>>>>>>
>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>
>>>>>>>> This is a question for Wicket masters and those application builders
>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>
>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>> so that they have a feel for what use-cases exist in the real
>>>>>>>> world.]
>>>>>>>>
>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>> usage patterns. What I am interested in are those applications where
>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>
>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>> are reanimated pages.
>>>>>>>>
>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>> given time are reanimated pages?
>>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>>> very significant.
>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Richard
>>>>>>>> --
>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Martin Grigorov
>>> jWeekend
>>> Training, Consulting, Development
>>> http://jWeekend.com
>>>
>>
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
When you say 10000 times, you set NOS_TIMES to 10000?
(NOS_TIMES should have been called ARRAY_SIZE).

Richard

On 07/11/2011 05:38 AM, Martin Grigorov wrote:
> Running the third method (the 'problematic' one) 10000 times shows no
> changes in the PermGen space in VisualVM graphics.
> The value is stable at 7.9Mb.
>
> MemoryMXBean shows that non-heap space increases more than heap space
> but I didn't find any resource explaining what is included in this
> non-heap statistics.
>
> The proof that PermGen is quite stable can be seen with:  -verbose:gc
> -XX:+PrintGCDetails
>
> It produces something like:
> [Full GC (System) [PSYoungGen: 0K->0K(76480K)] [PSOldGen:
> 1372K->1372K(174784K)] 1372K->1372K(251264K) [PSPermGen:
> 6746K->6746K(16384K)], 0.0198550 secs] [Times: user=0.01 sys=0.00,
> real=0.02 secs]
>
> Comparing several such outputs shows that PermGen is stable (not
> increasing, not decreasing).
>
> Almost all of the memory allocation happens in the YoungGen and rarely
> in the OldGen. This is normal because Label objects are created and
> then discarded.
>
> On Sun, Jul 10, 2011 at 11:37 AM, Martin Grigorov<mg...@apache.org>  wrote:
>> Hi,
>>
>> About the use cases: my experience is that most of the time the uses
>> the in-memory pages (for each listener callback execution, for ajax
>> requests,...).
>> Previous version of a page, or previous page is needed when the user
>> clicks browser back button. Even in this case most of the time the
>> in-memory cache is hit. Only when the user goes several pages back and
>> this page is not in-memory then the disk store is used.
>>
>> So far so good, but...! Even in-memory store contains serialized
>> versions of the Page, named SerializedPage. This is a struct which
>> contains
>> {
>>   sessionId: String,
>>   pageId: int,
>>   data: byte[]
>> }
>> so the Page is serialized back and forth when stored in *any*
>> IPageStore/IDataStore.
>>
>> This is the current state in Wicket 1.5.
>>
>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>> improved to work with Page instances but we decided to postpone this
>> optimization for 1.5.0+.
>>
>> About new String("someLiteral"): I don't remember lately seeing this
>> code neither in libraries, nor in applications. This constructor
>> should be used only when the developer explicitly wants this string to
>> not be interned and stored in the PermGen space, i.e. it will be
>> stored in the heap space.
>> Your benchmark test tests exactly this - the heap space.
>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>> after deserialization.
>> I'm not very into Java Serialization but indeed it seems the Strings
>> are deserialized in the heap. But even in this case they go in the
>> Eden space, i.e. they are reclaimed soon after.
>>
>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>> <ri...@gmail.com>  wrote:
>>> I you run the little Java program I included, you will see that
>>> there is an impact - de-serialized objects take more memory.
>>>
>>> Richard
>>>
>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>
>>>> string literals are interned by the jvm so they should have a minimal
>>>> memory impact.
>>>>
>>>> -igor
>>>>
>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>> <ri...@gmail.com>    wrote:
>>>>>
>>>>> Martin,
>>>>>
>>>>> The reason I was interested was because it struck me a couple of
>>>>> days ago that while each Page, tree of Components, is created
>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>> as instance variables in the tree are shared
>>>>> between all copies of the Page but that when such a Page is
>>>>> serialized to disk and then de-serialized, each String becomes its own
>>>>> copy unique to that particular Page. This means that if an
>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>> there could be a bunch of memory being used for all the String
>>>>> copies.
>>>>>
>>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>>> there are three different ways of creating an array of
>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>     new Label(some_string)
>>>>>
>>>>> The first is to share the same String over all instance of the Label.
>>>>>     new Label(the_string)
>>>>> The second is to make a copy of the String when creating each
>>>>> Label;
>>>>>     new Label(new String(the_string))
>>>>> The third is to create a single Label, serialize it to an array of
>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>> the byte array for each Label.
>>>>>
>>>>> Needless to say, the first uses the least memory; the label string
>>>>> is shared by all Labels while the second and third approach
>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>> de-serialized String is replaced with the original instance of the
>>>>> String, then the third approach uses only as much memory as the
>>>>> first approach.
>>>>>
>>>>> No rocket science here, but it does seem to imply that if a
>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>> then there could be a memory saving by
>>>>> making de-serialization smarter about possible shared objects.
>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>> usage patterns, it might be worth looking into.
>>>>>
>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>> application might fit the use-case.
>>>>>
>>>>> Richard
>>>>>
>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>
>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>
>>>>>> But I am no master ;)
>>>>>>
>>>>>> **
>>>>>> Martin
>>>>>>
>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>
>>>>>>> This is a question for Wicket masters and those application builders
>>>>>>> whose application match the criteria as specified below.
>>>>>>>
>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>>>
>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>> usage patterns. What I am interested in are those applications where
>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>
>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>> applications where a significant number of pages in-memory
>>>>>>> are reanimated pages.
>>>>>>>
>>>>>>> For such applications, what percentage of all pages at any
>>>>>>> given time are reanimated pages?
>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>> very significant.
>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Richard
>>>>>>> --
>>>>>>> Quis custodiet ipsos custodes
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Martin Grigorov
>> jWeekend
>> Training, Consulting, Development
>> http://jWeekend.com
>>
>
>
>

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Martin Grigorov <mg...@apache.org>.
Running the third method (the 'problematic' one) 10000 times shows no
changes in the PermGen space in VisualVM graphics.
The value is stable at 7.9Mb.

MemoryMXBean shows that non-heap space increases more than heap space
but I didn't find any resource explaining what is included in this
non-heap statistics.

The proof that PermGen is quite stable can be seen with:  -verbose:gc
-XX:+PrintGCDetails

It produces something like:
[Full GC (System) [PSYoungGen: 0K->0K(76480K)] [PSOldGen:
1372K->1372K(174784K)] 1372K->1372K(251264K) [PSPermGen:
6746K->6746K(16384K)], 0.0198550 secs] [Times: user=0.01 sys=0.00,
real=0.02 secs]

Comparing several such outputs shows that PermGen is stable (not
increasing, not decreasing).

Almost all of the memory allocation happens in the YoungGen and rarely
in the OldGen. This is normal because Label objects are created and
then discarded.

On Sun, Jul 10, 2011 at 11:37 AM, Martin Grigorov <mg...@apache.org> wrote:
> Hi,
>
> About the use cases: my experience is that most of the time the uses
> the in-memory pages (for each listener callback execution, for ajax
> requests,...).
> Previous version of a page, or previous page is needed when the user
> clicks browser back button. Even in this case most of the time the
> in-memory cache is hit. Only when the user goes several pages back and
> this page is not in-memory then the disk store is used.
>
> So far so good, but...! Even in-memory store contains serialized
> versions of the Page, named SerializedPage. This is a struct which
> contains
> {
>  sessionId: String,
>  pageId: int,
>  data: byte[]
> }
> so the Page is serialized back and forth when stored in *any*
> IPageStore/IDataStore.
>
> This is the current state in Wicket 1.5.
>
> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
> improved to work with Page instances but we decided to postpone this
> optimization for 1.5.0+.
>
> About new String("someLiteral"): I don't remember lately seeing this
> code neither in libraries, nor in applications. This constructor
> should be used only when the developer explicitly wants this string to
> not be interned and stored in the PermGen space, i.e. it will be
> stored in the heap space.
> Your benchmark test tests exactly this - the heap space.
> I'll try the app with MemoryMXBean to see whether the non-heap changes
> after deserialization.
> I'm not very into Java Serialization but indeed it seems the Strings
> are deserialized in the heap. But even in this case they go in the
> Eden space, i.e. they are reclaimed soon after.
>
> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
> <ri...@gmail.com> wrote:
>> I you run the little Java program I included, you will see that
>> there is an impact - de-serialized objects take more memory.
>>
>> Richard
>>
>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>
>>> string literals are interned by the jvm so they should have a minimal
>>> memory impact.
>>>
>>> -igor
>>>
>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>> <ri...@gmail.com>  wrote:
>>>>
>>>> Martin,
>>>>
>>>> The reason I was interested was because it struck me a couple of
>>>> days ago that while each Page, tree of Components, is created
>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>> as instance variables in the tree are shared
>>>> between all copies of the Page but that when such a Page is
>>>> serialized to disk and then de-serialized, each String becomes its own
>>>> copy unique to that particular Page. This means that if an
>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>> there could be a bunch of memory being used for all the String
>>>> copies.
>>>>
>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>> there are three different ways of creating an array of
>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>    new Label(some_string)
>>>>
>>>> The first is to share the same String over all instance of the Label.
>>>>    new Label(the_string)
>>>> The second is to make a copy of the String when creating each
>>>> Label;
>>>>    new Label(new String(the_string))
>>>> The third is to create a single Label, serialize it to an array of
>>>> bytes and then generate the Labels in the array by de-serialized
>>>> the byte array for each Label.
>>>>
>>>> Needless to say, the first uses the least memory; the label string
>>>> is shared by all Labels while the second and third approach
>>>> uses more memory. Also, if during the de-serialization process, the
>>>> de-serialized String is replaced with the original instance of the
>>>> String, then the third approach uses only as much memory as the
>>>> first approach.
>>>>
>>>> No rocket science here, but it does seem to imply that if a
>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>> then there could be a memory saving by
>>>> making de-serialization smarter about possible shared objects.
>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>> usage patterns, it might be worth looking into.
>>>>
>>>> Hence, my question to the masters of Wicket and developers whose
>>>> application might fit the use-case.
>>>>
>>>> Richard
>>>>
>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>
>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>
>>>>> But I am no master ;)
>>>>>
>>>>> **
>>>>> Martin
>>>>>
>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>
>>>>>> This is a question for Wicket masters and those application builders
>>>>>> whose application match the criteria as specified below.
>>>>>>
>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>>
>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>> usage patterns. What I am interested in are those applications where
>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>
>>>>>> Firstly, do such applications exists? These are real-world
>>>>>> applications where a significant number of pages in-memory
>>>>>> are reanimated pages.
>>>>>>
>>>>>> For such applications, what percentage of all pages at any
>>>>>> given time are reanimated pages?
>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>> very significant.
>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Richard
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
>
>
> --
> Martin Grigorov
> jWeekend
> Training, Consulting, Development
> http://jWeekend.com
>



-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.

On 07/20/2011 10:03 AM, Martin Grigorov wrote:
> Hi Richard,
>
> With the serialization optimizations you optimize only the second and
> third level stores, i.e. the runtime memory is still the almost same.
> You'll gain only if you have bigger second level cache which is used
> when the user uses browser back button. And I think this is no so
> often.
Just a thought and maybe a little off topic,
What if, when a Page is first generated, when it is first
loaded given its class, prior to use, the Page is serialized and
the bytes are put into a cache Map from class name to bytes.
Then, subsequent times the page is request (same session or
different session), it is found in the cache and simply
de-serialized.
Would it work?
Would it be better (choose some criteria)?
Thanks

>
> About Scala vs. Java consciousness: I guess you read this thread -
> http://groups.google.com/group/scala-user/browse_thread/thread/ea4d4dda2352a523#
> Here and in the previous thread on this topic the functional guys
> suggest solutions which I think are not that easy to read and as
> proven the speed is far from the imperative solution. Oderski explains
> it well in his response.
Ha. Yea, I have been following that discussion.
I tend to write OO-Scala and not FP-Scala.
Partly because that is the way my mind works but also
because if FP was so great, Lisp would have ended the discussion
(or may Haskell would have) and all enterprise applications would
be written in Lisp - but, of course, if you search the IBM/Oracle/SAP
sites you don't find any Lisp enterprise applications for sale
(ok, having said this, someone will find one, I admit defeat, etc.).
Also, its a lot easier to understand, debug and log OO vs FP code
(but, again, that is just my enterprise application development
background speaking).

>
> About the questions - the simple answer is that a Component can have
> just one parent, so it is not possible to reuse it neither in the same
> page nor in different page. The same is true about its collection of
> children. This is the current state.
Well, I guess the immutable Component would have to have a mutable
parent reference.

Thanks
Richard
-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Martin Grigorov <mg...@apache.org>.
Hi Richard,

With the serialization optimizations you optimize only the second and
third level stores, i.e. the runtime memory is still the almost same.
You'll gain only if you have bigger second level cache which is used
when the user uses browser back button. And I think this is no so
often.

About Scala vs. Java consciousness: I guess you read this thread -
http://groups.google.com/group/scala-user/browse_thread/thread/ea4d4dda2352a523#
Here and in the previous thread on this topic the functional guys
suggest solutions which I think are not that easy to read and as
proven the speed is far from the imperative solution. Oderski explains
it well in his response.

About the questions - the simple answer is that a Component can have
just one parent, so it is not possible to reuse it neither in the same
page nor in different page. The same is true about its collection of
children. This is the current state.

On Wed, Jul 20, 2011 at 7:44 PM, Igor Vaynberg <ig...@gmail.com> wrote:
> On Wed, Jul 20, 2011 at 9:00 AM, richard emberson
> <ri...@gmail.com> wrote:
>> I have many examples of such Java bloat. Consider the getKey method
>> in the org/apache/wicket/util/value/ValueMap.java class:
>>
>> Java version:
>>
>>  public String getKey(final String key)
>>  {
>>    for (Object keyValue : keySet())
>>    {
>>      if (keyValue instanceof String)
>>      {
>>        String keyString = (String)keyValue;
>>        if (key.equalsIgnoreCase(keyString))
>>        {
>>          return keyString;
>>        }
>>      }
>>    }
>>    return null;
>>  }
>>
>> Scala version:
>>
>>  def getKey(key: String): Option[String] =
>>    keySet find { s => key.equalsIgnoreCase(s) }
>
> that is a bad example. that method was there since the times valuemaps
> supported non-string keys, thats what all the noise was about. your
> code doesnt support non string keys, and i just cleaned it up ours so
> it doesnt have to worry about it either. thanks for pointing it out :)
>
> here it is in its concise form :
> public String getKey(String key) {
>        for (String other : keySet()) if (other.equalsIgnoreCase(key)) return other;
>        return null;
> }
>
> it all depends on formatting
>>
>> The Scala version reads like a sentence: "For the keys find
>> the key which equals, ignoring case, the key parameter."
>> The Java code is just so sad in comparison.
>
> not in my concise version, though, is it? however, the concise version
> is harder for some people to read, so we use very generous formatting
> rules when it comes to spacing and curly braces.
>
>> I did have 2 questions buried in my previous email.
>> Both having to do with serialization of an object when
>> it appears as 2nd (3rd, etc.) time during the serialization
>> process.
>
> serialization handles multiple references to the same instance. so if
> you have the same instance showing up more then once in the
> serialization graph it is only written out once. this is how circular
> references are handled as well.
>
>> So, first, is it possible, likely, allowed, excluded, etc. that
>> the same Component can appear more than once in the same
>> Page tree? Would it make sense or even be possible for the
>> same Form object to appear more than once in the same Page tree?
>> Not two copies of a Form, but the single instance in two places
>> on a Page?
>> If it should never happen, is there code in Wicket that ensures
>> that it does not happen?
>
> it is not allowed, see page#componentRendered()
>
>> Secondly, for a Component that is immutable in a given Page,
>> could it appear, be reused, in the "same" Page in different
>> Sessions (different clients)? Other areas of such Pages would
>> be different, hold different data, but could the immutable part
>> be same object? As an example, a read-only Label object, could
>> it be used in the same place in the same Page "type" but in
>> different Sessions? Is there any mechanism in Wicket currently
>> that could identify such possible reuse?
>
> sharing component instances between pages is a bad idea, sharing them
> between sessions is even worse. code is constantly refactored, what is
> immutable now will most likely not be immutable later. i would hate
> coding wicket if every time i made a change to someone else's
> component i would have to check if i just made something immutable
> mutable and possibly cause a security leak.
>
> -igor
>
>
>> After memory comes performance and thats a much harder nut to
>> crack. To track down bugs in the Scala port I had to put
>> detailed logging into both the Java and Scala versions.
>> What was most surprising was the amount a code that
>> had to be execute, multiple times, just to render the
>> simplest Page in a unit test - tens of pages of logging
>> output. I do not understand all that is truly happening
>> within Wicket to render a Page yet, but its on my todo list.
>> And, maybe, there is no issue.
>>
>> Richard
>> Thanks.
>>
>>
>> On 07/20/2011 03:04 AM, Martin Grigorov wrote:
>>>
>>> Hi Richard,
>>>
>>> 1. Scala traits are something useful which I hope to have someday in Java
>>> too.
>>> They can help in make some code reusable when it is not possible to
>>> have common base class. At the end a trait is a partial base class...
>>>
>>> 2. I'm not sure what problem you are after with this optimization in
>>> the serialized version of the object (its bytes).
>>> Your quest will not improve the runtime memory consumption because the
>>> trait's properties are mixed with the class instance properties. You
>>> may have problems with PermGen though because Scala produces classes
>>> for every "with Foo" (and for every Function/closure).
>>> You are trying to improve the size (and speed?) of the produced bytes
>>> after serialization. While this will reduce the size of the page
>>> caches (for two of them - second (application scope) and third
>>> (disk)). First level (http session) contains page instances (not
>>> serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
>>> more information.
>>>
>>> RAM and especially HDD are cheap today, so I think the benefit of your
>>> optimization will not be big. As a proof I can say that there are no
>>> complains in the mailing lists that Wicket produces too big files for
>>> the third level cache. The general complain is that http session
>>> footprint is bigger than action-based web frameworks but I think this
>>> is because using custom o.a.w.Session is so comfortable that people
>>> start putting a lot of state there. The next reason is first-level
>>> cache but even this is easy to "solve" - just implement custom
>>> IPageManager or override the default one to not use http session as
>>> first level cache.
>>>
>>> Recently we reworked a bit the code related to page serialization and
>>> now it is possible to use any library specialized in object
>>> serialization (see https://github.com/eishay/jvm-serializers/wiki).
>>> The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
>>> be harder to use but not impossible.
>>> The schemaless ones (Java Serialization, Kryo, XStream, ...) are
>>> easier to use with Wicket. You may check Kryo based serializer at
>>>
>>> https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
>>> . It is faster than Java Serialization and produces less bytes.
>>>
>>> On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
>>> <ri...@gmail.com>  wrote:
>>>>
>>>> Martin,
>>>>
>>>> The reason I was interested in Wicket memory usage was because
>>>> of the potential use of Scala traits, rather than the two possible
>>>> Java approaches, might be compelling when it comes to memory usage.
>>>>
>>>> First, the two Java approaches: proxy/wrapper object or bundle everything
>>>> into the base class.
>>>>
>>>> The proxy/wrapper approach lets one have a single implementation
>>>> that can be share by multiple classes. The down side is that
>>>> proxy/wrapper object requires an additional reference in the
>>>> class using it and hence additional memory usage.
>>>>
>>>> The bundle everything into the base class approach violates
>>>> OOP 101 dictum about having small objects focused on their
>>>> own particular behavior thus avoiding bloat.
>>>> (Not executable Java/Scala code below.)
>>>>
>>>> interface Parent {
>>>>  getParent
>>>>  setParent
>>>> }
>>>> // Potentially shared implementation
>>>> class ParentProxy implements Parent {
>>>>  parent
>>>>  getParent = parent
>>>>  setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> // Issue: Has additional instance variable: parentProxy
>>>> class CompWithProxy with Parent {
>>>>  parentProxy = new ParentProxy
>>>>  getParent = parentProxy.getParent
>>>>  setParent(parent) = parentProxy.setParent(parent)
>>>> }
>>>>
>>>> // Issue: Does not share implementation
>>>> class CompAllInOne with Parent {
>>>>  parent
>>>>  getParent = parent
>>>>  setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> Wicket has taken the "bundle everything into base class" in order
>>>> to lessen memory usage - a certainly reasonable Java approach
>>>> to the problem.
>>>>
>>>> With Scala one can do the following:
>>>>
>>>> // Shared implementation
>>>> trait ParentTrait {
>>>>  parent
>>>>  getParent = parent
>>>>  setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> // Uses implementation
>>>> class Comp with ParentTrait
>>>>
>>>> The implementation, ParentTrait, can be used by any
>>>> number of classes.
>>>> In addition, one can add to a base class any number of
>>>> such implementation traits sharing multiple implementations
>>>> across multiple classes.
>>>>
>>>> So, can using such approach result in smaller (less in-memory)
>>>> object in Scala than in Java?
>>>>
>>>> The ParentTrait does not really save very much. I assume
>>>> that its only the Page class and sub-classes that do not have
>>>> parent components in Wicket, so the savings per Page component
>>>> tree is very small indeed. But, there are other behaviors that
>>>> can be converted to traits, for example, Models.
>>>> Many of the instance variables in the Java Models which
>>>> take memory can be converted to methods return values which only
>>>> add to the size of the class, not to every instance of the class.
>>>> Also, with Model traits that use Component self-types, one can
>>>> do away with IComponentAssignedModel wrapping and such.
>>>>
>>>> So, how to demonstrate such memory differences. I created
>>>> stripped down versions of the Component and Label classes in
>>>> both Java and Scala (only ids and Models) .
>>>> Created different Model usage scenarios
>>>> with Model object in Java and Traits in Scala, and, finally,
>>>> serialized (Java Serialization) the result comparing the size
>>>> of the resulting array of bytes. There are two runs, one with
>>>> all Strings being the empty string and the next where all
>>>> strings are 10-character strings:
>>>>
>>>> The Java versions (empty string):
>>>> Label.Empty               99
>>>> Label.ReadOnly           196
>>>> Label.ReadWrite          159
>>>> Label.Resource           333
>>>> Label.Property           223
>>>> Label.ComponentProperty  351
>>>> Label.CompoundProperty   208
>>>>
>>>> The Scala versions (empty string):
>>>> Label.Empty              79
>>>> Label.ReadOnly           131
>>>> Label.ReadWrite          150
>>>> Label.Resource           164
>>>> Label.Property           207
>>>> Label.ComponentProperty  134
>>>> Label.CompoundProperty   184
>>>>
>>>>
>>>> The Java versions (10-character strings):
>>>> Label.Empty              109
>>>> Label.ReadOnly           214
>>>> Label.ReadWrite          177
>>>> Label.Resource           359
>>>> Label.Property           241
>>>> Label.ComponentProperty  369
>>>> Label.CompoundProperty   218
>>>>
>>>>
>>>> The Scala versions (10-character strings):
>>>> Label.Empty               89
>>>> Label.ReadOnly           149
>>>> Label.ReadWrite          168
>>>> Label.Resource           190
>>>> Label.Property           225
>>>> Label.ComponentProperty  152
>>>> Label.CompoundProperty   194
>>>>
>>>> [Note that the Java Label.Empty result is misleading since in Wicket
>>>> there is no memory overhead when a Component has no Model.]
>>>>
>>>> While this does indicate that using Model traits with Scala
>>>> will result in less memory usage than the comparable Java
>>>> approach, Java Serialization adds a whole lot of extra stuff
>>>> to the array of bytes that masks the true change in
>>>> in-memory usage. With Java Serialization, the class descriptor
>>>> for each instance serialized is also added to the byte array and,
>>>> it is this, that takes up most of the array of bytes.
>>>>
>>>> Thinking about it, I realized that Java Serialization is rather
>>>> a blunt tool when it comes to the requirement of (Scala) Wicket
>>>> Page serialization. Java Serialization creates a byte array
>>>> that is rather self-contained/self-descriptive in its content.
>>>> This is not required for (Scala) Wicket which has very
>>>> specific requirements and use-cases.
>>>>
>>>> But first, before I describe what I did, here are the results.
>>>> The byte array size data for the serializer I created just to
>>>> show that one can do a lot better than Java Serialization:
>>>>
>>>> The Scala versions (empty string):
>>>> Label.Empty                6
>>>> Label.ReadOnly             8
>>>> Label.ReadWrite            8
>>>> Label.Resource            10
>>>> Label.Property            13
>>>> Label.ComponentProperty    8
>>>> Label.CompoundProperty    11
>>>>
>>>> The Scala versions (10-character strings):
>>>> Label.Empty                8
>>>> Label.ReadOnly            12
>>>> Label.ReadWrite           12
>>>> Label.Resource            16
>>>> Label.Property            17
>>>> Label.ComponentProperty   12
>>>> Label.CompoundProperty    13
>>>>
>>>> Yes, better by more than a factor of 10. I assume factors
>>>> of 10 are compelling.
>>>>
>>>> So, back to the requirements. I spent a couple of days creating
>>>> the serializer (currently 3.8Kloc) that focused on what I thought
>>>> would be needed by (Scala) Wicket.
>>>> The same application using (Scala) Wicket is running on either a
>>>> single machine or a group of machines.
>>>> The serialized Page system can have:
>>>>
>>>>  In-memory repository
>>>>    (single-machine, testing);
>>>>  In-memory cache with local disk backstore
>>>>    (single-machine, production, re-start) and
>>>>  In-memory cache with database backstore used by a number of machines
>>>>    (multi-machine, production, fail-over, session-migration, re-start)
>>>>
>>>>  Strings and associated id are cached/backstored where it is the id
>>>>    that is used in the serialized array.
>>>>  Classes and associated id are cached/backstored where it is the id
>>>>    that is used in the serialized array.
>>>>  Optimizations allow, for example, the Long value 1L to be serialized
>>>>    as 1 byte or (un-optimized) as 9 bytes.
>>>>  When using a backstore, a header is prepended to each byte array
>>>>    that includes the serializer magic number (2 bytes), serializer
>>>>    protocol version (2 bytes?) and application information (version,
>>>> etc.)
>>>>    (2 bytes?).
>>>>
>>>> In addition, there are two cases where one might be serializing
>>>> the same object more than once.
>>>>
>>>> The first case is dealt with by most serializers, an object
>>>> appears more than once in the tree of objects being serialized.
>>>> Java Serialization deals with this. One must keep track of
>>>> the identity of all objects being serialized. Then, if an object
>>>> appears for serialization for a second (third, etc.) time, some
>>>> sort of reference object and tag is serialized rather than the
>>>> object. De-serialization is ....  obvious.
>>>> I do not know, but I assume that this does not arise in Wicket; the
>>>> same Component appearing more than once in the same Page tree of
>>>> components. If it does happen, please let me know. If it should
>>>> not happen but could, is there some visitor well-formness traversal
>>>> that check for duplicate object appearances in a given tree?
>>>>
>>>> The second case is one that probably does (or could) occur with
>>>> Wicket and I've never heard of a serializer dealing with, namely,
>>>> the same object appears in more than one Page tree - knowledge
>>>> of what is being serialized is shared across serializations.
>>>> For this to work, the
>>>> Component (which could be a tree of Components) has to be
>>>> immutable like a Label with a read-only value or read-only Model
>>>> (and the Model object is never changed), etc. Here, there can be
>>>> a saving if the shared object is serialized in its own backstore
>>>> and only its identifier appears in the byte arrays of each Page.
>>>> If there was an Immutable interface which could tag immutable
>>>> objects, it would be much easier for the serializer to identify
>>>> them (well, not just easier, but, rather, plain old possible
>>>> versus impossible) - just a last minute thought.
>>>>
>>>> I've not create a Java version of my serializer. But, since the
>>>> Scala version does not use much Scala magic, a Java version
>>>> would not be too hard to port to. I also have some 500 unit tests.
>>>>
>>>> Well, enough for now.
>>>>
>>>> Richard
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> About the use cases: my experience is that most of the time the uses
>>>>> the in-memory pages (for each listener callback execution, for ajax
>>>>> requests,...).
>>>>> Previous version of a page, or previous page is needed when the user
>>>>> clicks browser back button. Even in this case most of the time the
>>>>> in-memory cache is hit. Only when the user goes several pages back and
>>>>> this page is not in-memory then the disk store is used.
>>>>>
>>>>> So far so good, but...! Even in-memory store contains serialized
>>>>> versions of the Page, named SerializedPage. This is a struct which
>>>>> contains
>>>>> {
>>>>>   sessionId: String,
>>>>>   pageId: int,
>>>>>   data: byte[]
>>>>> }
>>>>> so the Page is serialized back and forth when stored in *any*
>>>>> IPageStore/IDataStore.
>>>>>
>>>>> This is the current state in Wicket 1.5.
>>>>>
>>>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>>>> improved to work with Page instances but we decided to postpone this
>>>>> optimization for 1.5.0+.
>>>>>
>>>>> About new String("someLiteral"): I don't remember lately seeing this
>>>>> code neither in libraries, nor in applications. This constructor
>>>>> should be used only when the developer explicitly wants this string to
>>>>> not be interned and stored in the PermGen space, i.e. it will be
>>>>> stored in the heap space.
>>>>> Your benchmark test tests exactly this - the heap space.
>>>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>>>> after deserialization.
>>>>> I'm not very into Java Serialization but indeed it seems the Strings
>>>>> are deserialized in the heap. But even in this case they go in the
>>>>> Eden space, i.e. they are reclaimed soon after.
>>>>>
>>>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>>>> <ri...@gmail.com>    wrote:
>>>>>>
>>>>>> I you run the little Java program I included, you will see that
>>>>>> there is an impact - de-serialized objects take more memory.
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>>>
>>>>>>> string literals are interned by the jvm so they should have a minimal
>>>>>>> memory impact.
>>>>>>>
>>>>>>> -igor
>>>>>>>
>>>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>>>> <ri...@gmail.com>      wrote:
>>>>>>>>
>>>>>>>> Martin,
>>>>>>>>
>>>>>>>> The reason I was interested was because it struck me a couple of
>>>>>>>> days ago that while each Page, tree of Components, is created
>>>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>>>> as instance variables in the tree are shared
>>>>>>>> between all copies of the Page but that when such a Page is
>>>>>>>> serialized to disk and then de-serialized, each String becomes its
>>>>>>>> own
>>>>>>>> copy unique to that particular Page. This means that if an
>>>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>>>> there could be a bunch of memory being used for all the String
>>>>>>>> copies.
>>>>>>>>
>>>>>>>> In the attached simple Java file (yes, I still write Java when I
>>>>>>>> must)
>>>>>>>> there are three different ways of creating an array of
>>>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>>>    new Label(some_string)
>>>>>>>>
>>>>>>>> The first is to share the same String over all instance of the Label.
>>>>>>>>    new Label(the_string)
>>>>>>>> The second is to make a copy of the String when creating each
>>>>>>>> Label;
>>>>>>>>    new Label(new String(the_string))
>>>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>>>> the byte array for each Label.
>>>>>>>>
>>>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>>>> is shared by all Labels while the second and third approach
>>>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>>>> de-serialized String is replaced with the original instance of the
>>>>>>>> String, then the third approach uses only as much memory as the
>>>>>>>> first approach.
>>>>>>>>
>>>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>>>>> then there could be a memory saving by
>>>>>>>> making de-serialization smarter about possible shared objects.
>>>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>>>> usage patterns, it might be worth looking into.
>>>>>>>>
>>>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>>>> application might fit the use-case.
>>>>>>>>
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>>>
>>>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>>>
>>>>>>>>> But I am no master ;)
>>>>>>>>>
>>>>>>>>> **
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>> This is a question for Wicket masters and those application
>>>>>>>>>> builders
>>>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>>>
>>>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>>>> so that they have a feel for what use-cases exist in the real
>>>>>>>>>> world.]
>>>>>>>>>>
>>>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>>>> usage patterns. What I am interested in are those applications
>>>>>>>>>> where
>>>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>>>
>>>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>>>> are reanimated pages.
>>>>>>>>>>
>>>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>>>> given time are reanimated pages?
>>>>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>>>>> very significant.
>>>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>>>> memory had been serialized to disk, flushed from any in-memory
>>>>>>>>>> cache
>>>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Richard
>>>>>>>>>> --
>>>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
>
> lol, so scala has a built in isOneOf, of course it wins there...this
> is of course a non-example. im not sure why some of our code is so
> bloated, its been there for years. i cleaned this one up to, here is
> the "concise" version:
>
> private boolean isOneOf(final char ch, final char[] charray) {
>     for (char c : charray) if (c==ch) return true;
>     return false;
> }
> what does the scala code for exists() look like? :)

Good re-write.
The Scala exists code pretty much looks like a generic version
of the isOneOf code.
The FP folks would point out that the difference is that there
are a bunch of such canned methods on all collection objects
and they are designed to be chained together.

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Igor Vaynberg <ig...@gmail.com>.
On Wed, Jul 20, 2011 at 10:19 AM, richard emberson
<ri...@gmail.com> wrote:
> Thanks Igor.
>
>> it is not allowed, see page#componentRendered()
> Thanks.
>
>> sharing component instances between pages
> I am going to have to think about all of this.
> Maybe making mutable and immutable version of things
> or, maybe, an Immutable trait (interface) that signals
> intent (but, of course, would not enforce it).
>
>> that is a bad example
>
> Maybe here's a better example (actually, a rather extreme example):
>
> org/apache/wicket/util/upload/ParameterParser.java
>
>  private def isOneOf(ch: Char, charray: Array[Char]): Boolean =
>    charray exists { _ == ch }
>
>  private boolean isOneOf(final char ch, final char[] charray)
>  {
>    boolean result = false;
>    for (char character : charray)
>    {
>      if (ch == character)
>      {
>        result = true;
>        break;
>      }
>
>    }
>    return result;
>  }

lol, so scala has a built in isOneOf, of course it wins there...this
is of course a non-example. im not sure why some of our code is so
bloated, its been there for years. i cleaned this one up to, here is
the "concise" version:

private boolean isOneOf(final char ch, final char[] charray) {
   for (char c : charray) if (c==ch) return true;
   return false;
}

what does the scala code for exists() look like? :)

-igor


>
> I am not trying to (re-)start any wars here.
> I do not think its all due to formatting.
> Currently, for 1.5-RC5.1 loc:
> Java Wicket:  154556
> Scala Wicket: 118617
> and its not really possible to use some of the more-terse
> aspects of Scala because that would require a rather larger
> porting/re-writing effort.
>
>
> Richard
>
> On 07/20/2011 09:44 AM, Igor Vaynberg wrote:
>>
>> On Wed, Jul 20, 2011 at 9:00 AM, richard emberson
>> <ri...@gmail.com>  wrote:
>>>
>>> I have many examples of such Java bloat. Consider the getKey method
>>> in the org/apache/wicket/util/value/ValueMap.java class:
>>>
>>> Java version:
>>>
>>>  public String getKey(final String key)
>>>  {
>>>    for (Object keyValue : keySet())
>>>    {
>>>      if (keyValue instanceof String)
>>>      {
>>>        String keyString = (String)keyValue;
>>>        if (key.equalsIgnoreCase(keyString))
>>>        {
>>>          return keyString;
>>>        }
>>>      }
>>>    }
>>>    return null;
>>>  }
>>>
>>> Scala version:
>>>
>>>  def getKey(key: String): Option[String] =
>>>    keySet find { s =>  key.equalsIgnoreCase(s) }
>>
>> that is a bad example. that method was there since the times valuemaps
>> supported non-string keys, thats what all the noise was about. your
>> code doesnt support non string keys, and i just cleaned it up ours so
>> it doesnt have to worry about it either. thanks for pointing it out :)
>>
>> here it is in its concise form :
>> public String getKey(String key) {
>>        for (String other : keySet()) if (other.equalsIgnoreCase(key))
>> return other;
>>        return null;
>> }
>>
>> it all depends on formatting
>>>
>>> The Scala version reads like a sentence: "For the keys find
>>> the key which equals, ignoring case, the key parameter."
>>> The Java code is just so sad in comparison.
>>
>> not in my concise version, though, is it? however, the concise version
>> is harder for some people to read, so we use very generous formatting
>> rules when it comes to spacing and curly braces.
>>
>>> I did have 2 questions buried in my previous email.
>>> Both having to do with serialization of an object when
>>> it appears as 2nd (3rd, etc.) time during the serialization
>>> process.
>>
>> serialization handles multiple references to the same instance. so if
>> you have the same instance showing up more then once in the
>> serialization graph it is only written out once. this is how circular
>> references are handled as well.
>>
>>> So, first, is it possible, likely, allowed, excluded, etc. that
>>> the same Component can appear more than once in the same
>>> Page tree? Would it make sense or even be possible for the
>>> same Form object to appear more than once in the same Page tree?
>>> Not two copies of a Form, but the single instance in two places
>>> on a Page?
>>> If it should never happen, is there code in Wicket that ensures
>>> that it does not happen?
>>
>> it is not allowed, see page#componentRendered()
>>
>>> Secondly, for a Component that is immutable in a given Page,
>>> could it appear, be reused, in the "same" Page in different
>>> Sessions (different clients)? Other areas of such Pages would
>>> be different, hold different data, but could the immutable part
>>> be same object? As an example, a read-only Label object, could
>>> it be used in the same place in the same Page "type" but in
>>> different Sessions? Is there any mechanism in Wicket currently
>>> that could identify such possible reuse?
>>
>> sharing component instances between pages is a bad idea, sharing them
>> between sessions is even worse. code is constantly refactored, what is
>> immutable now will most likely not be immutable later. i would hate
>> coding wicket if every time i made a change to someone else's
>> component i would have to check if i just made something immutable
>> mutable and possibly cause a security leak.
>>
>> -igor
>>
>>
>>> After memory comes performance and thats a much harder nut to
>>> crack. To track down bugs in the Scala port I had to put
>>> detailed logging into both the Java and Scala versions.
>>> What was most surprising was the amount a code that
>>> had to be execute, multiple times, just to render the
>>> simplest Page in a unit test - tens of pages of logging
>>> output. I do not understand all that is truly happening
>>> within Wicket to render a Page yet, but its on my todo list.
>>> And, maybe, there is no issue.
>>>
>>> Richard
>>> Thanks.
>>>
>>>
>>> On 07/20/2011 03:04 AM, Martin Grigorov wrote:
>>>>
>>>> Hi Richard,
>>>>
>>>> 1. Scala traits are something useful which I hope to have someday in
>>>> Java
>>>> too.
>>>> They can help in make some code reusable when it is not possible to
>>>> have common base class. At the end a trait is a partial base class...
>>>>
>>>> 2. I'm not sure what problem you are after with this optimization in
>>>> the serialized version of the object (its bytes).
>>>> Your quest will not improve the runtime memory consumption because the
>>>> trait's properties are mixed with the class instance properties. You
>>>> may have problems with PermGen though because Scala produces classes
>>>> for every "with Foo" (and for every Function/closure).
>>>> You are trying to improve the size (and speed?) of the produced bytes
>>>> after serialization. While this will reduce the size of the page
>>>> caches (for two of them - second (application scope) and third
>>>> (disk)). First level (http session) contains page instances (not
>>>> serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
>>>> more information.
>>>>
>>>> RAM and especially HDD are cheap today, so I think the benefit of your
>>>> optimization will not be big. As a proof I can say that there are no
>>>> complains in the mailing lists that Wicket produces too big files for
>>>> the third level cache. The general complain is that http session
>>>> footprint is bigger than action-based web frameworks but I think this
>>>> is because using custom o.a.w.Session is so comfortable that people
>>>> start putting a lot of state there. The next reason is first-level
>>>> cache but even this is easy to "solve" - just implement custom
>>>> IPageManager or override the default one to not use http session as
>>>> first level cache.
>>>>
>>>> Recently we reworked a bit the code related to page serialization and
>>>> now it is possible to use any library specialized in object
>>>> serialization (see https://github.com/eishay/jvm-serializers/wiki).
>>>> The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
>>>> be harder to use but not impossible.
>>>> The schemaless ones (Java Serialization, Kryo, XStream, ...) are
>>>> easier to use with Wicket. You may check Kryo based serializer at
>>>>
>>>>
>>>> https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
>>>> . It is faster than Java Serialization and produces less bytes.
>>>>
>>>> On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
>>>> <ri...@gmail.com>    wrote:
>>>>>
>>>>> Martin,
>>>>>
>>>>> The reason I was interested in Wicket memory usage was because
>>>>> of the potential use of Scala traits, rather than the two possible
>>>>> Java approaches, might be compelling when it comes to memory usage.
>>>>>
>>>>> First, the two Java approaches: proxy/wrapper object or bundle
>>>>> everything
>>>>> into the base class.
>>>>>
>>>>> The proxy/wrapper approach lets one have a single implementation
>>>>> that can be share by multiple classes. The down side is that
>>>>> proxy/wrapper object requires an additional reference in the
>>>>> class using it and hence additional memory usage.
>>>>>
>>>>> The bundle everything into the base class approach violates
>>>>> OOP 101 dictum about having small objects focused on their
>>>>> own particular behavior thus avoiding bloat.
>>>>> (Not executable Java/Scala code below.)
>>>>>
>>>>> interface Parent {
>>>>>  getParent
>>>>>  setParent
>>>>> }
>>>>> // Potentially shared implementation
>>>>> class ParentProxy implements Parent {
>>>>>  parent
>>>>>  getParent = parent
>>>>>  setParent(parent) = this.parent = parent
>>>>> }
>>>>>
>>>>> // Issue: Has additional instance variable: parentProxy
>>>>> class CompWithProxy with Parent {
>>>>>  parentProxy = new ParentProxy
>>>>>  getParent = parentProxy.getParent
>>>>>  setParent(parent) = parentProxy.setParent(parent)
>>>>> }
>>>>>
>>>>> // Issue: Does not share implementation
>>>>> class CompAllInOne with Parent {
>>>>>  parent
>>>>>  getParent = parent
>>>>>  setParent(parent) = this.parent = parent
>>>>> }
>>>>>
>>>>> Wicket has taken the "bundle everything into base class" in order
>>>>> to lessen memory usage - a certainly reasonable Java approach
>>>>> to the problem.
>>>>>
>>>>> With Scala one can do the following:
>>>>>
>>>>> // Shared implementation
>>>>> trait ParentTrait {
>>>>>  parent
>>>>>  getParent = parent
>>>>>  setParent(parent) = this.parent = parent
>>>>> }
>>>>>
>>>>> // Uses implementation
>>>>> class Comp with ParentTrait
>>>>>
>>>>> The implementation, ParentTrait, can be used by any
>>>>> number of classes.
>>>>> In addition, one can add to a base class any number of
>>>>> such implementation traits sharing multiple implementations
>>>>> across multiple classes.
>>>>>
>>>>> So, can using such approach result in smaller (less in-memory)
>>>>> object in Scala than in Java?
>>>>>
>>>>> The ParentTrait does not really save very much. I assume
>>>>> that its only the Page class and sub-classes that do not have
>>>>> parent components in Wicket, so the savings per Page component
>>>>> tree is very small indeed. But, there are other behaviors that
>>>>> can be converted to traits, for example, Models.
>>>>> Many of the instance variables in the Java Models which
>>>>> take memory can be converted to methods return values which only
>>>>> add to the size of the class, not to every instance of the class.
>>>>> Also, with Model traits that use Component self-types, one can
>>>>> do away with IComponentAssignedModel wrapping and such.
>>>>>
>>>>> So, how to demonstrate such memory differences. I created
>>>>> stripped down versions of the Component and Label classes in
>>>>> both Java and Scala (only ids and Models) .
>>>>> Created different Model usage scenarios
>>>>> with Model object in Java and Traits in Scala, and, finally,
>>>>> serialized (Java Serialization) the result comparing the size
>>>>> of the resulting array of bytes. There are two runs, one with
>>>>> all Strings being the empty string and the next where all
>>>>> strings are 10-character strings:
>>>>>
>>>>> The Java versions (empty string):
>>>>> Label.Empty               99
>>>>> Label.ReadOnly           196
>>>>> Label.ReadWrite          159
>>>>> Label.Resource           333
>>>>> Label.Property           223
>>>>> Label.ComponentProperty  351
>>>>> Label.CompoundProperty   208
>>>>>
>>>>> The Scala versions (empty string):
>>>>> Label.Empty              79
>>>>> Label.ReadOnly           131
>>>>> Label.ReadWrite          150
>>>>> Label.Resource           164
>>>>> Label.Property           207
>>>>> Label.ComponentProperty  134
>>>>> Label.CompoundProperty   184
>>>>>
>>>>>
>>>>> The Java versions (10-character strings):
>>>>> Label.Empty              109
>>>>> Label.ReadOnly           214
>>>>> Label.ReadWrite          177
>>>>> Label.Resource           359
>>>>> Label.Property           241
>>>>> Label.ComponentProperty  369
>>>>> Label.CompoundProperty   218
>>>>>
>>>>>
>>>>> The Scala versions (10-character strings):
>>>>> Label.Empty               89
>>>>> Label.ReadOnly           149
>>>>> Label.ReadWrite          168
>>>>> Label.Resource           190
>>>>> Label.Property           225
>>>>> Label.ComponentProperty  152
>>>>> Label.CompoundProperty   194
>>>>>
>>>>> [Note that the Java Label.Empty result is misleading since in Wicket
>>>>> there is no memory overhead when a Component has no Model.]
>>>>>
>>>>> While this does indicate that using Model traits with Scala
>>>>> will result in less memory usage than the comparable Java
>>>>> approach, Java Serialization adds a whole lot of extra stuff
>>>>> to the array of bytes that masks the true change in
>>>>> in-memory usage. With Java Serialization, the class descriptor
>>>>> for each instance serialized is also added to the byte array and,
>>>>> it is this, that takes up most of the array of bytes.
>>>>>
>>>>> Thinking about it, I realized that Java Serialization is rather
>>>>> a blunt tool when it comes to the requirement of (Scala) Wicket
>>>>> Page serialization. Java Serialization creates a byte array
>>>>> that is rather self-contained/self-descriptive in its content.
>>>>> This is not required for (Scala) Wicket which has very
>>>>> specific requirements and use-cases.
>>>>>
>>>>> But first, before I describe what I did, here are the results.
>>>>> The byte array size data for the serializer I created just to
>>>>> show that one can do a lot better than Java Serialization:
>>>>>
>>>>> The Scala versions (empty string):
>>>>> Label.Empty                6
>>>>> Label.ReadOnly             8
>>>>> Label.ReadWrite            8
>>>>> Label.Resource            10
>>>>> Label.Property            13
>>>>> Label.ComponentProperty    8
>>>>> Label.CompoundProperty    11
>>>>>
>>>>> The Scala versions (10-character strings):
>>>>> Label.Empty                8
>>>>> Label.ReadOnly            12
>>>>> Label.ReadWrite           12
>>>>> Label.Resource            16
>>>>> Label.Property            17
>>>>> Label.ComponentProperty   12
>>>>> Label.CompoundProperty    13
>>>>>
>>>>> Yes, better by more than a factor of 10. I assume factors
>>>>> of 10 are compelling.
>>>>>
>>>>> So, back to the requirements. I spent a couple of days creating
>>>>> the serializer (currently 3.8Kloc) that focused on what I thought
>>>>> would be needed by (Scala) Wicket.
>>>>> The same application using (Scala) Wicket is running on either a
>>>>> single machine or a group of machines.
>>>>> The serialized Page system can have:
>>>>>
>>>>>  In-memory repository
>>>>>    (single-machine, testing);
>>>>>  In-memory cache with local disk backstore
>>>>>    (single-machine, production, re-start) and
>>>>>  In-memory cache with database backstore used by a number of machines
>>>>>    (multi-machine, production, fail-over, session-migration, re-start)
>>>>>
>>>>>  Strings and associated id are cached/backstored where it is the id
>>>>>    that is used in the serialized array.
>>>>>  Classes and associated id are cached/backstored where it is the id
>>>>>    that is used in the serialized array.
>>>>>  Optimizations allow, for example, the Long value 1L to be serialized
>>>>>    as 1 byte or (un-optimized) as 9 bytes.
>>>>>  When using a backstore, a header is prepended to each byte array
>>>>>    that includes the serializer magic number (2 bytes), serializer
>>>>>    protocol version (2 bytes?) and application information (version,
>>>>> etc.)
>>>>>    (2 bytes?).
>>>>>
>>>>> In addition, there are two cases where one might be serializing
>>>>> the same object more than once.
>>>>>
>>>>> The first case is dealt with by most serializers, an object
>>>>> appears more than once in the tree of objects being serialized.
>>>>> Java Serialization deals with this. One must keep track of
>>>>> the identity of all objects being serialized. Then, if an object
>>>>> appears for serialization for a second (third, etc.) time, some
>>>>> sort of reference object and tag is serialized rather than the
>>>>> object. De-serialization is ....  obvious.
>>>>> I do not know, but I assume that this does not arise in Wicket; the
>>>>> same Component appearing more than once in the same Page tree of
>>>>> components. If it does happen, please let me know. If it should
>>>>> not happen but could, is there some visitor well-formness traversal
>>>>> that check for duplicate object appearances in a given tree?
>>>>>
>>>>> The second case is one that probably does (or could) occur with
>>>>> Wicket and I've never heard of a serializer dealing with, namely,
>>>>> the same object appears in more than one Page tree - knowledge
>>>>> of what is being serialized is shared across serializations.
>>>>> For this to work, the
>>>>> Component (which could be a tree of Components) has to be
>>>>> immutable like a Label with a read-only value or read-only Model
>>>>> (and the Model object is never changed), etc. Here, there can be
>>>>> a saving if the shared object is serialized in its own backstore
>>>>> and only its identifier appears in the byte arrays of each Page.
>>>>> If there was an Immutable interface which could tag immutable
>>>>> objects, it would be much easier for the serializer to identify
>>>>> them (well, not just easier, but, rather, plain old possible
>>>>> versus impossible) - just a last minute thought.
>>>>>
>>>>> I've not create a Java version of my serializer. But, since the
>>>>> Scala version does not use much Scala magic, a Java version
>>>>> would not be too hard to port to. I also have some 500 unit tests.
>>>>>
>>>>> Well, enough for now.
>>>>>
>>>>> Richard
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> About the use cases: my experience is that most of the time the uses
>>>>>> the in-memory pages (for each listener callback execution, for ajax
>>>>>> requests,...).
>>>>>> Previous version of a page, or previous page is needed when the user
>>>>>> clicks browser back button. Even in this case most of the time the
>>>>>> in-memory cache is hit. Only when the user goes several pages back and
>>>>>> this page is not in-memory then the disk store is used.
>>>>>>
>>>>>> So far so good, but...! Even in-memory store contains serialized
>>>>>> versions of the Page, named SerializedPage. This is a struct which
>>>>>> contains
>>>>>> {
>>>>>>   sessionId: String,
>>>>>>   pageId: int,
>>>>>>   data: byte[]
>>>>>> }
>>>>>> so the Page is serialized back and forth when stored in *any*
>>>>>> IPageStore/IDataStore.
>>>>>>
>>>>>> This is the current state in Wicket 1.5.
>>>>>>
>>>>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>>>>> improved to work with Page instances but we decided to postpone this
>>>>>> optimization for 1.5.0+.
>>>>>>
>>>>>> About new String("someLiteral"): I don't remember lately seeing this
>>>>>> code neither in libraries, nor in applications. This constructor
>>>>>> should be used only when the developer explicitly wants this string to
>>>>>> not be interned and stored in the PermGen space, i.e. it will be
>>>>>> stored in the heap space.
>>>>>> Your benchmark test tests exactly this - the heap space.
>>>>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>>>>> after deserialization.
>>>>>> I'm not very into Java Serialization but indeed it seems the Strings
>>>>>> are deserialized in the heap. But even in this case they go in the
>>>>>> Eden space, i.e. they are reclaimed soon after.
>>>>>>
>>>>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>>>>> <ri...@gmail.com>      wrote:
>>>>>>>
>>>>>>> I you run the little Java program I included, you will see that
>>>>>>> there is an impact - de-serialized objects take more memory.
>>>>>>>
>>>>>>> Richard
>>>>>>>
>>>>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>>>>
>>>>>>>> string literals are interned by the jvm so they should have a
>>>>>>>> minimal
>>>>>>>> memory impact.
>>>>>>>>
>>>>>>>> -igor
>>>>>>>>
>>>>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>>>>> <ri...@gmail.com>        wrote:
>>>>>>>>>
>>>>>>>>> Martin,
>>>>>>>>>
>>>>>>>>> The reason I was interested was because it struck me a couple of
>>>>>>>>> days ago that while each Page, tree of Components, is created
>>>>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>>>>> as instance variables in the tree are shared
>>>>>>>>> between all copies of the Page but that when such a Page is
>>>>>>>>> serialized to disk and then de-serialized, each String becomes its
>>>>>>>>> own
>>>>>>>>> copy unique to that particular Page. This means that if an
>>>>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>>>>> there could be a bunch of memory being used for all the String
>>>>>>>>> copies.
>>>>>>>>>
>>>>>>>>> In the attached simple Java file (yes, I still write Java when I
>>>>>>>>> must)
>>>>>>>>> there are three different ways of creating an array of
>>>>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>>>>    new Label(some_string)
>>>>>>>>>
>>>>>>>>> The first is to share the same String over all instance of the
>>>>>>>>> Label.
>>>>>>>>>    new Label(the_string)
>>>>>>>>> The second is to make a copy of the String when creating each
>>>>>>>>> Label;
>>>>>>>>>    new Label(new String(the_string))
>>>>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>>>>> the byte array for each Label.
>>>>>>>>>
>>>>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>>>>> is shared by all Labels while the second and third approach
>>>>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>>>>> de-serialized String is replaced with the original instance of the
>>>>>>>>> String, then the third approach uses only as much memory as the
>>>>>>>>> first approach.
>>>>>>>>>
>>>>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>>>>> significant number of Pages in-memory are actually reanimated
>>>>>>>>> Pages,
>>>>>>>>> then there could be a memory saving by
>>>>>>>>> making de-serialization smarter about possible shared objects.
>>>>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>>>>> usage patterns, it might be worth looking into.
>>>>>>>>>
>>>>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>>>>> application might fit the use-case.
>>>>>>>>>
>>>>>>>>> Richard
>>>>>>>>>
>>>>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>>>>
>>>>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>>>>
>>>>>>>>>> But I am no master ;)
>>>>>>>>>>
>>>>>>>>>> **
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>> This is a question for Wicket masters and those application
>>>>>>>>>>> builders
>>>>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>>>>
>>>>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>>>>> so that they have a feel for what use-cases exist in the real
>>>>>>>>>>> world.]
>>>>>>>>>>>
>>>>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>>>>> usage patterns. What I am interested in are those applications
>>>>>>>>>>> where
>>>>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>>>>> previously been serialized and stored to disk and then
>>>>>>>>>>> reanimated,
>>>>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>>>>
>>>>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>>>>> are reanimated pages.
>>>>>>>>>>>
>>>>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>>>>> given time are reanimated pages?
>>>>>>>>>>> Is it, say, a couple of percent? Two or three in which case its
>>>>>>>>>>> not
>>>>>>>>>>> very significant.
>>>>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>>>>> memory had been serialized to disk, flushed from any in-memory
>>>>>>>>>>> cache
>>>>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Richard
>>>>>>>>>>> --
>>>>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Quis custodiet ipsos custodes
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
Thanks Igor.

 > it is not allowed, see page#componentRendered()
Thanks.

 > sharing component instances between pages
I am going to have to think about all of this.
Maybe making mutable and immutable version of things
or, maybe, an Immutable trait (interface) that signals
intent (but, of course, would not enforce it).

 > that is a bad example

Maybe here's a better example (actually, a rather extreme example):

org/apache/wicket/util/upload/ParameterParser.java

   private def isOneOf(ch: Char, charray: Array[Char]): Boolean =
     charray exists { _ == ch }

   private boolean isOneOf(final char ch, final char[] charray)
   {
     boolean result = false;
     for (char character : charray)
     {
       if (ch == character)
       {
         result = true;
         break;
       }

     }
     return result;
   }

I am not trying to (re-)start any wars here.
I do not think its all due to formatting.
Currently, for 1.5-RC5.1 loc:
Java Wicket:  154556
Scala Wicket: 118617
and its not really possible to use some of the more-terse
aspects of Scala because that would require a rather larger
porting/re-writing effort.


Richard

On 07/20/2011 09:44 AM, Igor Vaynberg wrote:
> On Wed, Jul 20, 2011 at 9:00 AM, richard emberson
> <ri...@gmail.com>  wrote:
>> I have many examples of such Java bloat. Consider the getKey method
>> in the org/apache/wicket/util/value/ValueMap.java class:
>>
>> Java version:
>>
>>   public String getKey(final String key)
>>   {
>>     for (Object keyValue : keySet())
>>     {
>>       if (keyValue instanceof String)
>>       {
>>         String keyString = (String)keyValue;
>>         if (key.equalsIgnoreCase(keyString))
>>         {
>>           return keyString;
>>         }
>>       }
>>     }
>>     return null;
>>   }
>>
>> Scala version:
>>
>>   def getKey(key: String): Option[String] =
>>     keySet find { s =>  key.equalsIgnoreCase(s) }
>
> that is a bad example. that method was there since the times valuemaps
> supported non-string keys, thats what all the noise was about. your
> code doesnt support non string keys, and i just cleaned it up ours so
> it doesnt have to worry about it either. thanks for pointing it out :)
>
> here it is in its concise form :
> public String getKey(String key) {
> 	for (String other : keySet()) if (other.equalsIgnoreCase(key)) return other;
> 	return null;
> }
>
> it all depends on formatting
>>
>> The Scala version reads like a sentence: "For the keys find
>> the key which equals, ignoring case, the key parameter."
>> The Java code is just so sad in comparison.
>
> not in my concise version, though, is it? however, the concise version
> is harder for some people to read, so we use very generous formatting
> rules when it comes to spacing and curly braces.
>
>> I did have 2 questions buried in my previous email.
>> Both having to do with serialization of an object when
>> it appears as 2nd (3rd, etc.) time during the serialization
>> process.
>
> serialization handles multiple references to the same instance. so if
> you have the same instance showing up more then once in the
> serialization graph it is only written out once. this is how circular
> references are handled as well.
>
>> So, first, is it possible, likely, allowed, excluded, etc. that
>> the same Component can appear more than once in the same
>> Page tree? Would it make sense or even be possible for the
>> same Form object to appear more than once in the same Page tree?
>> Not two copies of a Form, but the single instance in two places
>> on a Page?
>> If it should never happen, is there code in Wicket that ensures
>> that it does not happen?
>
> it is not allowed, see page#componentRendered()
>
>> Secondly, for a Component that is immutable in a given Page,
>> could it appear, be reused, in the "same" Page in different
>> Sessions (different clients)? Other areas of such Pages would
>> be different, hold different data, but could the immutable part
>> be same object? As an example, a read-only Label object, could
>> it be used in the same place in the same Page "type" but in
>> different Sessions? Is there any mechanism in Wicket currently
>> that could identify such possible reuse?
>
> sharing component instances between pages is a bad idea, sharing them
> between sessions is even worse. code is constantly refactored, what is
> immutable now will most likely not be immutable later. i would hate
> coding wicket if every time i made a change to someone else's
> component i would have to check if i just made something immutable
> mutable and possibly cause a security leak.
>
> -igor
>
>
>> After memory comes performance and thats a much harder nut to
>> crack. To track down bugs in the Scala port I had to put
>> detailed logging into both the Java and Scala versions.
>> What was most surprising was the amount a code that
>> had to be execute, multiple times, just to render the
>> simplest Page in a unit test - tens of pages of logging
>> output. I do not understand all that is truly happening
>> within Wicket to render a Page yet, but its on my todo list.
>> And, maybe, there is no issue.
>>
>> Richard
>> Thanks.
>>
>>
>> On 07/20/2011 03:04 AM, Martin Grigorov wrote:
>>>
>>> Hi Richard,
>>>
>>> 1. Scala traits are something useful which I hope to have someday in Java
>>> too.
>>> They can help in make some code reusable when it is not possible to
>>> have common base class. At the end a trait is a partial base class...
>>>
>>> 2. I'm not sure what problem you are after with this optimization in
>>> the serialized version of the object (its bytes).
>>> Your quest will not improve the runtime memory consumption because the
>>> trait's properties are mixed with the class instance properties. You
>>> may have problems with PermGen though because Scala produces classes
>>> for every "with Foo" (and for every Function/closure).
>>> You are trying to improve the size (and speed?) of the produced bytes
>>> after serialization. While this will reduce the size of the page
>>> caches (for two of them - second (application scope) and third
>>> (disk)). First level (http session) contains page instances (not
>>> serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
>>> more information.
>>>
>>> RAM and especially HDD are cheap today, so I think the benefit of your
>>> optimization will not be big. As a proof I can say that there are no
>>> complains in the mailing lists that Wicket produces too big files for
>>> the third level cache. The general complain is that http session
>>> footprint is bigger than action-based web frameworks but I think this
>>> is because using custom o.a.w.Session is so comfortable that people
>>> start putting a lot of state there. The next reason is first-level
>>> cache but even this is easy to "solve" - just implement custom
>>> IPageManager or override the default one to not use http session as
>>> first level cache.
>>>
>>> Recently we reworked a bit the code related to page serialization and
>>> now it is possible to use any library specialized in object
>>> serialization (see https://github.com/eishay/jvm-serializers/wiki).
>>> The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
>>> be harder to use but not impossible.
>>> The schemaless ones (Java Serialization, Kryo, XStream, ...) are
>>> easier to use with Wicket. You may check Kryo based serializer at
>>>
>>> https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
>>> . It is faster than Java Serialization and produces less bytes.
>>>
>>> On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
>>> <ri...@gmail.com>    wrote:
>>>>
>>>> Martin,
>>>>
>>>> The reason I was interested in Wicket memory usage was because
>>>> of the potential use of Scala traits, rather than the two possible
>>>> Java approaches, might be compelling when it comes to memory usage.
>>>>
>>>> First, the two Java approaches: proxy/wrapper object or bundle everything
>>>> into the base class.
>>>>
>>>> The proxy/wrapper approach lets one have a single implementation
>>>> that can be share by multiple classes. The down side is that
>>>> proxy/wrapper object requires an additional reference in the
>>>> class using it and hence additional memory usage.
>>>>
>>>> The bundle everything into the base class approach violates
>>>> OOP 101 dictum about having small objects focused on their
>>>> own particular behavior thus avoiding bloat.
>>>> (Not executable Java/Scala code below.)
>>>>
>>>> interface Parent {
>>>>   getParent
>>>>   setParent
>>>> }
>>>> // Potentially shared implementation
>>>> class ParentProxy implements Parent {
>>>>   parent
>>>>   getParent = parent
>>>>   setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> // Issue: Has additional instance variable: parentProxy
>>>> class CompWithProxy with Parent {
>>>>   parentProxy = new ParentProxy
>>>>   getParent = parentProxy.getParent
>>>>   setParent(parent) = parentProxy.setParent(parent)
>>>> }
>>>>
>>>> // Issue: Does not share implementation
>>>> class CompAllInOne with Parent {
>>>>   parent
>>>>   getParent = parent
>>>>   setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> Wicket has taken the "bundle everything into base class" in order
>>>> to lessen memory usage - a certainly reasonable Java approach
>>>> to the problem.
>>>>
>>>> With Scala one can do the following:
>>>>
>>>> // Shared implementation
>>>> trait ParentTrait {
>>>>   parent
>>>>   getParent = parent
>>>>   setParent(parent) = this.parent = parent
>>>> }
>>>>
>>>> // Uses implementation
>>>> class Comp with ParentTrait
>>>>
>>>> The implementation, ParentTrait, can be used by any
>>>> number of classes.
>>>> In addition, one can add to a base class any number of
>>>> such implementation traits sharing multiple implementations
>>>> across multiple classes.
>>>>
>>>> So, can using such approach result in smaller (less in-memory)
>>>> object in Scala than in Java?
>>>>
>>>> The ParentTrait does not really save very much. I assume
>>>> that its only the Page class and sub-classes that do not have
>>>> parent components in Wicket, so the savings per Page component
>>>> tree is very small indeed. But, there are other behaviors that
>>>> can be converted to traits, for example, Models.
>>>> Many of the instance variables in the Java Models which
>>>> take memory can be converted to methods return values which only
>>>> add to the size of the class, not to every instance of the class.
>>>> Also, with Model traits that use Component self-types, one can
>>>> do away with IComponentAssignedModel wrapping and such.
>>>>
>>>> So, how to demonstrate such memory differences. I created
>>>> stripped down versions of the Component and Label classes in
>>>> both Java and Scala (only ids and Models) .
>>>> Created different Model usage scenarios
>>>> with Model object in Java and Traits in Scala, and, finally,
>>>> serialized (Java Serialization) the result comparing the size
>>>> of the resulting array of bytes. There are two runs, one with
>>>> all Strings being the empty string and the next where all
>>>> strings are 10-character strings:
>>>>
>>>> The Java versions (empty string):
>>>> Label.Empty               99
>>>> Label.ReadOnly           196
>>>> Label.ReadWrite          159
>>>> Label.Resource           333
>>>> Label.Property           223
>>>> Label.ComponentProperty  351
>>>> Label.CompoundProperty   208
>>>>
>>>> The Scala versions (empty string):
>>>> Label.Empty              79
>>>> Label.ReadOnly           131
>>>> Label.ReadWrite          150
>>>> Label.Resource           164
>>>> Label.Property           207
>>>> Label.ComponentProperty  134
>>>> Label.CompoundProperty   184
>>>>
>>>>
>>>> The Java versions (10-character strings):
>>>> Label.Empty              109
>>>> Label.ReadOnly           214
>>>> Label.ReadWrite          177
>>>> Label.Resource           359
>>>> Label.Property           241
>>>> Label.ComponentProperty  369
>>>> Label.CompoundProperty   218
>>>>
>>>>
>>>> The Scala versions (10-character strings):
>>>> Label.Empty               89
>>>> Label.ReadOnly           149
>>>> Label.ReadWrite          168
>>>> Label.Resource           190
>>>> Label.Property           225
>>>> Label.ComponentProperty  152
>>>> Label.CompoundProperty   194
>>>>
>>>> [Note that the Java Label.Empty result is misleading since in Wicket
>>>> there is no memory overhead when a Component has no Model.]
>>>>
>>>> While this does indicate that using Model traits with Scala
>>>> will result in less memory usage than the comparable Java
>>>> approach, Java Serialization adds a whole lot of extra stuff
>>>> to the array of bytes that masks the true change in
>>>> in-memory usage. With Java Serialization, the class descriptor
>>>> for each instance serialized is also added to the byte array and,
>>>> it is this, that takes up most of the array of bytes.
>>>>
>>>> Thinking about it, I realized that Java Serialization is rather
>>>> a blunt tool when it comes to the requirement of (Scala) Wicket
>>>> Page serialization. Java Serialization creates a byte array
>>>> that is rather self-contained/self-descriptive in its content.
>>>> This is not required for (Scala) Wicket which has very
>>>> specific requirements and use-cases.
>>>>
>>>> But first, before I describe what I did, here are the results.
>>>> The byte array size data for the serializer I created just to
>>>> show that one can do a lot better than Java Serialization:
>>>>
>>>> The Scala versions (empty string):
>>>> Label.Empty                6
>>>> Label.ReadOnly             8
>>>> Label.ReadWrite            8
>>>> Label.Resource            10
>>>> Label.Property            13
>>>> Label.ComponentProperty    8
>>>> Label.CompoundProperty    11
>>>>
>>>> The Scala versions (10-character strings):
>>>> Label.Empty                8
>>>> Label.ReadOnly            12
>>>> Label.ReadWrite           12
>>>> Label.Resource            16
>>>> Label.Property            17
>>>> Label.ComponentProperty   12
>>>> Label.CompoundProperty    13
>>>>
>>>> Yes, better by more than a factor of 10. I assume factors
>>>> of 10 are compelling.
>>>>
>>>> So, back to the requirements. I spent a couple of days creating
>>>> the serializer (currently 3.8Kloc) that focused on what I thought
>>>> would be needed by (Scala) Wicket.
>>>> The same application using (Scala) Wicket is running on either a
>>>> single machine or a group of machines.
>>>> The serialized Page system can have:
>>>>
>>>>   In-memory repository
>>>>     (single-machine, testing);
>>>>   In-memory cache with local disk backstore
>>>>     (single-machine, production, re-start) and
>>>>   In-memory cache with database backstore used by a number of machines
>>>>     (multi-machine, production, fail-over, session-migration, re-start)
>>>>
>>>>   Strings and associated id are cached/backstored where it is the id
>>>>     that is used in the serialized array.
>>>>   Classes and associated id are cached/backstored where it is the id
>>>>     that is used in the serialized array.
>>>>   Optimizations allow, for example, the Long value 1L to be serialized
>>>>     as 1 byte or (un-optimized) as 9 bytes.
>>>>   When using a backstore, a header is prepended to each byte array
>>>>     that includes the serializer magic number (2 bytes), serializer
>>>>     protocol version (2 bytes?) and application information (version,
>>>> etc.)
>>>>     (2 bytes?).
>>>>
>>>> In addition, there are two cases where one might be serializing
>>>> the same object more than once.
>>>>
>>>> The first case is dealt with by most serializers, an object
>>>> appears more than once in the tree of objects being serialized.
>>>> Java Serialization deals with this. One must keep track of
>>>> the identity of all objects being serialized. Then, if an object
>>>> appears for serialization for a second (third, etc.) time, some
>>>> sort of reference object and tag is serialized rather than the
>>>> object. De-serialization is ....  obvious.
>>>> I do not know, but I assume that this does not arise in Wicket; the
>>>> same Component appearing more than once in the same Page tree of
>>>> components. If it does happen, please let me know. If it should
>>>> not happen but could, is there some visitor well-formness traversal
>>>> that check for duplicate object appearances in a given tree?
>>>>
>>>> The second case is one that probably does (or could) occur with
>>>> Wicket and I've never heard of a serializer dealing with, namely,
>>>> the same object appears in more than one Page tree - knowledge
>>>> of what is being serialized is shared across serializations.
>>>> For this to work, the
>>>> Component (which could be a tree of Components) has to be
>>>> immutable like a Label with a read-only value or read-only Model
>>>> (and the Model object is never changed), etc. Here, there can be
>>>> a saving if the shared object is serialized in its own backstore
>>>> and only its identifier appears in the byte arrays of each Page.
>>>> If there was an Immutable interface which could tag immutable
>>>> objects, it would be much easier for the serializer to identify
>>>> them (well, not just easier, but, rather, plain old possible
>>>> versus impossible) - just a last minute thought.
>>>>
>>>> I've not create a Java version of my serializer. But, since the
>>>> Scala version does not use much Scala magic, a Java version
>>>> would not be too hard to port to. I also have some 500 unit tests.
>>>>
>>>> Well, enough for now.
>>>>
>>>> Richard
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> About the use cases: my experience is that most of the time the uses
>>>>> the in-memory pages (for each listener callback execution, for ajax
>>>>> requests,...).
>>>>> Previous version of a page, or previous page is needed when the user
>>>>> clicks browser back button. Even in this case most of the time the
>>>>> in-memory cache is hit. Only when the user goes several pages back and
>>>>> this page is not in-memory then the disk store is used.
>>>>>
>>>>> So far so good, but...! Even in-memory store contains serialized
>>>>> versions of the Page, named SerializedPage. This is a struct which
>>>>> contains
>>>>> {
>>>>>    sessionId: String,
>>>>>    pageId: int,
>>>>>    data: byte[]
>>>>> }
>>>>> so the Page is serialized back and forth when stored in *any*
>>>>> IPageStore/IDataStore.
>>>>>
>>>>> This is the current state in Wicket 1.5.
>>>>>
>>>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>>>> improved to work with Page instances but we decided to postpone this
>>>>> optimization for 1.5.0+.
>>>>>
>>>>> About new String("someLiteral"): I don't remember lately seeing this
>>>>> code neither in libraries, nor in applications. This constructor
>>>>> should be used only when the developer explicitly wants this string to
>>>>> not be interned and stored in the PermGen space, i.e. it will be
>>>>> stored in the heap space.
>>>>> Your benchmark test tests exactly this - the heap space.
>>>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>>>> after deserialization.
>>>>> I'm not very into Java Serialization but indeed it seems the Strings
>>>>> are deserialized in the heap. But even in this case they go in the
>>>>> Eden space, i.e. they are reclaimed soon after.
>>>>>
>>>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>>>> <ri...@gmail.com>      wrote:
>>>>>>
>>>>>> I you run the little Java program I included, you will see that
>>>>>> there is an impact - de-serialized objects take more memory.
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>>>
>>>>>>> string literals are interned by the jvm so they should have a minimal
>>>>>>> memory impact.
>>>>>>>
>>>>>>> -igor
>>>>>>>
>>>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>>>> <ri...@gmail.com>        wrote:
>>>>>>>>
>>>>>>>> Martin,
>>>>>>>>
>>>>>>>> The reason I was interested was because it struck me a couple of
>>>>>>>> days ago that while each Page, tree of Components, is created
>>>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>>>> as instance variables in the tree are shared
>>>>>>>> between all copies of the Page but that when such a Page is
>>>>>>>> serialized to disk and then de-serialized, each String becomes its
>>>>>>>> own
>>>>>>>> copy unique to that particular Page. This means that if an
>>>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>>>> there could be a bunch of memory being used for all the String
>>>>>>>> copies.
>>>>>>>>
>>>>>>>> In the attached simple Java file (yes, I still write Java when I
>>>>>>>> must)
>>>>>>>> there are three different ways of creating an array of
>>>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>>>     new Label(some_string)
>>>>>>>>
>>>>>>>> The first is to share the same String over all instance of the Label.
>>>>>>>>     new Label(the_string)
>>>>>>>> The second is to make a copy of the String when creating each
>>>>>>>> Label;
>>>>>>>>     new Label(new String(the_string))
>>>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>>>> the byte array for each Label.
>>>>>>>>
>>>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>>>> is shared by all Labels while the second and third approach
>>>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>>>> de-serialized String is replaced with the original instance of the
>>>>>>>> String, then the third approach uses only as much memory as the
>>>>>>>> first approach.
>>>>>>>>
>>>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>>>>> then there could be a memory saving by
>>>>>>>> making de-serialization smarter about possible shared objects.
>>>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>>>> usage patterns, it might be worth looking into.
>>>>>>>>
>>>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>>>> application might fit the use-case.
>>>>>>>>
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>>>
>>>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>>>
>>>>>>>>> But I am no master ;)
>>>>>>>>>
>>>>>>>>> **
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>> This is a question for Wicket masters and those application
>>>>>>>>>> builders
>>>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>>>
>>>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>>>> so that they have a feel for what use-cases exist in the real
>>>>>>>>>> world.]
>>>>>>>>>>
>>>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>>>> usage patterns. What I am interested in are those applications
>>>>>>>>>> where
>>>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>>>
>>>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>>>> are reanimated pages.
>>>>>>>>>>
>>>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>>>> given time are reanimated pages?
>>>>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>>>>> very significant.
>>>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>>>> memory had been serialized to disk, flushed from any in-memory
>>>>>>>>>> cache
>>>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Richard
>>>>>>>>>> --
>>>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Igor Vaynberg <ig...@gmail.com>.
On Wed, Jul 20, 2011 at 9:00 AM, richard emberson
<ri...@gmail.com> wrote:
> I have many examples of such Java bloat. Consider the getKey method
> in the org/apache/wicket/util/value/ValueMap.java class:
>
> Java version:
>
>  public String getKey(final String key)
>  {
>    for (Object keyValue : keySet())
>    {
>      if (keyValue instanceof String)
>      {
>        String keyString = (String)keyValue;
>        if (key.equalsIgnoreCase(keyString))
>        {
>          return keyString;
>        }
>      }
>    }
>    return null;
>  }
>
> Scala version:
>
>  def getKey(key: String): Option[String] =
>    keySet find { s => key.equalsIgnoreCase(s) }

that is a bad example. that method was there since the times valuemaps
supported non-string keys, thats what all the noise was about. your
code doesnt support non string keys, and i just cleaned it up ours so
it doesnt have to worry about it either. thanks for pointing it out :)

here it is in its concise form :
public String getKey(String key) {
	for (String other : keySet()) if (other.equalsIgnoreCase(key)) return other;
	return null;
}

it all depends on formatting
>
> The Scala version reads like a sentence: "For the keys find
> the key which equals, ignoring case, the key parameter."
> The Java code is just so sad in comparison.

not in my concise version, though, is it? however, the concise version
is harder for some people to read, so we use very generous formatting
rules when it comes to spacing and curly braces.

> I did have 2 questions buried in my previous email.
> Both having to do with serialization of an object when
> it appears as 2nd (3rd, etc.) time during the serialization
> process.

serialization handles multiple references to the same instance. so if
you have the same instance showing up more then once in the
serialization graph it is only written out once. this is how circular
references are handled as well.

> So, first, is it possible, likely, allowed, excluded, etc. that
> the same Component can appear more than once in the same
> Page tree? Would it make sense or even be possible for the
> same Form object to appear more than once in the same Page tree?
> Not two copies of a Form, but the single instance in two places
> on a Page?
> If it should never happen, is there code in Wicket that ensures
> that it does not happen?

it is not allowed, see page#componentRendered()

> Secondly, for a Component that is immutable in a given Page,
> could it appear, be reused, in the "same" Page in different
> Sessions (different clients)? Other areas of such Pages would
> be different, hold different data, but could the immutable part
> be same object? As an example, a read-only Label object, could
> it be used in the same place in the same Page "type" but in
> different Sessions? Is there any mechanism in Wicket currently
> that could identify such possible reuse?

sharing component instances between pages is a bad idea, sharing them
between sessions is even worse. code is constantly refactored, what is
immutable now will most likely not be immutable later. i would hate
coding wicket if every time i made a change to someone else's
component i would have to check if i just made something immutable
mutable and possibly cause a security leak.

-igor


> After memory comes performance and thats a much harder nut to
> crack. To track down bugs in the Scala port I had to put
> detailed logging into both the Java and Scala versions.
> What was most surprising was the amount a code that
> had to be execute, multiple times, just to render the
> simplest Page in a unit test - tens of pages of logging
> output. I do not understand all that is truly happening
> within Wicket to render a Page yet, but its on my todo list.
> And, maybe, there is no issue.
>
> Richard
> Thanks.
>
>
> On 07/20/2011 03:04 AM, Martin Grigorov wrote:
>>
>> Hi Richard,
>>
>> 1. Scala traits are something useful which I hope to have someday in Java
>> too.
>> They can help in make some code reusable when it is not possible to
>> have common base class. At the end a trait is a partial base class...
>>
>> 2. I'm not sure what problem you are after with this optimization in
>> the serialized version of the object (its bytes).
>> Your quest will not improve the runtime memory consumption because the
>> trait's properties are mixed with the class instance properties. You
>> may have problems with PermGen though because Scala produces classes
>> for every "with Foo" (and for every Function/closure).
>> You are trying to improve the size (and speed?) of the produced bytes
>> after serialization. While this will reduce the size of the page
>> caches (for two of them - second (application scope) and third
>> (disk)). First level (http session) contains page instances (not
>> serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
>> more information.
>>
>> RAM and especially HDD are cheap today, so I think the benefit of your
>> optimization will not be big. As a proof I can say that there are no
>> complains in the mailing lists that Wicket produces too big files for
>> the third level cache. The general complain is that http session
>> footprint is bigger than action-based web frameworks but I think this
>> is because using custom o.a.w.Session is so comfortable that people
>> start putting a lot of state there. The next reason is first-level
>> cache but even this is easy to "solve" - just implement custom
>> IPageManager or override the default one to not use http session as
>> first level cache.
>>
>> Recently we reworked a bit the code related to page serialization and
>> now it is possible to use any library specialized in object
>> serialization (see https://github.com/eishay/jvm-serializers/wiki).
>> The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
>> be harder to use but not impossible.
>> The schemaless ones (Java Serialization, Kryo, XStream, ...) are
>> easier to use with Wicket. You may check Kryo based serializer at
>>
>> https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
>> . It is faster than Java Serialization and produces less bytes.
>>
>> On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
>> <ri...@gmail.com>  wrote:
>>>
>>> Martin,
>>>
>>> The reason I was interested in Wicket memory usage was because
>>> of the potential use of Scala traits, rather than the two possible
>>> Java approaches, might be compelling when it comes to memory usage.
>>>
>>> First, the two Java approaches: proxy/wrapper object or bundle everything
>>> into the base class.
>>>
>>> The proxy/wrapper approach lets one have a single implementation
>>> that can be share by multiple classes. The down side is that
>>> proxy/wrapper object requires an additional reference in the
>>> class using it and hence additional memory usage.
>>>
>>> The bundle everything into the base class approach violates
>>> OOP 101 dictum about having small objects focused on their
>>> own particular behavior thus avoiding bloat.
>>> (Not executable Java/Scala code below.)
>>>
>>> interface Parent {
>>>  getParent
>>>  setParent
>>> }
>>> // Potentially shared implementation
>>> class ParentProxy implements Parent {
>>>  parent
>>>  getParent = parent
>>>  setParent(parent) = this.parent = parent
>>> }
>>>
>>> // Issue: Has additional instance variable: parentProxy
>>> class CompWithProxy with Parent {
>>>  parentProxy = new ParentProxy
>>>  getParent = parentProxy.getParent
>>>  setParent(parent) = parentProxy.setParent(parent)
>>> }
>>>
>>> // Issue: Does not share implementation
>>> class CompAllInOne with Parent {
>>>  parent
>>>  getParent = parent
>>>  setParent(parent) = this.parent = parent
>>> }
>>>
>>> Wicket has taken the "bundle everything into base class" in order
>>> to lessen memory usage - a certainly reasonable Java approach
>>> to the problem.
>>>
>>> With Scala one can do the following:
>>>
>>> // Shared implementation
>>> trait ParentTrait {
>>>  parent
>>>  getParent = parent
>>>  setParent(parent) = this.parent = parent
>>> }
>>>
>>> // Uses implementation
>>> class Comp with ParentTrait
>>>
>>> The implementation, ParentTrait, can be used by any
>>> number of classes.
>>> In addition, one can add to a base class any number of
>>> such implementation traits sharing multiple implementations
>>> across multiple classes.
>>>
>>> So, can using such approach result in smaller (less in-memory)
>>> object in Scala than in Java?
>>>
>>> The ParentTrait does not really save very much. I assume
>>> that its only the Page class and sub-classes that do not have
>>> parent components in Wicket, so the savings per Page component
>>> tree is very small indeed. But, there are other behaviors that
>>> can be converted to traits, for example, Models.
>>> Many of the instance variables in the Java Models which
>>> take memory can be converted to methods return values which only
>>> add to the size of the class, not to every instance of the class.
>>> Also, with Model traits that use Component self-types, one can
>>> do away with IComponentAssignedModel wrapping and such.
>>>
>>> So, how to demonstrate such memory differences. I created
>>> stripped down versions of the Component and Label classes in
>>> both Java and Scala (only ids and Models) .
>>> Created different Model usage scenarios
>>> with Model object in Java and Traits in Scala, and, finally,
>>> serialized (Java Serialization) the result comparing the size
>>> of the resulting array of bytes. There are two runs, one with
>>> all Strings being the empty string and the next where all
>>> strings are 10-character strings:
>>>
>>> The Java versions (empty string):
>>> Label.Empty               99
>>> Label.ReadOnly           196
>>> Label.ReadWrite          159
>>> Label.Resource           333
>>> Label.Property           223
>>> Label.ComponentProperty  351
>>> Label.CompoundProperty   208
>>>
>>> The Scala versions (empty string):
>>> Label.Empty              79
>>> Label.ReadOnly           131
>>> Label.ReadWrite          150
>>> Label.Resource           164
>>> Label.Property           207
>>> Label.ComponentProperty  134
>>> Label.CompoundProperty   184
>>>
>>>
>>> The Java versions (10-character strings):
>>> Label.Empty              109
>>> Label.ReadOnly           214
>>> Label.ReadWrite          177
>>> Label.Resource           359
>>> Label.Property           241
>>> Label.ComponentProperty  369
>>> Label.CompoundProperty   218
>>>
>>>
>>> The Scala versions (10-character strings):
>>> Label.Empty               89
>>> Label.ReadOnly           149
>>> Label.ReadWrite          168
>>> Label.Resource           190
>>> Label.Property           225
>>> Label.ComponentProperty  152
>>> Label.CompoundProperty   194
>>>
>>> [Note that the Java Label.Empty result is misleading since in Wicket
>>> there is no memory overhead when a Component has no Model.]
>>>
>>> While this does indicate that using Model traits with Scala
>>> will result in less memory usage than the comparable Java
>>> approach, Java Serialization adds a whole lot of extra stuff
>>> to the array of bytes that masks the true change in
>>> in-memory usage. With Java Serialization, the class descriptor
>>> for each instance serialized is also added to the byte array and,
>>> it is this, that takes up most of the array of bytes.
>>>
>>> Thinking about it, I realized that Java Serialization is rather
>>> a blunt tool when it comes to the requirement of (Scala) Wicket
>>> Page serialization. Java Serialization creates a byte array
>>> that is rather self-contained/self-descriptive in its content.
>>> This is not required for (Scala) Wicket which has very
>>> specific requirements and use-cases.
>>>
>>> But first, before I describe what I did, here are the results.
>>> The byte array size data for the serializer I created just to
>>> show that one can do a lot better than Java Serialization:
>>>
>>> The Scala versions (empty string):
>>> Label.Empty                6
>>> Label.ReadOnly             8
>>> Label.ReadWrite            8
>>> Label.Resource            10
>>> Label.Property            13
>>> Label.ComponentProperty    8
>>> Label.CompoundProperty    11
>>>
>>> The Scala versions (10-character strings):
>>> Label.Empty                8
>>> Label.ReadOnly            12
>>> Label.ReadWrite           12
>>> Label.Resource            16
>>> Label.Property            17
>>> Label.ComponentProperty   12
>>> Label.CompoundProperty    13
>>>
>>> Yes, better by more than a factor of 10. I assume factors
>>> of 10 are compelling.
>>>
>>> So, back to the requirements. I spent a couple of days creating
>>> the serializer (currently 3.8Kloc) that focused on what I thought
>>> would be needed by (Scala) Wicket.
>>> The same application using (Scala) Wicket is running on either a
>>> single machine or a group of machines.
>>> The serialized Page system can have:
>>>
>>>  In-memory repository
>>>    (single-machine, testing);
>>>  In-memory cache with local disk backstore
>>>    (single-machine, production, re-start) and
>>>  In-memory cache with database backstore used by a number of machines
>>>    (multi-machine, production, fail-over, session-migration, re-start)
>>>
>>>  Strings and associated id are cached/backstored where it is the id
>>>    that is used in the serialized array.
>>>  Classes and associated id are cached/backstored where it is the id
>>>    that is used in the serialized array.
>>>  Optimizations allow, for example, the Long value 1L to be serialized
>>>    as 1 byte or (un-optimized) as 9 bytes.
>>>  When using a backstore, a header is prepended to each byte array
>>>    that includes the serializer magic number (2 bytes), serializer
>>>    protocol version (2 bytes?) and application information (version,
>>> etc.)
>>>    (2 bytes?).
>>>
>>> In addition, there are two cases where one might be serializing
>>> the same object more than once.
>>>
>>> The first case is dealt with by most serializers, an object
>>> appears more than once in the tree of objects being serialized.
>>> Java Serialization deals with this. One must keep track of
>>> the identity of all objects being serialized. Then, if an object
>>> appears for serialization for a second (third, etc.) time, some
>>> sort of reference object and tag is serialized rather than the
>>> object. De-serialization is ....  obvious.
>>> I do not know, but I assume that this does not arise in Wicket; the
>>> same Component appearing more than once in the same Page tree of
>>> components. If it does happen, please let me know. If it should
>>> not happen but could, is there some visitor well-formness traversal
>>> that check for duplicate object appearances in a given tree?
>>>
>>> The second case is one that probably does (or could) occur with
>>> Wicket and I've never heard of a serializer dealing with, namely,
>>> the same object appears in more than one Page tree - knowledge
>>> of what is being serialized is shared across serializations.
>>> For this to work, the
>>> Component (which could be a tree of Components) has to be
>>> immutable like a Label with a read-only value or read-only Model
>>> (and the Model object is never changed), etc. Here, there can be
>>> a saving if the shared object is serialized in its own backstore
>>> and only its identifier appears in the byte arrays of each Page.
>>> If there was an Immutable interface which could tag immutable
>>> objects, it would be much easier for the serializer to identify
>>> them (well, not just easier, but, rather, plain old possible
>>> versus impossible) - just a last minute thought.
>>>
>>> I've not create a Java version of my serializer. But, since the
>>> Scala version does not use much Scala magic, a Java version
>>> would not be too hard to port to. I also have some 500 unit tests.
>>>
>>> Well, enough for now.
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>>
>>> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>>>
>>>> Hi,
>>>>
>>>> About the use cases: my experience is that most of the time the uses
>>>> the in-memory pages (for each listener callback execution, for ajax
>>>> requests,...).
>>>> Previous version of a page, or previous page is needed when the user
>>>> clicks browser back button. Even in this case most of the time the
>>>> in-memory cache is hit. Only when the user goes several pages back and
>>>> this page is not in-memory then the disk store is used.
>>>>
>>>> So far so good, but...! Even in-memory store contains serialized
>>>> versions of the Page, named SerializedPage. This is a struct which
>>>> contains
>>>> {
>>>>   sessionId: String,
>>>>   pageId: int,
>>>>   data: byte[]
>>>> }
>>>> so the Page is serialized back and forth when stored in *any*
>>>> IPageStore/IDataStore.
>>>>
>>>> This is the current state in Wicket 1.5.
>>>>
>>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>>> improved to work with Page instances but we decided to postpone this
>>>> optimization for 1.5.0+.
>>>>
>>>> About new String("someLiteral"): I don't remember lately seeing this
>>>> code neither in libraries, nor in applications. This constructor
>>>> should be used only when the developer explicitly wants this string to
>>>> not be interned and stored in the PermGen space, i.e. it will be
>>>> stored in the heap space.
>>>> Your benchmark test tests exactly this - the heap space.
>>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>>> after deserialization.
>>>> I'm not very into Java Serialization but indeed it seems the Strings
>>>> are deserialized in the heap. But even in this case they go in the
>>>> Eden space, i.e. they are reclaimed soon after.
>>>>
>>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>>> <ri...@gmail.com>    wrote:
>>>>>
>>>>> I you run the little Java program I included, you will see that
>>>>> there is an impact - de-serialized objects take more memory.
>>>>>
>>>>> Richard
>>>>>
>>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>>
>>>>>> string literals are interned by the jvm so they should have a minimal
>>>>>> memory impact.
>>>>>>
>>>>>> -igor
>>>>>>
>>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>>> <ri...@gmail.com>      wrote:
>>>>>>>
>>>>>>> Martin,
>>>>>>>
>>>>>>> The reason I was interested was because it struck me a couple of
>>>>>>> days ago that while each Page, tree of Components, is created
>>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>>> as instance variables in the tree are shared
>>>>>>> between all copies of the Page but that when such a Page is
>>>>>>> serialized to disk and then de-serialized, each String becomes its
>>>>>>> own
>>>>>>> copy unique to that particular Page. This means that if an
>>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>>> there could be a bunch of memory being used for all the String
>>>>>>> copies.
>>>>>>>
>>>>>>> In the attached simple Java file (yes, I still write Java when I
>>>>>>> must)
>>>>>>> there are three different ways of creating an array of
>>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>>    new Label(some_string)
>>>>>>>
>>>>>>> The first is to share the same String over all instance of the Label.
>>>>>>>    new Label(the_string)
>>>>>>> The second is to make a copy of the String when creating each
>>>>>>> Label;
>>>>>>>    new Label(new String(the_string))
>>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>>> the byte array for each Label.
>>>>>>>
>>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>>> is shared by all Labels while the second and third approach
>>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>>> de-serialized String is replaced with the original instance of the
>>>>>>> String, then the third approach uses only as much memory as the
>>>>>>> first approach.
>>>>>>>
>>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>>>> then there could be a memory saving by
>>>>>>> making de-serialization smarter about possible shared objects.
>>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>>> usage patterns, it might be worth looking into.
>>>>>>>
>>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>>> application might fit the use-case.
>>>>>>>
>>>>>>> Richard
>>>>>>>
>>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>>
>>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>>
>>>>>>>> But I am no master ;)
>>>>>>>>
>>>>>>>> **
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>>
>>>>>>>>> This is a question for Wicket masters and those application
>>>>>>>>> builders
>>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>>
>>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>>> so that they have a feel for what use-cases exist in the real
>>>>>>>>> world.]
>>>>>>>>>
>>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>>> usage patterns. What I am interested in are those applications
>>>>>>>>> where
>>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>>
>>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>>> are reanimated pages.
>>>>>>>>>
>>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>>> given time are reanimated pages?
>>>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>>>> very significant.
>>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>>> memory had been serialized to disk, flushed from any in-memory
>>>>>>>>> cache
>>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Richard
>>>>>>>>> --
>>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Quis custodiet ipsos custodes
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
Martin,

I understand that some on the Wicket mailing list do not
believe that memory usage should be a big concern while others
are very concerned about it. One simply has to look at the
data storage code in the Component class and its complexity
to see a reflection of that concern.

For me, memory is memory and if one can save memory, support,
say, 15 thousand client per server rather than 10 thousand,
then, as a library builder, that is something to do.
But, again, some will say to just buy more RAM ... but no matter
how much RAM one buys, the framework that uses less memory
per client will still use less memory per client.

Maybe the Java community will back-port some of the capabilities
found in Scala into Java.
[Yea, most such 'advanced' Scala features pre-date either Scala or
Java but, in Scala, they are a part of the language's "feel".]
IMO, why bother about Java. But, again, having
written so much Scala code now, going back to Java is, well,
just painful; so much template/boiler-plate code is required by
Java.
I have many examples of such Java bloat. Consider the getKey method
in the org/apache/wicket/util/value/ValueMap.java class:

Java version:

   public String getKey(final String key)
   {
     for (Object keyValue : keySet())
     {
       if (keyValue instanceof String)
       {
         String keyString = (String)keyValue;
         if (key.equalsIgnoreCase(keyString))
         {
           return keyString;
         }
       }
     }
     return null;
   }

Scala version:

   def getKey(key: String): Option[String] =
     keySet find { s => key.equalsIgnoreCase(s) }


The Scala version reads like a sentence: "For the keys find
the key which equals, ignoring case, the key parameter."
The Java code is just so sad in comparison.

At any rate, I am looking into Component memory usage and how,
in particular, Scala traits can help.
[Certainly, Java 8, 9, maybe 10 might add traits with a new key word,
but why wait, why bother.]
I am more than willing to pay a memory price on a per-class
basis rather than on a per-instance basis; so, make the PermGen
bigger - really, not a problem, with thousands of clients each
with multiple component tree, traits is a clear win.

While trying to estimate Scala trait usage per-component memory
saving, I looked into Wicket's Page serialization. I believed that
the new page management code would allow one to plugin a different
serializer, hence I wrote what I think is a far faster/compact
serializer which is targeted to Scala Wicket - but, its not been tested
(other than low-level unit tests) yet, so, who knows.



I did have 2 questions buried in my previous email.
Both having to do with serialization of an object when
it appears as 2nd (3rd, etc.) time during the serialization
process.

So, first, is it possible, likely, allowed, excluded, etc. that
the same Component can appear more than once in the same
Page tree? Would it make sense or even be possible for the
same Form object to appear more than once in the same Page tree?
Not two copies of a Form, but the single instance in two places
on a Page?
If it should never happen, is there code in Wicket that ensures
that it does not happen?

Secondly, for a Component that is immutable in a given Page,
could it appear, be reused, in the "same" Page in different
Sessions (different clients)? Other areas of such Pages would
be different, hold different data, but could the immutable part
be same object? As an example, a read-only Label object, could
it be used in the same place in the same Page "type" but in
different Sessions? Is there any mechanism in Wicket currently
that could identify such possible reuse?



After memory comes performance and thats a much harder nut to
crack. To track down bugs in the Scala port I had to put
detailed logging into both the Java and Scala versions.
What was most surprising was the amount a code that
had to be execute, multiple times, just to render the
simplest Page in a unit test - tens of pages of logging
output. I do not understand all that is truly happening
within Wicket to render a Page yet, but its on my todo list.
And, maybe, there is no issue.

Richard
Thanks.


On 07/20/2011 03:04 AM, Martin Grigorov wrote:
> Hi Richard,
>
> 1. Scala traits are something useful which I hope to have someday in Java too.
> They can help in make some code reusable when it is not possible to
> have common base class. At the end a trait is a partial base class...
>
> 2. I'm not sure what problem you are after with this optimization in
> the serialized version of the object (its bytes).
> Your quest will not improve the runtime memory consumption because the
> trait's properties are mixed with the class instance properties. You
> may have problems with PermGen though because Scala produces classes
> for every "with Foo" (and for every Function/closure).
> You are trying to improve the size (and speed?) of the produced bytes
> after serialization. While this will reduce the size of the page
> caches (for two of them - second (application scope) and third
> (disk)). First level (http session) contains page instances (not
> serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
> more information.
>
> RAM and especially HDD are cheap today, so I think the benefit of your
> optimization will not be big. As a proof I can say that there are no
> complains in the mailing lists that Wicket produces too big files for
> the third level cache. The general complain is that http session
> footprint is bigger than action-based web frameworks but I think this
> is because using custom o.a.w.Session is so comfortable that people
> start putting a lot of state there. The next reason is first-level
> cache but even this is easy to "solve" - just implement custom
> IPageManager or override the default one to not use http session as
> first level cache.
>
> Recently we reworked a bit the code related to page serialization and
> now it is possible to use any library specialized in object
> serialization (see https://github.com/eishay/jvm-serializers/wiki).
> The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
> be harder to use but not impossible.
> The schemaless ones (Java Serialization, Kryo, XStream, ...) are
> easier to use with Wicket. You may check Kryo based serializer at
> https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
> . It is faster than Java Serialization and produces less bytes.
>
> On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
> <ri...@gmail.com>  wrote:
>> Martin,
>>
>> The reason I was interested in Wicket memory usage was because
>> of the potential use of Scala traits, rather than the two possible
>> Java approaches, might be compelling when it comes to memory usage.
>>
>> First, the two Java approaches: proxy/wrapper object or bundle everything
>> into the base class.
>>
>> The proxy/wrapper approach lets one have a single implementation
>> that can be share by multiple classes. The down side is that
>> proxy/wrapper object requires an additional reference in the
>> class using it and hence additional memory usage.
>>
>> The bundle everything into the base class approach violates
>> OOP 101 dictum about having small objects focused on their
>> own particular behavior thus avoiding bloat.
>> (Not executable Java/Scala code below.)
>>
>> interface Parent {
>>   getParent
>>   setParent
>> }
>> // Potentially shared implementation
>> class ParentProxy implements Parent {
>>   parent
>>   getParent = parent
>>   setParent(parent) = this.parent = parent
>> }
>>
>> // Issue: Has additional instance variable: parentProxy
>> class CompWithProxy with Parent {
>>   parentProxy = new ParentProxy
>>   getParent = parentProxy.getParent
>>   setParent(parent) = parentProxy.setParent(parent)
>> }
>>
>> // Issue: Does not share implementation
>> class CompAllInOne with Parent {
>>   parent
>>   getParent = parent
>>   setParent(parent) = this.parent = parent
>> }
>>
>> Wicket has taken the "bundle everything into base class" in order
>> to lessen memory usage - a certainly reasonable Java approach
>> to the problem.
>>
>> With Scala one can do the following:
>>
>> // Shared implementation
>> trait ParentTrait {
>>   parent
>>   getParent = parent
>>   setParent(parent) = this.parent = parent
>> }
>>
>> // Uses implementation
>> class Comp with ParentTrait
>>
>> The implementation, ParentTrait, can be used by any
>> number of classes.
>> In addition, one can add to a base class any number of
>> such implementation traits sharing multiple implementations
>> across multiple classes.
>>
>> So, can using such approach result in smaller (less in-memory)
>> object in Scala than in Java?
>>
>> The ParentTrait does not really save very much. I assume
>> that its only the Page class and sub-classes that do not have
>> parent components in Wicket, so the savings per Page component
>> tree is very small indeed. But, there are other behaviors that
>> can be converted to traits, for example, Models.
>> Many of the instance variables in the Java Models which
>> take memory can be converted to methods return values which only
>> add to the size of the class, not to every instance of the class.
>> Also, with Model traits that use Component self-types, one can
>> do away with IComponentAssignedModel wrapping and such.
>>
>> So, how to demonstrate such memory differences. I created
>> stripped down versions of the Component and Label classes in
>> both Java and Scala (only ids and Models) .
>> Created different Model usage scenarios
>> with Model object in Java and Traits in Scala, and, finally,
>> serialized (Java Serialization) the result comparing the size
>> of the resulting array of bytes. There are two runs, one with
>> all Strings being the empty string and the next where all
>> strings are 10-character strings:
>>
>> The Java versions (empty string):
>> Label.Empty               99
>> Label.ReadOnly           196
>> Label.ReadWrite          159
>> Label.Resource           333
>> Label.Property           223
>> Label.ComponentProperty  351
>> Label.CompoundProperty   208
>>
>> The Scala versions (empty string):
>> Label.Empty              79
>> Label.ReadOnly           131
>> Label.ReadWrite          150
>> Label.Resource           164
>> Label.Property           207
>> Label.ComponentProperty  134
>> Label.CompoundProperty   184
>>
>>
>> The Java versions (10-character strings):
>> Label.Empty              109
>> Label.ReadOnly           214
>> Label.ReadWrite          177
>> Label.Resource           359
>> Label.Property           241
>> Label.ComponentProperty  369
>> Label.CompoundProperty   218
>>
>>
>> The Scala versions (10-character strings):
>> Label.Empty               89
>> Label.ReadOnly           149
>> Label.ReadWrite          168
>> Label.Resource           190
>> Label.Property           225
>> Label.ComponentProperty  152
>> Label.CompoundProperty   194
>>
>> [Note that the Java Label.Empty result is misleading since in Wicket
>> there is no memory overhead when a Component has no Model.]
>>
>> While this does indicate that using Model traits with Scala
>> will result in less memory usage than the comparable Java
>> approach, Java Serialization adds a whole lot of extra stuff
>> to the array of bytes that masks the true change in
>> in-memory usage. With Java Serialization, the class descriptor
>> for each instance serialized is also added to the byte array and,
>> it is this, that takes up most of the array of bytes.
>>
>> Thinking about it, I realized that Java Serialization is rather
>> a blunt tool when it comes to the requirement of (Scala) Wicket
>> Page serialization. Java Serialization creates a byte array
>> that is rather self-contained/self-descriptive in its content.
>> This is not required for (Scala) Wicket which has very
>> specific requirements and use-cases.
>>
>> But first, before I describe what I did, here are the results.
>> The byte array size data for the serializer I created just to
>> show that one can do a lot better than Java Serialization:
>>
>> The Scala versions (empty string):
>> Label.Empty                6
>> Label.ReadOnly             8
>> Label.ReadWrite            8
>> Label.Resource            10
>> Label.Property            13
>> Label.ComponentProperty    8
>> Label.CompoundProperty    11
>>
>> The Scala versions (10-character strings):
>> Label.Empty                8
>> Label.ReadOnly            12
>> Label.ReadWrite           12
>> Label.Resource            16
>> Label.Property            17
>> Label.ComponentProperty   12
>> Label.CompoundProperty    13
>>
>> Yes, better by more than a factor of 10. I assume factors
>> of 10 are compelling.
>>
>> So, back to the requirements. I spent a couple of days creating
>> the serializer (currently 3.8Kloc) that focused on what I thought
>> would be needed by (Scala) Wicket.
>> The same application using (Scala) Wicket is running on either a
>> single machine or a group of machines.
>> The serialized Page system can have:
>>
>>   In-memory repository
>>     (single-machine, testing);
>>   In-memory cache with local disk backstore
>>     (single-machine, production, re-start) and
>>   In-memory cache with database backstore used by a number of machines
>>     (multi-machine, production, fail-over, session-migration, re-start)
>>
>>   Strings and associated id are cached/backstored where it is the id
>>     that is used in the serialized array.
>>   Classes and associated id are cached/backstored where it is the id
>>     that is used in the serialized array.
>>   Optimizations allow, for example, the Long value 1L to be serialized
>>     as 1 byte or (un-optimized) as 9 bytes.
>>   When using a backstore, a header is prepended to each byte array
>>     that includes the serializer magic number (2 bytes), serializer
>>     protocol version (2 bytes?) and application information (version, etc.)
>>     (2 bytes?).
>>
>> In addition, there are two cases where one might be serializing
>> the same object more than once.
>>
>> The first case is dealt with by most serializers, an object
>> appears more than once in the tree of objects being serialized.
>> Java Serialization deals with this. One must keep track of
>> the identity of all objects being serialized. Then, if an object
>> appears for serialization for a second (third, etc.) time, some
>> sort of reference object and tag is serialized rather than the
>> object. De-serialization is ....  obvious.
>> I do not know, but I assume that this does not arise in Wicket; the
>> same Component appearing more than once in the same Page tree of
>> components. If it does happen, please let me know. If it should
>> not happen but could, is there some visitor well-formness traversal
>> that check for duplicate object appearances in a given tree?
>>
>> The second case is one that probably does (or could) occur with
>> Wicket and I've never heard of a serializer dealing with, namely,
>> the same object appears in more than one Page tree - knowledge
>> of what is being serialized is shared across serializations.
>> For this to work, the
>> Component (which could be a tree of Components) has to be
>> immutable like a Label with a read-only value or read-only Model
>> (and the Model object is never changed), etc. Here, there can be
>> a saving if the shared object is serialized in its own backstore
>> and only its identifier appears in the byte arrays of each Page.
>> If there was an Immutable interface which could tag immutable
>> objects, it would be much easier for the serializer to identify
>> them (well, not just easier, but, rather, plain old possible
>> versus impossible) - just a last minute thought.
>>
>> I've not create a Java version of my serializer. But, since the
>> Scala version does not use much Scala magic, a Java version
>> would not be too hard to port to. I also have some 500 unit tests.
>>
>> Well, enough for now.
>>
>> Richard
>>
>>
>>
>>
>>
>> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>>
>>> Hi,
>>>
>>> About the use cases: my experience is that most of the time the uses
>>> the in-memory pages (for each listener callback execution, for ajax
>>> requests,...).
>>> Previous version of a page, or previous page is needed when the user
>>> clicks browser back button. Even in this case most of the time the
>>> in-memory cache is hit. Only when the user goes several pages back and
>>> this page is not in-memory then the disk store is used.
>>>
>>> So far so good, but...! Even in-memory store contains serialized
>>> versions of the Page, named SerializedPage. This is a struct which
>>> contains
>>> {
>>>    sessionId: String,
>>>    pageId: int,
>>>    data: byte[]
>>> }
>>> so the Page is serialized back and forth when stored in *any*
>>> IPageStore/IDataStore.
>>>
>>> This is the current state in Wicket 1.5.
>>>
>>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>>> improved to work with Page instances but we decided to postpone this
>>> optimization for 1.5.0+.
>>>
>>> About new String("someLiteral"): I don't remember lately seeing this
>>> code neither in libraries, nor in applications. This constructor
>>> should be used only when the developer explicitly wants this string to
>>> not be interned and stored in the PermGen space, i.e. it will be
>>> stored in the heap space.
>>> Your benchmark test tests exactly this - the heap space.
>>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>>> after deserialization.
>>> I'm not very into Java Serialization but indeed it seems the Strings
>>> are deserialized in the heap. But even in this case they go in the
>>> Eden space, i.e. they are reclaimed soon after.
>>>
>>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>>> <ri...@gmail.com>    wrote:
>>>>
>>>> I you run the little Java program I included, you will see that
>>>> there is an impact - de-serialized objects take more memory.
>>>>
>>>> Richard
>>>>
>>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>>
>>>>> string literals are interned by the jvm so they should have a minimal
>>>>> memory impact.
>>>>>
>>>>> -igor
>>>>>
>>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>>> <ri...@gmail.com>      wrote:
>>>>>>
>>>>>> Martin,
>>>>>>
>>>>>> The reason I was interested was because it struck me a couple of
>>>>>> days ago that while each Page, tree of Components, is created
>>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>>> as instance variables in the tree are shared
>>>>>> between all copies of the Page but that when such a Page is
>>>>>> serialized to disk and then de-serialized, each String becomes its own
>>>>>> copy unique to that particular Page. This means that if an
>>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>>> there could be a bunch of memory being used for all the String
>>>>>> copies.
>>>>>>
>>>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>>>> there are three different ways of creating an array of
>>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>>     new Label(some_string)
>>>>>>
>>>>>> The first is to share the same String over all instance of the Label.
>>>>>>     new Label(the_string)
>>>>>> The second is to make a copy of the String when creating each
>>>>>> Label;
>>>>>>     new Label(new String(the_string))
>>>>>> The third is to create a single Label, serialize it to an array of
>>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>>> the byte array for each Label.
>>>>>>
>>>>>> Needless to say, the first uses the least memory; the label string
>>>>>> is shared by all Labels while the second and third approach
>>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>>> de-serialized String is replaced with the original instance of the
>>>>>> String, then the third approach uses only as much memory as the
>>>>>> first approach.
>>>>>>
>>>>>> No rocket science here, but it does seem to imply that if a
>>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>>> then there could be a memory saving by
>>>>>> making de-serialization smarter about possible shared objects.
>>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>>> usage patterns, it might be worth looking into.
>>>>>>
>>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>>> application might fit the use-case.
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>>
>>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>>
>>>>>>> But I am no master ;)
>>>>>>>
>>>>>>> **
>>>>>>> Martin
>>>>>>>
>>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>>
>>>>>>>> This is a question for Wicket masters and those application builders
>>>>>>>> whose application match the criteria as specified below.
>>>>>>>>
>>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>>>>
>>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>>> usage patterns. What I am interested in are those applications where
>>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>>
>>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>>> applications where a significant number of pages in-memory
>>>>>>>> are reanimated pages.
>>>>>>>>
>>>>>>>> For such applications, what percentage of all pages at any
>>>>>>>> given time are reanimated pages?
>>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>>> very significant.
>>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Richard
>>>>>>>> --
>>>>>>>> Quis custodiet ipsos custodes
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
>
>

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Martin Grigorov <mg...@apache.org>.
Hi Richard,

1. Scala traits are something useful which I hope to have someday in Java too.
They can help in make some code reusable when it is not possible to
have common base class. At the end a trait is a partial base class...

2. I'm not sure what problem you are after with this optimization in
the serialized version of the object (its bytes).
Your quest will not improve the runtime memory consumption because the
trait's properties are mixed with the class instance properties. You
may have problems with PermGen though because Scala produces classes
for every "with Foo" (and for every Function/closure).
You are trying to improve the size (and speed?) of the produced bytes
after serialization. While this will reduce the size of the page
caches (for two of them - second (application scope) and third
(disk)). First level (http session) contains page instances (not
serialized). Check https://cwiki.apache.org/confluence/x/qIaoAQ for
more information.

RAM and especially HDD are cheap today, so I think the benefit of your
optimization will not be big. As a proof I can say that there are no
complains in the mailing lists that Wicket produces too big files for
the third level cache. The general complain is that http session
footprint is bigger than action-based web frameworks but I think this
is because using custom o.a.w.Session is so comfortable that people
start putting a lot of state there. The next reason is first-level
cache but even this is easy to "solve" - just implement custom
IPageManager or override the default one to not use http session as
first level cache.

Recently we reworked a bit the code related to page serialization and
now it is possible to use any library specialized in object
serialization (see https://github.com/eishay/jvm-serializers/wiki).
The schema based ones (like Apache Avro, Thrift, Protobuf, ...) will
be harder to use but not impossible.
The schemaless ones (Java Serialization, Kryo, XStream, ...) are
easier to use with Wicket. You may check Kryo based serializer at
https://github.com/wicketstuff/core/tree/master/jdk-1.6-parent/serializer-kryo
. It is faster than Java Serialization and produces less bytes.

On Wed, Jul 20, 2011 at 2:43 AM, richard emberson
<ri...@gmail.com> wrote:
> Martin,
>
> The reason I was interested in Wicket memory usage was because
> of the potential use of Scala traits, rather than the two possible
> Java approaches, might be compelling when it comes to memory usage.
>
> First, the two Java approaches: proxy/wrapper object or bundle everything
> into the base class.
>
> The proxy/wrapper approach lets one have a single implementation
> that can be share by multiple classes. The down side is that
> proxy/wrapper object requires an additional reference in the
> class using it and hence additional memory usage.
>
> The bundle everything into the base class approach violates
> OOP 101 dictum about having small objects focused on their
> own particular behavior thus avoiding bloat.
> (Not executable Java/Scala code below.)
>
> interface Parent {
>  getParent
>  setParent
> }
> // Potentially shared implementation
> class ParentProxy implements Parent {
>  parent
>  getParent = parent
>  setParent(parent) = this.parent = parent
> }
>
> // Issue: Has additional instance variable: parentProxy
> class CompWithProxy with Parent {
>  parentProxy = new ParentProxy
>  getParent = parentProxy.getParent
>  setParent(parent) = parentProxy.setParent(parent)
> }
>
> // Issue: Does not share implementation
> class CompAllInOne with Parent {
>  parent
>  getParent = parent
>  setParent(parent) = this.parent = parent
> }
>
> Wicket has taken the "bundle everything into base class" in order
> to lessen memory usage - a certainly reasonable Java approach
> to the problem.
>
> With Scala one can do the following:
>
> // Shared implementation
> trait ParentTrait {
>  parent
>  getParent = parent
>  setParent(parent) = this.parent = parent
> }
>
> // Uses implementation
> class Comp with ParentTrait
>
> The implementation, ParentTrait, can be used by any
> number of classes.
> In addition, one can add to a base class any number of
> such implementation traits sharing multiple implementations
> across multiple classes.
>
> So, can using such approach result in smaller (less in-memory)
> object in Scala than in Java?
>
> The ParentTrait does not really save very much. I assume
> that its only the Page class and sub-classes that do not have
> parent components in Wicket, so the savings per Page component
> tree is very small indeed. But, there are other behaviors that
> can be converted to traits, for example, Models.
> Many of the instance variables in the Java Models which
> take memory can be converted to methods return values which only
> add to the size of the class, not to every instance of the class.
> Also, with Model traits that use Component self-types, one can
> do away with IComponentAssignedModel wrapping and such.
>
> So, how to demonstrate such memory differences. I created
> stripped down versions of the Component and Label classes in
> both Java and Scala (only ids and Models) .
> Created different Model usage scenarios
> with Model object in Java and Traits in Scala, and, finally,
> serialized (Java Serialization) the result comparing the size
> of the resulting array of bytes. There are two runs, one with
> all Strings being the empty string and the next where all
> strings are 10-character strings:
>
> The Java versions (empty string):
> Label.Empty               99
> Label.ReadOnly           196
> Label.ReadWrite          159
> Label.Resource           333
> Label.Property           223
> Label.ComponentProperty  351
> Label.CompoundProperty   208
>
> The Scala versions (empty string):
> Label.Empty              79
> Label.ReadOnly           131
> Label.ReadWrite          150
> Label.Resource           164
> Label.Property           207
> Label.ComponentProperty  134
> Label.CompoundProperty   184
>
>
> The Java versions (10-character strings):
> Label.Empty              109
> Label.ReadOnly           214
> Label.ReadWrite          177
> Label.Resource           359
> Label.Property           241
> Label.ComponentProperty  369
> Label.CompoundProperty   218
>
>
> The Scala versions (10-character strings):
> Label.Empty               89
> Label.ReadOnly           149
> Label.ReadWrite          168
> Label.Resource           190
> Label.Property           225
> Label.ComponentProperty  152
> Label.CompoundProperty   194
>
> [Note that the Java Label.Empty result is misleading since in Wicket
> there is no memory overhead when a Component has no Model.]
>
> While this does indicate that using Model traits with Scala
> will result in less memory usage than the comparable Java
> approach, Java Serialization adds a whole lot of extra stuff
> to the array of bytes that masks the true change in
> in-memory usage. With Java Serialization, the class descriptor
> for each instance serialized is also added to the byte array and,
> it is this, that takes up most of the array of bytes.
>
> Thinking about it, I realized that Java Serialization is rather
> a blunt tool when it comes to the requirement of (Scala) Wicket
> Page serialization. Java Serialization creates a byte array
> that is rather self-contained/self-descriptive in its content.
> This is not required for (Scala) Wicket which has very
> specific requirements and use-cases.
>
> But first, before I describe what I did, here are the results.
> The byte array size data for the serializer I created just to
> show that one can do a lot better than Java Serialization:
>
> The Scala versions (empty string):
> Label.Empty                6
> Label.ReadOnly             8
> Label.ReadWrite            8
> Label.Resource            10
> Label.Property            13
> Label.ComponentProperty    8
> Label.CompoundProperty    11
>
> The Scala versions (10-character strings):
> Label.Empty                8
> Label.ReadOnly            12
> Label.ReadWrite           12
> Label.Resource            16
> Label.Property            17
> Label.ComponentProperty   12
> Label.CompoundProperty    13
>
> Yes, better by more than a factor of 10. I assume factors
> of 10 are compelling.
>
> So, back to the requirements. I spent a couple of days creating
> the serializer (currently 3.8Kloc) that focused on what I thought
> would be needed by (Scala) Wicket.
> The same application using (Scala) Wicket is running on either a
> single machine or a group of machines.
> The serialized Page system can have:
>
>  In-memory repository
>    (single-machine, testing);
>  In-memory cache with local disk backstore
>    (single-machine, production, re-start) and
>  In-memory cache with database backstore used by a number of machines
>    (multi-machine, production, fail-over, session-migration, re-start)
>
>  Strings and associated id are cached/backstored where it is the id
>    that is used in the serialized array.
>  Classes and associated id are cached/backstored where it is the id
>    that is used in the serialized array.
>  Optimizations allow, for example, the Long value 1L to be serialized
>    as 1 byte or (un-optimized) as 9 bytes.
>  When using a backstore, a header is prepended to each byte array
>    that includes the serializer magic number (2 bytes), serializer
>    protocol version (2 bytes?) and application information (version, etc.)
>    (2 bytes?).
>
> In addition, there are two cases where one might be serializing
> the same object more than once.
>
> The first case is dealt with by most serializers, an object
> appears more than once in the tree of objects being serialized.
> Java Serialization deals with this. One must keep track of
> the identity of all objects being serialized. Then, if an object
> appears for serialization for a second (third, etc.) time, some
> sort of reference object and tag is serialized rather than the
> object. De-serialization is ....  obvious.
> I do not know, but I assume that this does not arise in Wicket; the
> same Component appearing more than once in the same Page tree of
> components. If it does happen, please let me know. If it should
> not happen but could, is there some visitor well-formness traversal
> that check for duplicate object appearances in a given tree?
>
> The second case is one that probably does (or could) occur with
> Wicket and I've never heard of a serializer dealing with, namely,
> the same object appears in more than one Page tree - knowledge
> of what is being serialized is shared across serializations.
> For this to work, the
> Component (which could be a tree of Components) has to be
> immutable like a Label with a read-only value or read-only Model
> (and the Model object is never changed), etc. Here, there can be
> a saving if the shared object is serialized in its own backstore
> and only its identifier appears in the byte arrays of each Page.
> If there was an Immutable interface which could tag immutable
> objects, it would be much easier for the serializer to identify
> them (well, not just easier, but, rather, plain old possible
> versus impossible) - just a last minute thought.
>
> I've not create a Java version of my serializer. But, since the
> Scala version does not use much Scala magic, a Java version
> would not be too hard to port to. I also have some 500 unit tests.
>
> Well, enough for now.
>
> Richard
>
>
>
>
>
> On 07/10/2011 02:37 AM, Martin Grigorov wrote:
>>
>> Hi,
>>
>> About the use cases: my experience is that most of the time the uses
>> the in-memory pages (for each listener callback execution, for ajax
>> requests,...).
>> Previous version of a page, or previous page is needed when the user
>> clicks browser back button. Even in this case most of the time the
>> in-memory cache is hit. Only when the user goes several pages back and
>> this page is not in-memory then the disk store is used.
>>
>> So far so good, but...! Even in-memory store contains serialized
>> versions of the Page, named SerializedPage. This is a struct which
>> contains
>> {
>>   sessionId: String,
>>   pageId: int,
>>   data: byte[]
>> }
>> so the Page is serialized back and forth when stored in *any*
>> IPageStore/IDataStore.
>>
>> This is the current state in Wicket 1.5.
>>
>> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
>> improved to work with Page instances but we decided to postpone this
>> optimization for 1.5.0+.
>>
>> About new String("someLiteral"): I don't remember lately seeing this
>> code neither in libraries, nor in applications. This constructor
>> should be used only when the developer explicitly wants this string to
>> not be interned and stored in the PermGen space, i.e. it will be
>> stored in the heap space.
>> Your benchmark test tests exactly this - the heap space.
>> I'll try the app with MemoryMXBean to see whether the non-heap changes
>> after deserialization.
>> I'm not very into Java Serialization but indeed it seems the Strings
>> are deserialized in the heap. But even in this case they go in the
>> Eden space, i.e. they are reclaimed soon after.
>>
>> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
>> <ri...@gmail.com>  wrote:
>>>
>>> I you run the little Java program I included, you will see that
>>> there is an impact - de-serialized objects take more memory.
>>>
>>> Richard
>>>
>>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>>
>>>> string literals are interned by the jvm so they should have a minimal
>>>> memory impact.
>>>>
>>>> -igor
>>>>
>>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>>> <ri...@gmail.com>    wrote:
>>>>>
>>>>> Martin,
>>>>>
>>>>> The reason I was interested was because it struck me a couple of
>>>>> days ago that while each Page, tree of Components, is created
>>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>>> as instance variables in the tree are shared
>>>>> between all copies of the Page but that when such a Page is
>>>>> serialized to disk and then de-serialized, each String becomes its own
>>>>> copy unique to that particular Page. This means that if an
>>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>>> there could be a bunch of memory being used for all the String
>>>>> copies.
>>>>>
>>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>>> there are three different ways of creating an array of
>>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>>    new Label(some_string)
>>>>>
>>>>> The first is to share the same String over all instance of the Label.
>>>>>    new Label(the_string)
>>>>> The second is to make a copy of the String when creating each
>>>>> Label;
>>>>>    new Label(new String(the_string))
>>>>> The third is to create a single Label, serialize it to an array of
>>>>> bytes and then generate the Labels in the array by de-serialized
>>>>> the byte array for each Label.
>>>>>
>>>>> Needless to say, the first uses the least memory; the label string
>>>>> is shared by all Labels while the second and third approach
>>>>> uses more memory. Also, if during the de-serialization process, the
>>>>> de-serialized String is replaced with the original instance of the
>>>>> String, then the third approach uses only as much memory as the
>>>>> first approach.
>>>>>
>>>>> No rocket science here, but it does seem to imply that if a
>>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>>> then there could be a memory saving by
>>>>> making de-serialization smarter about possible shared objects.
>>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>>> usage patterns, it might be worth looking into.
>>>>>
>>>>> Hence, my question to the masters of Wicket and developers whose
>>>>> application might fit the use-case.
>>>>>
>>>>> Richard
>>>>>
>>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>>
>>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>>
>>>>>> But I am no master ;)
>>>>>>
>>>>>> **
>>>>>> Martin
>>>>>>
>>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>>
>>>>>>> This is a question for Wicket masters and those application builders
>>>>>>> whose application match the criteria as specified below.
>>>>>>>
>>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>>>
>>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>>> usage patterns. What I am interested in are those applications where
>>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>>
>>>>>>> Firstly, do such applications exists? These are real-world
>>>>>>> applications where a significant number of pages in-memory
>>>>>>> are reanimated pages.
>>>>>>>
>>>>>>> For such applications, what percentage of all pages at any
>>>>>>> given time are reanimated pages?
>>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>>> very significant.
>>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Richard
>>>>>>> --
>>>>>>> Quis custodiet ipsos custodes
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
Martin,

The reason I was interested in Wicket memory usage was because
of the potential use of Scala traits, rather than the two possible
Java approaches, might be compelling when it comes to memory usage.

First, the two Java approaches: proxy/wrapper object or bundle everything
into the base class.

The proxy/wrapper approach lets one have a single implementation
that can be share by multiple classes. The down side is that
proxy/wrapper object requires an additional reference in the
class using it and hence additional memory usage.

The bundle everything into the base class approach violates
OOP 101 dictum about having small objects focused on their
own particular behavior thus avoiding bloat.
(Not executable Java/Scala code below.)

interface Parent {
   getParent
   setParent
}
// Potentially shared implementation
class ParentProxy implements Parent {
   parent
   getParent = parent
   setParent(parent) = this.parent = parent
}

// Issue: Has additional instance variable: parentProxy
class CompWithProxy with Parent {
   parentProxy = new ParentProxy
   getParent = parentProxy.getParent
   setParent(parent) = parentProxy.setParent(parent)
}

// Issue: Does not share implementation
class CompAllInOne with Parent {
   parent
   getParent = parent
   setParent(parent) = this.parent = parent
}

Wicket has taken the "bundle everything into base class" in order
to lessen memory usage - a certainly reasonable Java approach
to the problem.

With Scala one can do the following:

// Shared implementation
trait ParentTrait {
   parent
   getParent = parent
   setParent(parent) = this.parent = parent
}

// Uses implementation
class Comp with ParentTrait

The implementation, ParentTrait, can be used by any
number of classes.
In addition, one can add to a base class any number of
such implementation traits sharing multiple implementations
across multiple classes.

So, can using such approach result in smaller (less in-memory)
object in Scala than in Java?

The ParentTrait does not really save very much. I assume
that its only the Page class and sub-classes that do not have
parent components in Wicket, so the savings per Page component
tree is very small indeed. But, there are other behaviors that
can be converted to traits, for example, Models.
Many of the instance variables in the Java Models which
take memory can be converted to methods return values which only
add to the size of the class, not to every instance of the class.
Also, with Model traits that use Component self-types, one can
do away with IComponentAssignedModel wrapping and such.

So, how to demonstrate such memory differences. I created
stripped down versions of the Component and Label classes in
both Java and Scala (only ids and Models) .
Created different Model usage scenarios
with Model object in Java and Traits in Scala, and, finally,
serialized (Java Serialization) the result comparing the size
of the resulting array of bytes. There are two runs, one with
all Strings being the empty string and the next where all
strings are 10-character strings:

The Java versions (empty string):
Label.Empty               99
Label.ReadOnly           196
Label.ReadWrite          159
Label.Resource           333
Label.Property           223
Label.ComponentProperty  351
Label.CompoundProperty   208

The Scala versions (empty string):
Label.Empty              79
Label.ReadOnly           131
Label.ReadWrite          150
Label.Resource           164
Label.Property           207
Label.ComponentProperty  134
Label.CompoundProperty   184


The Java versions (10-character strings):
Label.Empty              109
Label.ReadOnly           214
Label.ReadWrite          177
Label.Resource           359
Label.Property           241
Label.ComponentProperty  369
Label.CompoundProperty   218


The Scala versions (10-character strings):
Label.Empty               89
Label.ReadOnly           149
Label.ReadWrite          168
Label.Resource           190
Label.Property           225
Label.ComponentProperty  152
Label.CompoundProperty   194

[Note that the Java Label.Empty result is misleading since in Wicket
there is no memory overhead when a Component has no Model.]

While this does indicate that using Model traits with Scala
will result in less memory usage than the comparable Java
approach, Java Serialization adds a whole lot of extra stuff
to the array of bytes that masks the true change in
in-memory usage. With Java Serialization, the class descriptor
for each instance serialized is also added to the byte array and,
it is this, that takes up most of the array of bytes.

Thinking about it, I realized that Java Serialization is rather
a blunt tool when it comes to the requirement of (Scala) Wicket
Page serialization. Java Serialization creates a byte array
that is rather self-contained/self-descriptive in its content.
This is not required for (Scala) Wicket which has very
specific requirements and use-cases.

But first, before I describe what I did, here are the results.
The byte array size data for the serializer I created just to
show that one can do a lot better than Java Serialization:

The Scala versions (empty string):
Label.Empty                6
Label.ReadOnly             8
Label.ReadWrite            8
Label.Resource            10
Label.Property            13
Label.ComponentProperty    8
Label.CompoundProperty    11

The Scala versions (10-character strings):
Label.Empty                8
Label.ReadOnly            12
Label.ReadWrite           12
Label.Resource            16
Label.Property            17
Label.ComponentProperty   12
Label.CompoundProperty    13

Yes, better by more than a factor of 10. I assume factors
of 10 are compelling.

So, back to the requirements. I spent a couple of days creating
the serializer (currently 3.8Kloc) that focused on what I thought
would be needed by (Scala) Wicket.
The same application using (Scala) Wicket is running on either a
single machine or a group of machines.
The serialized Page system can have:

   In-memory repository
     (single-machine, testing);
   In-memory cache with local disk backstore
     (single-machine, production, re-start) and
   In-memory cache with database backstore used by a number of machines
     (multi-machine, production, fail-over, session-migration, re-start)

   Strings and associated id are cached/backstored where it is the id
     that is used in the serialized array.
   Classes and associated id are cached/backstored where it is the id
     that is used in the serialized array.
   Optimizations allow, for example, the Long value 1L to be serialized
     as 1 byte or (un-optimized) as 9 bytes.
   When using a backstore, a header is prepended to each byte array
     that includes the serializer magic number (2 bytes), serializer
     protocol version (2 bytes?) and application information (version, etc.)
     (2 bytes?).

In addition, there are two cases where one might be serializing
the same object more than once.

The first case is dealt with by most serializers, an object
appears more than once in the tree of objects being serialized.
Java Serialization deals with this. One must keep track of
the identity of all objects being serialized. Then, if an object
appears for serialization for a second (third, etc.) time, some
sort of reference object and tag is serialized rather than the
object. De-serialization is ....  obvious.
I do not know, but I assume that this does not arise in Wicket; the
same Component appearing more than once in the same Page tree of
components. If it does happen, please let me know. If it should
not happen but could, is there some visitor well-formness traversal
that check for duplicate object appearances in a given tree?

The second case is one that probably does (or could) occur with
Wicket and I've never heard of a serializer dealing with, namely,
the same object appears in more than one Page tree - knowledge
of what is being serialized is shared across serializations.
For this to work, the
Component (which could be a tree of Components) has to be
immutable like a Label with a read-only value or read-only Model
(and the Model object is never changed), etc. Here, there can be
a saving if the shared object is serialized in its own backstore
and only its identifier appears in the byte arrays of each Page.
If there was an Immutable interface which could tag immutable
objects, it would be much easier for the serializer to identify
them (well, not just easier, but, rather, plain old possible
versus impossible) - just a last minute thought.

I've not create a Java version of my serializer. But, since the
Scala version does not use much Scala magic, a Java version
would not be too hard to port to. I also have some 500 unit tests.

Well, enough for now.

Richard





On 07/10/2011 02:37 AM, Martin Grigorov wrote:
> Hi,
>
> About the use cases: my experience is that most of the time the uses
> the in-memory pages (for each listener callback execution, for ajax
> requests,...).
> Previous version of a page, or previous page is needed when the user
> clicks browser back button. Even in this case most of the time the
> in-memory cache is hit. Only when the user goes several pages back and
> this page is not in-memory then the disk store is used.
>
> So far so good, but...! Even in-memory store contains serialized
> versions of the Page, named SerializedPage. This is a struct which
> contains
> {
>    sessionId: String,
>    pageId: int,
>    data: byte[]
> }
> so the Page is serialized back and forth when stored in *any*
> IPageStore/IDataStore.
>
> This is the current state in Wicket 1.5.
>
> Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
> improved to work with Page instances but we decided to postpone this
> optimization for 1.5.0+.
>
> About new String("someLiteral"): I don't remember lately seeing this
> code neither in libraries, nor in applications. This constructor
> should be used only when the developer explicitly wants this string to
> not be interned and stored in the PermGen space, i.e. it will be
> stored in the heap space.
> Your benchmark test tests exactly this - the heap space.
> I'll try the app with MemoryMXBean to see whether the non-heap changes
> after deserialization.
> I'm not very into Java Serialization but indeed it seems the Strings
> are deserialized in the heap. But even in this case they go in the
> Eden space, i.e. they are reclaimed soon after.
>
> On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
> <ri...@gmail.com>  wrote:
>> I you run the little Java program I included, you will see that
>> there is an impact - de-serialized objects take more memory.
>>
>> Richard
>>
>> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>>
>>> string literals are interned by the jvm so they should have a minimal
>>> memory impact.
>>>
>>> -igor
>>>
>>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>>> <ri...@gmail.com>    wrote:
>>>>
>>>> Martin,
>>>>
>>>> The reason I was interested was because it struck me a couple of
>>>> days ago that while each Page, tree of Components, is created
>>>> many (almost all?) of the non-end-user-generated Strings stored
>>>> as instance variables in the tree are shared
>>>> between all copies of the Page but that when such a Page is
>>>> serialized to disk and then de-serialized, each String becomes its own
>>>> copy unique to that particular Page. This means that if an
>>>> appreciable number of Pages in-memory are reanimated Pages, then
>>>> there could be a bunch of memory being used for all the String
>>>> copies.
>>>>
>>>> In the attached simple Java file (yes, I still write Java when I must)
>>>> there are three different ways of creating an array of
>>>> Label objects (not Wicket Label) where each Label takes a String:
>>>>     new Label(some_string)
>>>>
>>>> The first is to share the same String over all instance of the Label.
>>>>     new Label(the_string)
>>>> The second is to make a copy of the String when creating each
>>>> Label;
>>>>     new Label(new String(the_string))
>>>> The third is to create a single Label, serialize it to an array of
>>>> bytes and then generate the Labels in the array by de-serialized
>>>> the byte array for each Label.
>>>>
>>>> Needless to say, the first uses the least memory; the label string
>>>> is shared by all Labels while the second and third approach
>>>> uses more memory. Also, if during the de-serialization process, the
>>>> de-serialized String is replaced with the original instance of the
>>>> String, then the third approach uses only as much memory as the
>>>> first approach.
>>>>
>>>> No rocket science here, but it does seem to imply that if a
>>>> significant number of Pages in-memory are actually reanimated Pages,
>>>> then there could be a memory saving by
>>>> making de-serialization smarter about possible shared objects.
>>>> Even it it is only, say, a 5% saving for only certain Wicket
>>>> usage patterns, it might be worth looking into.
>>>>
>>>> Hence, my question to the masters of Wicket and developers whose
>>>> application might fit the use-case.
>>>>
>>>> Richard
>>>>
>>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>>
>>>>> Difficult to say ... we have disabled page versioning and se dump
>>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>>
>>>>> But I am no master ;)
>>>>>
>>>>> **
>>>>> Martin
>>>>>
>>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>>
>>>>>> This is a question for Wicket masters and those application builders
>>>>>> whose application match the criteria as specified below.
>>>>>>
>>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>>
>>>>>> Wicket is used in a wide range of applications with a variety of
>>>>>> usage patterns. What I am interested in are those applications where
>>>>>> an appreciable number of the pages in memory are pages that had
>>>>>> previously been serialized and stored to disk and then reanimated,
>>>>>> not found in an in-memory cache and had to be read from disk and
>>>>>> de-serialized back into an in-memory page; which is to say,
>>>>>> applications with an appreciable number of reanimated pages.
>>>>>>
>>>>>> Firstly, do such applications exists? These are real-world
>>>>>> applications where a significant number of pages in-memory
>>>>>> are reanimated pages.
>>>>>>
>>>>>> For such applications, what percentage of all pages at any
>>>>>> given time are reanimated pages?
>>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>>> very significant.
>>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>>> and then, as needed, de-serialized back into a Page.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Richard
>>>>>> --
>>>>>> Quis custodiet ipsos custodes
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
>
>

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Martin Grigorov <mg...@apache.org>.
Hi,

About the use cases: my experience is that most of the time the uses
the in-memory pages (for each listener callback execution, for ajax
requests,...).
Previous version of a page, or previous page is needed when the user
clicks browser back button. Even in this case most of the time the
in-memory cache is hit. Only when the user goes several pages back and
this page is not in-memory then the disk store is used.

So far so good, but...! Even in-memory store contains serialized
versions of the Page, named SerializedPage. This is a struct which
contains
{
  sessionId: String,
  pageId: int,
  data: byte[]
}
so the Page is serialized back and forth when stored in *any*
IPageStore/IDataStore.

This is the current state in Wicket 1.5.

Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
improved to work with Page instances but we decided to postpone this
optimization for 1.5.0+.

About new String("someLiteral"): I don't remember lately seeing this
code neither in libraries, nor in applications. This constructor
should be used only when the developer explicitly wants this string to
not be interned and stored in the PermGen space, i.e. it will be
stored in the heap space.
Your benchmark test tests exactly this - the heap space.
I'll try the app with MemoryMXBean to see whether the non-heap changes
after deserialization.
I'm not very into Java Serialization but indeed it seems the Strings
are deserialized in the heap. But even in this case they go in the
Eden space, i.e. they are reclaimed soon after.

On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
<ri...@gmail.com> wrote:
> I you run the little Java program I included, you will see that
> there is an impact - de-serialized objects take more memory.
>
> Richard
>
> On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
>>
>> string literals are interned by the jvm so they should have a minimal
>> memory impact.
>>
>> -igor
>>
>> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
>> <ri...@gmail.com>  wrote:
>>>
>>> Martin,
>>>
>>> The reason I was interested was because it struck me a couple of
>>> days ago that while each Page, tree of Components, is created
>>> many (almost all?) of the non-end-user-generated Strings stored
>>> as instance variables in the tree are shared
>>> between all copies of the Page but that when such a Page is
>>> serialized to disk and then de-serialized, each String becomes its own
>>> copy unique to that particular Page. This means that if an
>>> appreciable number of Pages in-memory are reanimated Pages, then
>>> there could be a bunch of memory being used for all the String
>>> copies.
>>>
>>> In the attached simple Java file (yes, I still write Java when I must)
>>> there are three different ways of creating an array of
>>> Label objects (not Wicket Label) where each Label takes a String:
>>>    new Label(some_string)
>>>
>>> The first is to share the same String over all instance of the Label.
>>>    new Label(the_string)
>>> The second is to make a copy of the String when creating each
>>> Label;
>>>    new Label(new String(the_string))
>>> The third is to create a single Label, serialize it to an array of
>>> bytes and then generate the Labels in the array by de-serialized
>>> the byte array for each Label.
>>>
>>> Needless to say, the first uses the least memory; the label string
>>> is shared by all Labels while the second and third approach
>>> uses more memory. Also, if during the de-serialization process, the
>>> de-serialized String is replaced with the original instance of the
>>> String, then the third approach uses only as much memory as the
>>> first approach.
>>>
>>> No rocket science here, but it does seem to imply that if a
>>> significant number of Pages in-memory are actually reanimated Pages,
>>> then there could be a memory saving by
>>> making de-serialization smarter about possible shared objects.
>>> Even it it is only, say, a 5% saving for only certain Wicket
>>> usage patterns, it might be worth looking into.
>>>
>>> Hence, my question to the masters of Wicket and developers whose
>>> application might fit the use-case.
>>>
>>> Richard
>>>
>>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>>
>>>> Difficult to say ... we have disabled page versioning and se dump
>>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>>
>>>> But I am no master ;)
>>>>
>>>> **
>>>> Martin
>>>>
>>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>>
>>>>> This is a question for Wicket masters and those application builders
>>>>> whose application match the criteria as specified below.
>>>>>
>>>>> [In this case, a Wicket master is someone with a knowledge
>>>>> of how Wicket is being used in a wide spectrum of applications
>>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>>
>>>>> Wicket is used in a wide range of applications with a variety of
>>>>> usage patterns. What I am interested in are those applications where
>>>>> an appreciable number of the pages in memory are pages that had
>>>>> previously been serialized and stored to disk and then reanimated,
>>>>> not found in an in-memory cache and had to be read from disk and
>>>>> de-serialized back into an in-memory page; which is to say,
>>>>> applications with an appreciable number of reanimated pages.
>>>>>
>>>>> Firstly, do such applications exists? These are real-world
>>>>> applications where a significant number of pages in-memory
>>>>> are reanimated pages.
>>>>>
>>>>> For such applications, what percentage of all pages at any
>>>>> given time are reanimated pages?
>>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>>> very significant.
>>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>>> and then, as needed, de-serialized back into a Page.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Richard
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
I you run the little Java program I included, you will see that
there is an impact - de-serialized objects take more memory.

Richard

On 07/09/2011 05:23 PM, Igor Vaynberg wrote:
> string literals are interned by the jvm so they should have a minimal
> memory impact.
>
> -igor
>
> On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
> <ri...@gmail.com>  wrote:
>> Martin,
>>
>> The reason I was interested was because it struck me a couple of
>> days ago that while each Page, tree of Components, is created
>> many (almost all?) of the non-end-user-generated Strings stored
>> as instance variables in the tree are shared
>> between all copies of the Page but that when such a Page is
>> serialized to disk and then de-serialized, each String becomes its own
>> copy unique to that particular Page. This means that if an
>> appreciable number of Pages in-memory are reanimated Pages, then
>> there could be a bunch of memory being used for all the String
>> copies.
>>
>> In the attached simple Java file (yes, I still write Java when I must)
>> there are three different ways of creating an array of
>> Label objects (not Wicket Label) where each Label takes a String:
>>     new Label(some_string)
>>
>> The first is to share the same String over all instance of the Label.
>>     new Label(the_string)
>> The second is to make a copy of the String when creating each
>> Label;
>>     new Label(new String(the_string))
>> The third is to create a single Label, serialize it to an array of
>> bytes and then generate the Labels in the array by de-serialized
>> the byte array for each Label.
>>
>> Needless to say, the first uses the least memory; the label string
>> is shared by all Labels while the second and third approach
>> uses more memory. Also, if during the de-serialization process, the
>> de-serialized String is replaced with the original instance of the
>> String, then the third approach uses only as much memory as the
>> first approach.
>>
>> No rocket science here, but it does seem to imply that if a
>> significant number of Pages in-memory are actually reanimated Pages,
>> then there could be a memory saving by
>> making de-serialization smarter about possible shared objects.
>> Even it it is only, say, a 5% saving for only certain Wicket
>> usage patterns, it might be worth looking into.
>>
>> Hence, my question to the masters of Wicket and developers whose
>> application might fit the use-case.
>>
>> Richard
>>
>> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>>
>>> Difficult to say ... we have disabled page versioning and se dump
>>> sessions onto disk every 5 minutes to minimize memory hassles.
>>>
>>> But I am no master ;)
>>>
>>> **
>>> Martin
>>>
>>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>>
>>>> This is a question for Wicket masters and those application builders
>>>> whose application match the criteria as specified below.
>>>>
>>>> [In this case, a Wicket master is someone with a knowledge
>>>> of how Wicket is being used in a wide spectrum of applications
>>>> so that they have a feel for what use-cases exist in the real world.]
>>>>
>>>> Wicket is used in a wide range of applications with a variety of
>>>> usage patterns. What I am interested in are those applications where
>>>> an appreciable number of the pages in memory are pages that had
>>>> previously been serialized and stored to disk and then reanimated,
>>>> not found in an in-memory cache and had to be read from disk and
>>>> de-serialized back into an in-memory page; which is to say,
>>>> applications with an appreciable number of reanimated pages.
>>>>
>>>> Firstly, do such applications exists? These are real-world
>>>> applications where a significant number of pages in-memory
>>>> are reanimated pages.
>>>>
>>>> For such applications, what percentage of all pages at any
>>>> given time are reanimated pages?
>>>> Is it, say, a couple of percent? Two or three in which case its not
>>>> very significant.
>>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>>> memory had been serialized to disk, flushed from any in-memory cache
>>>> and then, as needed, de-serialized back into a Page.
>>>>
>>>> Thanks
>>>>
>>>> Richard
>>>> --
>>>> Quis custodiet ipsos custodes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>> --
>> Quis custodiet ipsos custodes
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

-- 
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by Igor Vaynberg <ig...@gmail.com>.
string literals are interned by the jvm so they should have a minimal
memory impact.

-igor

On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
<ri...@gmail.com> wrote:
> Martin,
>
> The reason I was interested was because it struck me a couple of
> days ago that while each Page, tree of Components, is created
> many (almost all?) of the non-end-user-generated Strings stored
> as instance variables in the tree are shared
> between all copies of the Page but that when such a Page is
> serialized to disk and then de-serialized, each String becomes its own
> copy unique to that particular Page. This means that if an
> appreciable number of Pages in-memory are reanimated Pages, then
> there could be a bunch of memory being used for all the String
> copies.
>
> In the attached simple Java file (yes, I still write Java when I must)
> there are three different ways of creating an array of
> Label objects (not Wicket Label) where each Label takes a String:
>    new Label(some_string)
>
> The first is to share the same String over all instance of the Label.
>    new Label(the_string)
> The second is to make a copy of the String when creating each
> Label;
>    new Label(new String(the_string))
> The third is to create a single Label, serialize it to an array of
> bytes and then generate the Labels in the array by de-serialized
> the byte array for each Label.
>
> Needless to say, the first uses the least memory; the label string
> is shared by all Labels while the second and third approach
> uses more memory. Also, if during the de-serialization process, the
> de-serialized String is replaced with the original instance of the
> String, then the third approach uses only as much memory as the
> first approach.
>
> No rocket science here, but it does seem to imply that if a
> significant number of Pages in-memory are actually reanimated Pages,
> then there could be a memory saving by
> making de-serialization smarter about possible shared objects.
> Even it it is only, say, a 5% saving for only certain Wicket
> usage patterns, it might be worth looking into.
>
> Hence, my question to the masters of Wicket and developers whose
> application might fit the use-case.
>
> Richard
>
> On 07/09/2011 11:03 AM, Martin Makundi wrote:
>>
>> Difficult to say ... we have disabled page versioning and se dump
>> sessions onto disk every 5 minutes to minimize memory hassles.
>>
>> But I am no master ;)
>>
>> **
>> Martin
>>
>> 2011/7/9 richard emberson<ri...@gmail.com>:
>>>
>>> This is a question for Wicket masters and those application builders
>>> whose application match the criteria as specified below.
>>>
>>> [In this case, a Wicket master is someone with a knowledge
>>> of how Wicket is being used in a wide spectrum of applications
>>> so that they have a feel for what use-cases exist in the real world.]
>>>
>>> Wicket is used in a wide range of applications with a variety of
>>> usage patterns. What I am interested in are those applications where
>>> an appreciable number of the pages in memory are pages that had
>>> previously been serialized and stored to disk and then reanimated,
>>> not found in an in-memory cache and had to be read from disk and
>>> de-serialized back into an in-memory page; which is to say,
>>> applications with an appreciable number of reanimated pages.
>>>
>>> Firstly, do such applications exists? These are real-world
>>> applications where a significant number of pages in-memory
>>> are reanimated pages.
>>>
>>> For such applications, what percentage of all pages at any
>>> given time are reanimated pages?
>>> Is it, say, a couple of percent? Two or three in which case its not
>>> very significant.
>>> Or, is it, say, 50%? Meaning that half of all pages currently in
>>> memory had been serialized to disk, flushed from any in-memory cache
>>> and then, as needed, de-serialized back into a Page.
>>>
>>> Thanks
>>>
>>> Richard
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Page De-Serialization and memory

Posted by richard emberson <ri...@gmail.com>.
Martin,

The reason I was interested was because it struck me a couple of
days ago that while each Page, tree of Components, is created
many (almost all?) of the non-end-user-generated Strings stored
as instance variables in the tree are shared
between all copies of the Page but that when such a Page is
serialized to disk and then de-serialized, each String becomes its own
copy unique to that particular Page. This means that if an
appreciable number of Pages in-memory are reanimated Pages, then
there could be a bunch of memory being used for all the String
copies.

In the attached simple Java file (yes, I still write Java when I must)
there are three different ways of creating an array of
Label objects (not Wicket Label) where each Label takes a String:
     new Label(some_string)

The first is to share the same String over all instance of the Label.
     new Label(the_string)
The second is to make a copy of the String when creating each
Label;
     new Label(new String(the_string))
The third is to create a single Label, serialize it to an array of
bytes and then generate the Labels in the array by de-serialized
the byte array for each Label.

Needless to say, the first uses the least memory; the label string
is shared by all Labels while the second and third approach
uses more memory. Also, if during the de-serialization process, the
de-serialized String is replaced with the original instance of the
String, then the third approach uses only as much memory as the
first approach.

No rocket science here, but it does seem to imply that if a
significant number of Pages in-memory are actually reanimated Pages,
then there could be a memory saving by
making de-serialization smarter about possible shared objects.
Even it it is only, say, a 5% saving for only certain Wicket
usage patterns, it might be worth looking into.

Hence, my question to the masters of Wicket and developers whose
application might fit the use-case.

Richard

On 07/09/2011 11:03 AM, Martin Makundi wrote:
> Difficult to say ... we have disabled page versioning and se dump
> sessions onto disk every 5 minutes to minimize memory hassles.
>
> But I am no master ;)
>
> **
> Martin
>
> 2011/7/9 richard emberson<ri...@gmail.com>:
>> This is a question for Wicket masters and those application builders
>> whose application match the criteria as specified below.
>>
>> [In this case, a Wicket master is someone with a knowledge
>> of how Wicket is being used in a wide spectrum of applications
>> so that they have a feel for what use-cases exist in the real world.]
>>
>> Wicket is used in a wide range of applications with a variety of
>> usage patterns. What I am interested in are those applications where
>> an appreciable number of the pages in memory are pages that had
>> previously been serialized and stored to disk and then reanimated,
>> not found in an in-memory cache and had to be read from disk and
>> de-serialized back into an in-memory page; which is to say,
>> applications with an appreciable number of reanimated pages.
>>
>> Firstly, do such applications exists? These are real-world
>> applications where a significant number of pages in-memory
>> are reanimated pages.
>>
>> For such applications, what percentage of all pages at any
>> given time are reanimated pages?
>> Is it, say, a couple of percent? Two or three in which case its not
>> very significant.
>> Or, is it, say, 50%? Meaning that half of all pages currently in
>> memory had been serialized to disk, flushed from any in-memory cache
>> and then, as needed, de-serialized back into a Page.
>>
>> Thanks
>>
>> Richard
>> --
>> Quis custodiet ipsos custodes
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

-- 
Quis custodiet ipsos custodes

Re: Page De-Serialization and memory

Posted by Martin Makundi <ma...@koodaripalvelut.com>.
Difficult to say ... we have disabled page versioning and se dump
sessions onto disk every 5 minutes to minimize memory hassles.

But I am no master ;)

**
Martin

2011/7/9 richard emberson <ri...@gmail.com>:
> This is a question for Wicket masters and those application builders
> whose application match the criteria as specified below.
>
> [In this case, a Wicket master is someone with a knowledge
> of how Wicket is being used in a wide spectrum of applications
> so that they have a feel for what use-cases exist in the real world.]
>
> Wicket is used in a wide range of applications with a variety of
> usage patterns. What I am interested in are those applications where
> an appreciable number of the pages in memory are pages that had
> previously been serialized and stored to disk and then reanimated,
> not found in an in-memory cache and had to be read from disk and
> de-serialized back into an in-memory page; which is to say,
> applications with an appreciable number of reanimated pages.
>
> Firstly, do such applications exists? These are real-world
> applications where a significant number of pages in-memory
> are reanimated pages.
>
> For such applications, what percentage of all pages at any
> given time are reanimated pages?
> Is it, say, a couple of percent? Two or three in which case its not
> very significant.
> Or, is it, say, 50%? Meaning that half of all pages currently in
> memory had been serialized to disk, flushed from any in-memory cache
> and then, as needed, de-serialized back into a Page.
>
> Thanks
>
> Richard
> --
> Quis custodiet ipsos custodes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org