You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cayenne.apache.org by Nikita Timofeev <nt...@objectstyle.com> on 2017/07/05 14:19:16 UTC

Re: Cayenne object storage / memory usage

Hi all,

I've run some additional benchmarks for field-based classes inspired
by John and they were so promising, that I've moved on
to the implementation.

So here is pull request for you to review [1].
Here [2] you can see what new generated classes will look like.

For me there is no visible downsides in this solution, e.g. both
memory usage and speed are improved.
All tests are clean and the only minor incompatibility out there
is in HOLLOW state that no longer resets object's values [3]
(though this can be implemented as well, I'm just
not sure this is really needed).

P.S. here is some raw numbers from my benchmarks.
I'm giving absolute numbers, but really only their relation is important.
Results for old version are on the left, for new version on the right.

Memory usage:
==============
1. 10.000 small objects
(int, Date and String ~ 20 chars)
>>> 6Mb vs 2.5Mb <<<

2. 10.000 objects with big values
(int, Date and String ~ 1K chars)
Actually in case of same classes (same field number),
there will be just constant difference,
so this is just to get idea what to expect in different cases.
>>> 24.5Mb vs 21Mb <<<

Performance:
==============
(numbers are in millions ops per sec, measured with JMH benchmark)
1. Getter:
>>> 107 vs 177 <<<

2. Setter:
Not so impressive, as Cayenne stack took most of the
time here to process graph diff, but still new methods are better.
>>> 12.5 vs 14.5 <<<

3. readPropertyDirectly:
>>> 152 vs 248 <<<

4. writePropertyDirectly:
This is map.put() vs switch(String) battle,
and map definitely loosing it :)
>>> 126 vs 582 <<<

[1] https://github.com/apache/cayenne/pull/235
[2] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
[3] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144

On Wed, Jun 21, 2017 at 10:20 PM, John Huss <jo...@gmail.com> wrote:
> I was surprised by the difference in memory too, but this is a small diff
> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
> methods) so there isn't anything else going on.  My unverified assumption
> of HashMap is that is doubles in size each time it resizes, so entities
> with more fields could cause more waste. For example a entity with 65
> fields would have 63 empty array slots (ignoring fill factor).  So the
> exact savings may vary.
>
> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <ro...@roxanemy.com>
> wrote:
>
>> I’m also a little surprised at the 1/2-ing… what were the values being
>> stored? I suppose in theory, many values are relatively “small”,
>> memory-wise, so having the overhead of also storing the key could ~double
>> the memory use, but if you’re storing large values, I wouldn’t expect the
>> utilization to drop as dramatically. What were your data values (type and
>> length distribution for strings)?
>>
>> Thanks!
>>
>> Robert
>>
>> > On Jun 10, 2017, at 6:49 AM, Michael Gentry <bl...@gmail.com> wrote:
>> >
>> > Hi John,
>> >
>> > I'm a little surprised that map-based storage is over 2x worse in memory
>> > consumption.  I'm wondering if there is more going on here than storage
>> of
>> > the property values.  Would it be simple enough to adapt your test case
>> to
>> > compare a list of POJOs vs a list of maps and see what the memory
>> footprint
>> > and difference is that way?
>> >
>> > I personally was thinking the big improvement for using fields directly
>> is
>> > the speed improvement.  I didn't think the memory consumption difference
>> > would be that dramatic.
>> >
>> > Thanks,
>> >
>> > mrg
>> >
>> >
>> > On Fri, Jun 9, 2017 at 10:55 AM, John Huss <jo...@gmail.com> wrote:
>> >
>> >> I did some experimenting recently to see if changes to the way data in
>> >> stored in Cayenne objects could reduce the amount of memory they
>> consume.
>> >>
>> >> I chose to use separate fields for each property instead of a HashMap
>> >> (which is what CayenneDataObject uses).  The results were very
>> affirming.
>> >> For my test of loading 10,000 objects from every table in my database I
>> got
>> >> it to use about about *half the memory* of the default class (from 921
>> MB
>> >> down to 431 MB).
>> >>
>> >> I know there has been some discussion already about addressing this
>> topic
>> >> for the next major release, so I thought I'd throw in some observations
>> /
>> >> questions here.
>> >>
>> >> For my implementation I subclassed CayenneDataObject because in previous
>> >> experience I found implementing a replacement to be much more difficult
>> and
>> >> subject to more bugs due to the less frequently used code path that
>> >> PersistentObject and it's descriptors take you down.  My apps rely on
>> >> things that are sort of specific to CayenneDataObject like Validating.
>> >>
>> >> So one question is how we should be addressing the need that people may
>> >> have to create their own data classes. Right now I believe the
>> recommended
>> >> path is to subclass PersistentObject, but I'm not convinced that that
>> is a
>> >> viable solution without wholesale copying most of CayenneDataObject into
>> >> your subclass.  I'd rather see a fuller base class (in addition to
>> keeping
>> >> PersistentObject around) that includes all of CayenneDataObject except
>> the
>> >> property storage (HashMap).
>> >>
>> >> For my implementation I had to modify CayenneDataObject, but only
>> slightly
>> >> to avoid creating the HashMap which I wasn't using. However, because
>> class
>> >> isn't really intended for customization this map is referenced in
>> multiple
>> >> methods that can't easily be overridden to change the way things are
>> >> stored.
>> >>
>> >> Another approach might be to ask why anyone should need to customize the
>> >> way data is stored in the objects if we can just use the best solution
>> >> possible in the first place?  I can't imagine a more efficient
>> >> representation that fields.  However, fields present difficulties for
>> the
>> >> use case where you aren't generating unique classes for your model but
>> just
>> >> rely on the generic class.  In theory this could be addressed via
>> runtime
>> >> code generation or something else, but that would be quite a change.
>> >>
>> >> So I'm looking forward to discussing this and toward the future.
>> >>
>> >> John
>> >>
>>
>>



-- 
Best regards,
Nikita Timofeev

Re: Cayenne object storage / memory usage

Posted by John Huss <jo...@gmail.com>.

I'm very glad to see this moving forward! Very exciting! Thanks for your
work on this.
On Thu, Jul 6, 2017 at 8:32 AM Robert Zeigler <ro...@roxanemy.com>
wrote:

> Kudos on the improvements, and to the original developers (Andrus, et al)
> for a fantastic design. These days, I’ve been doing a lot more Python
> coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I
> still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction
> model more akin to Hibernate, though not as egregious).
>
> Best,
>
> Robert
>
> > On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <an...@objectstyle.org>
> wrote:
> >
> > The fact that we can switch to field-based DataObjects with minimal
> effort and without sacrificing a single thing in the Cayenne design is a
> *very* big deal! Thanks John for bringing the possibility to everyone's
> attention, and Nikita - for the working code and benchmarks.
> >
> > I am going to try this out on a real app some time next week. Very
> exciting! :)
> >
> > Andrus
> >
> >
> >> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <nt...@objectstyle.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I've run some additional benchmarks for field-based classes inspired
> >> by John and they were so promising, that I've moved on
> >> to the implementation.
> >>
> >> So here is pull request for you to review [1].
> >> Here [2] you can see what new generated classes will look like.
> >>
> >> For me there is no visible downsides in this solution, e.g. both
> >> memory usage and speed are improved.
> >> All tests are clean and the only minor incompatibility out there
> >> is in HOLLOW state that no longer resets object's values [3]
> >> (though this can be implemented as well, I'm just
> >> not sure this is really needed).
> >>
> >> P.S. here is some raw numbers from my benchmarks.
> >> I'm giving absolute numbers, but really only their relation is
> important.
> >> Results for old version are on the left, for new version on the right.
> >>
> >> Memory usage:
> >> ==============
> >> 1. 10.000 small objects
> >> (int, Date and String ~ 20 chars)
> >>>>> 6Mb vs 2.5Mb <<<
> >>
> >> 2. 10.000 objects with big values
> >> (int, Date and String ~ 1K chars)
> >> Actually in case of same classes (same field number),
> >> there will be just constant difference,
> >> so this is just to get idea what to expect in different cases.
> >>>>> 24.5Mb vs 21Mb <<<
> >>
> >> Performance:
> >> ==============
> >> (numbers are in millions ops per sec, measured with JMH benchmark)
> >> 1. Getter:
> >>>>> 107 vs 177 <<<
> >>
> >> 2. Setter:
> >> Not so impressive, as Cayenne stack took most of the
> >> time here to process graph diff, but still new methods are better.
> >>>>> 12.5 vs 14.5 <<<
> >>
> >> 3. readPropertyDirectly:
> >>>>> 152 vs 248 <<<
> >>
> >> 4. writePropertyDirectly:
> >> This is map.put() vs switch(String) battle,
> >> and map definitely loosing it :)
> >>>>> 126 vs 582 <<<
> >>
> >> [1] https://github.com/apache/cayenne/pull/235
> >> [2]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> >> [3]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> >>
> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <jo...@gmail.com>
> wrote:
> >>> I was surprised by the difference in memory too, but this is a small
> diff
> >>> (apart from the newly generated
> readPropertyDirectly/writePropertyDirectly
> >>> methods) so there isn't anything else going on.  My unverified
> assumption
> >>> of HashMap is that is doubles in size each time it resizes, so entities
> >>> with more fields could cause more waste. For example a entity with 65
> >>> fields would have 63 empty array slots (ignoring fill factor).  So the
> >>> exact savings may vary.
> >>>
> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> robert.zeigler@roxanemy.com>
> >>> wrote:
> >>>
> >>>> I’m also a little surprised at the 1/2-ing… what were the values being
> >>>> stored? I suppose in theory, many values are relatively “small”,
> >>>> memory-wise, so having the overhead of also storing the key could
> ~double
> >>>> the memory use, but if you’re storing large values, I wouldn’t expect
> the
> >>>> utilization to drop as dramatically. What were your data values (type
> and
> >>>> length distribution for strings)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Robert
> >>>>
> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <bl...@gmail.com>
> wrote:
> >>>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I'm a little surprised that map-based storage is over 2x worse in
> memory
> >>>>> consumption.  I'm wondering if there is more going on here than
> storage
> >>>> of
> >>>>> the property values.  Would it be simple enough to adapt your test
> case
> >>>> to
> >>>>> compare a list of POJOs vs a list of maps and see what the memory
> >>>> footprint
> >>>>> and difference is that way?
> >>>>>
> >>>>> I personally was thinking the big improvement for using fields
> directly
> >>>> is
> >>>>> the speed improvement.  I didn't think the memory consumption
> difference
> >>>>> would be that dramatic.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> mrg
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <jo...@gmail.com>
> wrote:
> >>>>>
> >>>>>> I did some experimenting recently to see if changes to the way data
> in
> >>>>>> stored in Cayenne objects could reduce the amount of memory they
> >>>> consume.
> >>>>>>
> >>>>>> I chose to use separate fields for each property instead of a
> HashMap
> >>>>>> (which is what CayenneDataObject uses).  The results were very
> >>>> affirming.
> >>>>>> For my test of loading 10,000 objects from every table in my
> database I
> >>>> got
> >>>>>> it to use about about *half the memory* of the default class (from
> 921
> >>>> MB
> >>>>>> down to 431 MB).
> >>>>>>
> >>>>>> I know there has been some discussion already about addressing this
> >>>> topic
> >>>>>> for the next major release, so I thought I'd throw in some
> observations
> >>>> /
> >>>>>> questions here.
> >>>>>>
> >>>>>> For my implementation I subclassed CayenneDataObject because in
> previous
> >>>>>> experience I found implementing a replacement to be much more
> difficult
> >>>> and
> >>>>>> subject to more bugs due to the less frequently used code path that
> >>>>>> PersistentObject and it's descriptors take you down.  My apps rely
> on
> >>>>>> things that are sort of specific to CayenneDataObject like
> Validating.
> >>>>>>
> >>>>>> So one question is how we should be addressing the need that people
> may
> >>>>>> have to create their own data classes. Right now I believe the
> >>>> recommended
> >>>>>> path is to subclass PersistentObject, but I'm not convinced that
> that
> >>>> is a
> >>>>>> viable solution without wholesale copying most of CayenneDataObject
> into
> >>>>>> your subclass.  I'd rather see a fuller base class (in addition to
> >>>> keeping
> >>>>>> PersistentObject around) that includes all of CayenneDataObject
> except
> >>>> the
> >>>>>> property storage (HashMap).
> >>>>>>
> >>>>>> For my implementation I had to modify CayenneDataObject, but only
> >>>> slightly
> >>>>>> to avoid creating the HashMap which I wasn't using. However, because
> >>>> class
> >>>>>> isn't really intended for customization this map is referenced in
> >>>> multiple
> >>>>>> methods that can't easily be overridden to change the way things are
> >>>>>> stored.
> >>>>>>
> >>>>>> Another approach might be to ask why anyone should need to
> customize the
> >>>>>> way data is stored in the objects if we can just use the best
> solution
> >>>>>> possible in the first place?  I can't imagine a more efficient
> >>>>>> representation that fields.  However, fields present difficulties
> for
> >>>> the
> >>>>>> use case where you aren't generating unique classes for your model
> but
> >>>> just
> >>>>>> rely on the generic class.  In theory this could be addressed via
> >>>> runtime
> >>>>>> code generation or something else, but that would be quite a change.
> >>>>>>
> >>>>>> So I'm looking forward to discussing this and toward the future.
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Nikita Timofeev
> >
>
>

Re: Cayenne object storage / memory usage

Posted by Robert Zeigler <ro...@roxanemy.com>.

Kudos on the improvements, and to the original developers (Andrus, et al) for a fantastic design. These days, I’ve been doing a lot more Python coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction model more akin to Hibernate, though not as egregious).

Best,

Robert

> On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <an...@objectstyle.org> wrote:
> 
> The fact that we can switch to field-based DataObjects with minimal effort and without sacrificing a single thing in the Cayenne design is a *very* big deal! Thanks John for bringing the possibility to everyone's attention, and Nikita - for the working code and benchmarks. 
> 
> I am going to try this out on a real app some time next week. Very exciting! :)
> 
> Andrus
> 
> 
>> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <nt...@objectstyle.com> wrote:
>> 
>> Hi all,
>> 
>> I've run some additional benchmarks for field-based classes inspired
>> by John and they were so promising, that I've moved on
>> to the implementation.
>> 
>> So here is pull request for you to review [1].
>> Here [2] you can see what new generated classes will look like.
>> 
>> For me there is no visible downsides in this solution, e.g. both
>> memory usage and speed are improved.
>> All tests are clean and the only minor incompatibility out there
>> is in HOLLOW state that no longer resets object's values [3]
>> (though this can be implemented as well, I'm just
>> not sure this is really needed).
>> 
>> P.S. here is some raw numbers from my benchmarks.
>> I'm giving absolute numbers, but really only their relation is important.
>> Results for old version are on the left, for new version on the right.
>> 
>> Memory usage:
>> ==============
>> 1. 10.000 small objects
>> (int, Date and String ~ 20 chars)
>>>>> 6Mb vs 2.5Mb <<<
>> 
>> 2. 10.000 objects with big values
>> (int, Date and String ~ 1K chars)
>> Actually in case of same classes (same field number),
>> there will be just constant difference,
>> so this is just to get idea what to expect in different cases.
>>>>> 24.5Mb vs 21Mb <<<
>> 
>> Performance:
>> ==============
>> (numbers are in millions ops per sec, measured with JMH benchmark)
>> 1. Getter:
>>>>> 107 vs 177 <<<
>> 
>> 2. Setter:
>> Not so impressive, as Cayenne stack took most of the
>> time here to process graph diff, but still new methods are better.
>>>>> 12.5 vs 14.5 <<<
>> 
>> 3. readPropertyDirectly:
>>>>> 152 vs 248 <<<
>> 
>> 4. writePropertyDirectly:
>> This is map.put() vs switch(String) battle,
>> and map definitely loosing it :)
>>>>> 126 vs 582 <<<
>> 
>> [1] https://github.com/apache/cayenne/pull/235
>> [2] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
>> [3] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>> 
>> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <jo...@gmail.com> wrote:
>>> I was surprised by the difference in memory too, but this is a small diff
>>> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
>>> methods) so there isn't anything else going on.  My unverified assumption
>>> of HashMap is that is doubles in size each time it resizes, so entities
>>> with more fields could cause more waste. For example a entity with 65
>>> fields would have 63 empty array slots (ignoring fill factor).  So the
>>> exact savings may vary.
>>> 
>>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <ro...@roxanemy.com>
>>> wrote:
>>> 
>>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>>> stored? I suppose in theory, many values are relatively “small”,
>>>> memory-wise, so having the overhead of also storing the key could ~double
>>>> the memory use, but if you’re storing large values, I wouldn’t expect the
>>>> utilization to drop as dramatically. What were your data values (type and
>>>> length distribution for strings)?
>>>> 
>>>> Thanks!
>>>> 
>>>> Robert
>>>> 
>>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <bl...@gmail.com> wrote:
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I'm a little surprised that map-based storage is over 2x worse in memory
>>>>> consumption.  I'm wondering if there is more going on here than storage
>>>> of
>>>>> the property values.  Would it be simple enough to adapt your test case
>>>> to
>>>>> compare a list of POJOs vs a list of maps and see what the memory
>>>> footprint
>>>>> and difference is that way?
>>>>> 
>>>>> I personally was thinking the big improvement for using fields directly
>>>> is
>>>>> the speed improvement.  I didn't think the memory consumption difference
>>>>> would be that dramatic.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> mrg
>>>>> 
>>>>> 
>>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <jo...@gmail.com> wrote:
>>>>> 
>>>>>> I did some experimenting recently to see if changes to the way data in
>>>>>> stored in Cayenne objects could reduce the amount of memory they
>>>> consume.
>>>>>> 
>>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>>> (which is what CayenneDataObject uses).  The results were very
>>>> affirming.
>>>>>> For my test of loading 10,000 objects from every table in my database I
>>>> got
>>>>>> it to use about about *half the memory* of the default class (from 921
>>>> MB
>>>>>> down to 431 MB).
>>>>>> 
>>>>>> I know there has been some discussion already about addressing this
>>>> topic
>>>>>> for the next major release, so I thought I'd throw in some observations
>>>> /
>>>>>> questions here.
>>>>>> 
>>>>>> For my implementation I subclassed CayenneDataObject because in previous
>>>>>> experience I found implementing a replacement to be much more difficult
>>>> and
>>>>>> subject to more bugs due to the less frequently used code path that
>>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>>> things that are sort of specific to CayenneDataObject like Validating.
>>>>>> 
>>>>>> So one question is how we should be addressing the need that people may
>>>>>> have to create their own data classes. Right now I believe the
>>>> recommended
>>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>>> is a
>>>>>> viable solution without wholesale copying most of CayenneDataObject into
>>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>>> keeping
>>>>>> PersistentObject around) that includes all of CayenneDataObject except
>>>> the
>>>>>> property storage (HashMap).
>>>>>> 
>>>>>> For my implementation I had to modify CayenneDataObject, but only
>>>> slightly
>>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>>> class
>>>>>> isn't really intended for customization this map is referenced in
>>>> multiple
>>>>>> methods that can't easily be overridden to change the way things are
>>>>>> stored.
>>>>>> 
>>>>>> Another approach might be to ask why anyone should need to customize the
>>>>>> way data is stored in the objects if we can just use the best solution
>>>>>> possible in the first place?  I can't imagine a more efficient
>>>>>> representation that fields.  However, fields present difficulties for
>>>> the
>>>>>> use case where you aren't generating unique classes for your model but
>>>> just
>>>>>> rely on the generic class.  In theory this could be addressed via
>>>> runtime
>>>>>> code generation or something else, but that would be quite a change.
>>>>>> 
>>>>>> So I'm looking forward to discussing this and toward the future.
>>>>>> 
>>>>>> John
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>> Nikita Timofeev
>