You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cayenne.apache.org by "Andrus Adamchik (JIRA)" <de...@cayenne.apache.org> on 2008/03/05 13:43:16 UTC

[jira] Created: (CAY-999) Scaling paginated list

Scaling paginated list
----------------------

Key: CAY-999
URL: https://issues.apache.org/cayenne/browse/CAY-999
Project: Cayenne
Issue Type: Improvement
Components: Cayenne Core Library
Affects Versions: 3.0
Reporter: Andrus Adamchik
Assignee: Andrus Adamchik

An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.

1. Simplify ID representation:

Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.

a. DataRow - 120 bytes,
b. HashMap - 104 bytes,
c. Object[] - 32 bytes,
d java.lang.Integer - 16 bytes
[primitive int is even better, but it complicates the implementation, as we'd need a parallel int[] (long[], double[], etc.) , so all in all we may get no gain]

2. Swap out LRU pages

For the very large lists, it would make sense to un-fault resolved pages when more pages are resolved , so that the list size doesn't grow beyond a certain fixed amount, no matter how many pages are resolved. These parameters will have to be configurable per query, as some users would prefer to keep the entire thing...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Improving memory use [Was: [jira] Created: (CAY-999) Scaling paginated list]

Posted by Andrus Adamchik <an...@objectstyle.org>.

Decode map will not hold the DataRows, only the "legend" to decode  
them. So it is a flyweight (as in "flyweight pattern"). E.g.

Artist Decode Map:

    "ARTIST_ID" -> 0
    "ARTIST_NAME" -> 1
    "DATE_OF_BIRTH" -> 2

Artist DataRow:

     [1, 'Dali', '19...']
     decodeMap // pointer to decodeMap

Andrus


On Mar 7, 2008, at 12:43 PM, Aristedes Maniatis wrote:

>
> On 06/03/2008, at 12:44 AM, Andrus Adamchik wrote:
>
>> This got me thinking about DataRow memory/creation efficiency  
>> throughout the framework. We are wasting lots of space on repeating  
>> information. Essentially a DataRow for each entity has a well  
>> defined set of keys, so ideally we can normalize the storage of  
>> DataRows internally, saving an Object[] of values with a reference  
>> to a shared "decode map", one per entity. Such a shared map would  
>> have DbAttribute names for the keys and array positions for the  
>> values. What we'll lose is the ability to serialize DataRows (e.g.  
>> for remote notifications), but maybe we can work around it somehow.
>
> How does this interact with the DataDomain snapshot cache? You've  
> explained that this cache is Map<ObjectId, DataRow> but it has an  
> LRU expiry policy. What happens with a DataRow which is expired from  
> the DataDomain but still exists in the 'decode map'? Is it possible  
> to merge the two concepts (snapshot cache and decode map) as long as  
> there was a more sophisticated expiry policy?
>
> The big benefit to reducing memory usage is that users will be able  
> to create larger caches and improve performance.
>
>
> Ari
>
>
> -------------------------->
> ish
> http://www.ish.com.au
> Level 1, 30 Wilson Street Newtown 2042 Australia
> phone +61 2 9550 5001   fax +61 2 9550 4001
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>
>
>

Re: Improving memory use [Was: [jira] Created: (CAY-999) Scaling paginated list]

Posted by Aristedes Maniatis <ar...@ish.com.au>.

On 06/03/2008, at 12:44 AM, Andrus Adamchik wrote:

> This got me thinking about DataRow memory/creation efficiency  
> throughout the framework. We are wasting lots of space on repeating  
> information. Essentially a DataRow for each entity has a well  
> defined set of keys, so ideally we can normalize the storage of  
> DataRows internally, saving an Object[] of values with a reference  
> to a shared "decode map", one per entity. Such a shared map would  
> have DbAttribute names for the keys and array positions for the  
> values. What we'll lose is the ability to serialize DataRows (e.g.  
> for remote notifications), but maybe we can work around it somehow.

How does this interact with the DataDomain snapshot cache? You've  
explained that this cache is Map<ObjectId, DataRow> but it has an LRU  
expiry policy. What happens with a DataRow which is expired from the  
DataDomain but still exists in the 'decode map'? Is it possible to  
merge the two concepts (snapshot cache and decode map) as long as  
there was a more sophisticated expiry policy?

The big benefit to reducing memory usage is that users will be able to  
create larger caches and improve performance.

Ari

-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Improving memory use [Was: [jira] Created: (CAY-999) Scaling paginated list]

Posted by Andrus Adamchik <an...@objectstyle.org>.

On Mar 5, 2008, at 2:43 PM, Andrus Adamchik (JIRA) wrote:

> a. DataRow - 120 bytes,
> b. HashMap - 104 bytes,
> c. Object[] - 32 bytes,
> d java.lang.Integer - 16 bytes

This got me thinking about DataRow memory/creation efficiency  
throughout the framework. We are wasting lots of space on repeating  
information. Essentially a DataRow for each entity has a well defined  
set of keys, so ideally we can normalize the storage of DataRows  
internally, saving an Object[] of values with a reference to a shared  
"decode map", one per entity. Such a shared map would have DbAttribute  
names for the keys and array positions for the values. What we'll lose  
is the ability to serialize DataRows (e.g. for remote notifications),  
but maybe we can work around it somehow.

Just thinking out loud ...

Andrus

[jira] Closed: (CAY-999) Scaling paginated list

Posted by "Andrus Adamchik (JIRA)" <de...@cayenne.apache.org>.

     [ https://issues.apache.org/cayenne/browse/CAY-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrus Adamchik closed CAY-999.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 3.0

Optimized for single column PK entities. Compound PK still uses the old algorithm and creates a DataRow.. We may implement a global change optimizing DataRows representation per this message: http://objectstyle.org/cayenne/lists/cayenne-devel/2008/03/0029.html

> Scaling paginated list
> ----------------------
>
>                 Key: CAY-999
>                 URL: https://issues.apache.org/cayenne/browse/CAY-999
>             Project: Cayenne
>          Issue Type: Improvement
>          Components: Cayenne Core Library
>    Affects Versions: 3.0
>            Reporter: Andrus Adamchik
>            Assignee: Andrus Adamchik
>             Fix For: 3.0
>
>
> An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.
> 1. Simplify ID representation:
> Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.
> a. DataRow - 120 bytes, 
> b. HashMap - 104 bytes,
> c. Object[] - 32 bytes,
> d java.lang.Integer - 16 bytes
> [primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CAY-999) Scaling paginated list

Posted by "Andrus Adamchik (JIRA)" <de...@cayenne.apache.org>.

     [ https://issues.apache.org/cayenne/browse/CAY-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrus Adamchik updated CAY-999:
--------------------------------

    Description: 
An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.

1. Simplify ID representation:

Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.

a. DataRow - 120 bytes, 
b. HashMap - 104 bytes,
c. Object[] - 32 bytes,
d java.lang.Integer - 16 bytes
[primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]


  was:
An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.

1. Simplify ID representation:

Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.

a. DataRow - 120 bytes, 
b. HashMap - 104 bytes,
c. Object[] - 32 bytes,
d java.lang.Integer - 16 bytes
[primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]

2. Swap out LRU pages

For the very large lists, it would make sense to un-fault resolved pages when more pages are resolved , so that the list size doesn't grow beyond a certain fixed amount, no matter how many pages are resolved. These parameters will have to be configurable per query, as some users would prefer to keep the entire thing...


Splitting second optimization in a separate Jira

> Scaling paginated list
> ----------------------
>
>                 Key: CAY-999
>                 URL: https://issues.apache.org/cayenne/browse/CAY-999
>             Project: Cayenne
>          Issue Type: Improvement
>          Components: Cayenne Core Library
>    Affects Versions: 3.0
>            Reporter: Andrus Adamchik
>            Assignee: Andrus Adamchik
>
> An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.
> 1. Simplify ID representation:
> Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.
> a. DataRow - 120 bytes, 
> b. HashMap - 104 bytes,
> c. Object[] - 32 bytes,
> d java.lang.Integer - 16 bytes
> [primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CAY-999) Scaling paginated list

Posted by "Ari Maniatis (JIRA)" <de...@cayenne.apache.org>.

    [ https://issues.apache.org/cayenne/browse/CAY-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764#action_12764 ] 

Ari Maniatis commented on CAY-999:
----------------------------------

1. Very useful.
2. Sounds like it could get complicated fast. Algorithms would be similar to virtual memory paging within an operating system (wikipaedia page with good links here: http://en.wikipedia.org/wiki/Page_replacement_algorithm). The danger is that performance regressions could be easily possible under certain workloads. Maybe OSCache has already solved some of these problems.

> Scaling paginated list
> ----------------------
>
>                 Key: CAY-999
>                 URL: https://issues.apache.org/cayenne/browse/CAY-999
>             Project: Cayenne
>          Issue Type: Improvement
>          Components: Cayenne Core Library
>    Affects Versions: 3.0
>            Reporter: Andrus Adamchik
>            Assignee: Andrus Adamchik
>
> An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.
> 1. Simplify ID representation:
> Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.
> a. DataRow - 120 bytes, 
> b. HashMap - 104 bytes,
> c. Object[] - 32 bytes,
> d java.lang.Integer - 16 bytes
> [primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]
> 2. Swap out LRU pages
> For the very large lists, it would make sense to un-fault resolved pages when more pages are resolved , so that the list size doesn't grow beyond a certain fixed amount, no matter how many pages are resolved. These parameters will have to be configurable per query, as some users would prefer to keep the entire thing...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CAY-999) Scaling paginated list

Posted by "Andrus Adamchik (JIRA)" <de...@cayenne.apache.org>.

    [ https://issues.apache.org/cayenne/browse/CAY-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765#action_12765 ] 

Andrus Adamchik commented on CAY-999:
-------------------------------------

yeah, #2 can get hairy pretty quickly. Although it may be important if a user needs to iterate through the huge lists of objects... Currently the only memory-sensitive way to do it is via ResultIterator. This may be a more user-friendly alternative. Whatever we do, we must ensure that this is an *optional* and not the default behavior.

> Scaling paginated list
> ----------------------
>
>                 Key: CAY-999
>                 URL: https://issues.apache.org/cayenne/browse/CAY-999
>             Project: Cayenne
>          Issue Type: Improvement
>          Components: Cayenne Core Library
>    Affects Versions: 3.0
>            Reporter: Andrus Adamchik
>            Assignee: Andrus Adamchik
>
> An idea for scaling IncrementalFaultList to store massive amount of objects, like hundreds of thousands.This pertains to the server-side IncrementalFaultList. The problems to solve are the speed of the initial list initialization and overall memory use.
> 1. Simplify ID representation:
> Even unresolved lists can take significant amount of memory... Each unresolved object slot currently stores a DataRow with N number of entries, where N is the number of PK columns for the entity. I.e. most often than not - 1 entry. Here is a memory use calculation for various representations of an unresolved entry, based on a single int PK DbEntity.
> a. DataRow - 120 bytes, 
> b. HashMap - 104 bytes,
> c. Object[] - 32 bytes,
> d java.lang.Integer - 16 bytes
> [primitive int is even better, but it complicates the implementation, as we'd need a parallel int[]  (long[], double[], etc.) , so all in all we may get no gain]
> 2. Swap out LRU pages
> For the very large lists, it would make sense to un-fault resolved pages when more pages are resolved , so that the list size doesn't grow beyond a certain fixed amount, no matter how many pages are resolved. These parameters will have to be configurable per query, as some users would prefer to keep the entire thing...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.