You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2012/09/19 15:35:07 UTC

higher layer library makes things faster?

So there is this interesting case where a higher layer library makes things slower.  This is counter-intuitive as every abstraction usually makes things slower with an increase in productivity.    It would be cool if more and more libraries supported something to help with this scenario I think.

The concept.   Once in a while you end up needing a new query into an noSQL data model that already exists, and you do something like this

UserRow user = fetchUser(rowKey);
List<RoleMappingRow> mappings = fetchRoleMappings(user.getRoleMappingIds())
List<GroupIdRowKeys> rowKeys = new ArrayList<GroupIdRowKeys>();
for(RoleMapping m : mappings) {
   rowKeys.addAll(m.getGroupIds());
}
List<GroupRow> groups = fetchGroupRows(rowKeys);

It turns out if you index stuff, it is much faster in a lot of cases.  Instead you can scan just 2 index rows to find the keys for the groups and read them in.  Basically you do one scan on the RoleMapping where mapping has a FK to UserRow and you now have a list of primary keys for your RoleMapping.  Next, you do a scan of the GroupRow index feeding in the primary keys of your RoleMapping which feeds back your GroupRow primary keys that you need to lookup….in this way, you skip not only a lot of coding that goes into avoiding getting all the UserRow data back, and can simply scan the indexes.

That said, every time you update a row, you need to remove an old value from the index and a new one to the index.  Inserts only need to add.  Removes only need to remove.

Anyways, I have found this to be quite an interesting paradigm shift as right now doing the latter manually requires a lot more code (but then, that is why more and more libraries like playOrm I think will exist in the future as it makes the above very simple to do yourself).

Later,
Dean

Re: higher layer library makes things faster?

Posted by "Hiller, Dean" <De...@nrel.gov>.

I guess you could look at that as a form of cachingŠdidn't think of it at
the timeŠ.I usually think of it as caching in RAM, but this I guess is
caching on disk(though hopefully if the row cache is used for the 3 index
tables playOrm uses, it should be blazingly fast).

Dean

On 9/19/12 10:59 AM, "jeffpk@gmail.com" <je...@gmail.com> wrote:

>Actually its not uncommon at all.  Any caching implemented on a higher
>level will generally improve speed at a cost in memory.
>
>Beware common wisdom, its seldom very wise
>Sent from my Verizon Wireless BlackBerry
>
>-----Original Message-----
>From: "Hiller, Dean" <De...@nrel.gov>
>Date: Wed, 19 Sep 2012 07:35:07
>To: user@cassandra.apache.org<us...@cassandra.apache.org>
>Reply-To: user@cassandra.apache.org
>Subject: higher layer library makes things faster?
>
>So there is this interesting case where a higher layer library makes
>things slower.  This is counter-intuitive as every abstraction usually
>makes things slower with an increase in productivity.    It would be cool
>if more and more libraries supported something to help with this scenario
>I think.
>
>The concept.   Once in a while you end up needing a new query into an
>noSQL data model that already exists, and you do something like this
>
>UserRow user = fetchUser(rowKey);
>List<RoleMappingRow> mappings =
>fetchRoleMappings(user.getRoleMappingIds())
>List<GroupIdRowKeys> rowKeys = new ArrayList<GroupIdRowKeys>();
>for(RoleMapping m : mappings) {
>   rowKeys.addAll(m.getGroupIds());
>}
>List<GroupRow> groups = fetchGroupRows(rowKeys);
>
>It turns out if you index stuff, it is much faster in a lot of cases.
>Instead you can scan just 2 index rows to find the keys for the groups
>and read them in.  Basically you do one scan on the RoleMapping where
>mapping has a FK to UserRow and you now have a list of primary keys for
>your RoleMapping.  Next, you do a scan of the GroupRow index feeding in
>the primary keys of your RoleMapping which feeds back your GroupRow
>primary keys that you need to lookupŠ.in this way, you skip not only a
>lot of coding that goes into avoiding getting all the UserRow data back,
>and can simply scan the indexes.
>
>That said, every time you update a row, you need to remove an old value
>from the index and a new one to the index.  Inserts only need to add.
>Removes only need to remove.
>
>Anyways, I have found this to be quite an interesting paradigm shift as
>right now doing the latter manually requires a lot more code (but then,
>that is why more and more libraries like playOrm I think will exist in
>the future as it makes the above very simple to do yourself).
>
>Later,
>Dean

Re: higher layer library makes things faster?

Posted by je...@gmail.com.

Actually its not uncommon at all.  Any caching implemented on a higher level will generally improve speed at a cost in memory.

Beware common wisdom, its seldom very wise
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: "Hiller, Dean" <De...@nrel.gov>
Date: Wed, 19 Sep 2012 07:35:07 
To: user@cassandra.apache.org<us...@cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Subject: higher layer library makes things faster?

So there is this interesting case where a higher layer library makes things slower.  This is counter-intuitive as every abstraction usually makes things slower with an increase in productivity.    It would be cool if more and more libraries supported something to help with this scenario I think.

The concept.   Once in a while you end up needing a new query into an noSQL data model that already exists, and you do something like this

UserRow user = fetchUser(rowKey);
List<RoleMappingRow> mappings = fetchRoleMappings(user.getRoleMappingIds())
List<GroupIdRowKeys> rowKeys = new ArrayList<GroupIdRowKeys>();
for(RoleMapping m : mappings) {
   rowKeys.addAll(m.getGroupIds());
}
List<GroupRow> groups = fetchGroupRows(rowKeys);

It turns out if you index stuff, it is much faster in a lot of cases.  Instead you can scan just 2 index rows to find the keys for the groups and read them in.  Basically you do one scan on the RoleMapping where mapping has a FK to UserRow and you now have a list of primary keys for your RoleMapping.  Next, you do a scan of the GroupRow index feeding in the primary keys of your RoleMapping which feeds back your GroupRow primary keys that you need to lookup….in this way, you skip not only a lot of coding that goes into avoiding getting all the UserRow data back, and can simply scan the indexes.

That said, every time you update a row, you need to remove an old value from the index and a new one to the index.  Inserts only need to add.  Removes only need to remove.

Anyways, I have found this to be quite an interesting paradigm shift as right now doing the latter manually requires a lot more code (but then, that is why more and more libraries like playOrm I think will exist in the future as it makes the above very simple to do yourself).

Later,
Dean