You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Carlos Sanchez <ca...@riskmetrics.com> on 2010/04/23 21:50:57 UTC

Trove maps

Jonathan,

Have you thought of using Trove collections instead of regular java collections (HashMap / HashSet) in Cassandra? Trove maps are faster and require less memory

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

RE: Trove maps

Posted by Mark Jones <MJ...@imagehawk.com>.
Eliminating GC hell would probably do a lot to help Cassandra maintain speed vs periods of superfast/superslow performance.  I look forward to hearing how this experiment goes.

From: Eric Hauser [mailto:ewhauser@gmail.com]
Sent: Friday, April 23, 2010 3:37 PM
To: user@cassandra.apache.org
Subject: Re: Trove maps

According to their license page, it is LGPL.

On Fri, Apr 23, 2010 at 4:25 PM, Avinash Lakshman <av...@gmail.com>> wrote:
I think the GPL license of Trove prevents us from using it in Cassadra. But yes for all its maps it uses Open Addressing which is much more memory efficient than linear chaining that is employed in the JDK.

Avinash
On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez <ca...@riskmetrics.com>> wrote:
I will try to modify the code... what I like about Trove is that even for regular maps (non primitive) there are no Entry objects created so there are much less references to be gced

On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:

> From what I have seen Trove is only a win when you are doing Maps of
> primitives, which is mostly not what we use in Cassandra.  (The one
> exception I can think of is a map of int -> columnfamilies in
> CommitLogHeader.  You're welcome to experiment and see if using Trove
> there or elsewhere makes a measurable difference with stress.py.)
>
> -Jonathan
>
> On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
> <ca...@riskmetrics.com>> wrote:
>> Jonathan,
>>
>> Have you thought of using Trove collections instead of regular java collections (HashMap / HashSet) in Cassandra? Trove maps are faster and require less memory
>>
>> Carlos
>>
>> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>>


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.



Re: Trove maps

Posted by Eric Hauser <ew...@gmail.com>.
According to their license page, it is LGPL.


On Fri, Apr 23, 2010 at 4:25 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> I think the GPL license of Trove prevents us from using it in Cassadra. But
> yes for all its maps it uses Open Addressing which is much more memory
> efficient than linear chaining that is employed in the JDK.
>
> Avinash
>
> On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez <
> carlos.sanchez@riskmetrics.com> wrote:
>
>> I will try to modify the code... what I like about Trove is that even for
>> regular maps (non primitive) there are no Entry objects created so there are
>> much less references to be gced
>>
>> On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:
>>
>> > From what I have seen Trove is only a win when you are doing Maps of
>> > primitives, which is mostly not what we use in Cassandra.  (The one
>> > exception I can think of is a map of int -> columnfamilies in
>> > CommitLogHeader.  You're welcome to experiment and see if using Trove
>> > there or elsewhere makes a measurable difference with stress.py.)
>> >
>> > -Jonathan
>> >
>> > On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
>> > <ca...@riskmetrics.com> wrote:
>> >> Jonathan,
>> >>
>> >> Have you thought of using Trove collections instead of regular java
>> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and
>> require less memory
>> >>
>> >> Carlos
>> >>
>> >> This email message and any attachments are for the sole use of the
>> intended recipients and may contain proprietary and/or confidential
>> information which may be privileged or otherwise protected from disclosure.
>> Any unauthorized review, use, disclosure or distribution is prohibited. If
>> you are not an intended recipient, please contact the sender by reply email
>> and destroy the original message and any copies of the message as well as
>> any attachments to the original message.
>> >>
>>
>>
>> This email message and any attachments are for the sole use of the
>> intended recipients and may contain proprietary and/or confidential
>> information which may be privileged or otherwise protected from disclosure.
>> Any unauthorized review, use, disclosure or distribution is prohibited. If
>> you are not an intended recipient, please contact the sender by reply email
>> and destroy the original message and any copies of the message as well as
>> any attachments to the original message.
>>
>
>

Re: Trove maps

Posted by Avinash Lakshman <av...@gmail.com>.
I think the GPL license of Trove prevents us from using it in Cassadra. But
yes for all its maps it uses Open Addressing which is much more memory
efficient than linear chaining that is employed in the JDK.

Avinash

On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez <
carlos.sanchez@riskmetrics.com> wrote:

> I will try to modify the code... what I like about Trove is that even for
> regular maps (non primitive) there are no Entry objects created so there are
> much less references to be gced
>
> On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:
>
> > From what I have seen Trove is only a win when you are doing Maps of
> > primitives, which is mostly not what we use in Cassandra.  (The one
> > exception I can think of is a map of int -> columnfamilies in
> > CommitLogHeader.  You're welcome to experiment and see if using Trove
> > there or elsewhere makes a measurable difference with stress.py.)
> >
> > -Jonathan
> >
> > On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
> > <ca...@riskmetrics.com> wrote:
> >> Jonathan,
> >>
> >> Have you thought of using Trove collections instead of regular java
> collections (HashMap / HashSet) in Cassandra? Trove maps are faster and
> require less memory
> >>
> >> Carlos
> >>
> >> This email message and any attachments are for the sole use of the
> intended recipients and may contain proprietary and/or confidential
> information which may be privileged or otherwise protected from disclosure.
> Any unauthorized review, use, disclosure or distribution is prohibited. If
> you are not an intended recipient, please contact the sender by reply email
> and destroy the original message and any copies of the message as well as
> any attachments to the original message.
> >>
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>

Re: Trove maps

Posted by Prashant Malik <pm...@gmail.com>.
;) ya I it was painful

On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> Hahaha, Jeff - I remember scampering to remove those references to the
> Trove maps, I think around 2 years ago.
>
> Avinash
>
>
> On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher <ha...@cloudera.com>wrote:
>
>> Hey,
>>
>> History repeating itself a bit, here: one delay in getting Cassandra into
>> the open source world was removing its use of the Trove collections library,
>> as the license (LGPL) is not compatible with the Apache 2.0 license.
>>
>> Later,
>> Jeff
>>
>> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>wrote:
>>
>>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>>> <ca...@riskmetrics.com> wrote:
>>> > There are forEach methods in  that would allow you to travel the
>>> keys/values/entries w/o creating the extra object (entries)
>>>
>>> Ok. So if change was made, it'd make sense to ensure those were used
>>> for traversal. Thanks!
>>>
>>> -+ Tatu +-
>>>
>>
>>
>

Re: Trove maps

Posted by Avinash Lakshman <av...@gmail.com>.
Well it wasn't used for any critical operations. So there is no way to have
figured what impact it did or did not have.

Avinash

On Tue, May 4, 2010 at 7:49 PM, Cagatay Kavukcuoglu <cagatay@kavukcuoglu.org
> wrote:

> Did removing Trove collections have a noticeable effect on performance
> or memory use at the time?
>
> On Tuesday, May 4, 2010, Avinash Lakshman <av...@gmail.com>
> wrote:
> > Hahaha, Jeff - I remember scampering to remove those references to the
> Trove maps, I think around 2 years ago.
> > Avinash
> >
> > On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher <ha...@cloudera.com>
> wrote:
> > Hey,
> >
> > History repeating itself a bit, here: one delay in getting Cassandra into
> the open source world was removing its use of the Trove collections library,
> as the license (LGPL) is not compatible with the Apache 2.0 license.
> >
> > Later,
> > Jeff
> >
> > On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>
> wrote:
> >
> > On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
> > <ca...@riskmetrics.com> wrote:
> >> There are forEach methods in  that would allow you to travel the
> keys/values/entries w/o creating the extra object (entries)
> >
> > Ok. So if change was made, it'd make sense to ensure those were used
> > for traversal. Thanks!
> >
> > -+ Tatu +-
> >
> >
> >
> >
>
> --
> CK.
>

Re: Trove maps

Posted by Cagatay Kavukcuoglu <ca...@kavukcuoglu.org>.
Did removing Trove collections have a noticeable effect on performance
or memory use at the time?

On Tuesday, May 4, 2010, Avinash Lakshman <av...@gmail.com> wrote:
> Hahaha, Jeff - I remember scampering to remove those references to the Trove maps, I think around 2 years ago.
> Avinash
>
> On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
> Hey,
>
> History repeating itself a bit, here: one delay in getting Cassandra into the open source world was removing its use of the Trove collections library, as the license (LGPL) is not compatible with the Apache 2.0 license.
>
> Later,
> Jeff
>
> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com> wrote:
>
> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
> <ca...@riskmetrics.com> wrote:
>> There are forEach methods in  that would allow you to travel the keys/values/entries w/o creating the extra object (entries)
>
> Ok. So if change was made, it'd make sense to ensure those were used
> for traversal. Thanks!
>
> -+ Tatu +-
>
>
>
>

-- 
CK.

Re: Trove maps

Posted by Avinash Lakshman <av...@gmail.com>.
Hahaha, Jeff - I remember scampering to remove those references to the Trove
maps, I think around 2 years ago.

Avinash

On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> Hey,
>
> History repeating itself a bit, here: one delay in getting Cassandra into
> the open source world was removing its use of the Trove collections library,
> as the license (LGPL) is not compatible with the Apache 2.0 license.
>
> Later,
> Jeff
>
> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>wrote:
>
>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>> <ca...@riskmetrics.com> wrote:
>> > There are forEach methods in  that would allow you to travel the
>> keys/values/entries w/o creating the extra object (entries)
>>
>> Ok. So if change was made, it'd make sense to ensure those were used
>> for traversal. Thanks!
>>
>> -+ Tatu +-
>>
>
>

Re: Trove maps

Posted by Paul Brown <pa...@gmail.com>.
We went through this with Ode w.r.t. Hibernate.  Note that Ode still ships with Hibernate support there, just not with Hibernate libraries in the distribution or with a strong dependence on Hibernate.

So, if you made Trove maps optional and provided an adapter, you'd be OK.  You just can't bundle the Trove maps or have the project require it.

-- Paul

On May 4, 2010, at 10:24 AM, Tatu Saloranta wrote:

> Oh boy... that stupid, stupid bickering about true nature of LGPL.
> Both Apache Foundation and FSF appeared like little kids arguing over
> whose dad is stronger (this was few years back, when it was discussed
> whether LGPL components could be used for Apache License projects)
> Almost made me explicitly bar use of Apache licenses for my own projects. ;-p
> 
> (no, there is absolutely no reason to avoid LGPL from ASL license
> code, absolute none -- UNLESS code is (c) by FSF, in which case maybe
> there is a problem).
> 
> But of course Apache can impose their own, however misguided silly
> rules on projects under their umbrella. :-)
> 
> -+ Tatu +-
> 
> On Tue, May 4, 2010 at 6:16 AM, Boris Shulman <sh...@gmail.com> wrote:
>> LGPL ia listed as a part of a forbidden licenses for apache projects
>> (see Excluded Licenses in http://www.apache.org/legal/3party.html)...
>> 
>> On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
>>> Hey,
>>> 
>>> History repeating itself a bit, here: one delay in getting Cassandra into
>>> the open source world was removing its use of the Trove collections library,
>>> as the license (LGPL) is not compatible with the Apache 2.0 license.
>>> 
>>> Later,
>>> Jeff
>>> 
>>> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>
>>> wrote:
>>>> 
>>>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>>>> <ca...@riskmetrics.com> wrote:
>>>>> There are forEach methods in  that would allow you to travel the
>>>>> keys/values/entries w/o creating the extra object (entries)
>>>> 
>>>> Ok. So if change was made, it'd make sense to ensure those were used
>>>> for traversal. Thanks!
>>>> 
>>>> -+ Tatu +-
>>> 
>>> 
>> 


Re: Trove maps

Posted by Joe Stump <jo...@joestump.net>.
On May 4, 2010, at 6:24 PM, Tatu Saloranta wrote:

> But of course Apache can impose their own, however misguided silly
> rules on projects under their umbrella. :-)

I smell an -ac'esque patch to Cassandra brewing. ;)

--Joe


Re: Trove maps

Posted by Tatu Saloranta <ts...@gmail.com>.
Oh boy... that stupid, stupid bickering about true nature of LGPL.
Both Apache Foundation and FSF appeared like little kids arguing over
whose dad is stronger (this was few years back, when it was discussed
whether LGPL components could be used for Apache License projects)
Almost made me explicitly bar use of Apache licenses for my own projects. ;-p

(no, there is absolutely no reason to avoid LGPL from ASL license
code, absolute none -- UNLESS code is (c) by FSF, in which case maybe
there is a problem).

But of course Apache can impose their own, however misguided silly
rules on projects under their umbrella. :-)

-+ Tatu +-

On Tue, May 4, 2010 at 6:16 AM, Boris Shulman <sh...@gmail.com> wrote:
> LGPL ia listed as a part of a forbidden licenses for apache projects
> (see Excluded Licenses in http://www.apache.org/legal/3party.html)...
>
> On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
>> Hey,
>>
>> History repeating itself a bit, here: one delay in getting Cassandra into
>> the open source world was removing its use of the Trove collections library,
>> as the license (LGPL) is not compatible with the Apache 2.0 license.
>>
>> Later,
>> Jeff
>>
>> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>
>> wrote:
>>>
>>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>>> <ca...@riskmetrics.com> wrote:
>>> > There are forEach methods in  that would allow you to travel the
>>> > keys/values/entries w/o creating the extra object (entries)
>>>
>>> Ok. So if change was made, it'd make sense to ensure those were used
>>> for traversal. Thanks!
>>>
>>> -+ Tatu +-
>>
>>
>

Re: Trove maps

Posted by Boris Shulman <sh...@gmail.com>.
LGPL ia listed as a part of a forbidden licenses for apache projects
(see Excluded Licenses in http://www.apache.org/legal/3party.html)...

On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
> Hey,
>
> History repeating itself a bit, here: one delay in getting Cassandra into
> the open source world was removing its use of the Trove collections library,
> as the license (LGPL) is not compatible with the Apache 2.0 license.
>
> Later,
> Jeff
>
> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>
> wrote:
>>
>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>> <ca...@riskmetrics.com> wrote:
>> > There are forEach methods in  that would allow you to travel the
>> > keys/values/entries w/o creating the extra object (entries)
>>
>> Ok. So if change was made, it'd make sense to ensure those were used
>> for traversal. Thanks!
>>
>> -+ Tatu +-
>
>

Re: Trove maps

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Hey,

History repeating itself a bit, here: one delay in getting Cassandra into
the open source world was removing its use of the Trove collections library,
as the license (LGPL) is not compatible with the Apache 2.0 license.

Later,
Jeff

On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta <ts...@gmail.com>wrote:

> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
> <ca...@riskmetrics.com> wrote:
> > There are forEach methods in  that would allow you to travel the
> keys/values/entries w/o creating the extra object (entries)
>
> Ok. So if change was made, it'd make sense to ensure those were used
> for traversal. Thanks!
>
> -+ Tatu +-
>

Re: Trove maps

Posted by Tatu Saloranta <ts...@gmail.com>.
On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> There are forEach methods in  that would allow you to travel the keys/values/entries w/o creating the extra object (entries)

Ok. So if change was made, it'd make sense to ensure those were used
for traversal. Thanks!

-+ Tatu +-

RE: Trove maps

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
There are forEach methods in  that would allow you to travel the keys/values/entries w/o creating the extra object (entries)
________________________________________
From: Tatu Saloranta [tsaloranta@gmail.com]
Sent: Friday, April 23, 2010 11:58 PM
To: user@cassandra.apache.org
Subject: Re: Trove maps

On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> I will try to modify the code... what I like about Trove is that even for regular maps (non primitive) there are no Entry objects created so there are much less references to be gced

This could help, but how is iteration then handled? Are Map.Entry
instances created (and discarded) during iteration? (which could be a
net loss in some cases -- or maybe not, it's short-lived garbage vs
long-lived one if as part of long-living Map).
Just curious,

-+ Tatu +-

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Trove maps

Posted by Tatu Saloranta <ts...@gmail.com>.
On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> I will try to modify the code... what I like about Trove is that even for regular maps (non primitive) there are no Entry objects created so there are much less references to be gced

This could help, but how is iteration then handled? Are Map.Entry
instances created (and discarded) during iteration? (which could be a
net loss in some cases -- or maybe not, it's short-lived garbage vs
long-lived one if as part of long-living Map).
Just curious,

-+ Tatu +-

Re: Trove maps

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
I will try to modify the code... what I like about Trove is that even for regular maps (non primitive) there are no Entry objects created so there are much less references to be gced

On Apr 23, 2010, at 2:55 PM, Jonathan Ellis wrote:

> From what I have seen Trove is only a win when you are doing Maps of
> primitives, which is mostly not what we use in Cassandra.  (The one
> exception I can think of is a map of int -> columnfamilies in
> CommitLogHeader.  You're welcome to experiment and see if using Trove
> there or elsewhere makes a measurable difference with stress.py.)
>
> -Jonathan
>
> On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
> <ca...@riskmetrics.com> wrote:
>> Jonathan,
>>
>> Have you thought of using Trove collections instead of regular java collections (HashMap / HashSet) in Cassandra? Trove maps are faster and require less memory
>>
>> Carlos
>>
>> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>>


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Trove maps

Posted by Jonathan Ellis <jb...@gmail.com>.
>From what I have seen Trove is only a win when you are doing Maps of
primitives, which is mostly not what we use in Cassandra.  (The one
exception I can think of is a map of int -> columnfamilies in
CommitLogHeader.  You're welcome to experiment and see if using Trove
there or elsewhere makes a measurable difference with stress.py.)

-Jonathan

On Fri, Apr 23, 2010 at 2:50 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> Jonathan,
>
> Have you thought of using Trove collections instead of regular java collections (HashMap / HashSet) in Cassandra? Trove maps are faster and require less memory
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>