You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ian Danforth <id...@numenta.com> on 2011/09/22 20:28:27 UTC

Storing (python) objects

All,

 I find myself considering storing serialized python dicts in Cassandra. I'd
like to store fairly complex, nested dicts, and it's just easier to do this
rather than work out a lot of super columns / columns etc.

 Do others find themselves storing serialized data structures in Cassandra
or is this generally a sign of doing something wrong?

 Thanks in advance!

Ian

Re: Storing (python) objects

Posted by Koert Kuipers <ko...@tresata.com>.
i would advise not to use a language specific storage format, you might
regret it later on if you want to add an application to your system that is
written in anything else than python. i mean python is great, but it is not
necessary the right tool for every job

look at thrift/protobuf/avro/bson/json
i would use a serialization with an IDL

On Fri, Sep 23, 2011 at 5:07 AM, David Allsopp <dn...@gmail.com> wrote:

> We have done exactly as you describe (nested dicts etc) - works fine as
> long as you are happy to read the whole lump of data, i.e. don't need to
> read at a finer granularity. This approach can also save a lot of storage
> space as you don't have the overhead of many small columns.
>
> Some folks also write JSON, which would be a bit more language-independent
> of course.
>
>
> On 22 September 2011 19:28, Ian Danforth <id...@numenta.com> wrote:
>
>> All,
>>
>>  I find myself considering storing serialized python dicts in Cassandra.
>> I'd like to store fairly complex, nested dicts, and it's just easier to do
>> this rather than work out a lot of super columns / columns etc.
>>
>>  Do others find themselves storing serialized data structures in Cassandra
>> or is this generally a sign of doing something wrong?
>>
>>  Thanks in advance!
>>
>> Ian
>>
>
>

Re: Storing (python) objects

Posted by David Allsopp <dn...@gmail.com>.
We have done exactly as you describe (nested dicts etc) - works fine as long
as you are happy to read the whole lump of data, i.e. don't need to read at
a finer granularity. This approach can also save a lot of storage space as
you don't have the overhead of many small columns.

Some folks also write JSON, which would be a bit more language-independent
of course.

On 22 September 2011 19:28, Ian Danforth <id...@numenta.com> wrote:

> All,
>
>  I find myself considering storing serialized python dicts in Cassandra.
> I'd like to store fairly complex, nested dicts, and it's just easier to do
> this rather than work out a lot of super columns / columns etc.
>
>  Do others find themselves storing serialized data structures in Cassandra
> or is this generally a sign of doing something wrong?
>
>  Thanks in advance!
>
> Ian
>

Re: Storing (python) objects

Posted by Tristan Seligmann <mi...@mithrandi.net>.
On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc <al...@datadoghq.com> wrote:
> For data accessed through a single path, I use the same trick: pickle, bz2
> and insert.

Note that unpickling a pickle in Python involves a) arbitrary code
execution, and b) relies on your code being the same (or close enough)
to what it was when the pickle was created, so it is generally a very
bad choice for persistent data serialization.
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar

Re: Storing (python) objects

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Sep 23, 2011 at 1:41 PM, Ian Danforth <id...@numenta.com> wrote:

> Good feedback from all. Thanks!
>
> Ian
>
> On Fri, Sep 23, 2011 at 7:48 AM, Tristan Seligmann <
> mithrandi@mithrandi.net> wrote:
>
>> On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc <al...@datadoghq.com>
>> wrote:
>> > For data accessed through a single path, I use the same trick: pickle,
>> bz2
>> > and insert.
>>
>> Note that unpickling a pickle in Python involves a) arbitrary code
>> execution, and b) relies on your code being the same (or close enough)
>> to what it was when the pickle was created, so it is generally a very
>> bad choice for persistent data serialization.
>> --
>> mithrandi, i Ainil en-Balandor, a faer Ambar
>>
>
>
I am working on something similar
https://github.com/edwardcapriolo/Cassandra-AnyType one of the features I
want to get at is being able to serialize any comparable object to json
using google gson. Doing this will allow storage of any Java object as json,
and the fields should sort by the same rules as compare to. (still a work in
progress)

Re: Storing (python) objects

Posted by Ian Danforth <id...@numenta.com>.
Good feedback from all. Thanks!

Ian

On Fri, Sep 23, 2011 at 7:48 AM, Tristan Seligmann
<mi...@mithrandi.net>wrote:

> On Fri, Sep 23, 2011 at 1:09 AM, Alexis Lê-Quôc <al...@datadoghq.com> wrote:
> > For data accessed through a single path, I use the same trick: pickle,
> bz2
> > and insert.
>
> Note that unpickling a pickle in Python involves a) arbitrary code
> execution, and b) relies on your code being the same (or close enough)
> to what it was when the pickle was created, so it is generally a very
> bad choice for persistent data serialization.
> --
> mithrandi, i Ainil en-Balandor, a faer Ambar
>

Re: Storing (python) objects

Posted by Alexis Lê-Quôc <al...@datadoghq.com>.
On Thu, Sep 22, 2011 at 3:50 PM, Aaron Turner <sy...@gmail.com> wrote:

> On Thu, Sep 22, 2011 at 11:28 AM, Ian Danforth <id...@numenta.com>
> wrote:
> > All,
> >  I find myself considering storing serialized python dicts in Cassandra.
> I'd
> > like to store fairly complex, nested dicts, and it's just easier to do
> this
> > rather than work out a lot of super columns / columns etc.
> >  Do others find themselves storing serialized data structures in
> Cassandra
> > or is this generally a sign of doing something wrong?
> >  Thanks in advance!
> > Ian
>
> I just convert my ruby objects to json strings and store it that way.
> Works just fine and there's no advantage to use SuperColumns since
> Cassandra has to read all the supercolumns anyways, so storing as json
> requires less overhead.
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

For data accessed through a single path, I use the same trick: pickle, bz2
and insert.

-- 
Alexis Lê-Quôc | Datadog, Inc. | @alq

Re: Storing (python) objects

Posted by Aaron Turner <sy...@gmail.com>.
On Thu, Sep 22, 2011 at 11:28 AM, Ian Danforth <id...@numenta.com> wrote:
> All,
>  I find myself considering storing serialized python dicts in Cassandra. I'd
> like to store fairly complex, nested dicts, and it's just easier to do this
> rather than work out a lot of super columns / columns etc.
>  Do others find themselves storing serialized data structures in Cassandra
> or is this generally a sign of doing something wrong?
>  Thanks in advance!
> Ian

I just convert my ruby objects to json strings and store it that way.
Works just fine and there's no advantage to use SuperColumns since
Cassandra has to read all the supercolumns anyways, so storing as json
requires less overhead.


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"