You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Hartzman, Leslie" <le...@medtronic.com> on 2013/09/11 20:26:59 UTC

Complex JSON objects

Hi,

What would be the recommended way to deal with a complex JSON structure, short of storing the whole JSON as a value to a column? What options are there to store dynamic data like this?

e.g.,

{
  " readings": [
                {
                       "value" : 20,
                      "rate_of_change" : 0.05,
                      "timestamp" :  1378686742465
                 },
                {
                       "value" : 22,
                      "rate_of_change" : 0.05,
                      "timestamp" :  1378686742466
                 },
                {
                       "value" : 21,
                      "rate_of_change" : 0.05,
                      "timestamp" :  1378686742467
                 }
  ],
  "events" : [
             {
                    "type" : "direction_change",
                    "version" : 0.1,
                    "timestamp": 1378686742465
                     "data" : {
                                          "units" : "miles",
                                          "direction" : "NW",
                                          "offset" : 23
                                      }
               },
             {
                    "type" : "altitude_change",
                    "version" : 0.1,
                    "timestamp": 1378686742465
                     "data" : {
                                          "rate": 0.2,
                                          "duration" : 18923
                                      }
                }
   ]
}



[CONFIDENTIALITY AND PRIVACY NOTICE]

Information transmitted by this email is proprietary to Medtronic and is intended for use only by the individual or entity to which it is addressed, and may contain information that is private, privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please delete this mail from your records.
 
To view this notice in other languages you can either select the following link or manually copy and paste the link into the address bar of a web browser: http://emaildisclaimer.medtronic.com

RE: Complex JSON objects

Posted by "Hartzman, Leslie" <le...@medtronic.com>.
Thanks to you and Paolo an Edward. You’ve given me something to think about. I’ll just have to figure out the most reasonable approach for my needs.

Les

From: Laing, Michael [mailto:michael.laing@nytimes.com]
Sent: Wednesday, September 11, 2013 2:39 PM
To: user@cassandra.apache.org
Subject: Re: Complex JSON objects

A way to do this would be to express the JSON structure as (path, value) tuples and then use a map<json, json> to store them.

For example, your JSON above can be expressed as shown below where the path is a list of keys/indices and the value is a scalar.

You could also concatenate the path elements and use them as a column key instead. The advantage there is that you can do range queries against such structures, and they will efficiently yield subtrees. E.g. a query for "path > 'readings.1.' and path < 'readings.1.\uffff'" will yield the appropriate rows.

ml

([u'events', 0, u'timestamp'], 1378686742465)
([u'events', 0, u'version'], 0.1)
([u'events', 0, u'type'], u'direction_change')
([u'events', 0, u'data', u'units'], u'miles')
([u'events', 0, u'data', u'direction'], u'NW')
([u'events', 0, u'data', u'offset'], 23)
([u'events', 1, u'timestamp'], 1378686742465)
([u'events', 1, u'version'], 0.1)
([u'events', 1, u'type'], u'altitude_change')
([u'events', 1, u'data', u'duration'], 18923)
([u'events', 1, u'data', u'rate'], 0.2)
([u'readings', 0, u'timestamp'], 1378686742465)
([u'readings', 0, u'value'], 20)
([u'readings', 0, u'rate_of_change'], 0.05)
([u'readings', 1, u'timestamp'], 1378686742466)
([u'readings', 1, u'value'], 22)
([u'readings', 1, u'rate_of_change'], 0.05)
([u'readings', 2, u'timestamp'], 1378686742467)
([u'readings', 2, u'value'], 21)
([u'readings', 2, u'rate_of_change'], 0.05)

On Wed, Sep 11, 2013 at 2:26 PM, Hartzman, Leslie <le...@medtronic.com>> wrote:
Hi,

What would be the recommended way to deal with a complex JSON structure, short of storing the whole JSON as a value to a column? What options are there to store dynamic data like this?

e.g.,

{
  “ readings”: [
                {
                       “value” : 20,
                      “rate_of_change” : 0.05,
                      “timestamp” :  1378686742465
                 },
                {
                       “value” : 22,
                      “rate_of_change” : 0.05,
                      “timestamp” :  1378686742466
                 },
                {
                       “value” : 21,
                      “rate_of_change” : 0.05,
                      “timestamp” :  1378686742467
                 }
  ],
  “events” : [
             {
                    “type” : “direction_change”,
                    “version” : 0.1,
                    “timestamp”: 1378686742465
                     “data” : {
                                          “units” : “miles”,
                                          “direction” : “NW”,
                                          “offset” : 23
                                      }
               },
             {
                    “type” : “altitude_change”,
                    “version” : 0.1,
                    “timestamp”: 1378686742465
                     “data” : {
                                          “rate”: 0.2,
                                          “duration” : 18923
                                      }
                }
   ]
}



[CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email is proprietary to Medtronic and is intended for use only by the individual or entity to which it is addressed, and may contain information that is private, privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please delete this mail from your records. To view this notice in other languages you can either select the following link or manually copy and paste the link into the address bar of a web browser: http://emaildisclaimer.medtronic.com


Re: Complex JSON objects

Posted by "Laing, Michael" <mi...@nytimes.com>.
A way to do this would be to express the JSON structure as (path, value)
tuples and then use a map<json, json> to store them.

For example, your JSON above can be expressed as shown below where the path
is a list of keys/indices and the value is a scalar.

You could also concatenate the path elements and use them as a column key
instead. The advantage there is that you can do range queries against such
structures, and they will efficiently yield subtrees. E.g. a query for
"path > 'readings.1.' and path < 'readings.1.\uffff'" will yield the
appropriate rows.

ml

([u'events', 0, u'timestamp'], 1378686742465)

([u'events', 0, u'version'], 0.1)

([u'events', 0, u'type'], u'direction_change')

([u'events', 0, u'data', u'units'], u'miles')

([u'events', 0, u'data', u'direction'], u'NW')

([u'events', 0, u'data', u'offset'], 23)

([u'events', 1, u'timestamp'], 1378686742465)

([u'events', 1, u'version'], 0.1)

([u'events', 1, u'type'], u'altitude_change')

([u'events', 1, u'data', u'duration'], 18923)

([u'events', 1, u'data', u'rate'], 0.2)

([u'readings', 0, u'timestamp'], 1378686742465)

([u'readings', 0, u'value'], 20)

([u'readings', 0, u'rate_of_change'], 0.05)

([u'readings', 1, u'timestamp'], 1378686742466)

([u'readings', 1, u'value'], 22)

([u'readings', 1, u'rate_of_change'], 0.05)

([u'readings', 2, u'timestamp'], 1378686742467)

([u'readings', 2, u'value'], 21)

([u'readings', 2, u'rate_of_change'], 0.05)


On Wed, Sep 11, 2013 at 2:26 PM, Hartzman, Leslie <
leslie.d.hartzman@medtronic.com> wrote:

>  Hi,****
>
> ** **
>
> What would be the recommended way to deal with a complex JSON structure,
> short of storing the whole JSON as a value to a column? What options are
> there to store dynamic data like this?****
>
> ** **
>
> e.g.,****
>
> ** **
>
> {****
>
>   “ readings”: [****
>
>                 {****
>
>                        “value” : 20,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742465****
>
>                  },****
>
>                 {****
>
>                        “value” : 22,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742466****
>
>                  },****
>
>                 {****
>
>                        “value” : 21,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742467****
>
>                  }****
>
>   ],****
>
>   “events” : [****
>
>              {****
>
>                     “type” : “direction_change”,****
>
>                     “version” : 0.1,****
>
>                     “timestamp”: 1378686742465****
>
>                      “data” : {****
>
>                                           “units” : “miles”,****
>
>                                           “direction” : “NW”,****
>
>                                           “offset” : 23****
>
>                                       }****
>
>                },****
>
>              {****
>
>                     “type” : “altitude_change”,****
>
>                     “version” : 0.1,****
>
>                     “timestamp”: 1378686742465****
>
>                      “data” : {****
>
>                                           “rate”: 0.2,****
>
>                                           “duration” : 18923****
>
>                                       }****
>
>                 }****
>
>    ]****
>
> }****
>
> ** **
>
>                  ****
>
> [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email
> is proprietary to Medtronic and is intended for use only by the individual
> or entity to which it is addressed, and may contain information that is
> private, privileged, confidential or exempt from disclosure under
> applicable law. If you are not the intended recipient or it appears that
> this mail has been forwarded to you without proper authority, you are
> notified that any use or dissemination of this information in any manner is
> strictly prohibited. In such cases, please delete this mail from your
> records. To view this notice in other languages you can either select the
> following link or manually copy and paste the link into the address bar of
> a web browser: http://emaildisclaimer.medtronic.com
>

Re: Complex JSON objects

Posted by Paulo Motta <pa...@gmail.com>.
What you can do to store a complex json object in a C* skinny row is to
serialize each field independently as a Json String and store each field as
a C* column within the same row (representing a JSON object).

So using the example you mentioned, you could store it in cassandra as:

ColumnFamily["objectKey"]["readings"] = "[{reading1}, {reading2},
{reading3}]"
ColumnFamily["objectKey"]["events"] = "[{event1}, {event2}, {event3}]"

But in fact, that isn't an optimal way to store such data in cassandra,
since you would need to de-serialize all the readings if you were
interested in a particular reading or time period.

A better way to store time series data is to store one measurement/event
per column, so you're able to retrieve data for a particular time period
more easily (since columns are stored in sorted order). One way to do that
for your data would be to store them in 2 column families, as in:

Reading["objectKey"]["timestamp3"] = "{reading3}"

Reading["objectKey"]["timestamp2"] = "{reading2}"

Reading["objectKey"]["timestamp1"] = "{reading1}"

Event["objectKey"]["timestamp3"] = "{event3}"

Event["objectKey"]["timestamp2"] = "{event2}"

Event["objectKey"]["timestamp1"] = "{event1}"


So you're able to reconstruct the original JSON "objectKey" by fetching the
columns from Reading["objectKey"] and Event["objectKey"], and you're also
able to efficiently query all readings between timestamp2 and timestamp3
that ocurred inside the json object, if necessary.


In this post you can find more information on how to store time series data
in C* in an efficient way:
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra


2013/9/11 Edward Capriolo <ed...@gmail.com>

> I was playing a while back with the concept of storing JSON into cassandra
> columns in a sortable way.
>
> Warning: This is kinda just a cool idea, I never productionized it.
> https://github.com/edwardcapriolo/Cassandra-AnyType
>
>
>
> On Wed, Sep 11, 2013 at 2:26 PM, Hartzman, Leslie <
> leslie.d.hartzman@medtronic.com> wrote:
>
>>  Hi,****
>>
>> ** **
>>
>> What would be the recommended way to deal with a complex JSON structure,
>> short of storing the whole JSON as a value to a column? What options are
>> there to store dynamic data like this?****
>>
>> ** **
>>
>> e.g.,****
>>
>> ** **
>>
>> {****
>>
>>   “ readings”: [****
>>
>>                 {****
>>
>>                        “value” : 20,****
>>
>>                       “rate_of_change” : 0.05,****
>>
>>                       “timestamp” :  1378686742465****
>>
>>                  },****
>>
>>                 {****
>>
>>                        “value” : 22,****
>>
>>                       “rate_of_change” : 0.05,****
>>
>>                       “timestamp” :  1378686742466****
>>
>>                  },****
>>
>>                 {****
>>
>>                        “value” : 21,****
>>
>>                       “rate_of_change” : 0.05,****
>>
>>                       “timestamp” :  1378686742467****
>>
>>                  }****
>>
>>   ],****
>>
>>   “events” : [****
>>
>>              {****
>>
>>                     “type” : “direction_change”,****
>>
>>                     “version” : 0.1,****
>>
>>                     “timestamp”: 1378686742465****
>>
>>                      “data” : {****
>>
>>                                           “units” : “miles”,****
>>
>>                                           “direction” : “NW”,****
>>
>>                                           “offset” : 23****
>>
>>                                       }****
>>
>>                },****
>>
>>              {****
>>
>>                     “type” : “altitude_change”,****
>>
>>                     “version” : 0.1,****
>>
>>                     “timestamp”: 1378686742465****
>>
>>                      “data” : {****
>>
>>                                           “rate”: 0.2,****
>>
>>                                           “duration” : 18923****
>>
>>                                       }****
>>
>>                 }****
>>
>>    ]****
>>
>> }****
>>
>> ** **
>>
>>                  ****
>>
>> [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this
>> email is proprietary to Medtronic and is intended for use only by the
>> individual or entity to which it is addressed, and may contain information
>> that is private, privileged, confidential or exempt from disclosure under
>> applicable law. If you are not the intended recipient or it appears that
>> this mail has been forwarded to you without proper authority, you are
>> notified that any use or dissemination of this information in any manner is
>> strictly prohibited. In such cases, please delete this mail from your
>> records. To view this notice in other languages you can either select the
>> following link or manually copy and paste the link into the address bar of
>> a web browser: http://emaildisclaimer.medtronic.com
>>
>
>


-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*

Re: Complex JSON objects

Posted by Edward Capriolo <ed...@gmail.com>.
I was playing a while back with the concept of storing JSON into cassandra
columns in a sortable way.

Warning: This is kinda just a cool idea, I never productionized it.
https://github.com/edwardcapriolo/Cassandra-AnyType



On Wed, Sep 11, 2013 at 2:26 PM, Hartzman, Leslie <
leslie.d.hartzman@medtronic.com> wrote:

>  Hi,****
>
> ** **
>
> What would be the recommended way to deal with a complex JSON structure,
> short of storing the whole JSON as a value to a column? What options are
> there to store dynamic data like this?****
>
> ** **
>
> e.g.,****
>
> ** **
>
> {****
>
>   “ readings”: [****
>
>                 {****
>
>                        “value” : 20,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742465****
>
>                  },****
>
>                 {****
>
>                        “value” : 22,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742466****
>
>                  },****
>
>                 {****
>
>                        “value” : 21,****
>
>                       “rate_of_change” : 0.05,****
>
>                       “timestamp” :  1378686742467****
>
>                  }****
>
>   ],****
>
>   “events” : [****
>
>              {****
>
>                     “type” : “direction_change”,****
>
>                     “version” : 0.1,****
>
>                     “timestamp”: 1378686742465****
>
>                      “data” : {****
>
>                                           “units” : “miles”,****
>
>                                           “direction” : “NW”,****
>
>                                           “offset” : 23****
>
>                                       }****
>
>                },****
>
>              {****
>
>                     “type” : “altitude_change”,****
>
>                     “version” : 0.1,****
>
>                     “timestamp”: 1378686742465****
>
>                      “data” : {****
>
>                                           “rate”: 0.2,****
>
>                                           “duration” : 18923****
>
>                                       }****
>
>                 }****
>
>    ]****
>
> }****
>
> ** **
>
>                  ****
>
> [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email
> is proprietary to Medtronic and is intended for use only by the individual
> or entity to which it is addressed, and may contain information that is
> private, privileged, confidential or exempt from disclosure under
> applicable law. If you are not the intended recipient or it appears that
> this mail has been forwarded to you without proper authority, you are
> notified that any use or dissemination of this information in any manner is
> strictly prohibited. In such cases, please delete this mail from your
> records. To view this notice in other languages you can either select the
> following link or manually copy and paste the link into the address bar of
> a web browser: http://emaildisclaimer.medtronic.com
>