You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Pablo Molnar <pa...@gmail.com> on 2011/01/31 20:30:32 UTC

Persist JSON into HBase

Hi everyone,

In my company we are experimenting with HBase and I'd like to know the best
way to persist a semi-structured complex (3 levels) entity represented as
JSON to HBase.
I've already done successfully a Java client that persist rows in a table
and now my target is persist this JSON.

I've looked through the api and found a MapWritable class that could be
usefull because for me is very easy to convert the JSON enitity into a Map
and then persist it.

I really appreciate some example how I could implement this
If this is possible my next concerns would be how about if I want to create
and index by a key (of the Map)?

Thanks in advance,
Pablo Molnar

Re: Persist JSON into HBase

Posted by Lars George <la...@gmail.com>.
Sorry for the late bump...

It is quite nice to store JSON as strings in HBase, i.e. use for
example JSONObject to convert to something like "{ "name' : "lars" }"
and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler
you can use Hive and its built in JSON support to query cells like so:

select get_json_object(hbase_table.value, '$.name') from hbase_table
where key = <some-key>;

and it returns "lars".

Lars

On Mon, Jan 31, 2011 at 10:15 PM, Sandy Pratt <pr...@adobe.com> wrote:
> My use of HBase is essentially what Stack describes: I serialize little log entry objects with (mostly) protobuf and store them in a single cell in HBase.  I did this at first because it was easy, and made a note to go back and break out the fields into their own columns, and in fact into multiple column families in some cases.  When I went back and did this, I found that my 'exploded' schema was actually slower to scan than the 'blob' schema was, and filters didn't seem to help all that much.  This was in the 0.20 days, IIRC.  So this is to say, +1 on storing blobs in HBase.
>
> I don't know if this would work for you, but what's worked well for me is to write side files for Hive to read as I ingest entries into HBase.  I like HBase for durability, random access, sorting, and scanning, and I'll continue to use it to store the golden copy for the foreseeable future, but I've found that Hive against text files is at least a couple of times faster than MR against an HBase source for my map reduce needs.  If you find that what you need from the Hive schema changes over time, you can simply nuke the files and recreate them with a map reduce against the golden copy in HBase.
>
> Sandy
>

RE: Persist JSON into HBase

Posted by Sandy Pratt <pr...@adobe.com>.
My use of HBase is essentially what Stack describes: I serialize little log entry objects with (mostly) protobuf and store them in a single cell in HBase.  I did this at first because it was easy, and made a note to go back and break out the fields into their own columns, and in fact into multiple column families in some cases.  When I went back and did this, I found that my 'exploded' schema was actually slower to scan than the 'blob' schema was, and filters didn't seem to help all that much.  This was in the 0.20 days, IIRC.  So this is to say, +1 on storing blobs in HBase.

I don't know if this would work for you, but what's worked well for me is to write side files for Hive to read as I ingest entries into HBase.  I like HBase for durability, random access, sorting, and scanning, and I'll continue to use it to store the golden copy for the foreseeable future, but I've found that Hive against text files is at least a couple of times faster than MR against an HBase source for my map reduce needs.  If you find that what you need from the Hive schema changes over time, you can simply nuke the files and recreate them with a map reduce against the golden copy in HBase.

Sandy

Re: Persist JSON into HBase

Posted by Pablo Molnar <pa...@gmail.com>.
Thanks Dave!
It really makes sense. It all depends on how you want to process the data
later.

I guess I'm going to persist the Map instead of a String to leverage the
filters having to parse the JSON.

Pablo




On Mon, Jan 31, 2011 at 5:20 PM, Buttler, David <bu...@llnl.gov> wrote:

> How you serialize your objects to hbase depends on how you want to use your
> objects later.  Assuming that you have a good serialization to json already,
> and all you want to do is put and get the items, then just convert your json
> string to a byte array and put it in a column qualifier (e.g. Table name:
> 'Person'; Column Family name: 'a'; Column Qualifier name: 'json').
>
> However, if you want to scan your table and only pull out rows with certain
> attribute conditions (say, person names starting with 'A'), you may want to
> push the filter to the server and not have to pull out every single json
> object to your client, deserialize it, and then check the condition.  In
> that case, you may want to have each of your objects fields be a Column
> Qualifier (maybe in addition to the json, maybe as an alternative
> serialization).
>
> Does that make sense?
> Dave
>
>
> -----Original Message-----
> From: Pablo Molnar [mailto:pablomolnar@gmail.com]
> Sent: Monday, January 31, 2011 12:09 PM
> To: user@hbase.apache.org
> Subject: Re: Persist JSON into HBase
>
> Thanks for the feedback Stack!
> So you suggest to just serialize the JSON represent as a String or as a
> Map?
>
>
> Something like this:
>
> (supposing item is a String or a Map)
>
> Put row = new Put(Bytes.toBytes(item.id))
> row.add(Bytes.toBytes("json"), Bytes.toBytes("1"), Bytes.toBytes(item))
> table.put(row)
>
> What should I use as qualifier in this case?
> Is this way the json efficiently persisted?
> HBase does not offer a Map serializer? Should I use
> myObjectOutputStream.writeObject(map) ?
>
> Thanks again,
> Pablo
>
>
> On Mon, Jan 31, 2011 at 4:48 PM, Stack <st...@duboce.net> wrote:
>
> > Don't use MapWritable.
> >
> > In the layer above HBase, inside whatever is hosting the HBase client,
> > serialize the JSON to bytes and then write that to an HBase cell.  In
> > the same layer, reading, do the deserializations.
> >
> > HBase only does byte arrays.
> >
> > St.Ack
> >
> >
> > On Mon, Jan 31, 2011 at 11:30 AM, Pablo Molnar <pa...@gmail.com>
> > wrote:
> > > Hi everyone,
> > >
> > > In my company we are experimenting with HBase and I'd like to know the
> > best
> > > way to persist a semi-structured complex (3 levels) entity represented
> as
> > > JSON to HBase.
> > > I've already done successfully a Java client that persist rows in a
> table
> > > and now my target is persist this JSON.
> > >
> > > I've looked through the api and found a MapWritable class that could be
> > > usefull because for me is very easy to convert the JSON enitity into a
> > Map
> > > and then persist it.
> > >
> > > I really appreciate some example how I could implement this
> > > If this is possible my next concerns would be how about if I want to
> > create
> > > and index by a key (of the Map)?
> > >
> > > Thanks in advance,
> > > Pablo Molnar
> > >
> >
>

RE: Persist JSON into HBase

Posted by "Buttler, David" <bu...@llnl.gov>.
How you serialize your objects to hbase depends on how you want to use your objects later.  Assuming that you have a good serialization to json already, and all you want to do is put and get the items, then just convert your json string to a byte array and put it in a column qualifier (e.g. Table name: 'Person'; Column Family name: 'a'; Column Qualifier name: 'json').

However, if you want to scan your table and only pull out rows with certain attribute conditions (say, person names starting with 'A'), you may want to push the filter to the server and not have to pull out every single json object to your client, deserialize it, and then check the condition.  In that case, you may want to have each of your objects fields be a Column Qualifier (maybe in addition to the json, maybe as an alternative serialization).

Does that make sense?
Dave


-----Original Message-----
From: Pablo Molnar [mailto:pablomolnar@gmail.com] 
Sent: Monday, January 31, 2011 12:09 PM
To: user@hbase.apache.org
Subject: Re: Persist JSON into HBase

Thanks for the feedback Stack!
So you suggest to just serialize the JSON represent as a String or as a Map?


Something like this:

(supposing item is a String or a Map)

Put row = new Put(Bytes.toBytes(item.id))
row.add(Bytes.toBytes("json"), Bytes.toBytes("1"), Bytes.toBytes(item))
table.put(row)

What should I use as qualifier in this case?
Is this way the json efficiently persisted?
HBase does not offer a Map serializer? Should I use
myObjectOutputStream.writeObject(map) ?

Thanks again,
Pablo


On Mon, Jan 31, 2011 at 4:48 PM, Stack <st...@duboce.net> wrote:

> Don't use MapWritable.
>
> In the layer above HBase, inside whatever is hosting the HBase client,
> serialize the JSON to bytes and then write that to an HBase cell.  In
> the same layer, reading, do the deserializations.
>
> HBase only does byte arrays.
>
> St.Ack
>
>
> On Mon, Jan 31, 2011 at 11:30 AM, Pablo Molnar <pa...@gmail.com>
> wrote:
> > Hi everyone,
> >
> > In my company we are experimenting with HBase and I'd like to know the
> best
> > way to persist a semi-structured complex (3 levels) entity represented as
> > JSON to HBase.
> > I've already done successfully a Java client that persist rows in a table
> > and now my target is persist this JSON.
> >
> > I've looked through the api and found a MapWritable class that could be
> > usefull because for me is very easy to convert the JSON enitity into a
> Map
> > and then persist it.
> >
> > I really appreciate some example how I could implement this
> > If this is possible my next concerns would be how about if I want to
> create
> > and index by a key (of the Map)?
> >
> > Thanks in advance,
> > Pablo Molnar
> >
>

Re: Persist JSON into HBase

Posted by Pablo Molnar <pa...@gmail.com>.
Thanks for the feedback Stack!
So you suggest to just serialize the JSON represent as a String or as a Map?


Something like this:

(supposing item is a String or a Map)

Put row = new Put(Bytes.toBytes(item.id))
row.add(Bytes.toBytes("json"), Bytes.toBytes("1"), Bytes.toBytes(item))
table.put(row)

What should I use as qualifier in this case?
Is this way the json efficiently persisted?
HBase does not offer a Map serializer? Should I use
myObjectOutputStream.writeObject(map) ?

Thanks again,
Pablo


On Mon, Jan 31, 2011 at 4:48 PM, Stack <st...@duboce.net> wrote:

> Don't use MapWritable.
>
> In the layer above HBase, inside whatever is hosting the HBase client,
> serialize the JSON to bytes and then write that to an HBase cell.  In
> the same layer, reading, do the deserializations.
>
> HBase only does byte arrays.
>
> St.Ack
>
>
> On Mon, Jan 31, 2011 at 11:30 AM, Pablo Molnar <pa...@gmail.com>
> wrote:
> > Hi everyone,
> >
> > In my company we are experimenting with HBase and I'd like to know the
> best
> > way to persist a semi-structured complex (3 levels) entity represented as
> > JSON to HBase.
> > I've already done successfully a Java client that persist rows in a table
> > and now my target is persist this JSON.
> >
> > I've looked through the api and found a MapWritable class that could be
> > usefull because for me is very easy to convert the JSON enitity into a
> Map
> > and then persist it.
> >
> > I really appreciate some example how I could implement this
> > If this is possible my next concerns would be how about if I want to
> create
> > and index by a key (of the Map)?
> >
> > Thanks in advance,
> > Pablo Molnar
> >
>

Re: Persist JSON into HBase

Posted by Stack <st...@duboce.net>.
Don't use MapWritable.

In the layer above HBase, inside whatever is hosting the HBase client,
serialize the JSON to bytes and then write that to an HBase cell.  In
the same layer, reading, do the deserializations.

HBase only does byte arrays.

St.Ack


On Mon, Jan 31, 2011 at 11:30 AM, Pablo Molnar <pa...@gmail.com> wrote:
> Hi everyone,
>
> In my company we are experimenting with HBase and I'd like to know the best
> way to persist a semi-structured complex (3 levels) entity represented as
> JSON to HBase.
> I've already done successfully a Java client that persist rows in a table
> and now my target is persist this JSON.
>
> I've looked through the api and found a MapWritable class that could be
> usefull because for me is very easy to convert the JSON enitity into a Map
> and then persist it.
>
> I really appreciate some example how I could implement this
> If this is possible my next concerns would be how about if I want to create
> and index by a key (of the Map)?
>
> Thanks in advance,
> Pablo Molnar
>