You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lars George <la...@gmail.com> on 2011/02/03 09:47:39 UTC

Re: Persist JSON into HBase

Sorry for the late bump...

It is quite nice to store JSON as strings in HBase, i.e. use for
example JSONObject to convert to something like "{ "name' : "lars" }"
and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler
you can use Hive and its built in JSON support to query cells like so:

select get_json_object(hbase_table.value, '$.name') from hbase_table
where key = <some-key>;

and it returns "lars".

Lars

On Mon, Jan 31, 2011 at 10:15 PM, Sandy Pratt <pr...@adobe.com> wrote:
> My use of HBase is essentially what Stack describes: I serialize little log entry objects with (mostly) protobuf and store them in a single cell in HBase.  I did this at first because it was easy, and made a note to go back and break out the fields into their own columns, and in fact into multiple column families in some cases.  When I went back and did this, I found that my 'exploded' schema was actually slower to scan than the 'blob' schema was, and filters didn't seem to help all that much.  This was in the 0.20 days, IIRC.  So this is to say, +1 on storing blobs in HBase.
>
> I don't know if this would work for you, but what's worked well for me is to write side files for Hive to read as I ingest entries into HBase.  I like HBase for durability, random access, sorting, and scanning, and I'll continue to use it to store the golden copy for the foreseeable future, but I've found that Hive against text files is at least a couple of times faster than MR against an HBase source for my map reduce needs.  If you find that what you need from the Hive schema changes over time, you can simply nuke the files and recreate them with a map reduce against the golden copy in HBase.
>
> Sandy
>