You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shashidhar Rao <ra...@gmail.com> on 2015/01/04 11:01:52 UTC

Storing Json format in Hbase

Hi,

Can someone guide me if the solution I am proposing is a feasible option or
not

1. Large xml data is delivered through external system.
2. Convert these into json format.
3. Store it into HBASE ,even though there will be hardly any updates , only
retrieval. I have looked at Hive but finally had to decide against it as
retrieval would be slow.
4. Need to use Hadoop Nosql as other components are all using Hadoop
ecosystem.

Can xml data be directly stored into Hbase without any
transformation.(second question)

Any suggestions on storing xml data on Nosql. (only open source and no
commercial nosql)

Thanks in advance

Shashi

Re: Storing Json format in Hbase

Posted by Shashidhar Rao <ra...@gmail.com>.
 Ayache,

In fact my use case fit Exist db , open source ,no license and after all
xml documents and query through xquery and xpath and thanks for the
suggestion.
One last question, what do you think of this exist db ? Can this db scale
well upto 50 -100 terabytes or more of xml data load in future. I mean I
could not find much on their web site.
Who all are using this exist db in production? Any idea.

Thanks
Shashi

On Sun, Jan 4, 2015 at 9:15 PM, Shashidhar Rao <ra...@gmail.com>
wrote:

> Thanks a lot Ayache for the links
>
> On Sun, Jan 4, 2015 at 8:49 PM, Ayache Khettar <
> ayache.khettar@googlemail.com> wrote:
>
>> Hi
>>
>> HBase doesn't support XML query using xpath. For that you will have to
>> consider an XML database such as exist (
>> http://exist-db.org/exist/apps/homepage/index.html) or MarkLogic
>> (requires
>> commercial licence). If you still want to use Hbase then consider storing
>> metadata along the XML Payload with same row ID. You will have to think of
>> your queries first before making decision on how metadata you would want
>> to
>> store. In one of the project I was involved in, we stored metadata data in
>> Apache solar using Hbase indexer (see cloudera product suite
>>
>> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_use_hbase_indexer_service.html
>> )
>> which is near real time update. So the payload xml ends up in hbase and
>> the
>> metadata goes into apache solar. So you query against apache solar as
>> opposed to Hbase.
>>
>> There are various ways on how to achieve what you wanted and all down to
>> the choice of the technology and architecture drivers.
>>
>> all the best
>>
>> Ayache
>>
>>
>>
>>
>> On 4 January 2015 at 11:47, Shashidhar Rao <ra...@gmail.com>
>> wrote:
>>
>> > Ayache and Chandrashekhar,
>> >
>> > You are correct, even I am reluctant to go for json transformation.
>> Storing
>> > xml in Hbase without  transformation to json would be a lot easier at
>> the
>> > storing stage.
>> >
>> > But, my concern is querying this xml data from HBase. Queries include
>> > aggregation, count and joins just to name a few. Can you please shed
>> some
>> > lights on how to query xml data from Hbase , is it possible to use
>> xquery
>> > or xpath?
>> >
>> > Json transformation was considered because of Mongodb, as it supports
>> > native json format and  it seems to be good in analytics. Analytics
>> would
>> > be at later stage.
>> >
>> > Can you please share some insights into xml querying from Hbase ,any
>> links
>> > would be helpful or any example , I am unable to find.
>> >
>> > Thanks in advance
>> >
>> > Shashi
>> >
>> > On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
>> > ayache.khettar@googlemail.com> wrote:
>> >
>> > > Hi
>> > >
>> > > You could perfectly store XML into Hbase without any issue. All
>> depends
>> > > what do with the XML. To query back the XML, you will have to store
>> its
>> > >  metadata with it using the same row ID. This way you could query back
>> > the
>> > > XML. I would go for JSON transformation only if the down stream flow
>> > needs
>> > > the payload in JSON format.
>> > >
>> > > Ayache
>> > > On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
>> > > shekhar.kotekar@gmail.com> wrote:
>> > >
>> > > > You can convert xml to json using map-reduce program and then store
>> > json
>> > > > into HBase but you need to decide what should be your row key.
>> > > >
>> > > > Another point you have to take into account is that if you want to
>> > search
>> > > > anything inside json or not. If you want to search inside json then
>> > HBase
>> > > > won't be best option for you. Probably you can switch to MongoDB or
>> > some
>> > > > other document store.
>> > > >
>> > > > Hope it helps...
>> > > >
>> > > > Regards,
>> > > > Chandrashekhar
>> > > > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <
>> raoshashidhar123@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Can someone guide me if the solution I am proposing is a feasible
>> > > option
>> > > > or
>> > > > > not
>> > > > >
>> > > > > 1. Large xml data is delivered through external system.
>> > > > > 2. Convert these into json format.
>> > > > > 3. Store it into HBASE ,even though there will be hardly any
>> updates
>> > ,
>> > > > only
>> > > > > retrieval. I have looked at Hive but finally had to decide
>> against it
>> > > as
>> > > > > retrieval would be slow.
>> > > > > 4. Need to use Hadoop Nosql as other components are all using
>> Hadoop
>> > > > > ecosystem.
>> > > > >
>> > > > > Can xml data be directly stored into Hbase without any
>> > > > > transformation.(second question)
>> > > > >
>> > > > > Any suggestions on storing xml data on Nosql. (only open source
>> and
>> > no
>> > > > > commercial nosql)
>> > > > >
>> > > > > Thanks in advance
>> > > > >
>> > > > > Shashi
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Storing Json format in Hbase

Posted by Shashidhar Rao <ra...@gmail.com>.
Thanks a lot Ayache for the links

On Sun, Jan 4, 2015 at 8:49 PM, Ayache Khettar <
ayache.khettar@googlemail.com> wrote:

> Hi
>
> HBase doesn't support XML query using xpath. For that you will have to
> consider an XML database such as exist (
> http://exist-db.org/exist/apps/homepage/index.html) or MarkLogic (requires
> commercial licence). If you still want to use Hbase then consider storing
> metadata along the XML Payload with same row ID. You will have to think of
> your queries first before making decision on how metadata you would want to
> store. In one of the project I was involved in, we stored metadata data in
> Apache solar using Hbase indexer (see cloudera product suite
>
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_use_hbase_indexer_service.html
> )
> which is near real time update. So the payload xml ends up in hbase and the
> metadata goes into apache solar. So you query against apache solar as
> opposed to Hbase.
>
> There are various ways on how to achieve what you wanted and all down to
> the choice of the technology and architecture drivers.
>
> all the best
>
> Ayache
>
>
>
>
> On 4 January 2015 at 11:47, Shashidhar Rao <ra...@gmail.com>
> wrote:
>
> > Ayache and Chandrashekhar,
> >
> > You are correct, even I am reluctant to go for json transformation.
> Storing
> > xml in Hbase without  transformation to json would be a lot easier at the
> > storing stage.
> >
> > But, my concern is querying this xml data from HBase. Queries include
> > aggregation, count and joins just to name a few. Can you please shed some
> > lights on how to query xml data from Hbase , is it possible to use xquery
> > or xpath?
> >
> > Json transformation was considered because of Mongodb, as it supports
> > native json format and  it seems to be good in analytics. Analytics would
> > be at later stage.
> >
> > Can you please share some insights into xml querying from Hbase ,any
> links
> > would be helpful or any example , I am unable to find.
> >
> > Thanks in advance
> >
> > Shashi
> >
> > On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
> > ayache.khettar@googlemail.com> wrote:
> >
> > > Hi
> > >
> > > You could perfectly store XML into Hbase without any issue. All depends
> > > what do with the XML. To query back the XML, you will have to store its
> > >  metadata with it using the same row ID. This way you could query back
> > the
> > > XML. I would go for JSON transformation only if the down stream flow
> > needs
> > > the payload in JSON format.
> > >
> > > Ayache
> > > On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
> > > shekhar.kotekar@gmail.com> wrote:
> > >
> > > > You can convert xml to json using map-reduce program and then store
> > json
> > > > into HBase but you need to decide what should be your row key.
> > > >
> > > > Another point you have to take into account is that if you want to
> > search
> > > > anything inside json or not. If you want to search inside json then
> > HBase
> > > > won't be best option for you. Probably you can switch to MongoDB or
> > some
> > > > other document store.
> > > >
> > > > Hope it helps...
> > > >
> > > > Regards,
> > > > Chandrashekhar
> > > > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <raoshashidhar123@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Can someone guide me if the solution I am proposing is a feasible
> > > option
> > > > or
> > > > > not
> > > > >
> > > > > 1. Large xml data is delivered through external system.
> > > > > 2. Convert these into json format.
> > > > > 3. Store it into HBASE ,even though there will be hardly any
> updates
> > ,
> > > > only
> > > > > retrieval. I have looked at Hive but finally had to decide against
> it
> > > as
> > > > > retrieval would be slow.
> > > > > 4. Need to use Hadoop Nosql as other components are all using
> Hadoop
> > > > > ecosystem.
> > > > >
> > > > > Can xml data be directly stored into Hbase without any
> > > > > transformation.(second question)
> > > > >
> > > > > Any suggestions on storing xml data on Nosql. (only open source and
> > no
> > > > > commercial nosql)
> > > > >
> > > > > Thanks in advance
> > > > >
> > > > > Shashi
> > > > >
> > > >
> > >
> >
>

Re: Storing Json format in Hbase

Posted by Ayache Khettar <ay...@googlemail.com>.
Hi

HBase doesn't support XML query using xpath. For that you will have to
consider an XML database such as exist (
http://exist-db.org/exist/apps/homepage/index.html) or MarkLogic (requires
commercial licence). If you still want to use Hbase then consider storing
metadata along the XML Payload with same row ID. You will have to think of
your queries first before making decision on how metadata you would want to
store. In one of the project I was involved in, we stored metadata data in
Apache solar using Hbase indexer (see cloudera product suite
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_use_hbase_indexer_service.html)
which is near real time update. So the payload xml ends up in hbase and the
metadata goes into apache solar. So you query against apache solar as
opposed to Hbase.

There are various ways on how to achieve what you wanted and all down to
the choice of the technology and architecture drivers.

all the best

Ayache




On 4 January 2015 at 11:47, Shashidhar Rao <ra...@gmail.com>
wrote:

> Ayache and Chandrashekhar,
>
> You are correct, even I am reluctant to go for json transformation. Storing
> xml in Hbase without  transformation to json would be a lot easier at the
> storing stage.
>
> But, my concern is querying this xml data from HBase. Queries include
> aggregation, count and joins just to name a few. Can you please shed some
> lights on how to query xml data from Hbase , is it possible to use xquery
> or xpath?
>
> Json transformation was considered because of Mongodb, as it supports
> native json format and  it seems to be good in analytics. Analytics would
> be at later stage.
>
> Can you please share some insights into xml querying from Hbase ,any links
> would be helpful or any example , I am unable to find.
>
> Thanks in advance
>
> Shashi
>
> On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
> ayache.khettar@googlemail.com> wrote:
>
> > Hi
> >
> > You could perfectly store XML into Hbase without any issue. All depends
> > what do with the XML. To query back the XML, you will have to store its
> >  metadata with it using the same row ID. This way you could query back
> the
> > XML. I would go for JSON transformation only if the down stream flow
> needs
> > the payload in JSON format.
> >
> > Ayache
> > On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
> > shekhar.kotekar@gmail.com> wrote:
> >
> > > You can convert xml to json using map-reduce program and then store
> json
> > > into HBase but you need to decide what should be your row key.
> > >
> > > Another point you have to take into account is that if you want to
> search
> > > anything inside json or not. If you want to search inside json then
> HBase
> > > won't be best option for you. Probably you can switch to MongoDB or
> some
> > > other document store.
> > >
> > > Hope it helps...
> > >
> > > Regards,
> > > Chandrashekhar
> > > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <ra...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone guide me if the solution I am proposing is a feasible
> > option
> > > or
> > > > not
> > > >
> > > > 1. Large xml data is delivered through external system.
> > > > 2. Convert these into json format.
> > > > 3. Store it into HBASE ,even though there will be hardly any updates
> ,
> > > only
> > > > retrieval. I have looked at Hive but finally had to decide against it
> > as
> > > > retrieval would be slow.
> > > > 4. Need to use Hadoop Nosql as other components are all using Hadoop
> > > > ecosystem.
> > > >
> > > > Can xml data be directly stored into Hbase without any
> > > > transformation.(second question)
> > > >
> > > > Any suggestions on storing xml data on Nosql. (only open source and
> no
> > > > commercial nosql)
> > > >
> > > > Thanks in advance
> > > >
> > > > Shashi
> > > >
> > >
> >
>

Re: Storing Json format in Hbase

Posted by Shashidhar Rao <ra...@gmail.com>.
Ayache and Chandrashekhar,

You are correct, even I am reluctant to go for json transformation. Storing
xml in Hbase without  transformation to json would be a lot easier at the
storing stage.

But, my concern is querying this xml data from HBase. Queries include
aggregation, count and joins just to name a few. Can you please shed some
lights on how to query xml data from Hbase , is it possible to use xquery
or xpath?

Json transformation was considered because of Mongodb, as it supports
native json format and  it seems to be good in analytics. Analytics would
be at later stage.

Can you please share some insights into xml querying from Hbase ,any links
would be helpful or any example , I am unable to find.

Thanks in advance

Shashi

On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
ayache.khettar@googlemail.com> wrote:

> Hi
>
> You could perfectly store XML into Hbase without any issue. All depends
> what do with the XML. To query back the XML, you will have to store its
>  metadata with it using the same row ID. This way you could query back the
> XML. I would go for JSON transformation only if the down stream flow needs
> the payload in JSON format.
>
> Ayache
> On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
> shekhar.kotekar@gmail.com> wrote:
>
> > You can convert xml to json using map-reduce program and then store json
> > into HBase but you need to decide what should be your row key.
> >
> > Another point you have to take into account is that if you want to search
> > anything inside json or not. If you want to search inside json then HBase
> > won't be best option for you. Probably you can switch to MongoDB or some
> > other document store.
> >
> > Hope it helps...
> >
> > Regards,
> > Chandrashekhar
> > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <ra...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Can someone guide me if the solution I am proposing is a feasible
> option
> > or
> > > not
> > >
> > > 1. Large xml data is delivered through external system.
> > > 2. Convert these into json format.
> > > 3. Store it into HBASE ,even though there will be hardly any updates ,
> > only
> > > retrieval. I have looked at Hive but finally had to decide against it
> as
> > > retrieval would be slow.
> > > 4. Need to use Hadoop Nosql as other components are all using Hadoop
> > > ecosystem.
> > >
> > > Can xml data be directly stored into Hbase without any
> > > transformation.(second question)
> > >
> > > Any suggestions on storing xml data on Nosql. (only open source and no
> > > commercial nosql)
> > >
> > > Thanks in advance
> > >
> > > Shashi
> > >
> >
>

Re: Storing Json format in Hbase

Posted by Ayache Khettar <ay...@googlemail.com>.
Hi

You could perfectly store XML into Hbase without any issue. All depends
what do with the XML. To query back the XML, you will have to store its
 metadata with it using the same row ID. This way you could query back the
XML. I would go for JSON transformation only if the down stream flow needs
the payload in JSON format.

Ayache
On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
shekhar.kotekar@gmail.com> wrote:

> You can convert xml to json using map-reduce program and then store json
> into HBase but you need to decide what should be your row key.
>
> Another point you have to take into account is that if you want to search
> anything inside json or not. If you want to search inside json then HBase
> won't be best option for you. Probably you can switch to MongoDB or some
> other document store.
>
> Hope it helps...
>
> Regards,
> Chandrashekhar
> On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <ra...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can someone guide me if the solution I am proposing is a feasible option
> or
> > not
> >
> > 1. Large xml data is delivered through external system.
> > 2. Convert these into json format.
> > 3. Store it into HBASE ,even though there will be hardly any updates ,
> only
> > retrieval. I have looked at Hive but finally had to decide against it as
> > retrieval would be slow.
> > 4. Need to use Hadoop Nosql as other components are all using Hadoop
> > ecosystem.
> >
> > Can xml data be directly stored into Hbase without any
> > transformation.(second question)
> >
> > Any suggestions on storing xml data on Nosql. (only open source and no
> > commercial nosql)
> >
> > Thanks in advance
> >
> > Shashi
> >
>

Re: Storing Json format in Hbase

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.
You can convert xml to json using map-reduce program and then store json
into HBase but you need to decide what should be your row key.

Another point you have to take into account is that if you want to search
anything inside json or not. If you want to search inside json then HBase
won't be best option for you. Probably you can switch to MongoDB or some
other document store.

Hope it helps...

Regards,
Chandrashekhar
On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <ra...@gmail.com> wrote:

> Hi,
>
> Can someone guide me if the solution I am proposing is a feasible option or
> not
>
> 1. Large xml data is delivered through external system.
> 2. Convert these into json format.
> 3. Store it into HBASE ,even though there will be hardly any updates , only
> retrieval. I have looked at Hive but finally had to decide against it as
> retrieval would be slow.
> 4. Need to use Hadoop Nosql as other components are all using Hadoop
> ecosystem.
>
> Can xml data be directly stored into Hbase without any
> transformation.(second question)
>
> Any suggestions on storing xml data on Nosql. (only open source and no
> commercial nosql)
>
> Thanks in advance
>
> Shashi
>