You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Flavio Pompermaier <po...@okkam.it> on 2013/07/24 10:51:32 UTC

HBase source

Hi to all,
I'd like to read data from HBase and move it to Solr.
Is there an HBase source in Flume or something to read from it?

Best,
Flavio

Re: HBase source

Posted by Roshan Naik <ro...@hortonworks.com>.
Your task appears to be more of a periodic batch movement.. rather than
continuous streaming. Flume is meant for the latter use case.
-roshan


On Wed, Jul 24, 2013 at 3:19 AM, Flavio Pompermaier <po...@okkam.it>wrote:

> In my use case I have a Solr index that proxy the access to data stored in
> HBase (I ask solr for the rowkey of documents matching some query).
> What I'd like to do is to be able to rebuild this solr index, reading the
> json or xml stored in each record, map fields to my solr document and
> commit.
> I know that this is not the main goal of Flume but I think it could be
> used also for this kind of task.
> I looked at the tools you suggested but they seems to be very small
> projects and they do not provide very interesting features like those in
> morphlines
> (correct me if I'm wrong!).
>
> Best,
> Flavio
>
>
> On Wed, Jul 24, 2013 at 12:06 PM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Flume is a event collection tool, means Flume poll a source or catch
>> events. HBase is a database, and usually stores some kind of data in a
>> schema (CF). You could write a custom source and do a scan on your tables,
>> but really I see no sense in such a task. And a full table scan at HBase is
>> really expensive.
>> What do you mean with reindexing? HBase has primary and secondary indexes
>> (http://hbase.apache.org/book/secondary.indexes.html), which can be
>> processed over filters. To integrate HBase into SolR, you can use one of
>> the tools I mentioned in my post before or ask the SolR mailing lists.
>>
>> - Alex
>>
>> On Jul 24, 2013, at 11:29 AM, Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>> I was thinking to reindex my data stored in HBase and Flume + SolrSink
>> were perfect to this purpose (although I could obviously write a mapreduce
>> job).
>> Don't you think this could be a common scenario in which Flume could be
>> useful?
>>
>> On Wed, Jul 24, 2013 at 11:08 AM, Alexander Alten-Lorenz <
>> wget.null@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> No. And from my perspective it doesn't make sense. I think you look for
>>> tools like https://github.com/Photobucket/Solbase or
>>> http://code.google.com/p/hbase-solr-dataimport/.
>>>
>>> - Alex
>>>
>>> On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <po...@okkam.it>
>>> wrote:
>>>
>>> > Hi to all,
>>> > I'd like to read data from HBase and move it to Solr.
>>> > Is there an HBase source in Flume or something to read from it?
>>> >
>>> > Best,
>>> > Flavio
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> http://mapredit.blogspot.com
>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>
>>
>>
>>
>>
>>
>>
>>

Re: HBase source

Posted by Flavio Pompermaier <po...@okkam.it>.
In my use case I have a Solr index that proxy the access to data stored in
HBase (I ask solr for the rowkey of documents matching some query).
What I'd like to do is to be able to rebuild this solr index, reading the
json or xml stored in each record, map fields to my solr document and
commit.
I know that this is not the main goal of Flume but I think it could be used
also for this kind of task.
I looked at the tools you suggested but they seems to be very small
projects and they do not provide very interesting features like those in
morphlines
(correct me if I'm wrong!).

Best,
Flavio


On Wed, Jul 24, 2013 at 12:06 PM, Alexander Alten-Lorenz <
wget.null@gmail.com> wrote:

> Flume is a event collection tool, means Flume poll a source or catch
> events. HBase is a database, and usually stores some kind of data in a
> schema (CF). You could write a custom source and do a scan on your tables,
> but really I see no sense in such a task. And a full table scan at HBase is
> really expensive.
> What do you mean with reindexing? HBase has primary and secondary indexes (
> http://hbase.apache.org/book/secondary.indexes.html), which can be
> processed over filters. To integrate HBase into SolR, you can use one of
> the tools I mentioned in my post before or ask the SolR mailing lists.
>
> - Alex
>
> On Jul 24, 2013, at 11:29 AM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> I was thinking to reindex my data stored in HBase and Flume + SolrSink
> were perfect to this purpose (although I could obviously write a mapreduce
> job).
> Don't you think this could be a common scenario in which Flume could be
> useful?
>
> On Wed, Jul 24, 2013 at 11:08 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> No. And from my perspective it doesn't make sense. I think you look for
>> tools like https://github.com/Photobucket/Solbase or
>> http://code.google.com/p/hbase-solr-dataimport/.
>>
>> - Alex
>>
>> On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>> > Hi to all,
>> > I'd like to read data from HBase and move it to Solr.
>> > Is there an HBase source in Flume or something to read from it?
>> >
>> > Best,
>> > Flavio
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>
>
>
>
>
>
>

Re: HBase source

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
Flume is a event collection tool, means Flume poll a source or catch events. HBase is a database, and usually stores some kind of data in a schema (CF). You could write a custom source and do a scan on your tables, but really I see no sense in such a task. And a full table scan at HBase is really expensive.
What do you mean with reindexing? HBase has primary and secondary indexes (http://hbase.apache.org/book/secondary.indexes.html), which can be processed over filters. To integrate HBase into SolR, you can use one of the tools I mentioned in my post before or ask the SolR mailing lists.

- Alex

On Jul 24, 2013, at 11:29 AM, Flavio Pompermaier <po...@okkam.it> wrote:

> I was thinking to reindex my data stored in HBase and Flume + SolrSink were perfect to this purpose (although I could obviously write a mapreduce job).
> Don't you think this could be a common scenario in which Flume could be useful?
> 
> On Wed, Jul 24, 2013 at 11:08 AM, Alexander Alten-Lorenz <wg...@gmail.com> wrote:
> Hi,
> 
> No. And from my perspective it doesn't make sense. I think you look for tools like https://github.com/Photobucket/Solbase or http://code.google.com/p/hbase-solr-dataimport/.
> 
> - Alex
> 
> On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <po...@okkam.it> wrote:
> 
> > Hi to all,
> > I'd like to read data from HBase and move it to Solr.
> > Is there an HBase source in Flume or something to read from it?
> >
> > Best,
> > Flavio
> 
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> 
> 
> 
> 



Re: HBase source

Posted by Flavio Pompermaier <po...@okkam.it>.
I was thinking to reindex my data stored in HBase and Flume + SolrSink were
perfect to this purpose (although I could obviously write a mapreduce job).
Don't you think this could be a common scenario in which Flume could be
useful?

On Wed, Jul 24, 2013 at 11:08 AM, Alexander Alten-Lorenz <
wget.null@gmail.com> wrote:

> Hi,
>
> No. And from my perspective it doesn't make sense. I think you look for
> tools like https://github.com/Photobucket/Solbase or
> http://code.google.com/p/hbase-solr-dataimport/.
>
> - Alex
>
> On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> > Hi to all,
> > I'd like to read data from HBase and move it to Solr.
> > Is there an HBase source in Flume or something to read from it?
> >
> > Best,
> > Flavio
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>

Re: HBase source

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
Hi,

No. And from my perspective it doesn't make sense. I think you look for tools like https://github.com/Photobucket/Solbase or http://code.google.com/p/hbase-solr-dataimport/.

- Alex

On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <po...@okkam.it> wrote:

> Hi to all,
> I'd like to read data from HBase and move it to Solr.
> Is there an HBase source in Flume or something to read from it?
> 
> Best,
> Flavio

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF