You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@fluo.apache.org by Alan Camillo <al...@blueshift.com.br> on 2017/12/19 18:25:11 UTC

Fluo application question

We just start a project that the objective is consolidate some personal
information using some business rules. It's a kind of ranking of the best
information of a person.

Today they use to reprocess every batch they receive comparing the new data
with all historical data. They're using Spark for this operation.
I'd like to propose something like this:
https://www.dropbox.com/s/glqhh7zzxd7g433/architecture.png?dl=0

Two questions:
 - is it possible create an observer to synchronizes with HBase?
 - Am I doing a good use of Fluo? If not, why?

Thank you all!

-----Original Message-----
From: Keith Turner [mailto:keith@deenlo.com]
Sent: Tuesday, December 19, 2017 2:21 PM
To: fluo-dev <de...@fluo.apache.org>
Subject: Re: About user group

On Tue, Dec 19, 2017 at 8:18 AM, Alan Camillo <al...@blueshift.com.br> wrote:
> Hello Fluo group!
>
> My name is Alan, I'm a big date architect and owner of a company
> called BlueShift Brasil. And I'm looking foward for Apache Fluo. I'd
> like to know about a user group to because I was no able to find, is it
> exist?

We currently do not have a user list.  Feel free to ask any questions you
have here on the dev list.

>
> I have many questions to do and I would'nt like to post those here.
> If I could help if something in the project, please count on me.

If you are interested in contributing, the following may be a good issue to
start with.

https://github.com/apache/fluo-docker/issues/9

>
> Thanks!
> Alan Camillo
> *BlueShift *I IT Director
> Cel.: +55 11 98283-6358
> Tel.: +55 11 4605-5082

Re: Fluo application question

Posted by Keith Turner <ke...@deenlo.com>.
I have not tried the following, but if I were going to read data from
Kafka into Fluo I would start with the following code.


Consumer consumer = //a Kafka consumer
FluoClient client = //a Fluo client

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(1000);
    try (LoaderExecutor le = client.newLoaderExecutor())
    {
       for (ConsumerRecord<String, String> record : records)
       {
          loader.execute((tx,ctx)-> {
             // execute a Fluo transaction using record
          });
       }
   } //when this try block exits all Fluo transactions are committed

   //let Kafka know the data was successfully processed.
   consumer.commitSync();
}

On Wed, Dec 20, 2017 at 10:46 AM, Alan Camillo <al...@blueshift.com.br> wrote:
> Thank you Keith for the answers and material you sent me.
> Just one more question about this solution:
>   - What's the best way to consume data from Kafka to Flue. Do I need to
> implement something like in the webindex project: Kafka (Common Crawl) ->
> Spark -> Fluo? Or it's possible to ingest data directly from a Flue
> application?
>
> Thank you again!
> Alan Camillo
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Tuesday, December 19, 2017 4:52 PM
> To: fluo-dev <de...@fluo.apache.org>
> Subject: Re: Fluo application question
>
> On Tue, Dec 19, 2017 at 1:25 PM, Alan Camillo <al...@blueshift.com.br> wrote:
>> We just start a project that the objective is consolidate some
>> personal information using some business rules. It's a kind of ranking
>> of the best information of a person.
>>
>> Today they use to reprocess every batch they receive comparing the new
>> data with all historical data. They're using Spark for this operation.
>> I'd like to propose something like this:
>> https://www.dropbox.com/s/glqhh7zzxd7g433/architecture.png?dl=0
>>
>> Two questions:
>>  - is it possible create an observer to synchronizes with HBase?
>
> You could use an export queue to make updates to an HBase instance.
>
> http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/export-queue/
>
> Also the slides below discuss the export queue (slide 27) and the concept of
> invert on export (slide 33).  Invert on export would likely be useful for a
> key value store like hbase.
>
> https://www.slideshare.net/AccumuloSummit/accumulo-summit-2016-tips-for-writing-fluo-applications
>
> Fluo recipes does not currently have an exporter for HBase.  It would be
> useful to add one to Fluo Recipes like the following for Accumulo.
>
> http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue/
>
>>  - Am I doing a good use of Fluo? If not, why?
>
> It sounds like it may be a good fit.  However, the exporter for HBase would
> need to be implemented.  The Accumulo exporter is written in such a way that
> multiple transactions can share a single writer for efficiency.  Not sure if
> this pattern should be followed for HBase.
>
>>
>> Thank you all!
>>
>> -----Original Message-----
>> From: Keith Turner [mailto:keith@deenlo.com]
>> Sent: Tuesday, December 19, 2017 2:21 PM
>> To: fluo-dev <de...@fluo.apache.org>
>> Subject: Re: About user group
>>
>> On Tue, Dec 19, 2017 at 8:18 AM, Alan Camillo <al...@blueshift.com.br>
>> wrote:
>>> Hello Fluo group!
>>>
>>> My name is Alan, I'm a big date architect and owner of a company
>>> called BlueShift Brasil. And I'm looking foward for Apache Fluo. I'd
>>> like to know about a user group to because I was no able to find, is
>>> it exist?
>>
>> We currently do not have a user list.  Feel free to ask any questions
>> you have here on the dev list.
>>
>>>
>>> I have many questions to do and I would'nt like to post those here.
>>> If I could help if something in the project, please count on me.
>>
>> If you are interested in contributing, the following may be a good
>> issue to start with.
>>
>> https://github.com/apache/fluo-docker/issues/9
>>
>>>
>>> Thanks!
>>> Alan Camillo
>>> *BlueShift *I IT Director
>>> Cel.: +55 11 98283-6358
>>> Tel.: +55 11 4605-5082

RE: Fluo application question

Posted by Alan Camillo <al...@blueshift.com.br>.
Thank you Keith for the answers and material you sent me.
Just one more question about this solution:
  - What's the best way to consume data from Kafka to Flue. Do I need to
implement something like in the webindex project: Kafka (Common Crawl) ->
Spark -> Fluo? Or it's possible to ingest data directly from a Flue
application?

Thank you again!
Alan Camillo

-----Original Message-----
From: Keith Turner [mailto:keith@deenlo.com]
Sent: Tuesday, December 19, 2017 4:52 PM
To: fluo-dev <de...@fluo.apache.org>
Subject: Re: Fluo application question

On Tue, Dec 19, 2017 at 1:25 PM, Alan Camillo <al...@blueshift.com.br> wrote:
> We just start a project that the objective is consolidate some
> personal information using some business rules. It's a kind of ranking
> of the best information of a person.
>
> Today they use to reprocess every batch they receive comparing the new
> data with all historical data. They're using Spark for this operation.
> I'd like to propose something like this:
> https://www.dropbox.com/s/glqhh7zzxd7g433/architecture.png?dl=0
>
> Two questions:
>  - is it possible create an observer to synchronizes with HBase?

You could use an export queue to make updates to an HBase instance.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/export-queue/

Also the slides below discuss the export queue (slide 27) and the concept of
invert on export (slide 33).  Invert on export would likely be useful for a
key value store like hbase.

https://www.slideshare.net/AccumuloSummit/accumulo-summit-2016-tips-for-writing-fluo-applications

Fluo recipes does not currently have an exporter for HBase.  It would be
useful to add one to Fluo Recipes like the following for Accumulo.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue/

>  - Am I doing a good use of Fluo? If not, why?

It sounds like it may be a good fit.  However, the exporter for HBase would
need to be implemented.  The Accumulo exporter is written in such a way that
multiple transactions can share a single writer for efficiency.  Not sure if
this pattern should be followed for HBase.

>
> Thank you all!
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Tuesday, December 19, 2017 2:21 PM
> To: fluo-dev <de...@fluo.apache.org>
> Subject: Re: About user group
>
> On Tue, Dec 19, 2017 at 8:18 AM, Alan Camillo <al...@blueshift.com.br>
> wrote:
>> Hello Fluo group!
>>
>> My name is Alan, I'm a big date architect and owner of a company
>> called BlueShift Brasil. And I'm looking foward for Apache Fluo. I'd
>> like to know about a user group to because I was no able to find, is
>> it exist?
>
> We currently do not have a user list.  Feel free to ask any questions
> you have here on the dev list.
>
>>
>> I have many questions to do and I would'nt like to post those here.
>> If I could help if something in the project, please count on me.
>
> If you are interested in contributing, the following may be a good
> issue to start with.
>
> https://github.com/apache/fluo-docker/issues/9
>
>>
>> Thanks!
>> Alan Camillo
>> *BlueShift *I IT Director
>> Cel.: +55 11 98283-6358
>> Tel.: +55 11 4605-5082

Re: Fluo application question

Posted by Keith Turner <ke...@deenlo.com>.
On Tue, Dec 19, 2017 at 1:25 PM, Alan Camillo <al...@blueshift.com.br> wrote:
> We just start a project that the objective is consolidate some personal
> information using some business rules. It's a kind of ranking of the best
> information of a person.
>
> Today they use to reprocess every batch they receive comparing the new data
> with all historical data. They're using Spark for this operation.
> I'd like to propose something like this:
> https://www.dropbox.com/s/glqhh7zzxd7g433/architecture.png?dl=0
>
> Two questions:
>  - is it possible create an observer to synchronizes with HBase?

You could use an export queue to make updates to an HBase instance.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/export-queue/

Also the slides below discuss the export queue (slide 27) and the
concept of invert on export (slide 33).  Invert on export would likely
be useful for a key value store like hbase.

https://www.slideshare.net/AccumuloSummit/accumulo-summit-2016-tips-for-writing-fluo-applications

Fluo recipes does not currently have an exporter for HBase.  It would
be useful to add one to Fluo Recipes like the following for Accumulo.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/accumulo-export-queue/

>  - Am I doing a good use of Fluo? If not, why?

It sounds like it may be a good fit.  However, the exporter for HBase
would need to be implemented.  The Accumulo exporter is written in
such a way that multiple transactions can share a single writer for
efficiency.  Not sure if this pattern should be followed for HBase.

>
> Thank you all!
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Tuesday, December 19, 2017 2:21 PM
> To: fluo-dev <de...@fluo.apache.org>
> Subject: Re: About user group
>
> On Tue, Dec 19, 2017 at 8:18 AM, Alan Camillo <al...@blueshift.com.br> wrote:
>> Hello Fluo group!
>>
>> My name is Alan, I'm a big date architect and owner of a company
>> called BlueShift Brasil. And I'm looking foward for Apache Fluo. I'd
>> like to know about a user group to because I was no able to find, is it
>> exist?
>
> We currently do not have a user list.  Feel free to ask any questions you
> have here on the dev list.
>
>>
>> I have many questions to do and I would'nt like to post those here.
>> If I could help if something in the project, please count on me.
>
> If you are interested in contributing, the following may be a good issue to
> start with.
>
> https://github.com/apache/fluo-docker/issues/9
>
>>
>> Thanks!
>> Alan Camillo
>> *BlueShift *I IT Director
>> Cel.: +55 11 98283-6358
>> Tel.: +55 11 4605-5082