You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samoa.apache.org by Raman Jhajj <bo...@gmail.com> on 2015/04/06 15:53:58 UTC

Deploying SAMOA on the system which uses Kafka to send stream of data

Hi Everyone,

I am highly confused in using SAMOA. I went through complete documentation
and I am able to run SAMOA on local as well on Storm. But as explained int
he documentation we submit data and the classifier details from the command
line and get the evaluation statistics as output.

But what I am trying to do is a bit different. I want to train the
classifier model with the training data which I have.
Then, I have a Kafka server setup which provides the stream of data. I want
to take this data into SAMOA, predict the input instance using the Bagging
(with VerticalHoeffdingTree) and store the predicted instance and
corresponding class in the database. I am not sure how I can achieve this.
I am not able to find any documentation regarding that. Please help me how
I can do this and If there is any example code to do this, it will be
really helpful.


-
Kind Regards,
*Raman*

Re: Deploying SAMOA on the system which uses Kafka to send stream of data

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Hi Raman,

If I understand correctly, you want to read from Kafka and then store your
output in some database.
Right now we don't have an example that does exactly this, although I think
it's a pretty common scenario.
There are a few pieces missing, I think, and it would be great if you
wanted to contribute them.

First, while we use Kafka in our Samza bindings, I don't think we have an
out-of-the-box Kafka stream.
So the first thing would be to wrap the logic of an existing Kafka consumer
in an InstanceStream.
See com.yahoo.labs.samoa.stream.FileStream as an example.

Second, the current output is sent directly to EvaluatorProcessor, which
simply computes statistics.
If you want to write out to a database, whatever database you want to use,
you will need to add a sink/writer for it.
We could simply add an option to the PrequentialEvaluation that takes a
"sink" Processor, to which all the classified events are forwarded. Then
this sink would simply write them to whichever data store it is configured
to.
By default, the sink would be null and the PrequentialEvaluation would work
as it does currently.

It is not too complicated, and it would be a nice addition.
Which database do you want to write to?

Cheers,

--
Gianmarco

On 6 April 2015 at 16:53, Raman Jhajj <bo...@gmail.com> wrote:

> Hi Everyone,
>
> I am highly confused in using SAMOA. I went through complete documentation
> and I am able to run SAMOA on local as well on Storm. But as explained int
> he documentation we submit data and the classifier details from the command
> line and get the evaluation statistics as output.
>
> But what I am trying to do is a bit different. I want to train the
> classifier model with the training data which I have.
> Then, I have a Kafka server setup which provides the stream of data. I want
> to take this data into SAMOA, predict the input instance using the Bagging
> (with VerticalHoeffdingTree) and store the predicted instance and
> corresponding class in the database. I am not sure how I can achieve this.
> I am not able to find any documentation regarding that. Please help me how
> I can do this and If there is any example code to do this, it will be
> really helpful.
>
>
> -
> Kind Regards,
> *Raman*
>