You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2014/07/28 13:13:25 UTC

How to connect two flows with specific behavior.

I want to create a topology for Flume, what I want to get it's,.

Data---> Source1-->Channel1-->MySink1 --->Source2 --> Channel2/Channel3

Channel2 --> SinkHDFS
Channel3 --> MySinkHBase

I'd need to code MySink1 and do an special transformation to my data, the
output would be the input for Source2.
Finally, these data should store in Hdfs with the standard sink of flume
and HBase, where I should create a new Serializer for HBase or something
like that.

I can't see how to do the connection between MySink1 and Source2.
Should Source2 be of Avro type? I think that if I want to connect many
flows inside Flume they have to be Avro, How I want an specific behavior, I
should create a new implementation which extends AbstractRpcAvro or
something like that... Am I right?

Re: How to connect two flows with specific behavior.

Posted by Jonathan Natkins <na...@streamsets.com>.
Gotcha. In that case, what I think you'd want to do is have the client
sources send to an AvroSink that is directed to forward to an AvroSource in
your data center, and attach an interceptor to the AvroSource on your end.
Your interceptor should be able to unwrap the Avro event and transform it
as you need to for the HDFS/HBase sinks. Does that sound reasonable to you?


On Mon, Jul 28, 2014 at 11:25 PM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> Yes, the reason it's that the Sources are in the client computers and
> sinks are installed in my systems, It doesn't seem polited overload the
> client system with those transformations.
>
> How about the connection between Sink1 and Source2?? should it be a Avro
> type? or it's not neccesary?? Anyway, I'm gonig to think about to do the
> transformations in the Source, although I think it's not possible.
>
>
> 2014-07-29 1:26 GMT+02:00 Jonathan Natkins <na...@streamsets.com>:
>
> Hi Guillermo,
>>
>> It might actually be easier to do the special transformation in a custom
>> interceptor that's attached to Source1. It depends a little bit on what
>> your transformation actually is, but generally, I'd say that it's going to
>> be *much* easier to implement a custom interceptor than it is to
>> implement a custom sink. This would also give the benefit of not requiring
>> you to forward to a second source, so you'd end up with a simpler pipeline
>> in the end. Is there some reason that you need to perform this
>> transformation in a custom sink?
>>
>> Thanks,
>> Natty
>>
>>
>> On Mon, Jul 28, 2014 at 4:13 AM, Guillermo Ortiz <ko...@gmail.com>
>> wrote:
>>
>>> I want to create a topology for Flume, what I want to get it's,.
>>>
>>> Data---> Source1-->Channel1-->MySink1 --->Source2 --> Channel2/Channel3
>>>
>>> Channel2 --> SinkHDFS
>>> Channel3 --> MySinkHBase
>>>
>>> I'd need to code MySink1 and do an special transformation to my data,
>>> the output would be the input for Source2.
>>> Finally, these data should store in Hdfs with the standard sink of flume
>>> and HBase, where I should create a new Serializer for HBase or something
>>> like that.
>>>
>>> I can't see how to do the connection between MySink1 and Source2.
>>> Should Source2 be of Avro type? I think that if I want to connect many
>>> flows inside Flume they have to be Avro, How I want an specific behavior, I
>>> should create a new implementation which extends AbstractRpcAvro or
>>> something like that... Am I right?
>>>
>>>
>>
>

Re: How to connect two flows with specific behavior.

Posted by Guillermo Ortiz <ko...@gmail.com>.
Yes, the reason it's that the Sources are in the client computers and sinks
are installed in my systems, It doesn't seem polited overload the client
system with those transformations.

How about the connection between Sink1 and Source2?? should it be a Avro
type? or it's not neccesary?? Anyway, I'm gonig to think about to do the
transformations in the Source, although I think it's not possible.


2014-07-29 1:26 GMT+02:00 Jonathan Natkins <na...@streamsets.com>:

> Hi Guillermo,
>
> It might actually be easier to do the special transformation in a custom
> interceptor that's attached to Source1. It depends a little bit on what
> your transformation actually is, but generally, I'd say that it's going to
> be *much* easier to implement a custom interceptor than it is to
> implement a custom sink. This would also give the benefit of not requiring
> you to forward to a second source, so you'd end up with a simpler pipeline
> in the end. Is there some reason that you need to perform this
> transformation in a custom sink?
>
> Thanks,
> Natty
>
>
> On Mon, Jul 28, 2014 at 4:13 AM, Guillermo Ortiz <ko...@gmail.com>
> wrote:
>
>> I want to create a topology for Flume, what I want to get it's,.
>>
>> Data---> Source1-->Channel1-->MySink1 --->Source2 --> Channel2/Channel3
>>
>> Channel2 --> SinkHDFS
>> Channel3 --> MySinkHBase
>>
>> I'd need to code MySink1 and do an special transformation to my data, the
>> output would be the input for Source2.
>> Finally, these data should store in Hdfs with the standard sink of flume
>> and HBase, where I should create a new Serializer for HBase or something
>> like that.
>>
>> I can't see how to do the connection between MySink1 and Source2.
>> Should Source2 be of Avro type? I think that if I want to connect many
>> flows inside Flume they have to be Avro, How I want an specific behavior, I
>> should create a new implementation which extends AbstractRpcAvro or
>> something like that... Am I right?
>>
>>
>

Re: How to connect two flows with specific behavior.

Posted by Jonathan Natkins <na...@streamsets.com>.
Hi Guillermo,

It might actually be easier to do the special transformation in a custom
interceptor that's attached to Source1. It depends a little bit on what
your transformation actually is, but generally, I'd say that it's going to
be *much* easier to implement a custom interceptor than it is to implement
a custom sink. This would also give the benefit of not requiring you to
forward to a second source, so you'd end up with a simpler pipeline in the
end. Is there some reason that you need to perform this transformation in a
custom sink?

Thanks,
Natty


On Mon, Jul 28, 2014 at 4:13 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> I want to create a topology for Flume, what I want to get it's,.
>
> Data---> Source1-->Channel1-->MySink1 --->Source2 --> Channel2/Channel3
>
> Channel2 --> SinkHDFS
> Channel3 --> MySinkHBase
>
> I'd need to code MySink1 and do an special transformation to my data, the
> output would be the input for Source2.
> Finally, these data should store in Hdfs with the standard sink of flume
> and HBase, where I should create a new Serializer for HBase or something
> like that.
>
> I can't see how to do the connection between MySink1 and Source2.
> Should Source2 be of Avro type? I think that if I want to connect many
> flows inside Flume they have to be Avro, How I want an specific behavior, I
> should create a new implementation which extends AbstractRpcAvro or
> something like that... Am I right?
>
>