You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Pavel Zalunin <wr...@gmail.com> on 2014/10/27 16:44:09 UTC

Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in
sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't
find http sink. Are there opensourced implementations of such thing?

Pavel.

Re: Flume HTTP sink for binary data

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

Thans for input, Hari.

Jean-François - we are looking to do something either custom or as part of
Flume pretty soon (tomorrow), so there is anything you want to share that
we should work off of, please stick it in JIRA and we'll work on that
instead of something custom.  Sorry for the last minute email/push. :(

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 27, 2014 at 2:47 PM, <j....@accenture.com> wrote:

>  Hello Ari, Otis,
>
>
>
> Yes I could contribute on this: I’m currently developing it. I’ve followed
> the pattern described below, reusing existing Sink as a model.
>
>
>
> I’m working on generic options that appear to me as mandatory for a wider
> usage that mine only:
>
> -          Load balancing
>
> -          Authentication
>
> -          Caching of Http connexion instead of creating one per event.
>
> -          Adding HTTP headers from config.
>
>
>
> I will follow up on this.
>
>
>
> Regards
>
>
> *________________________________________________ **Jean-François
> Guilmard*
>
>
>
> *From:* Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> *Sent:* lundi 27 octobre 2014 19:29
> *To:* user@flume.apache.org
> *Cc:* user@flume.apache.org
> *Subject:* Re: Flume HTTP sink for binary data
>
>
>
> You could build one using Apache HttpClient - just look at any other sink
> (like hbase sink or rolling file sink) and follow the same pattern.
>
>
> Thanks,
> Hari
>
>
>
> On Mon, Oct 27, 2014 at 9:29 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>  Hi,
>
>
>
> Shouldn't something like that exist in Flume?
>
>
>
> Jean-François, is this something you could contribute?  Maybe we can use
> yours or work on it if you can put up a patch?  If not, I think we can take
> one of the existing implementations you mentioned and try to improve that
> and contribute that...... but if your stuff is better, somewhat ready, and
> contributable.....
>
>
>
> Thanks,
>
> Otis
> --
>
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>
> On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com> wrote:
>
> Hi Pavel,
>
>
>
> I have found 2 initiatives on this.
>
>
>
>
> https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
>
>
> https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java
>
>
>
> None of them appeared to me with the code quality/flexibility I expected,
> so I guess it might be worth going further?
>
>
>
> Good luck and let me know if you find other alternatives. (I’m currently
> developing my own code, based on Apache HttpClient )
>
>
>
> Best regards.
>
>
> *________________________________________________ **Jean-François
> Guilmard*
>
>
>
> *From:* Pavel Zalunin [mailto:wr4bbit@gmail.com]
> *Sent:* lundi 27 octobre 2014 16:44
> *To:* user@flume.apache.org
> *Subject:* Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> We need to send binary data to our http endpoint, I looked at built-in
> sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and
> can't find http sink. Are there opensourced implementations of such thing?
>
>
>
> Pavel.
>
>
>  ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>

RE: Flume HTTP sink for binary data

Posted by j....@accenture.com.

Hello Ari, Otis,

Yes I could contribute on this: I’m currently developing it. I’ve followed the pattern described below, reusing existing Sink as a model.

I’m working on generic options that appear to me as mandatory for a wider usage that mine only:

-          Load balancing

-          Authentication

-          Caching of Http connexion instead of creating one per event.

-          Adding HTTP headers from config.

I will follow up on this.

Regards
________________________________________________
Jean-François Guilmard


From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
Sent: lundi 27 octobre 2014 19:29
To: user@flume.apache.org
Cc: user@flume.apache.org
Subject: Re: Flume HTTP sink for binary data

You could build one using Apache HttpClient - just look at any other sink (like hbase sink or rolling file sink) and follow the same pattern.

Thanks,
Hari


On Mon, Oct 27, 2014 at 9:29 AM, Otis Gospodnetic <ot...@gmail.com>> wrote:
Hi,

Shouldn't something like that exist in Flume?

Jean-François, is this something you could contribute?  Maybe we can use yours or work on it if you can put up a patch?  If not, I think we can take one of the existing implementations you mentioned and try to improve that and contribute that...... but if your stuff is better, somewhat ready, and contributable.....

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com>> wrote:
Hi Pavel,

I have found 2 initiatives on this.

https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java

None of them appeared to me with the code quality/flexibility I expected, so I guess it might be worth going further?

Good luck and let me know if you find other alternatives. (I’m currently developing my own code, based on Apache HttpClient )

Best regards.
________________________________________________
Jean-François Guilmard


From: Pavel Zalunin [mailto:wr4bbit@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 16:44
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't find http sink. Are there opensourced implementations of such thing?

Pavel.

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Flume HTTP sink for binary data

Posted by Hari Shreedharan <hs...@cloudera.com>.

You could build one using Apache HttpClient - just look at any other sink (like hbase sink or rolling file sink) and follow the same pattern.


Thanks,
Hari

On Mon, Oct 27, 2014 at 9:29 AM, Otis Gospodnetic
<ot...@gmail.com> wrote:

> Hi,
> Shouldn't something like that exist in Flume?
> Jean-François, is this something you could contribute?  Maybe we can use
> yours or work on it if you can put up a patch?  If not, I think we can take
> one of the existing implementations you mentioned and try to improve that
> and contribute that...... but if your stuff is better, somewhat ready, and
> contributable.....
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com> wrote:
>>  Hi Pavel,
>>
>>
>>
>> I have found 2 initiatives on this.
>>
>>
>>
>>
>> https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
>>
>>
>> https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java
>>
>>
>>
>> None of them appeared to me with the code quality/flexibility I expected,
>> so I guess it might be worth going further?
>>
>>
>>
>> Good luck and let me know if you find other alternatives. (I’m currently
>> developing my own code, based on Apache HttpClient )
>>
>>
>>
>> Best regards.
>>
>>
>> *________________________________________________ **Jean-François
>> Guilmard*
>>
>>
>>
>> *From:* Pavel Zalunin [mailto:wr4bbit@gmail.com]
>> *Sent:* lundi 27 octobre 2014 16:44
>> *To:* user@flume.apache.org
>> *Subject:* Flume HTTP sink for binary data
>>
>>
>>
>> Hi,
>>
>>
>>
>> We need to send binary data to our http endpoint, I looked at built-in
>> sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and
>> can't find http sink. Are there opensourced implementations of such thing?
>>
>>
>>
>> Pavel.
>>
>> ------------------------------
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Where allowed
>> by local law, electronic communications with Accenture and its affiliates,
>> including e-mail and instant messaging (including content), may be scanned
>> by our systems for the purposes of information security and assessment of
>> internal compliance with Accenture policy.
>>
>> ______________________________________________________________________________________
>>
>> www.accenture.com
>>

Re: HDFS IO error

Posted by Mike Zupan <mi...@manage.com>.

I ran into this with 1.4 and havent been able to fix it but when a datanode crashes flume seems to want to keep writing to it. I always had to restart flume for it to work again

-- 
Mike Zupan


On Thursday, October 30, 2014 at 11:53 AM, Ed Judge wrote:

> I am running into the following problem.
> 
> 30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
> at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
> at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
> ... 6 more
> 30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
> 
> 
> The following is my configuration.  The source is just a script running a curl command and downloading files from S3.
> 
> 
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
> 
> # Configure the source: STACK_S3
> a1.sources.r1.type = exec
> a1.sources.r1.command = ./conf/FlumeAgent.1.sh (http://FlumeAgent.1.sh) 
> a1.sources.r1.channels = c1
> 
> # Use a channel which buffers events in memory
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000000
> a1.channels.c1.transactionCapacity = 100
> 
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> a1.sinks.k1.hdfs.minBlockReplicas = 1
> a1.sinks.k1.hdfs.batchSize = 10
> 
> 
> I had the HDFS batch size at the default (100) but this issue was still happening.  Does anyone know what parameters I should change to make this error go away?
> No data is lost but I end up with a 0 byte file.
> 
> Thanks,
> Ed
>

Re: HDFS IO error

Posted by Ed Judge <ej...@gmail.com>.

How do I check for bad blocks?  I am now seeing this quite regularly?  I have a unique Hadoop setup in that I have 1 local datanode.  In addition I am running the flume instance within a Docker container.
I have looked at the hadoop logs and don’t see anything but INFO messages.  What could be taking more than 10 seconds?

Thanks,
Ed

On Oct 30, 2014, at 9:14 PM, Ed Judge <ej...@gmail.com> wrote:

> I have been using 1.5 all along. I end up with a 0 length file which is a little concerning. Not to mention that the timeout is adding 10 seconds to the overall transfer. Is this normal or is there something I can do to prevent the timeout?
> 
> Thanks,
> Ed. 
> 
> Sent from my iPhone
> 
> 
> On Oct 30, 2014, at 5:58 PM, Asim Zafir <as...@gmail.com> wrote:
> 
>> Ed, 
>> 
>> Are you saying you resolved the problem with 1.5.0 or you still have an issue?
>> 
>> Thanks, 
>> 
>> Asim Zafir.
>> 
>> On Thu, Oct 30, 2014 at 1:47 PM, Ed Judge <ej...@gmail.com> wrote:
>> Thanks for the replies.  We are using 1.5.0.
>> My observation is that Flume retries automatically (without my intervention) and that no data is lost.  
>> The impact is a) a delay of 10 seconds due to the timeout and b) a zero length file.
>> 
>> -Ed
>> 
>> On Oct 30, 2014, at 3:46 PM, Asim Zafir <as...@gmail.com> wrote:
>> 
>>> Please check if ur sinks i.e. hdfs data nodes that were receiving the writes are not having any bad blocks . Secondly I think you should also set hdfs roll interval or size to a higher value.  The reason this problem happens is because flume sink is not able to right to a data pipeline that was initially presented by hdfs. The solution in this case should be for hdfs to  initialize a new pipeline and present to flume. The hack currently Is to restart the flume process which then initializes a new hdfs pipeline enabling the sink to push backlogged events. There is a fix to this incorporated In flume 1.5 (i havent test it yet) but if u are on anything older the only way to make this work is restart the flume process
>>> 
>>> On Oct 30, 2014 11:54 AM, "Ed Judge" <ej...@gmail.com> wrote:
>>> I am running into the following problem.
>>> 
>>> 30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>>> java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.util.concurrent.TimeoutException
>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
>>> 	... 6 more
>>> 30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
>>> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>> 
>>> 
>>> The following is my configuration.  The source is just a script running a curl command and downloading files from S3.
>>> 
>>> 
>>> # Name the components on this agent
>>> a1.sources = r1
>>> a1.sinks = k1
>>> a1.channels = c1
>>> 
>>> # Configure the source: STACK_S3
>>> a1.sources.r1.type = exec
>>> a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
>>> a1.sources.r1.channels = c1
>>> 
>>> # Use a channel which buffers events in memory
>>> a1.channels.c1.type = memory
>>> a1.channels.c1.capacity = 1000000
>>> a1.channels.c1.transactionCapacity = 100
>>> 
>>> # Describe the sink
>>> a1.sinks.k1.type = hdfs
>>> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
>>> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
>>> a1.sinks.k1.hdfs.fileSuffix = .ds
>>> a1.sinks.k1.hdfs.rollInterval = 0
>>> a1.sinks.k1.hdfs.rollSize = 0
>>> a1.sinks.k1.hdfs.rollCount = 0
>>> a1.sinks.k1.hdfs.fileType = DataStream
>>> a1.sinks.k1.serializer = TEXT
>>> a1.sinks.k1.channel = c1
>>> a1.sinks.k1.hdfs.minBlockReplicas = 1
>>> a1.sinks.k1.hdfs.batchSize = 10
>>> 
>>> 
>>> I had the HDFS batch size at the default (100) but this issue was still happening.  Does anyone know what parameters I should change to make this error go away?
>>> No data is lost but I end up with a 0 byte file.
>>> 
>>> Thanks,
>>> Ed
>>> 
>> 
>>

Re: HDFS IO error

Posted by Ed Judge <ej...@gmail.com>.

I have been using 1.5 all along. I end up with a 0 length file which is a little concerning. Not to mention that the timeout is adding 10 seconds to the overall transfer. Is this normal or is there something I can do to prevent the timeout?

Thanks,
Ed. 

Sent from my iPhone


> On Oct 30, 2014, at 5:58 PM, Asim Zafir <as...@gmail.com> wrote:
> 
> Ed, 
> 
> Are you saying you resolved the problem with 1.5.0 or you still have an issue?
> 
> Thanks, 
> 
> Asim Zafir.
> 
>> On Thu, Oct 30, 2014 at 1:47 PM, Ed Judge <ej...@gmail.com> wrote:
>> Thanks for the replies.  We are using 1.5.0.
>> My observation is that Flume retries automatically (without my intervention) and that no data is lost.  
>> The impact is a) a delay of 10 seconds due to the timeout and b) a zero length file.
>> 
>> -Ed
>> 
>>> On Oct 30, 2014, at 3:46 PM, Asim Zafir <as...@gmail.com> wrote:
>>> 
>>> Please check if ur sinks i.e. hdfs data nodes that were receiving the writes are not having any bad blocks . Secondly I think you should also set hdfs roll interval or size to a higher value.  The reason this problem happens is because flume sink is not able to right to a data pipeline that was initially presented by hdfs. The solution in this case should be for hdfs to  initialize a new pipeline and present to flume. The hack currently Is to restart the flume process which then initializes a new hdfs pipeline enabling the sink to push backlogged events. There is a fix to this incorporated In flume 1.5 (i havent test it yet) but if u are on anything older the only way to make this work is restart the flume process
>>> 
>>>> On Oct 30, 2014 11:54 AM, "Ed Judge" <ej...@gmail.com> wrote:
>>>> I am running into the following problem.
>>>> 
>>>> 30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>>>> java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.util.concurrent.TimeoutException
>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>>>> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
>>>> 	... 6 more
>>>> 30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
>>>> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>>> 
>>>> 
>>>> The following is my configuration.  The source is just a script running a curl command and downloading files from S3.
>>>> 
>>>> 
>>>> # Name the components on this agent
>>>> a1.sources = r1
>>>> a1.sinks = k1
>>>> a1.channels = c1
>>>> 
>>>> # Configure the source: STACK_S3
>>>> a1.sources.r1.type = exec
>>>> a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
>>>> a1.sources.r1.channels = c1
>>>> 
>>>> # Use a channel which buffers events in memory
>>>> a1.channels.c1.type = memory
>>>> a1.channels.c1.capacity = 1000000
>>>> a1.channels.c1.transactionCapacity = 100
>>>> 
>>>> # Describe the sink
>>>> a1.sinks.k1.type = hdfs
>>>> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
>>>> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
>>>> a1.sinks.k1.hdfs.fileSuffix = .ds
>>>> a1.sinks.k1.hdfs.rollInterval = 0
>>>> a1.sinks.k1.hdfs.rollSize = 0
>>>> a1.sinks.k1.hdfs.rollCount = 0
>>>> a1.sinks.k1.hdfs.fileType = DataStream
>>>> a1.sinks.k1.serializer = TEXT
>>>> a1.sinks.k1.channel = c1
>>>> a1.sinks.k1.hdfs.minBlockReplicas = 1
>>>> a1.sinks.k1.hdfs.batchSize = 10
>>>> 
>>>> 
>>>> I had the HDFS batch size at the default (100) but this issue was still happening.  Does anyone know what parameters I should change to make this error go away?
>>>> No data is lost but I end up with a 0 byte file.
>>>> 
>>>> Thanks,
>>>> Ed
>

Re: HDFS IO error

Posted by Asim Zafir <as...@gmail.com>.

Ed,

Are you saying you resolved the problem with 1.5.0 or you still have an
issue?

Thanks,

Asim Zafir.

On Thu, Oct 30, 2014 at 1:47 PM, Ed Judge <ej...@gmail.com> wrote:

> Thanks for the replies.  We are using 1.5.0.
> My observation is that Flume retries automatically (without my
> intervention) and that no data is lost.
> The impact is a) a delay of 10 seconds due to the timeout and b) a zero
> length file.
>
> -Ed
>
> On Oct 30, 2014, at 3:46 PM, Asim Zafir <as...@gmail.com> wrote:
>
> Please check if ur sinks i.e. hdfs data nodes that were receiving the
> writes are not having any bad blocks . Secondly I think you should also set
> hdfs roll interval or size to a higher value.  The reason this problem
> happens is because flume sink is not able to right to a data pipeline that
> was initially presented by hdfs. The solution in this case should be for
> hdfs to  initialize a new pipeline and present to flume. The hack currently
> Is to restart the flume process which then initializes a new hdfs pipeline
> enabling the sink to push backlogged events. There is a fix to this
> incorporated In flume 1.5 (i havent test it yet) but if u are on anything
> older the only way to make this work is restart the flume process
> On Oct 30, 2014 11:54 AM, "Ed Judge" <ej...@gmail.com> wrote:
>
>> I am running into the following problem.
>>
>> 30 Oct 2014 18:43:26,375 WARN
>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>> java.io.IOException: Callable timed out after 10000 ms on file:
>> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
>> at
>> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
>> at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
>> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
>> at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>> at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.util.concurrent.TimeoutException
>> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>> at
>> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
>> ... 6 more
>> 30 Oct 2014 18:43:27,717 INFO
>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating
>> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
>> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook]
>> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping
>> lifecycle supervisor 10
>>
>>
>> The following is my configuration.  The source is just a script running a
>> curl command and downloading files from S3.
>>
>>
>> # Name the components on this agent
>> a1.sources = r1
>> a1.sinks = k1
>> a1.channels = c1
>>
>> # Configure the source: STACK_S3
>> a1.sources.r1.type = exec
>> a1.sources.r1.command = ./conf/FlumeAgent.1.sh <http://flumeagent.1.sh/>
>> a1.sources.r1.channels = c1
>>
>> # Use a channel which buffers events in memory
>> a1.channels.c1.type = memory
>> a1.channels.c1.capacity = 1000000
>> a1.channels.c1.transactionCapacity = 100
>>
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm
>> a1.sinks.k1.hdfs.filePrefix = dm-1-20
>> a1.sinks.k1.hdfs.fileSuffix = .ds
>> a1.sinks.k1.hdfs.rollInterval = 0
>> a1.sinks.k1.hdfs.rollSize = 0
>> a1.sinks.k1.hdfs.rollCount = 0
>> a1.sinks.k1.hdfs.fileType = DataStream
>> a1.sinks.k1.serializer = TEXT
>> a1.sinks.k1.channel = c1
>> a1.sinks.k1.hdfs.minBlockReplicas = 1
>> a1.sinks.k1.hdfs.batchSize = 10
>>
>>
>> I had the HDFS batch size at the default (100) but this issue was still
>> happening.  Does anyone know what parameters I should change to make this
>> error go away?
>> No data is lost but I end up with a 0 byte file.
>>
>> Thanks,
>> Ed
>>
>>
>

Re: HDFS IO error

Posted by Ed Judge <ej...@gmail.com>.

Thanks for the replies.  We are using 1.5.0.
My observation is that Flume retries automatically (without my intervention) and that no data is lost.  
The impact is a) a delay of 10 seconds due to the timeout and b) a zero length file.

-Ed

On Oct 30, 2014, at 3:46 PM, Asim Zafir <as...@gmail.com> wrote:

> Please check if ur sinks i.e. hdfs data nodes that were receiving the writes are not having any bad blocks . Secondly I think you should also set hdfs roll interval or size to a higher value.  The reason this problem happens is because flume sink is not able to right to a data pipeline that was initially presented by hdfs. The solution in this case should be for hdfs to  initialize a new pipeline and present to flume. The hack currently Is to restart the flume process which then initializes a new hdfs pipeline enabling the sink to push backlogged events. There is a fix to this incorporated In flume 1.5 (i havent test it yet) but if u are on anything older the only way to make this work is restart the flume process
> 
> On Oct 30, 2014 11:54 AM, "Ed Judge" <ej...@gmail.com> wrote:
> I am running into the following problem.
> 
> 30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
> 	at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
> 	at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> 	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
> 	... 6 more
> 30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
> 
> 
> The following is my configuration.  The source is just a script running a curl command and downloading files from S3.
> 
> 
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
> 
> # Configure the source: STACK_S3
> a1.sources.r1.type = exec
> a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
> a1.sources.r1.channels = c1
> 
> # Use a channel which buffers events in memory
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000000
> a1.channels.c1.transactionCapacity = 100
> 
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> a1.sinks.k1.hdfs.minBlockReplicas = 1
> a1.sinks.k1.hdfs.batchSize = 10
> 
> 
> I had the HDFS batch size at the default (100) but this issue was still happening.  Does anyone know what parameters I should change to make this error go away?
> No data is lost but I end up with a 0 byte file.
> 
> Thanks,
> Ed
>

Re: HDFS IO error

Posted by Asim Zafir <as...@gmail.com>.

Please check if ur sinks i.e. hdfs data nodes that were receiving the
writes are not having any bad blocks . Secondly I think you should also set
hdfs roll interval or size to a higher value.  The reason this problem
happens is because flume sink is not able to right to a data pipeline that
was initially presented by hdfs. The solution in this case should be for
hdfs to  initialize a new pipeline and present to flume. The hack currently
Is to restart the flume process which then initializes a new hdfs pipeline
enabling the sink to push backlogged events. There is a fix to this
incorporated In flume 1.5 (i havent test it yet) but if u are on anything
older the only way to make this work is restart the flume process
On Oct 30, 2014 11:54 AM, "Ed Judge" <ej...@gmail.com> wrote:

> I am running into the following problem.
>
> 30 Oct 2014 18:43:26,375 WARN
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms on file:
> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
> at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
> ... 6 more
> 30 Oct 2014 18:43:27,717 INFO
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating
> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook]
> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping
> lifecycle supervisor 10
>
>
> The following is my configuration.  The source is just a script running a
> curl command and downloading files from S3.
>
>
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
>
> # Configure the source: STACK_S3
> a1.sources.r1.type = exec
> a1.sources.r1.command = ./conf/FlumeAgent.1.sh
> a1.sources.r1.channels = c1
>
> # Use a channel which buffers events in memory
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000000
> a1.channels.c1.transactionCapacity = 100
>
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm
> a1.sinks.k1.hdfs.filePrefix = dm-1-20
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> a1.sinks.k1.hdfs.minBlockReplicas = 1
> a1.sinks.k1.hdfs.batchSize = 10
>
>
> I had the HDFS batch size at the default (100) but this issue was still
> happening.  Does anyone know what parameters I should change to make this
> error go away?
> No data is lost but I end up with a 0 byte file.
>
> Thanks,
> Ed
>
>

HDFS IO error

Posted by Ed Judge <ej...@gmail.com>.

I am running into the following problem.

30 Oct 2014 18:43:26,375 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
java.io.IOException: Callable timed out after 10000 ms on file: hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
	at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
	at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:201)
	at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
	... 6 more
30 Oct 2014 18:43:27,717 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10


The following is my configuration.  The source is just a script running a curl command and downloading files from S3.


# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Configure the source: STACK_S3
a1.sources.r1.type = exec
a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
a1.sources.r1.channels = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
a1.sinks.k1.hdfs.filePrefix = dm-1-20 
a1.sinks.k1.hdfs.fileSuffix = .ds
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.serializer = TEXT
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.batchSize = 10


I had the HDFS batch size at the default (100) but this issue was still happening.  Does anyone know what parameters I should change to make this error go away?
No data is lost but I end up with a 0 byte file.

Thanks,
Ed

RE: Flume HTTP sink for binary data

Posted by j....@accenture.com.

Hello,

I’ve proposed a patch in version v0, to be commented. I’m still working on improvements, but it could be used as-is I think.

Link of the JIRA and patch: https://issues.apache.org/jira/browse/FLUME-2524

Your comments are more than welcome.
________________________________________________
Jean-François Guilmard

From: Behram Khan [mailto:behramk@gmail.com]
Sent: lundi 3 novembre 2014 11:25
To: user@flume.apache.org
Subject: Re: Flume HTTP sink for binary data

Hi
Can you please update. Can you please share the link to this project.
Regards


On Tue, Oct 28, 2014 at 1:36 PM, <j....@accenture.com>> wrote:
I’ve created a new JIRA, and will propose a patch. Due to your time constraints, I’m not sure I will be able to commit a 100% final version. It will be more a good base of work, subject to comments, optimizations and propositions ☺

Regards

Jeff

From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com<ma...@gmail.com>]
Sent: mardi 28 octobre 2014 12:47

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume HTTP sink for binary data

Hi Jean-François,

Great to hear!
There is no existing JIRA issu for this that I'm aware of.  I did see Hari created a new JIRA issue yesterday about some HTTP Sink having some sort of issues with SSL, but I wasn't aware of Flume containing any HTTP Sinks in the first place, so I'm not sure what that issue is really about.
For example, I don't see any HTTP Sink here: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-sinks;h=ea7df0714f788e92ddbb403b762623832228a0fe;hb=trunk

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Oct 28, 2014 at 7:31 AM, <j....@accenture.com>> wrote:
Otis,

I’m about to propose something (today). I’m waiting for a formal authorization from my client: I’ve received an informal GO.

Is there any JIRA existing already ?

Best regards

From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 17:28
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume HTTP sink for binary data

Hi,

Shouldn't something like that exist in Flume?

Jean-François, is this something you could contribute?  Maybe we can use yours or work on it if you can put up a patch?  If not, I think we can take one of the existing implementations you mentioned and try to improve that and contribute that...... but if your stuff is better, somewhat ready, and contributable.....

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com>> wrote:
Hi Pavel,

I have found 2 initiatives on this.

https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java

None of them appeared to me with the code quality/flexibility I expected, so I guess it might be worth going further?

Good luck and let me know if you find other alternatives. (I’m currently developing my own code, based on Apache HttpClient )

Best regards.
________________________________________________
Jean-François Guilmard


From: Pavel Zalunin [mailto:wr4bbit@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 16:44
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't find http sink. Are there opensourced implementations of such thing?

Pavel.

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Flume HTTP sink for binary data

Posted by Behram Khan <be...@gmail.com>.

Hi
Can you please update. Can you please share the link to this project.
Regards


On Tue, Oct 28, 2014 at 1:36 PM, <j....@accenture.com> wrote:

>  I’ve created a new JIRA, and will propose a patch. Due to your time
> constraints, I’m not sure I will be able to commit a 100% final version. It
> will be more a good base of work, subject to comments, optimizations and
> propositions J
>
>
>
> Regards
>
>
>
> Jeff
>
>
>
> *From:* Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> *Sent:* mardi 28 octobre 2014 12:47
>
> *To:* user@flume.apache.org
> *Subject:* Re: Flume HTTP sink for binary data
>
>
>
> Hi Jean-François,
>
>
>
> Great to hear!
>
> There is no existing JIRA issu for this that I'm aware of.  I did see Hari
> created a new JIRA issue yesterday about some HTTP Sink having some sort of
> issues with SSL, but I wasn't aware of Flume containing any HTTP Sinks in
> the first place, so I'm not sure what that issue is really about.
>
> For example, I don't see any HTTP Sink here:
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-sinks;h=ea7df0714f788e92ddbb403b762623832228a0fe;hb=trunk
>
>
>   Otis
> --
>
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>
> On Tue, Oct 28, 2014 at 7:31 AM, <j....@accenture.com> wrote:
>
> Otis,
>
>
>
> I’m about to propose something (today). I’m waiting for a formal
> authorization from my client: I’ve received an informal GO.
>
>
>
> Is there any JIRA existing already ?
>
>
>
> Best regards
>
>
>
> *From:* Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> *Sent:* lundi 27 octobre 2014 17:28
> *To:* user@flume.apache.org
> *Subject:* Re: Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> Shouldn't something like that exist in Flume?
>
>
>
> Jean-François, is this something you could contribute?  Maybe we can use
> yours or work on it if you can put up a patch?  If not, I think we can take
> one of the existing implementations you mentioned and try to improve that
> and contribute that...... but if your stuff is better, somewhat ready, and
> contributable.....
>
>
>
> Thanks,
>
> Otis
> --
>
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>
> On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com> wrote:
>
> Hi Pavel,
>
>
>
> I have found 2 initiatives on this.
>
>
>
>
> https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
>
>
> https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java
>
>
>
> None of them appeared to me with the code quality/flexibility I expected,
> so I guess it might be worth going further?
>
>
>
> Good luck and let me know if you find other alternatives. (I’m currently
> developing my own code, based on Apache HttpClient )
>
>
>
> Best regards.
>
>
> *________________________________________________ **Jean-François
> Guilmard*
>
>
>
> *From:* Pavel Zalunin [mailto:wr4bbit@gmail.com]
> *Sent:* lundi 27 octobre 2014 16:44
> *To:* user@flume.apache.org
> *Subject:* Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> We need to send binary data to our http endpoint, I looked at built-in
> sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and
> can't find http sink. Are there opensourced implementations of such thing?
>
>
>
> Pavel.
>
>
>  ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>

RE: Flume HTTP sink for binary data

Posted by j....@accenture.com.

I’ve created a new JIRA, and will propose a patch. Due to your time constraints, I’m not sure I will be able to commit a 100% final version. It will be more a good base of work, subject to comments, optimizations and propositions ☺

Regards

Jeff

From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: mardi 28 octobre 2014 12:47
To: user@flume.apache.org
Subject: Re: Flume HTTP sink for binary data

Hi Jean-François,

Great to hear!
There is no existing JIRA issu for this that I'm aware of.  I did see Hari created a new JIRA issue yesterday about some HTTP Sink having some sort of issues with SSL, but I wasn't aware of Flume containing any HTTP Sinks in the first place, so I'm not sure what that issue is really about.
For example, I don't see any HTTP Sink here: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-sinks;h=ea7df0714f788e92ddbb403b762623832228a0fe;hb=trunk

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Tue, Oct 28, 2014 at 7:31 AM, <j....@accenture.com>> wrote:
Otis,

I’m about to propose something (today). I’m waiting for a formal authorization from my client: I’ve received an informal GO.

Is there any JIRA existing already ?

Best regards

From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 17:28
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume HTTP sink for binary data

Hi,

Shouldn't something like that exist in Flume?

Jean-François, is this something you could contribute?  Maybe we can use yours or work on it if you can put up a patch?  If not, I think we can take one of the existing implementations you mentioned and try to improve that and contribute that...... but if your stuff is better, somewhat ready, and contributable.....

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com>> wrote:
Hi Pavel,

I have found 2 initiatives on this.

https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java

None of them appeared to me with the code quality/flexibility I expected, so I guess it might be worth going further?

Good luck and let me know if you find other alternatives. (I’m currently developing my own code, based on Apache HttpClient )

Best regards.
________________________________________________
Jean-François Guilmard

From: Pavel Zalunin [mailto:wr4bbit@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 16:44
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't find http sink. Are there opensourced implementations of such thing?

Pavel.

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Flume HTTP sink for binary data

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi Jean-François,

Great to hear!
There is no existing JIRA issu for this that I'm aware of.  I did see Hari
created a new JIRA issue yesterday about some HTTP Sink having some sort of
issues with SSL, but I wasn't aware of Flume containing any HTTP Sinks in
the first place, so I'm not sure what that issue is really about.
For example, I don't see any HTTP Sink here:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-sinks;h=ea7df0714f788e92ddbb403b762623832228a0fe;hb=trunk

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Oct 28, 2014 at 7:31 AM, <j....@accenture.com> wrote:

>  Otis,
>
>
>
> I’m about to propose something (today). I’m waiting for a formal
> authorization from my client: I’ve received an informal GO.
>
>
>
> Is there any JIRA existing already ?
>
>
>
> Best regards
>
>
>
> *From:* Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> *Sent:* lundi 27 octobre 2014 17:28
> *To:* user@flume.apache.org
> *Subject:* Re: Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> Shouldn't something like that exist in Flume?
>
>
>
> Jean-François, is this something you could contribute?  Maybe we can use
> yours or work on it if you can put up a patch?  If not, I think we can take
> one of the existing implementations you mentioned and try to improve that
> and contribute that...... but if your stuff is better, somewhat ready, and
> contributable.....
>
>
>
> Thanks,
>
> Otis
> --
>
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>
> On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com> wrote:
>
> Hi Pavel,
>
>
>
> I have found 2 initiatives on this.
>
>
>
>
> https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
>
>
> https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java
>
>
>
> None of them appeared to me with the code quality/flexibility I expected,
> so I guess it might be worth going further?
>
>
>
> Good luck and let me know if you find other alternatives. (I’m currently
> developing my own code, based on Apache HttpClient )
>
>
>
> Best regards.
>
>
> *________________________________________________ **Jean-François
> Guilmard*
>
>
>
> *From:* Pavel Zalunin [mailto:wr4bbit@gmail.com]
> *Sent:* lundi 27 octobre 2014 16:44
> *To:* user@flume.apache.org
> *Subject:* Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> We need to send binary data to our http endpoint, I looked at built-in
> sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and
> can't find http sink. Are there opensourced implementations of such thing?
>
>
>
> Pavel.
>
>
>  ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>

RE: Flume HTTP sink for binary data

Posted by j....@accenture.com.

Otis,

I’m about to propose something (today). I’m waiting for a formal authorization from my client: I’ve received an informal GO.

Is there any JIRA existing already ?

Best regards

From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: lundi 27 octobre 2014 17:28
To: user@flume.apache.org
Subject: Re: Flume HTTP sink for binary data

Hi,

Shouldn't something like that exist in Flume?

Jean-François, is this something you could contribute?  Maybe we can use yours or work on it if you can put up a patch?  If not, I think we can take one of the existing implementations you mentioned and try to improve that and contribute that...... but if your stuff is better, somewhat ready, and contributable.....

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com>> wrote:
Hi Pavel,

I have found 2 initiatives on this.

https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java

None of them appeared to me with the code quality/flexibility I expected, so I guess it might be worth going further?

Good luck and let me know if you find other alternatives. (I’m currently developing my own code, based on Apache HttpClient )

Best regards.
________________________________________________
Jean-François Guilmard


From: Pavel Zalunin [mailto:wr4bbit@gmail.com<ma...@gmail.com>]
Sent: lundi 27 octobre 2014 16:44
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't find http sink. Are there opensourced implementations of such thing?

Pavel.

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Flume HTTP sink for binary data

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

Shouldn't something like that exist in Flume?

Jean-François, is this something you could contribute?  Maybe we can use
yours or work on it if you can put up a patch?  If not, I think we can take
one of the existing implementations you mentioned and try to improve that
and contribute that...... but if your stuff is better, somewhat ready, and
contributable.....

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Oct 27, 2014 at 12:09 PM, <j....@accenture.com> wrote:

>  Hi Pavel,
>
>
>
> I have found 2 initiatives on this.
>
>
>
>
> https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
>
>
> https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java
>
>
>
> None of them appeared to me with the code quality/flexibility I expected,
> so I guess it might be worth going further?
>
>
>
> Good luck and let me know if you find other alternatives. (I’m currently
> developing my own code, based on Apache HttpClient )
>
>
>
> Best regards.
>
>
> *________________________________________________ **Jean-François
> Guilmard*
>
>
>
> *From:* Pavel Zalunin [mailto:wr4bbit@gmail.com]
> *Sent:* lundi 27 octobre 2014 16:44
> *To:* user@flume.apache.org
> *Subject:* Flume HTTP sink for binary data
>
>
>
> Hi,
>
>
>
> We need to send binary data to our http endpoint, I looked at built-in
> sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and
> can't find http sink. Are there opensourced implementations of such thing?
>
>
>
> Pavel.
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>

RE: Flume HTTP sink for binary data

Posted by j....@accenture.com.

Hi Pavel,

I have found 2 initiatives on this.

https://github.com/hammer/flume/blob/f140709502abf286d9e25a174caa24629a776448/plugins/http/src/java/com/cloudera/flume/handlers/http/HttpPostSink.java
https://github.com/josealvarezmuguerza/flume-http-sink/blob/master/src/main/java/org/apache/flume/sink/HttpSink.java

None of them appeared to me with the code quality/flexibility I expected, so I guess it might be worth going further?

Good luck and let me know if you find other alternatives. (I'm currently developing my own code, based on Apache HttpClient )

Best regards.
________________________________________________
Jean-François Guilmard


From: Pavel Zalunin [mailto:wr4bbit@gmail.com]
Sent: lundi 27 octobre 2014 16:44
To: user@flume.apache.org
Subject: Flume HTTP sink for binary data

Hi,

We need to send binary data to our http endpoint, I looked at built-in sinks ( https://flume.apache.org/FlumeUserGuide.html#flume-sinks) and can't find http sink. Are there opensourced implementations of such thing?

Pavel.

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com