You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by AD <st...@gmail.com> on 2011/10/18 22:55:41 UTC

flume dying on InterruptException (nanos)

Hello,

 My collector keeps dying with the following error, is this a known issue?
Any idea how to prevent or find out what is causing it ?  is
format("%{nanos}" an issue ?

2011-10-17 23:16:33,957 INFO com.cloudera.flume.core.connector.DirectDriver:
Connector logicalNode flume1-18 exited with error: null
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)


source:  collectorSource("35853")
sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
-]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
-;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
attr2hbase("apache_logs","f1","","hbase_") }

Re: flume dying on InterruptException (nanos)

Posted by Cameron Gandevia <cg...@gmail.com>.

I think Prasad is correct, if you increase your file handles this error should go away. You will still experience the Collector shutting down I have commented why in flume-798. 

The current solution we have taken is to roll back to the 0.9.4 branch until we can patch the RollSink to fix our issue.

Let me know if you find any other fixes.

Sent from my iPad

On 2011-10-18, at 7:21 PM, AD <st...@gmail.com> wrote:

> were you able to work around it at all ?
> 
> On Tue, Oct 18, 2011 at 7:32 PM, Mingjie Lai <mj...@gmail.com> wrote:
> 
> May relate to these 2:
> https://issues.apache.org/jira/browse/FLUME-757
> https://issues.apache.org/jira/browse/FLUME-798
> 
> I saw it when I did some re-configures.
> 
> 
> 
> On 10/18/2011 01:55 PM, AD wrote:
> Hello,
> 
>  My collector keeps dying with the following error, is this a known
> issue? Any idea how to prevent or find out what is causing it ?  is
> format("%{nanos}" an issue ?
> 
> 2011-10-17 23:16:33,957 INFO
> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
> flume1-18 exited with error: null
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> 
> 
> source:  collectorSource("35853")
> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
> attr2hbase("apache_logs","f1","","hbase_") }
>

Re: flume dying on InterruptException (nanos)

Posted by AD <st...@gmail.com>.

were you able to work around it at all ?

On Tue, Oct 18, 2011 at 7:32 PM, Mingjie Lai <mj...@gmail.com> wrote:

>
> May relate to these 2:
> https://issues.apache.org/**jira/browse/FLUME-757<https://issues.apache.org/jira/browse/FLUME-757>
> https://issues.apache.org/**jira/browse/FLUME-798<https://issues.apache.org/jira/browse/FLUME-798>
>
> I saw it when I did some re-configures.
>
>
>
> On 10/18/2011 01:55 PM, AD wrote:
>
>> Hello,
>>
>>  My collector keeps dying with the following error, is this a known
>> issue? Any idea how to prevent or find out what is causing it ?  is
>> format("%{nanos}" an issue ?
>>
>> 2011-10-17 23:16:33,957 INFO
>> com.cloudera.flume.core.**connector.DirectDriver: Connector logicalNode
>> flume1-18 exited with error: null
>> java.lang.InterruptedException
>> at
>> java.util.concurrent.locks.**AbstractQueuedSynchronizer.**
>> tryAcquireNanos(**AbstractQueuedSynchronizer.**java:1246)
>> at
>> java.util.concurrent.locks.**ReentrantReadWriteLock$**WriteLock.tryLock(*
>> *ReentrantReadWriteLock.java:**1009)
>> at com.cloudera.flume.handlers.**rolling.RollSink.close(**
>> RollSink.java:296)
>> at
>> com.cloudera.flume.core.**EventSinkDecorator.close(**
>> EventSinkDecorator.java:67)
>> at
>> com.cloudera.flume.core.**EventSinkDecorator.close(**
>> EventSinkDecorator.java:67)
>>
>>
>> source:  collectorSource("35853")
>> sink:  regexAll("^([0-9.]+)\\s\\[([0-**9a-zA-z\\/:
>> -]+)\\]\\s([A-Z]+)\\s([a-zA-**Z0-9.:]+)\\s\"([^\\s]+)\"\\s([**
>> 0-9]+)\\s([0-9]+)\\s\"([^\\s]+**)\"\\s\"([a-zA-Z0-9\\/()_
>> -;]+)\"\\s(hit|miss)\\s([0-9.]**+)","hbase_remote_host","**
>> hbase_request_date","hbase_**request_method","hbase_**
>> request_host","hbase_request_**url","hbase_response_status","**
>> hbase_response_bytes","hbase_**referrer","hbase_user_agent","**
>> hbase_cache_hitmiss","hbase_**origin_firstbyte")
>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>> attr2hbase("apache_logs","f1",**"","hbase_") }
>>
>

Re: flume dying on InterruptException (nanos)

Posted by Mingjie Lai <mj...@gmail.com>.

May relate to these 2:
https://issues.apache.org/jira/browse/FLUME-757
https://issues.apache.org/jira/browse/FLUME-798

I saw it when I did some re-configures.


On 10/18/2011 01:55 PM, AD wrote:
> Hello,
>
>   My collector keeps dying with the following error, is this a known
> issue? Any idea how to prevent or find out what is causing it ?  is
> format("%{nanos}" an issue ?
>
> 2011-10-17 23:16:33,957 INFO
> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
> flume1-18 exited with error: null
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
> source:  collectorSource("35853")
> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
> attr2hbase("apache_logs","f1","","hbase_") }

Re: flume dying on InterruptException (nanos)

Posted by AD <st...@gmail.com>.

hmm not yet, will try though.

On Tue, Oct 18, 2011 at 8:20 PM, Prasad Mujumdar <pr...@cloudera.com>wrote:

>
>       Have you tried increasing the maximum number of open file on the
> system ?
>
> thanks
> Prasad
>
>
>
> On Tue, Oct 18, 2011 at 1:55 PM, AD <st...@gmail.com> wrote:
>
>> Hello,
>>
>>  My collector keeps dying with the following error, is this a known issue?
>> Any idea how to prevent or find out what is causing it ?  is
>> format("%{nanos}" an issue ?
>>
>> 2011-10-17 23:16:33,957 INFO
>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>> flume1-18 exited with error: null
>> java.lang.InterruptedException
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>  at
>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>  at
>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>> at
>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>
>>
>> source:  collectorSource("35853")
>> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>> attr2hbase("apache_logs","f1","","hbase_") }
>>
>
>

Re: flume dying on InterruptException (nanos)

Posted by Prasad Mujumdar <pr...@cloudera.com>.

      Have you tried increasing the maximum number of open file on the
system ?

thanks
Prasad


On Tue, Oct 18, 2011 at 1:55 PM, AD <st...@gmail.com> wrote:

> Hello,
>
>  My collector keeps dying with the following error, is this a known issue?
> Any idea how to prevent or find out what is causing it ?  is
> format("%{nanos}" an issue ?
>
> 2011-10-17 23:16:33,957 INFO
> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
> flume1-18 exited with error: null
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>  at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>  at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
> source:  collectorSource("35853")
> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
> attr2hbase("apache_logs","f1","","hbase_") }
>

Re: flume dying on InterruptException (nanos)

Posted by Cameron Gandevia <cg...@gmail.com>.

This would make sense to me. I guess the main concern would still be what if
the sink takes a very long time to respond. Although in its current state
interrupting the task is essentially killing the DirectDriver because
nothing downstream deals with the Interrupt.

It would be nice if the driver either didn't shutdown or was automatically
restarted upon failure.

On Wed, Oct 19, 2011 at 6:08 PM, Prasad Mujumdar <pr...@cloudera.com>wrote:

>
>   hmm ... I am wondering if the Trigger thread should just bail out without
> resetting trigger if it can't get hold of the lock in 1 sec. The next append
> or next trigger should take care of rotating the files ..
>
> thanks
> Prasad
>
>
> On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>
>> We recently modified the RollSink to hide our problem by giving it a few
>> seconds to finish writing before rolling. We are going to test it out and if
>> it fixes our issue we will provide a patch later today.
>>  On Oct 19, 2011 1:27 PM, "AD" <st...@gmail.com> wrote:
>>
>>> Yea, i am using Hbase sink, so i guess its possible something is getting
>>> hung up there and causing the collector to die. The number of file
>>> descriptors seems more than safe under the limit.
>>>
>>> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>>>
>>>> We were seeing the same issue when our HDFS instance was overloaded and
>>>> taking over a second to respond. I assume if whatever backend is down the
>>>> collector will die and need to be restarted when it becomes available again?
>>>> Doesn't seem very reliable
>>>>
>>>>
>>>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <
>>>> ralph.goers@dslextreme.com> wrote:
>>>>
>>>>> We saw this problem when it was taking more than 1 second for a
>>>>> response from writing to Cassandra (our back end).  A single long response
>>>>> will kill the collector.  We had to revert back to the version of Flume that
>>>>> uses syncrhonization instead of read/write locking to get around this.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>>>
>>>>> > Hello,
>>>>> >
>>>>> >  My collector keeps dying with the following error, is this a known
>>>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>>>> format("%{nanos}" an issue ?
>>>>> >
>>>>> > 2011-10-17 23:16:33,957 INFO
>>>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>>>> flume1-18 exited with error: null
>>>>> > java.lang.InterruptedException
>>>>> >       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>>>> >       at
>>>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>>>> >       at
>>>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >
>>>>> >
>>>>> > source:  collectorSource("35853")
>>>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>>>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Cameron Gandevia
>>>>
>>>
>>>
>


-- 
Thanks

Cameron Gandevia

Re: flume dying on InterruptException (nanos)

Posted by Kamal Bahadur <ma...@gmail.com>.

I agree with Prasad's solution. Since we are going to use different backends
(I use Cassandra) to store data, we cannot have some fixed time there.

Thanks,
Kamal

On Wed, Oct 19, 2011 at 6:08 PM, Prasad Mujumdar <pr...@cloudera.com>wrote:

>
>   hmm ... I am wondering if the Trigger thread should just bail out without
> resetting trigger if it can't get hold of the lock in 1 sec. The next append
> or next trigger should take care of rotating the files ..
>
> thanks
> Prasad
>
>
> On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>
>> We recently modified the RollSink to hide our problem by giving it a few
>> seconds to finish writing before rolling. We are going to test it out and if
>> it fixes our issue we will provide a patch later today.
>>  On Oct 19, 2011 1:27 PM, "AD" <st...@gmail.com> wrote:
>>
>>> Yea, i am using Hbase sink, so i guess its possible something is getting
>>> hung up there and causing the collector to die. The number of file
>>> descriptors seems more than safe under the limit.
>>>
>>> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>>>
>>>> We were seeing the same issue when our HDFS instance was overloaded and
>>>> taking over a second to respond. I assume if whatever backend is down the
>>>> collector will die and need to be restarted when it becomes available again?
>>>> Doesn't seem very reliable
>>>>
>>>>
>>>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <
>>>> ralph.goers@dslextreme.com> wrote:
>>>>
>>>>> We saw this problem when it was taking more than 1 second for a
>>>>> response from writing to Cassandra (our back end).  A single long response
>>>>> will kill the collector.  We had to revert back to the version of Flume that
>>>>> uses syncrhonization instead of read/write locking to get around this.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>>>
>>>>> > Hello,
>>>>> >
>>>>> >  My collector keeps dying with the following error, is this a known
>>>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>>>> format("%{nanos}" an issue ?
>>>>> >
>>>>> > 2011-10-17 23:16:33,957 INFO
>>>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>>>> flume1-18 exited with error: null
>>>>> > java.lang.InterruptedException
>>>>> >       at
>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>>>> >       at
>>>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>>>> >       at
>>>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >       at
>>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>>> >
>>>>> >
>>>>> > source:  collectorSource("35853")
>>>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>>>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Cameron Gandevia
>>>>
>>>
>>>
>

Re: flume dying on InterruptException (nanos)

Posted by Prasad Mujumdar <pr...@cloudera.com>.

  hmm ... I am wondering if the Trigger thread should just bail out without
resetting trigger if it can't get hold of the lock in 1 sec. The next append
or next trigger should take care of rotating the files ..

thanks
Prasad

On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cg...@gmail.com>wrote:

> We recently modified the RollSink to hide our problem by giving it a few
> seconds to finish writing before rolling. We are going to test it out and if
> it fixes our issue we will provide a patch later today.
> On Oct 19, 2011 1:27 PM, "AD" <st...@gmail.com> wrote:
>
>> Yea, i am using Hbase sink, so i guess its possible something is getting
>> hung up there and causing the collector to die. The number of file
>> descriptors seems more than safe under the limit.
>>
>> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>>
>>> We were seeing the same issue when our HDFS instance was overloaded and
>>> taking over a second to respond. I assume if whatever backend is down the
>>> collector will die and need to be restarted when it becomes available again?
>>> Doesn't seem very reliable
>>>
>>>
>>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ralph.goers@dslextreme.com
>>> > wrote:
>>>
>>>> We saw this problem when it was taking more than 1 second for a response
>>>> from writing to Cassandra (our back end).  A single long response will kill
>>>> the collector.  We had to revert back to the version of Flume that uses
>>>> syncrhonization instead of read/write locking to get around this.
>>>>
>>>> Ralph
>>>>
>>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> >  My collector keeps dying with the following error, is this a known
>>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>>> format("%{nanos}" an issue ?
>>>> >
>>>> > 2011-10-17 23:16:33,957 INFO
>>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>>> flume1-18 exited with error: null
>>>> > java.lang.InterruptedException
>>>> >       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>>> >       at
>>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>>> >       at
>>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>>> >       at
>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>> >       at
>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>> >
>>>> >
>>>> > source:  collectorSource("35853")
>>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Cameron Gandevia
>>>
>>
>>

Re: flume dying on InterruptException (nanos)

Posted by Cameron Gandevia <cg...@gmail.com>.

We recently modified the RollSink to hide our problem by giving it a few
seconds to finish writing before rolling. We are going to test it out and if
it fixes our issue we will provide a patch later today.
On Oct 19, 2011 1:27 PM, "AD" <st...@gmail.com> wrote:

> Yea, i am using Hbase sink, so i guess its possible something is getting
> hung up there and causing the collector to die. The number of file
> descriptors seems more than safe under the limit.
>
> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cg...@gmail.com>wrote:
>
>> We were seeing the same issue when our HDFS instance was overloaded and
>> taking over a second to respond. I assume if whatever backend is down the
>> collector will die and need to be restarted when it becomes available again?
>> Doesn't seem very reliable
>>
>>
>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ra...@dslextreme.com>wrote:
>>
>>> We saw this problem when it was taking more than 1 second for a response
>>> from writing to Cassandra (our back end).  A single long response will kill
>>> the collector.  We had to revert back to the version of Flume that uses
>>> syncrhonization instead of read/write locking to get around this.
>>>
>>> Ralph
>>>
>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>
>>> > Hello,
>>> >
>>> >  My collector keeps dying with the following error, is this a known
>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>> format("%{nanos}" an issue ?
>>> >
>>> > 2011-10-17 23:16:33,957 INFO
>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>> flume1-18 exited with error: null
>>> > java.lang.InterruptedException
>>> >       at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>> >       at
>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>> >       at
>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>> >       at
>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>> >       at
>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>> >
>>> >
>>> > source:  collectorSource("35853")
>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>
>>>
>>
>>
>> --
>> Thanks
>>
>> Cameron Gandevia
>>
>
>

Re: flume dying on InterruptException (nanos)

Posted by AD <st...@gmail.com>.

Yea, i am using Hbase sink, so i guess its possible something is getting
hung up there and causing the collector to die. The number of file
descriptors seems more than safe under the limit.

On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cg...@gmail.com>wrote:

> We were seeing the same issue when our HDFS instance was overloaded and
> taking over a second to respond. I assume if whatever backend is down the
> collector will die and need to be restarted when it becomes available again?
> Doesn't seem very reliable
>
>
> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ra...@dslextreme.com>wrote:
>
>> We saw this problem when it was taking more than 1 second for a response
>> from writing to Cassandra (our back end).  A single long response will kill
>> the collector.  We had to revert back to the version of Flume that uses
>> syncrhonization instead of read/write locking to get around this.
>>
>> Ralph
>>
>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>
>> > Hello,
>> >
>> >  My collector keeps dying with the following error, is this a known
>> issue? Any idea how to prevent or find out what is causing it ?  is
>> format("%{nanos}" an issue ?
>> >
>> > 2011-10-17 23:16:33,957 INFO
>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>> flume1-18 exited with error: null
>> > java.lang.InterruptedException
>> >       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>> >       at
>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>> >       at
>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>> >       at
>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>> >       at
>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>> >
>> >
>> > source:  collectorSource("35853")
>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>> attr2hbase("apache_logs","f1","","hbase_") }
>>
>>
>
>
> --
> Thanks
>
> Cameron Gandevia
>

Re: flume dying on InterruptException (nanos)

Posted by Cameron Gandevia <cg...@gmail.com>.

We were seeing the same issue when our HDFS instance was overloaded and
taking over a second to respond. I assume if whatever backend is down the
collector will die and need to be restarted when it becomes available again?
Doesn't seem very reliable

On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ra...@dslextreme.com>wrote:

> We saw this problem when it was taking more than 1 second for a response
> from writing to Cassandra (our back end).  A single long response will kill
> the collector.  We had to revert back to the version of Flume that uses
> syncrhonization instead of read/write locking to get around this.
>
> Ralph
>
> On Oct 18, 2011, at 1:55 PM, AD wrote:
>
> > Hello,
> >
> >  My collector keeps dying with the following error, is this a known
> issue? Any idea how to prevent or find out what is causing it ?  is
> format("%{nanos}" an issue ?
> >
> > 2011-10-17 23:16:33,957 INFO
> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
> flume1-18 exited with error: null
> > java.lang.InterruptedException
> >       at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> >       at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> >       at
> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> >       at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> >       at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> >
> >
> > source:  collectorSource("35853")
> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
> attr2hbase("apache_logs","f1","","hbase_") }
>
>


-- 
Thanks

Cameron Gandevia

Re: flume dying on InterruptException (nanos)

Posted by Ralph Goers <ra...@dslextreme.com>.

We saw this problem when it was taking more than 1 second for a response from writing to Cassandra (our back end).  A single long response will kill the collector.  We had to revert back to the version of Flume that uses syncrhonization instead of read/write locking to get around this.

Ralph

On Oct 18, 2011, at 1:55 PM, AD wrote:

> Hello,
> 
>  My collector keeps dying with the following error, is this a known issue? Any idea how to prevent or find out what is causing it ?  is format("%{nanos}" an issue ?
> 
> 2011-10-17 23:16:33,957 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode flume1-18 exited with error: null
> java.lang.InterruptedException
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> 	at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> 	at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> 	at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> 
> 
> source:  collectorSource("35853")
> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte") format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:") split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) { attr2hbase("apache_logs","f1","","hbase_") }