You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gyula Fóra <gy...@gmail.com> on 2017/07/12 07:48:53 UTC

Why would a kafka source checkpoint take so long?

Hi,

I have noticed a strange behavior in one of our jobs: every once in a while
the Kafka source checkpointing time becomes extremely large compared to
what it usually is. (To be very specific it is a kafka source chained with
a stateless map operator)

To be more specific checkpointing the offsets usually takes around 10ms
which sounds reasonable but in some checkpoints this goes into the 3-5
minutes range practically blocking the job for that period of time.
Yesterday I have observed even 10 minute delays. First I thought that some
sources might trigger checkpoints later than others, but adding some
logging and comparing it it seems that the triggerCheckpoint was received
at the same time.

Interestingly only one of the 3 kafka sources in the job seems to be
affected (last time I checked at least). We are still using the 0.8
consumer with commit on checkpoints. Also I dont see this happen in other
jobs.

Any clue on what might cause this?

Thanks :)
Gyula

Re: Why would a kafka source checkpoint take so long?

Posted by Vinay Patil <vi...@gmail.com>.
Hi Stephan,

Sure will do that next time when I observe it.

Regards,
Vinay Patil

On Thu, Jul 13, 2017 at 8:09 PM, Stephan Ewen <se...@apache.org> wrote:

> Is there any way you can pull a thread dump from the TMs at the point when
> that happens?
>
> On Wed, Jul 12, 2017 at 8:50 PM, vinay patil <vi...@gmail.com>
> wrote:
>
>> Hi Gyula,
>>
>> I have observed similar issue with FlinkConsumer09 and 010 and posted it
>> to the mailing list as well . This issue is not consistent, however
>> whenever it happens it leads to checkpoints getting failed or taking a long
>> time to complete.
>>
>> Regards,
>> Vinay Patil
>>
>> On Wed, Jul 12, 2017 at 7:00 PM, Gyula Fóra [via Apache Flink User
>> Mailing List archive.] <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=14232&i=0>> wrote:
>>
>>> I have added logging that will help determine this as well, next time
>>> this happens I will post the results. (Although there doesnt seem to be
>>> high backpressure)
>>>
>>> Thanks for the tips,
>>> Gyula
>>>
>>> Stephan Ewen <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=0>> ezt írta
>>> (időpont: 2017. júl. 12., Sze, 15:27):
>>>
>>>> Can it be that the checkpoint thread is waiting to grab the lock, which
>>>> is held by the chain under backpressure?
>>>>
>>>> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=1>> wrote:
>>>>
>>>>> Yes thats definitely what I am about to do next but just thought maybe
>>>>> someone has seen this before.
>>>>>
>>>>> Will post info next time it happens. (Not guaranteed to happen soon as
>>>>> it didn't happen for a long time before)
>>>>>
>>>>> Gyula
>>>>>
>>>>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=2>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> could you introduce some logging to figure out from which method call
>>>>>> the delay is introduced?
>>>>>>
>>>>>> Best,
>>>>>> Stefan
>>>>>>
>>>>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <[hidden email]
>>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=3>>:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are using the latest 1.3.1
>>>>>>
>>>>>> Gyula
>>>>>>
>>>>>> Urs Schoenenberger <[hidden email]
>>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=4>> ezt írta
>>>>>> (időpont: 2017. júl. 12., Sze, 10:44):
>>>>>>
>>>>>>> Hi Gyula,
>>>>>>>
>>>>>>> I don't know the cause unfortunately, but we observed a similiar
>>>>>>> issue
>>>>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to
>>>>>>> 1.2.1.
>>>>>>> Which version are you running on?
>>>>>>>
>>>>>>> Urs
>>>>>>>
>>>>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I have noticed a strange behavior in one of our jobs: every once
>>>>>>> in a while
>>>>>>> > the Kafka source checkpointing time becomes extremely large
>>>>>>> compared to
>>>>>>> > what it usually is. (To be very specific it is a kafka source
>>>>>>> chained with
>>>>>>> > a stateless map operator)
>>>>>>> >
>>>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>>>> 10ms
>>>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>>>> 3-5
>>>>>>> > minutes range practically blocking the job for that period of time.
>>>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>>>> that some
>>>>>>> > sources might trigger checkpoints later than others, but adding
>>>>>>> some
>>>>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>>>>> received
>>>>>>> > at the same time.
>>>>>>> >
>>>>>>> > Interestingly only one of the 3 kafka sources in the job seems to
>>>>>>> be
>>>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>>>> > consumer with commit on checkpoints. Also I dont see this happen
>>>>>>> in other
>>>>>>> > jobs.
>>>>>>> >
>>>>>>> > Any clue on what might cause this?
>>>>>>> >
>>>>>>> > Thanks :)
>>>>>>> > Gyula
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I have noticed a strange behavior in one of our jobs: every once
>>>>>>> in a
>>>>>>> > while the Kafka source checkpointing time becomes extremely large
>>>>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>>>>> > source chained with a stateless map operator)
>>>>>>> >
>>>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>>>> 10ms
>>>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>>>> 3-5
>>>>>>> > minutes range practically blocking the job for that period of time.
>>>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>>>> that
>>>>>>> > some sources might trigger checkpoints later than others, but
>>>>>>> adding
>>>>>>> > some logging and comparing it it seems that the triggerCheckpoint
>>>>>>> was
>>>>>>> > received at the same time.
>>>>>>> >
>>>>>>> > Interestingly only one of the 3 kafka sources in the job seems to
>>>>>>> be
>>>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>>>>> > other jobs.
>>>>>>> >
>>>>>>> > Any clue on what might cause this?
>>>>>>> >
>>>>>>> > Thanks :)
>>>>>>> > Gyula
>>>>>>>
>>>>>>> --
>>>>>>> Urs Schönenberger - [hidden email]
>>>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=5>
>>>>>>>
>>>>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/Why-would-a-kafka-source-checkpoint-take-so-long-
>>> tp14193p14210.html
>>> To start a new topic under Apache Flink User Mailing List archive.,
>>> email [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=14232&i=1>
>>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>>> NAML
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>> ------------------------------
>> View this message in context: Re: Why would a kafka source checkpoint
>> take so long?
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-so-long-tp14193p14232.html>
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>> at Nabble.com.
>>
>
>

Re: Why would a kafka source checkpoint take so long?

Posted by Stephan Ewen <se...@apache.org>.
Is there any way you can pull a thread dump from the TMs at the point when
that happens?

On Wed, Jul 12, 2017 at 8:50 PM, vinay patil <vi...@gmail.com>
wrote:

> Hi Gyula,
>
> I have observed similar issue with FlinkConsumer09 and 010 and posted it
> to the mailing list as well . This issue is not consistent, however
> whenever it happens it leads to checkpoints getting failed or taking a long
> time to complete.
>
> Regards,
> Vinay Patil
>
> On Wed, Jul 12, 2017 at 7:00 PM, Gyula Fóra [via Apache Flink User Mailing
> List archive.] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14232&i=0>> wrote:
>
>> I have added logging that will help determine this as well, next time
>> this happens I will post the results. (Although there doesnt seem to be
>> high backpressure)
>>
>> Thanks for the tips,
>> Gyula
>>
>> Stephan Ewen <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=14210&i=0>> ezt írta
>> (időpont: 2017. júl. 12., Sze, 15:27):
>>
>>> Can it be that the checkpoint thread is waiting to grab the lock, which
>>> is held by the chain under backpressure?
>>>
>>> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=1>> wrote:
>>>
>>>> Yes thats definitely what I am about to do next but just thought maybe
>>>> someone has seen this before.
>>>>
>>>> Will post info next time it happens. (Not guaranteed to happen soon as
>>>> it didn't happen for a long time before)
>>>>
>>>> Gyula
>>>>
>>>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=2>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> could you introduce some logging to figure out from which method call
>>>>> the delay is introduced?
>>>>>
>>>>> Best,
>>>>> Stefan
>>>>>
>>>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=3>>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We are using the latest 1.3.1
>>>>>
>>>>> Gyula
>>>>>
>>>>> Urs Schoenenberger <[hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=4>> ezt írta
>>>>> (időpont: 2017. júl. 12., Sze, 10:44):
>>>>>
>>>>>> Hi Gyula,
>>>>>>
>>>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>>>> Which version are you running on?
>>>>>>
>>>>>> Urs
>>>>>>
>>>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I have noticed a strange behavior in one of our jobs: every once in
>>>>>> a while
>>>>>> > the Kafka source checkpointing time becomes extremely large
>>>>>> compared to
>>>>>> > what it usually is. (To be very specific it is a kafka source
>>>>>> chained with
>>>>>> > a stateless map operator)
>>>>>> >
>>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>>> 10ms
>>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>>> 3-5
>>>>>> > minutes range practically blocking the job for that period of time.
>>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>>> that some
>>>>>> > sources might trigger checkpoints later than others, but adding some
>>>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>>>> received
>>>>>> > at the same time.
>>>>>> >
>>>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>>>> other
>>>>>> > jobs.
>>>>>> >
>>>>>> > Any clue on what might cause this?
>>>>>> >
>>>>>> > Thanks :)
>>>>>> > Gyula
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Hi,
>>>>>> >
>>>>>> > I have noticed a strange behavior in one of our jobs: every once in
>>>>>> a
>>>>>> > while the Kafka source checkpointing time becomes extremely large
>>>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>>>> > source chained with a stateless map operator)
>>>>>> >
>>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>>> 10ms
>>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>>> 3-5
>>>>>> > minutes range practically blocking the job for that period of time.
>>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>>> that
>>>>>> > some sources might trigger checkpoints later than others, but adding
>>>>>> > some logging and comparing it it seems that the triggerCheckpoint
>>>>>> was
>>>>>> > received at the same time.
>>>>>> >
>>>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>>>> > other jobs.
>>>>>> >
>>>>>> > Any clue on what might cause this?
>>>>>> >
>>>>>> > Thanks :)
>>>>>> > Gyula
>>>>>>
>>>>>> --
>>>>>> Urs Schönenberger - [hidden email]
>>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=5>
>>>>>>
>>>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>>>
>>>>>
>>>>>
>>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Why-would-a-kafka-source-checkpoint-take-so-
>> long-tp14193p14210.html
>> To start a new topic under Apache Flink User Mailing List archive., email [hidden
>> email] <http:///user/SendEmail.jtp?type=node&node=14232&i=1>
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Why would a kafka source checkpoint
> take so long?
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-so-long-tp14193p14232.html>
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at
> Nabble.com.
>

Re: Why would a kafka source checkpoint take so long?

Posted by vinay patil <vi...@gmail.com>.
Hi Gyula,

I have observed similar issue with FlinkConsumer09 and 010 and posted it to
the mailing list as well . This issue is not consistent, however whenever
it happens it leads to checkpoints getting failed or taking a long time to
complete.

Regards,
Vinay Patil

On Wed, Jul 12, 2017 at 7:00 PM, Gyula Fóra [via Apache Flink User Mailing
List archive.] <ml...@n4.nabble.com> wrote:

> I have added logging that will help determine this as well, next time this
> happens I will post the results. (Although there doesnt seem to be high
> backpressure)
>
> Thanks for the tips,
> Gyula
>
> Stephan Ewen <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14210&i=0>> ezt írta (időpont:
> 2017. júl. 12., Sze, 15:27):
>
>> Can it be that the checkpoint thread is waiting to grab the lock, which
>> is held by the chain under backpressure?
>>
>> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=14210&i=1>> wrote:
>>
>>> Yes thats definitely what I am about to do next but just thought maybe
>>> someone has seen this before.
>>>
>>> Will post info next time it happens. (Not guaranteed to happen soon as
>>> it didn't happen for a long time before)
>>>
>>> Gyula
>>>
>>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=2>> wrote:
>>>
>>>> Hi,
>>>>
>>>> could you introduce some logging to figure out from which method call
>>>> the delay is introduced?
>>>>
>>>> Best,
>>>> Stefan
>>>>
>>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=3>>:
>>>>
>>>> Hi,
>>>>
>>>> We are using the latest 1.3.1
>>>>
>>>> Gyula
>>>>
>>>> Urs Schoenenberger <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=4>> ezt írta
>>>> (időpont: 2017. júl. 12., Sze, 10:44):
>>>>
>>>>> Hi Gyula,
>>>>>
>>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>>> Which version are you running on?
>>>>>
>>>>> Urs
>>>>>
>>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have noticed a strange behavior in one of our jobs: every once in
>>>>> a while
>>>>> > the Kafka source checkpointing time becomes extremely large compared
>>>>> to
>>>>> > what it usually is. (To be very specific it is a kafka source
>>>>> chained with
>>>>> > a stateless map operator)
>>>>> >
>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>> 10ms
>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>> 3-5
>>>>> > minutes range practically blocking the job for that period of time.
>>>>> > Yesterday I have observed even 10 minute delays. First I thought
>>>>> that some
>>>>> > sources might trigger checkpoints later than others, but adding some
>>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>>> received
>>>>> > at the same time.
>>>>> >
>>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>>> other
>>>>> > jobs.
>>>>> >
>>>>> > Any clue on what might cause this?
>>>>> >
>>>>> > Thanks :)
>>>>> > Gyula
>>>>> >
>>>>> >
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>>> > while the Kafka source checkpointing time becomes extremely large
>>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>>> > source chained with a stateless map operator)
>>>>> >
>>>>> > To be more specific checkpointing the offsets usually takes around
>>>>> 10ms
>>>>> > which sounds reasonable but in some checkpoints this goes into the
>>>>> 3-5
>>>>> > minutes range practically blocking the job for that period of time.
>>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>>> > some sources might trigger checkpoints later than others, but adding
>>>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>>>> > received at the same time.
>>>>> >
>>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>>> > affected (last time I checked at least). We are still using the 0.8
>>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>>> > other jobs.
>>>>> >
>>>>> > Any clue on what might cause this?
>>>>> >
>>>>> > Thanks :)
>>>>> > Gyula
>>>>>
>>>>> --
>>>>> Urs Schönenberger - [hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=14210&i=5>
>>>>>
>>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>>
>>>>
>>>>
>>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-
> so-long-tp14193p14210.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml+s2336050n1h83@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-would-a-kafka-source-checkpoint-take-so-long-tp14193p14232.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Why would a kafka source checkpoint take so long?

Posted by Gyula Fóra <gy...@gmail.com>.
I have added logging that will help determine this as well, next time this
happens I will post the results. (Although there doesnt seem to be high
backpressure)

Thanks for the tips,
Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2017. júl. 12., Sze,
15:27):

> Can it be that the checkpoint thread is waiting to grab the lock, which is
> held by the chain under backpressure?
>
> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <gy...@gmail.com> wrote:
>
>> Yes thats definitely what I am about to do next but just thought maybe
>> someone has seen this before.
>>
>> Will post info next time it happens. (Not guaranteed to happen soon as it
>> didn't happen for a long time before)
>>
>> Gyula
>>
>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <s....@data-artisans.com>
>> wrote:
>>
>>> Hi,
>>>
>>> could you introduce some logging to figure out from which method call
>>> the delay is introduced?
>>>
>>> Best,
>>> Stefan
>>>
>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gy...@gmail.com>:
>>>
>>> Hi,
>>>
>>> We are using the latest 1.3.1
>>>
>>> Gyula
>>>
>>> Urs Schoenenberger <ur...@tngtech.com> ezt írta (időpont:
>>> 2017. júl. 12., Sze, 10:44):
>>>
>>>> Hi Gyula,
>>>>
>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>> Which version are you running on?
>>>>
>>>> Urs
>>>>
>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> while
>>>> > the Kafka source checkpointing time becomes extremely large compared
>>>> to
>>>> > what it usually is. (To be very specific it is a kafka source chained
>>>> with
>>>> > a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> some
>>>> > sources might trigger checkpoints later than others, but adding some
>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>> received
>>>> > at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> other
>>>> > jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>> >
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> > while the Kafka source checkpointing time becomes extremely large
>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>> > source chained with a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> > some sources might trigger checkpoints later than others, but adding
>>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>>> > received at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> > other jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>>
>>>> --
>>>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>>>
>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>
>>>
>>>
>

Re: Why would a kafka source checkpoint take so long?

Posted by Gyula Fóra <gy...@gmail.com>.
Hi,

I have seen this again yesterday, now with some logging it looks like
acquiring the lock took all the time. In this case it was pretty clear that
the job started falling behind a few minutes before starting the checkpoint
so backpressure seems to be the culprit.

Thanks,
Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2017. júl. 12., Sze,
15:27):

> Can it be that the checkpoint thread is waiting to grab the lock, which is
> held by the chain under backpressure?
>
> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <gy...@gmail.com> wrote:
>
>> Yes thats definitely what I am about to do next but just thought maybe
>> someone has seen this before.
>>
>> Will post info next time it happens. (Not guaranteed to happen soon as it
>> didn't happen for a long time before)
>>
>> Gyula
>>
>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <s....@data-artisans.com>
>> wrote:
>>
>>> Hi,
>>>
>>> could you introduce some logging to figure out from which method call
>>> the delay is introduced?
>>>
>>> Best,
>>> Stefan
>>>
>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gy...@gmail.com>:
>>>
>>> Hi,
>>>
>>> We are using the latest 1.3.1
>>>
>>> Gyula
>>>
>>> Urs Schoenenberger <ur...@tngtech.com> ezt írta (időpont:
>>> 2017. júl. 12., Sze, 10:44):
>>>
>>>> Hi Gyula,
>>>>
>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>> Which version are you running on?
>>>>
>>>> Urs
>>>>
>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> while
>>>> > the Kafka source checkpointing time becomes extremely large compared
>>>> to
>>>> > what it usually is. (To be very specific it is a kafka source chained
>>>> with
>>>> > a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> some
>>>> > sources might trigger checkpoints later than others, but adding some
>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>> received
>>>> > at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> other
>>>> > jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>> >
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> > while the Kafka source checkpointing time becomes extremely large
>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>> > source chained with a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> > some sources might trigger checkpoints later than others, but adding
>>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>>> > received at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> > other jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>>
>>>> --
>>>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>>>
>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>
>>>
>>>
>

Re: Why would a kafka source checkpoint take so long?

Posted by Stephan Ewen <se...@apache.org>.
Can it be that the checkpoint thread is waiting to grab the lock, which is
held by the chain under backpressure?

On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <gy...@gmail.com> wrote:

> Yes thats definitely what I am about to do next but just thought maybe
> someone has seen this before.
>
> Will post info next time it happens. (Not guaranteed to happen soon as it
> didn't happen for a long time before)
>
> Gyula
>
> On Wed, Jul 12, 2017, 12:13 Stefan Richter <s....@data-artisans.com>
> wrote:
>
>> Hi,
>>
>> could you introduce some logging to figure out from which method call the
>> delay is introduced?
>>
>> Best,
>> Stefan
>>
>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gy...@gmail.com>:
>>
>> Hi,
>>
>> We are using the latest 1.3.1
>>
>> Gyula
>>
>> Urs Schoenenberger <ur...@tngtech.com> ezt írta (időpont:
>> 2017. júl. 12., Sze, 10:44):
>>
>>> Hi Gyula,
>>>
>>> I don't know the cause unfortunately, but we observed a similiar issue
>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>> Which version are you running on?
>>>
>>> Urs
>>>
>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>> > Hi,
>>> >
>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>> while
>>> > the Kafka source checkpointing time becomes extremely large compared to
>>> > what it usually is. (To be very specific it is a kafka source chained
>>> with
>>> > a stateless map operator)
>>> >
>>> > To be more specific checkpointing the offsets usually takes around 10ms
>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>> > minutes range practically blocking the job for that period of time.
>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>> some
>>> > sources might trigger checkpoints later than others, but adding some
>>> > logging and comparing it it seems that the triggerCheckpoint was
>>> received
>>> > at the same time.
>>> >
>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>> > affected (last time I checked at least). We are still using the 0.8
>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>> other
>>> > jobs.
>>> >
>>> > Any clue on what might cause this?
>>> >
>>> > Thanks :)
>>> > Gyula
>>> >
>>> >
>>> >
>>> > Hi,
>>> >
>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>> > while the Kafka source checkpointing time becomes extremely large
>>> > compared to what it usually is. (To be very specific it is a kafka
>>> > source chained with a stateless map operator)
>>> >
>>> > To be more specific checkpointing the offsets usually takes around 10ms
>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>> > minutes range practically blocking the job for that period of time.
>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>> > some sources might trigger checkpoints later than others, but adding
>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>> > received at the same time.
>>> >
>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>> > affected (last time I checked at least). We are still using the 0.8
>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>> > other jobs.
>>> >
>>> > Any clue on what might cause this?
>>> >
>>> > Thanks :)
>>> > Gyula
>>>
>>> --
>>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>>
>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>
>>
>>

Re: Why would a kafka source checkpoint take so long?

Posted by Gyula Fóra <gy...@gmail.com>.
Yes thats definitely what I am about to do next but just thought maybe
someone has seen this before.

Will post info next time it happens. (Not guaranteed to happen soon as it
didn't happen for a long time before)

Gyula

On Wed, Jul 12, 2017, 12:13 Stefan Richter <s....@data-artisans.com>
wrote:

> Hi,
>
> could you introduce some logging to figure out from which method call the
> delay is introduced?
>
> Best,
> Stefan
>
> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gy...@gmail.com>:
>
> Hi,
>
> We are using the latest 1.3.1
>
> Gyula
>
> Urs Schoenenberger <ur...@tngtech.com> ezt írta (időpont:
> 2017. júl. 12., Sze, 10:44):
>
>> Hi Gyula,
>>
>> I don't know the cause unfortunately, but we observed a similiar issue
>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>> Which version are you running on?
>>
>> Urs
>>
>> On 12.07.2017 09:48, Gyula Fóra wrote:
>> > Hi,
>> >
>> > I have noticed a strange behavior in one of our jobs: every once in a
>> while
>> > the Kafka source checkpointing time becomes extremely large compared to
>> > what it usually is. (To be very specific it is a kafka source chained
>> with
>> > a stateless map operator)
>> >
>> > To be more specific checkpointing the offsets usually takes around 10ms
>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>> > minutes range practically blocking the job for that period of time.
>> > Yesterday I have observed even 10 minute delays. First I thought that
>> some
>> > sources might trigger checkpoints later than others, but adding some
>> > logging and comparing it it seems that the triggerCheckpoint was
>> received
>> > at the same time.
>> >
>> > Interestingly only one of the 3 kafka sources in the job seems to be
>> > affected (last time I checked at least). We are still using the 0.8
>> > consumer with commit on checkpoints. Also I dont see this happen in
>> other
>> > jobs.
>> >
>> > Any clue on what might cause this?
>> >
>> > Thanks :)
>> > Gyula
>> >
>> >
>> >
>> > Hi,
>> >
>> > I have noticed a strange behavior in one of our jobs: every once in a
>> > while the Kafka source checkpointing time becomes extremely large
>> > compared to what it usually is. (To be very specific it is a kafka
>> > source chained with a stateless map operator)
>> >
>> > To be more specific checkpointing the offsets usually takes around 10ms
>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>> > minutes range practically blocking the job for that period of time.
>> > Yesterday I have observed even 10 minute delays. First I thought that
>> > some sources might trigger checkpoints later than others, but adding
>> > some logging and comparing it it seems that the triggerCheckpoint was
>> > received at the same time.
>> >
>> > Interestingly only one of the 3 kafka sources in the job seems to be
>> > affected (last time I checked at least). We are still using the 0.8
>> > consumer with commit on checkpoints. Also I dont see this happen in
>> > other jobs.
>> >
>> > Any clue on what might cause this?
>> >
>> > Thanks :)
>> > Gyula
>>
>> --
>> Urs Schönenberger - urs.schoenenberger@tngtech.com
>>
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>
>

Re: Why would a kafka source checkpoint take so long?

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

could you introduce some logging to figure out from which method call the delay is introduced?

Best,
Stefan

> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gy...@gmail.com>:
> 
> Hi,
> 
> We are using the latest 1.3.1
> 
> Gyula
> 
> Urs Schoenenberger <urs.schoenenberger@tngtech.com <ma...@tngtech.com>> ezt írta (időpont: 2017. júl. 12., Sze, 10:44):
> Hi Gyula,
> 
> I don't know the cause unfortunately, but we observed a similiar issue
> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
> Which version are you running on?
> 
> Urs
> 
> On 12.07.2017 09:48, Gyula Fóra wrote:
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a while
> > the Kafka source checkpointing time becomes extremely large compared to
> > what it usually is. (To be very specific it is a kafka source chained with
> > a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that some
> > sources might trigger checkpoints later than others, but adding some
> > logging and comparing it it seems that the triggerCheckpoint was received
> > at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in other
> > jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
> >
> >
> >
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a
> > while the Kafka source checkpointing time becomes extremely large
> > compared to what it usually is. (To be very specific it is a kafka
> > source chained with a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that
> > some sources might trigger checkpoints later than others, but adding
> > some logging and comparing it it seems that the triggerCheckpoint was
> > received at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in
> > other jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
> 
> --
> Urs Schönenberger - urs.schoenenberger@tngtech.com <ma...@tngtech.com>
> 
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082


Re: Why would a kafka source checkpoint take so long?

Posted by Gyula Fóra <gy...@gmail.com>.
Hi,

We are using the latest 1.3.1

Gyula

Urs Schoenenberger <ur...@tngtech.com> ezt írta (időpont:
2017. júl. 12., Sze, 10:44):

> Hi Gyula,
>
> I don't know the cause unfortunately, but we observed a similiar issue
> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
> Which version are you running on?
>
> Urs
>
> On 12.07.2017 09:48, Gyula Fóra wrote:
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a
> while
> > the Kafka source checkpointing time becomes extremely large compared to
> > what it usually is. (To be very specific it is a kafka source chained
> with
> > a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that
> some
> > sources might trigger checkpoints later than others, but adding some
> > logging and comparing it it seems that the triggerCheckpoint was received
> > at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in other
> > jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
> >
> >
> >
> > Hi,
> >
> > I have noticed a strange behavior in one of our jobs: every once in a
> > while the Kafka source checkpointing time becomes extremely large
> > compared to what it usually is. (To be very specific it is a kafka
> > source chained with a stateless map operator)
> >
> > To be more specific checkpointing the offsets usually takes around 10ms
> > which sounds reasonable but in some checkpoints this goes into the 3-5
> > minutes range practically blocking the job for that period of time.
> > Yesterday I have observed even 10 minute delays. First I thought that
> > some sources might trigger checkpoints later than others, but adding
> > some logging and comparing it it seems that the triggerCheckpoint was
> > received at the same time.
> >
> > Interestingly only one of the 3 kafka sources in the job seems to be
> > affected (last time I checked at least). We are still using the 0.8
> > consumer with commit on checkpoints. Also I dont see this happen in
> > other jobs.
> >
> > Any clue on what might cause this?
> >
> > Thanks :)
> > Gyula
>
> --
> Urs Schönenberger - urs.schoenenberger@tngtech.com
>
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>

Re: Why would a kafka source checkpoint take so long?

Posted by Urs Schoenenberger <ur...@tngtech.com>.
Hi Gyula,

I don't know the cause unfortunately, but we observed a similiar issue
on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
Which version are you running on?

Urs

On 12.07.2017 09:48, Gyula Fóra wrote:
> Hi,
> 
> I have noticed a strange behavior in one of our jobs: every once in a while
> the Kafka source checkpointing time becomes extremely large compared to
> what it usually is. (To be very specific it is a kafka source chained with
> a stateless map operator)
> 
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that some
> sources might trigger checkpoints later than others, but adding some
> logging and comparing it it seems that the triggerCheckpoint was received
> at the same time.
> 
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in other
> jobs.
> 
> Any clue on what might cause this?
> 
> Thanks :)
> Gyula
> 
> 
> 
> Hi,
> 
> I have noticed a strange behavior in one of our jobs: every once in a
> while the Kafka source checkpointing time becomes extremely large
> compared to what it usually is. (To be very specific it is a kafka
> source chained with a stateless map operator)
> 
> To be more specific checkpointing the offsets usually takes around 10ms
> which sounds reasonable but in some checkpoints this goes into the 3-5
> minutes range practically blocking the job for that period of time.
> Yesterday I have observed even 10 minute delays. First I thought that
> some sources might trigger checkpoints later than others, but adding
> some logging and comparing it it seems that the triggerCheckpoint was
> received at the same time.
> 
> Interestingly only one of the 3 kafka sources in the job seems to be
> affected (last time I checked at least). We are still using the 0.8
> consumer with commit on checkpoints. Also I dont see this happen in
> other jobs.
> 
> Any clue on what might cause this?
> 
> Thanks :)
> Gyula

-- 
Urs Schönenberger - urs.schoenenberger@tngtech.com

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082