You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Akshay Sharma (Jira)" <ji...@apache.org> on 2020/08/20 09:54:00 UTC

[jira] [Commented] (KAFKA-8052) Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request

    [ https://issues.apache.org/jira/browse/KAFKA-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181079#comment-17181079 ] 

Akshay Sharma commented on KAFKA-8052:
--------------------------------------

[~rsivaram], Hi i'm seeing this issue in 2.3.0 kafka build also.

I've raised a bug for the same.

https://issues.apache.org/jira/browse/KAFKA-10423

> Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request 
> ----------------------------------------------------------------
>
>                 Key: KAFKA-8052
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8052
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.1.0
>            Reporter: Bartek Jakub
>            Assignee: Rajini Sivaram
>            Priority: Major
>             Fix For: 2.3.0
>
>
> I noticed in my logs some weird behavior. I see in logs intermittent log: 
> {noformat}
> 2019-03-06 14:02:13.024 INFO 1 --- [container-1-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-4, groupId=service-main] Node 2 was unable to process the fetch request with (sessionId=1321134604, epoch=125730): INVALID_FETCH_SESSION_EPOCH.{noformat}
> which happens every ~1 hour. 
>  
> I was wondering if it's my Kafka provider fault so I decided to investigate the problem and I tried to reproduce the issue on my local - with success. My configuration is:
>  * Kafka Clients version - 2.0.1
>  * Kafka - 2.12_2.1.0
>  
> I enabled trace logs for 'org.apache.kafka.clients' and that's what I get:
> {noformat}
> 2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=525) for node 1001. Added (), altered (), removed () out of (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
> 2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
> 2019-03-05 21:04:16.161 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=525,topics=[],forgotten_topics_data=[]} with correlation id 629 to node 1001
> 2019-03-05 21:04:16.664 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Completed receive from node 1001 for FETCH with correlation id 629, received {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]}
> 2019-03-05 21:04:16.664 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Node 1001 sent an incremental fetch response for session 197970881 with response=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
> 2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=526) for node 1001. Added (), altered (), removed () out of (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
> 2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
> 2019-03-05 21:04:16.665 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F630] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[]} with correlation id 630 to node 1001
> 2019-03-05 21:04:17.152 DEBUG 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main - ???] Sending Heartbeat request to coordinator localhost:9092 (id: 2147482646 rack: null)
> 2019-03-05 21:04:17.153 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - H631] Sending HEARTBEAT {group_id=service-main,generation_id=108,member_id=consumer-3-7e3c718e-f472-4bd8-9404-458ae5ed9a26} with correlation id 631 to node 2147482646
> 2019-03-05 21:04:17.254 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - ???] Completed receive from node 2147482646 for HEARTBEAT with correlation id 631, received {throttle_time_ms=0,error_code=0}
> // After two seconds FETCH response has been received...
> 2019-03-05 21:04:19.141 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F630] Completed receive from node 1001 for FETCH with correlation id 630, received {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]}
> 2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main] Received successful Heartbeat response
> // But in the same moment, Fethcer is sending FETCH with same epoch once again
> 2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=526) for node 1001. Added (), altered (), removed (itunes-command-19, itunes-command-18, itunes-command-11) out of (itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
> 2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(itunes-command-19, itunes-command-18, itunes-command-11), implied=(itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
> 2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main - F630 ] Node 1001 sent an incremental fetch response for session 197970881 with response=(), implied=(itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
> 2019-03-05 21:04:19.142 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F632] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[{topic=itunes-command,partitions=[19,18,11]}]} with correlation id 632 to node 1001
> 2019-03-05 21:04:19.152 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Completed receive from node 1001 for FETCH with correlation id 632, received {throttle_time_ms=0,error_code=71,session_id=0,responses=[]}
> 2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Found least loaded node localhost:9092 (id: 1001 rack: null)
> 2019-03-05 21:04:52.546 DEBUG 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending metadata request (type=MetadataRequest, topics=itunes-command) to node localhost:9092 (id: 1001 rack: null)
> 2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending METADATA {topics=[itunes-command],allow_auto_topic_creation=true} with correlation id 633 to node 1001
> 2019-03-05 21:04:52.547 INFO 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main] Group coordinator localhost:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery
> 2019-03-05 21:12:04.206 INFO 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Node 1001 was unable to process the fetch request with (sessionId=197970881, epoch=527): INVALID_FETCH_SESSION_EPOCH.{noformat}
>  
> The first request has been sent with epoch 525 and that's ok.
> The second request has been sent with epoch 526, and... in the meantime, AbstractCoordinator starts sending Heartbeat request. Everything stops for ~2 seconds. After these 2 seconds, the response for FETCH request has been received. *But* *in the same moment*, Fetcher decides to send FETCH request *with the same epoch* once again! That results with INVALID_FETCH_SESSION_EPOCH response from Kafka node (which is understandable).
> I'm wondering if is there some workaround for this problem?
>  And the second question is - does it impact my service performance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)