You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Bartek Jakub (JIRA)" <ji...@apache.org> on 2019/03/06 14:44:00 UTC

[jira] [Created] (KAFKA-8052) Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request

Bartek Jakub created KAFKA-8052:
-----------------------------------

             Summary: Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request 
                 Key: KAFKA-8052
                 URL: https://issues.apache.org/jira/browse/KAFKA-8052
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 2.1.0
            Reporter: Bartek Jakub


I noticed in my logs some weird symptom. I see in logs intermittent log:

 
{noformat}
2019-03-06 14:02:13.024 INFO 1 --- [container-1-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-4, groupId=service-main] Node 2 was unable to process the fetch request with (sessionId=1321134604, epoch=125730): INVALID_FETCH_SESSION_EPOCH.{noformat}
which happens every ~1 hour.

 

I was wondering if it's my Kafka provider fault so I decided to investigate the problem and I tried to reproduce the issue on my local - with success. My configuration is:
* Kafka Clients version - 2.0.1
* Kafka - 2.12_2.1.0

I enabled trace logs for 'org.apache.kafka.clients' and that's what I get:

 
{noformat}
2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=525) for node 1001. Added (), altered (), removed () out of (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
2019-03-05 21:04:16.161 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=525,topics=[],forgotten_topics_data=[]} with correlation id 629 to node 1001
2019-03-05 21:04:16.664 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Completed receive from node 1001 for FETCH with correlation id 629, received {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]}
2019-03-05 21:04:16.664 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Node 1001 sent an incremental fetch response for session 197970881 with response=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)



2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=526) for node 1001. Added (), altered (), removed () out of (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
2019-03-05 21:04:16.665 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F630] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[]} with correlation id 630 to node 1001
2019-03-05 21:04:17.152 DEBUG 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main - ???] Sending Heartbeat request to coordinator localhost:9092 (id: 2147482646 rack: null)
2019-03-05 21:04:17.153 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - H631] Sending HEARTBEAT {group_id=service-main,generation_id=108,member_id=consumer-3-7e3c718e-f472-4bd8-9404-458ae5ed9a26} with correlation id 631 to node 2147482646
2019-03-05 21:04:17.254 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - ???] Completed receive from node 2147482646 for HEARTBEAT with correlation id 631, received {throttle_time_ms=0,error_code=0}


// After two seconds FETCH response has been received...
2019-03-05 21:04:19.141 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F630] Completed receive from node 1001 for FETCH with correlation id 630, received {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]}
2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main] Received successful Heartbeat response

// But in the same moment, Fethcer is sending FETCH with same epoch once again
2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Built incremental fetch (sessionId=197970881, epoch=526) for node 1001. Added (), altered (), removed (itunes-command-19, itunes-command-18, itunes-command-11) out of (itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=service-main] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(itunes-command-19, itunes-command-18, itunes-command-11), implied=(itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null)
2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main - F630 ] Node 1001 sent an incremental fetch response for session 197970881 with response=(), implied=(itunes-command-10, itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, itunes-command-16)
2019-03-05 21:04:19.142 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main - F632] Sending FETCH {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[{topic=itunes-command,partitions=[19,18,11]}]} with correlation id 632 to node 1001
2019-03-05 21:04:19.152 TRACE 3052 --- [container-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Completed receive from node 1001 for FETCH with correlation id 632, received {throttle_time_ms=0,error_code=71,session_id=0,responses=[]}
2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Found least loaded node localhost:9092 (id: 1001 rack: null)
2019-03-05 21:04:52.546 DEBUG 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending metadata request (type=MetadataRequest, topics=itunes-command) to node localhost:9092 (id: 1001 rack: null)
2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, groupId=service-main] Sending METADATA {topics=[itunes-command],allow_auto_topic_creation=true} with correlation id 633 to node 1001
2019-03-05 21:04:52.547 INFO 3052 --- [ service-main] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=service-main] Group coordinator localhost:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery
2019-03-05 21:12:04.206 INFO 3052 --- [container-0-C-1] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, groupId=service-main] Node 1001 was unable to process the fetch request with (sessionId=197970881, epoch=527): INVALID_FETCH_SESSION_EPOCH.{noformat}
 

The first request has been sent with epoch 525 and that's ok.

The second request has been sent with epoch 526, and... in the meantime, AbstractCoordinator starts sending Heartbeat request. Everything stops for ~2 seconds. After these 2 seconds, the response for FETCH request has been received. *But* *in the same moment*, Fetcher decides to send FETCH request *with the same epoch* once again! That results with INVALID_FETCH_SESSION_EPOCH response from Kafka node (which is understandable).

I'm wondering if is there some workaround for this problem?
And the second question is - does it impact my service performance?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)