You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2012/06/27 01:41:42 UTC

[jira] [Created] (KAFKA-376) we need to expose different data to fetch requests from the follower replicas and consumer clients

Jun Rao created KAFKA-376:
-----------------------------

             Summary: we need to expose different data to fetch requests from the follower replicas and consumer clients
                 Key: KAFKA-376
                 URL: https://issues.apache.org/jira/browse/KAFKA-376
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8
            Reporter: Jun Rao


Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao closed KAFKA-376.
-------------------------

    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v5.patch

Thanks for taking a look, Jun.  I've attached a new patch that ddresses the following items:
- readMessageSet now returns Either[Short, (MessageSet, Long)] where, on success, we return both the messages and the highwatermark of the leader.
- availableFetchBytes now logs info when it gets an UnknownTopicOrPartitionException exception, and logs an error for any other exceptions.

I tried to fiddle with EasyMock to get around the optional argument issue, but had no luck with it.  Seems like Mockito get's around it by mocking out all possible invocations permuting the parameters which is a little nasty.

Let me know what you think.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451751#comment-13451751 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v7. +1 Please commit to 0.8.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v6.patch
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-376:
--------------------------

    Summary: expose different data to fetch requests from the follower replicas and consumer clients  (was: we need to expose different data to fetch requests from the follower replicas and consumer clients)
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425024#comment-13425024 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Prashanth,

Thanks for the patch. Some comments:

1. In KafkaApis, currently we allow fetch requests on follower replicas, in addition to the leader replica. This is potentially useful for debugging purpose and it would be good to keep this. For normal consumer clients, the fetch requests will still be routed to the leader. However, for tools like ConsoleConsumer, we can potentially consume from any replica.

2. SimpleFetchTest:
2.1 Could you add the Apache header and remove the author comment?
2.2 Since you mocked ReplicaManager, could we do the test with just 1 replica?
2.3 Not sure if bigFetch tests anything more than goodFetch in testNonReplicaSeesHwWhenFetching(). In addition to goodFetch, we should add a fetchRequest with an offset at HW and expect an empty MessageSet in the response.
2.4 In testNonReplicaSeesHwWhenFetching(), the second comment of " // there should be no response" is incorrect.
2.5 For verification, it's probably simpler if we just verify that all log.read requests made are correct.

Finally, we probably should have kafka-434 committed first since it's a big patch and has been rebased quite a few times. It's almost there and should be committed this week.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448200#comment-13448200 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Actually, there is one other issue:

In readMessageSet(topic: String, partition: Int, offset: Long, maxSize: Int, fromFollower: Boolean), could we make sure that actualSize is always >=0 ?
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417864#comment-13417864 ] 

Jay Kreps commented on KAFKA-376:
---------------------------------

Hey Prashanth, how goes it? Let me know if you don't have time and I can pick this up.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon resolved KAFKA-376.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8

Thanks Jun.  Comitted to 0.8.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451677#comment-13451677 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

Good catch, Jun.  V6 patch attached; if all is good, I'll go ahead and commit to the 0.8 branch.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v1.patch

Sigh, I took way too long to take a look at this again.  Regardless, I've attached a new patch that addresses the points.  I'm still a little unhappy with the logic to determine offsets - though I believe it's functionality correct, something about it seems off.  I'll mull it over while comments come in, hopefully someone points something out.


                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447732#comment-13447732 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v5, Prashanth. +1 on the patch.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418310#comment-13418310 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

I actually didn't get a chance to look at this, though I'd really like to.  Tell you what, I'll try to start this weekend and if I'm unable to, you can take it off my hands :)
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v3.patch

Here we go, I've attached v3 of the patch.  It's effectively the same, but with some adjustments to compile against latest trunk.  Let me know what you guys think.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v4.patch
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445023#comment-13445023 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Also, in availableFetchBytes(), we should probably distinguish UnknownTopicOrPartitionException from other exceptions. For the former, we just need to add an info level log. For the latter, we need to add an error level log with the stacktrace.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-DRAFT.patch

Hi all, I feel like I should add a draft patch because the code has grown much more complex since the last time I touched it.  The patch itself is minimal but a little nasty, so I'd like to clean it up a little but I figured I'd submit a draft so more eyes can get on it.  Here goes ... an outline:

1. Modified KafkaApis.availableFetchBytes to check the leader replicas highwaterMark if the request is coming from a follower, otherwise use the logEndOffset.  
2. Modified readMessageSet to not read beyond the highwaterMark of the replica's log (which is local because it's the leader) if the request is coming from a regular, non-follower, consumer.

One thing I'm considering doing is putting the log read logic in the replica itself since it is aware of hw and leo.  There's also a ReplicaManager.getLeaderReplica that throws an exception if the leader replica doesn't exist but returns an Option[Replica] - can this just return a Replica?  

As for the test, I couldn't find a decent place to put it because it includes both replication and simple consumer tests.  Rather than split the tests into two separate classes, I thought it'd be better to group them together.

Looking forward to the suggestions.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434184#comment-13434184 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v1. It looks good overall. Some comments:

20. KafkaApis.availableFetchBytes: getting the leo of a replica can be done as leader.logEndOffset.

21. Could you rebase and make sure that single_host_multi_brokers under system_test passes?
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426692#comment-13426692 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

For 1, yes, you are right. We actually ensure that fetch requests can only be made on the leader right now. So, we can relax it in a separate jira.

Yes, I meant 343.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Neha Narkhede (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-376:
--------------------------------

    Assignee: Prashanth Menon
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420800#comment-13420800 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

Minor update: Started some of the work involved, didn't get a chance to put as much time to it as I'd like.  Hoping to get a patch out by mid-end of this week.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442253#comment-13442253 ] 

Prashanth Menon edited comment on KAFKA-376 at 8/27/12 1:51 PM:
----------------------------------------------------------------

Here we go, I've attached v3 of the patch.  It's effectively the same, but with some adjustments to compile against latest trunk.  Let me know what you guys think.

Minor edit, thanks for taking care of the other JIRA's.  The last few weeks have been fairly busy, but the good news is that I see a break which I hope to use to get back into the swing of things :)  As for the offsets API, can we take care of that in a separate JIRA?
                
      was (Author: prashanth.menon):
    Here we go, I've attached v3 of the patch.  It's effectively the same, but with some adjustments to compile against latest trunk.  Let me know what you guys think.
                  
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v2.patch

Hi all,

Find a second patch that addresses 20 and 21 (and some line spacing cleanup).  It passes all the tests except for BackwardsCompatibilityTest; that test uses a pre-constructed log-data file that's loaded up when the Kafka server starts.  The test then issues an offset request and pulls all the data back and expected 100 messages total.  There are three issues here:

1) It looks like we need to handle the case where a topic/partition has a replication factor of one.  In such cases, updating the LEO of the log should also update the HW to the same value.  I took care of this in the patch, but I'd like some feedback on if it's the correct approach.
2) What should the server do on startup if it is a leader of a specific topic/partition and finds local data available for it.  Presumable, it should truncate it's log to the value in the checkpoint file, but it doesn't seem to be done now.  Is this thinking correct? 
3) KafkaApis should use the leader's HW to perform a bounds check when responding to an offset request.  It currently checks for leader presence, adding a check in should be relatively simple.  I can attach another patch with that (and any other issues that come up) mid-week.  
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437933#comment-13437933 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Prashanth, thanks for the updates. 

1) We have a separate jira kafka-420 to track the issue of maintaining HW with just 1 replica. It probably needs a bit more work than what's in your patch. First of all, we should increase HW as long as ISR (not assigned replica) has only 1 replica. Second, we need to check if we can increase HW in at least two other places (a) when a replica becomes the leader, (b) when ISR shrinks to size 1.

2) A new leader should never truncate its log. Only a follower truncates its log. This is because the new leader is guaranteed to have all committed data in its log and truncating any data from its log may risk losing a committed message.

3) Yes, we should fix the getOffset api. We will need to distinguish requests from a follower and a regular client, so that we can return either LEO or HW accordingly. We can add a replicaId to the getOffsetRequest like FetchRequest.

4) There are a couple other jiras that complicate things a bit. (a) In kafka-461, we are trying to get rid of the support of magic byte 0 in Message, which will end up remove BackwardsCompatibilityTest. However, if BackwardsCompatibilityTest is the only test that fails for this jira, that means we don't have enough unit test coverage for this jira and kafka-420. So, we will need to add a new unit test. (b) kafka-351 is doing some refactoring of ReplicaManager, which will change a bit how KafkaApis interacts with ReplicaManager.

Ideally, we probably should fix things in the following order. (a) get kafka-351 (I hope to check in mid this week) in first so that we can use the cleaner api in ReplicaManager; (b) fix kafka-420 and add a new unit test; (c) fix this jira and patch OffsetRequest. Does this make sense to you? We can probably have someone else work on kafka-420 if you don't have the time. 
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418337#comment-13418337 ] 

Jay Kreps commented on KAFKA-376:
---------------------------------

Awesome, sounds great! No pressure, just didn't want it to fall through the cracks. :-)
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451713#comment-13451713 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v6. We probably should do the length check inside log.read() , instead of readMessageSet. This way, we make sure that the caller's offset is checked by Log.findRange and an OffsetOutOfRangeException can be thrown if needed. In log.read(), after Log.findRange, we can check if length is <=0  and if so, immediately return an empty set.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v7.patch
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445013#comment-13445013 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v4. The EasyMock issue is interesting. Maybe you have to mock getReplica(topic, partition) directly. 

Another thought. The only reason that we call getReplica in KafkaApis.readMessageSets(fetchRequest) is to get the highWatermark. We can change readMessageSet(topic: String, partition: Int, offset: Long, maxSize: Int, fromFollower: Boolean) to return Either[Short, (MessageSet, highWatermark]) instead. This way, we can avoid calling getReplica in the first readMessageSets(). In general, the fewer times that we call getReplica the better since there are few places for error handling.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433821#comment-13433821 ] 

Joel Koshy commented on KAFKA-376:
----------------------------------

Hi Prashanth, the patch didn't apply cleanly for me. Can you svn up? 
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment:     (was: KAFKA-376-v4.patch)
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v7.patch

Argh, should have run the test suite before submitting the patch.  I've attached a new one that moves the logic into Log.read().
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment: KAFKA-376-v4.patch

Hi all,

So I've attached a new patch that addresses all the points except the second point in 30.2 .  It looks like using EasyMock to mock methods with optional arguments is tricky; the call to ReplicaManager.getReplica is causing issues because the brokerId is optional.  I suspect the Scala compiler produces several versions of the method that are chained together at runtime, but which EasyMock can't resolve for some reason.  I've tried some trickery to no avail.

It's a bad idea to modify code to accomadate faulty test/tools, but I figured I'd attach a patch for review while I check out other options.  Let me know what you think,
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444079#comment-13444079 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

Thanks for the review Jun.  30 and 31 are all silly mistakes by me.  I'll attach a patch later today to address them.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-376:
-----------------------------

    Labels: bugs  (was: )
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashanth Menon updated KAFKA-376:
----------------------------------

    Attachment:     (was: KAFKA-376-v7.patch)
    
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch, KAFKA-376-v4.patch, KAFKA-376-v5.patch, KAFKA-376-v6.patch, KAFKA-376-v7.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408658#comment-13408658 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

I can take a look at this, probably this weekend.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Prashanth Menon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426249#comment-13426249 ] 

Prashanth Menon commented on KAFKA-376:
---------------------------------------

Thanks for the comments, Jun.

1. Do we currently support this?  The fetch requests always check that the leader is on the broker handling the request before actually reading from the log, so I'm not sure how a read on a follower succeeds?

2.1, 2.2, 2.3, 2.4, 2.5 All valid.  Will upload new patch later this week.

I assume you mean, KAFKA-343?  I've been following it, and yes, it's grown quite large/complex hahaha.  I'll definitely wait for that to get ironed out and committed.
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>         Attachments: KAFKA-376-DRAFT.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-376) expose different data to fetch requests from the follower replicas and consumer clients

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442463#comment-13442463 ] 

Jun Rao commented on KAFKA-376:
-------------------------------

Thanks for patch v3. +1 on the patch. The following are some minor comments. Once they are addressed, the patch can be checked in without another review.

30. KafkaApis:
30.1 availableFetchBytes(): Not all exceptions are no leader exceptions. So, we should log the exception and change the error message to be more general.
30.2 readMessageSets(fetchRequest): The comment about replica id value of -1 can be removed. There is no need to pass in brokerId for the first replicaManager.getReplica call since it defaults to local replica. There is no need to make the second replicaManager.getReplica call for the remote replica since topic and partitionId are available locally. 
30.3 readMessageSet(): Leader's log should always to be available. If not, we should log an error in addition to returning an empty set.

31. SimpleFetchTest: There is no need to mock KafkaZooKeeper any more.

Also, could you create another jira to fix the getOffset api?
                
> expose different data to fetch requests from the follower replicas and consumer clients
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-376
>                 URL: https://issues.apache.org/jira/browse/KAFKA-376
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>              Labels: bugs
>         Attachments: KAFKA-376-DRAFT.patch, KAFKA-376-v1.patch, KAFKA-376-v2.patch, KAFKA-376-v3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the broker always uses highwatermark to calculate the available bytes to a fetch request, no matter where the request is from. Instead, we should use highwatermark for requests coming from real consumer clients and use logendoffset for requests coming from follower replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira