You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Parth-Brahmbhatt <gi...@git.apache.org> on 2015/01/20 20:36:38 UTC

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

GitHub user Parth-Brahmbhatt opened a pull request:

    https://github.com/apache/storm/pull/387

    Storm 631: refactoring kafka connector code.

    This also covers STORM-590. I could not come up with a better name for partitionCoordinator which is more like PartitionManagerCache but it also handles Task to PartitionManager Mapping so any suggestions are welcome. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Parth-Brahmbhatt/incubator-storm STORM-631

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #387
    
----
commit 9fc51c15565adb9a560f9612f616c2125b4e165c
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-01-20T06:38:10Z

    STORM-631: Refactoring storm-kafka connector.

commit 12eb96fc666a88d4906ce5b9215b6514858a58ce
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-01-20T19:15:42Z

    Merge remote-tracking branch 'upstream/master'

commit 1f732e895bbe1253139f93d9a32746057b234b2b
Author: Parth Brahmbhatt <br...@gmail.com>
Date:   2015-01-20T19:25:45Z

    STORM-631: Remvoing unintedend test files.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt closed the pull request at:

    https://github.com/apache/storm/pull/387


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72536213
  
    @ptgoetz  Makes sense.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by wurstmeister <gi...@git.apache.org>.
Github user wurstmeister commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72496938
  
    Hi, 
    
    i've created a backwards compatible alternative focused on using the Kafka API for reading partition information in https://github.com/apache/storm/pull/407. Please let me know what you think.
    
    Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by ptgoetz <gi...@git.apache.org>.
Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72551762
  
    STORM-650 created. Let's try to focus discussion/efforts there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-167640073
  
    closing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-112220073
  
    @Parth-Brahmbhatt @wurstmeister will this supersede STORM-590? ([pull request](https://github.com/apache/storm/pull/407))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-112223772
  
    I think we should be able to merge STORM-590 for now given that change is backward compatible and atleast allows users to move on to Kafka broker APIs. Personally I just want to re-write the whole kafka connector code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-75582274
  
    Do we want to close this pull request for the time being and re-open after discussion in STORM-650?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by ogorun <gi...@git.apache.org>.
Github user ogorun commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72505185
  
    Hi @Parth-Brahmbhatt ,
    
    Thank you for pointing me to this pull request. I see that my pull request  https://github.com/apache/storm/pull/406 shares part of ideas with your branch  Storm 631 and think it would be nice to join the ideas in one place.
    The main point of intersection is isolating failure-related functionality in dedicated class.
    
    * The difference in my approach is that I see this class plugable since logic of failure handling can differ from topology to topology. For example, I implemented case when number of retries per message is restricted.  Of course, reasonable default should be provided.  If it is pluggable, the question is how to choose and configure it. I think two main possibilities are: either with extending of KafkaConfig class, or introducing registry that stores appropriate factory. Last  approach can be better in case we speak about choosing approach per topology that works with several KafkaSpouts, listening  to different topics.
    
    * One more (minor) point. I would move all fail() event handling to failure handler. Your implementation divides this logic to common part located in PartitionManager class and specific one - in failure handler. I don't see conceptual cause for this division. There is a technical issue related to access level of members in PartitionManager, but this might be changed by slight PartitionManager refactoring.
    
    I'd be glad to see your feedback to these points


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by wurstmeister <gi...@git.apache.org>.
Github user wurstmeister commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72336141
  
    Hi 
    
    thanks for starting this refactoring, i've had this on my list of things to do but unfortunately have not had the time to do it.
    
    A few suggestions:
    
    It would be nice to break up this change (maybe we can even break up the jira):
    - ITuple refactoring & renaming of packages 
    - addition of API based PartitionCoordinator 
    
    This makes it easier to review.
    
    Also do we really have to make this a breaking change? Could we not just have another implementation of ```IBrokerReader``` and ```PartitionCoordinator```, which are fairly clean interfaces. I’ve definitely found the static partition configuration helpful when debugging.
    
    The use of the marker interface is fairly localised (```KafkaUtils.makeBrokerReader``` and ```KafkaSpout.open```) so I don't think it's too big a problem.
    
    If we want to get away from the ```BrokerHosts``` marker interface we could add something more explicit to the ```KafkaConfig``` (e.g. ```config.partitionSource()``` which could return STATIC/ZK/KAFKA) - I think this is more a question of style rather than complexity. 
    
    Why did you remove the configuration for the batch coordinator in trident? I’ve found that quite useful when debugging. I also think that mixing trident and non trident configuration can be a bit confusing, some parameters, e.g. ```stateUpdateIntervalMs``` do not have a meaning in trident.
    
    Thanks again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72542849
  
    @ptgoetz  makes sense. STORM-631 aiming to be a umbrella jira. Creating another jira is fine too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by d2r <gi...@git.apache.org>.
Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-167632432
  
    @Parth-Brahmbhatt should we close this pull request for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72349915
  
    Thanks for the suggestions, I have tried to address all the concerns below.
    * I can create multiple jiras and separate the refactoring in two parts as you suggested.
    * I did not remove PartitionCoordinator, just moved it to a package called partition where I moved all the partition management stuff. 
    * I removed IBrokerReader because I felt the recommended way to get the kafka broker host is the only way we should support, like a final method that can not be overridden. The users of this connector should not care how does the connector get internal kafka information like a topics' broker->partition mapping. If others are of the opinion that we should make this extensible I can keep the interface around I personally vote against doing this. 
    * I have worked on few kafka connector issues and everytime I read the code the marker interfaces and alternate configs were actually the things that made reading/understanding the code harder. I could pass an instance of tridentConfig to non trident spout and nothing would work and no compile time errors/warnings will be issued. In my opinion we should not support 3 ways to get internal kafka details,  this does not just make the code harder to read but even the documentation to use the connector gets confusing with so many different ways to achieve something that should be internal to the connector code. 
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by ptgoetz <gi...@git.apache.org>.
Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72533983
  
    @Parth-Brahmbhatt, @wurstmeister, @ogorun, @harshach:
    
    Forgive me for hijacking this pull request discussion, but it seems like we have a number of efforts/JIRAs trying to address multiple points (refactoring, usability, error handling, etc.) with a fair amount overlap.  In the interest of improved communication and collaboration I'd like to create a parent/umbrella JIRA that would allow us to discuss and coordinate such efforts in a central location. Creating a feature branch in the main Storm repo is also an option if it would make collaboration easier/less painful.
    
    I think the first order of business is to define (at a high level) everything we hope to accomplish, then come up with a plan for implementation.
    
    If we're all in agreement, I'll go ahead and create the parent JIRA and get the ball rolling. How does that sound?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.
Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-70728873
  
    I also tested the changes against kafka 0.8.2 release candidate and was able to verify that bolts/spouts/trident states/emitter are able to read and write from kafka topics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm 631: refactoring kafka connector code.

Posted by wurstmeister <gi...@git.apache.org>.
Github user wurstmeister commented on the pull request:

    https://github.com/apache/storm/pull/387#issuecomment-72537785
  
    sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---