You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by richard zhou <ri...@intel.com> on 2015/07/10 10:52:29 UTC

Re: Review Request 35591: Sqoop-1853: Kafka connector supporting FROM direction

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35591/
-----------------------------------------------------------

(Updated July 10, 2015, 8:52 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

This is an initial patch. It may not work currently. Please check whether this design of framework is fine or not.


Diffs (updated)
-----

  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConnector.java e3eafe87f7376031d79fec76739fd9e648df872b 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromDestroyer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromInitializer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaLoader.java ad9ec34755e36985078e880a0a076c46337ef411 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartition.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaUtils.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfig.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfiguration.java PRE-CREATION 

Diff: https://reviews.apache.org/r/35591/diff/


Testing
-------

Passed compile, but it still needs to be tested in the real cluster.


Thanks,

richard zhou


Re: Review Request 35591: Sqoop-1853: Kafka connector supporting FROM direction

Posted by richard zhou <ri...@intel.com>.

On Aug. 11, 2015, 2:09 p.m., richard zhou wrote:
> > Thanks for submitting a patch for this! I know my review has a lot of comments, but I wanted to put all of my thoughts out there. Feel free to discuss any of them.

Hey, Grant Henke, I real appreciate that you have left so many comments, which are very valuable. I will update the patch according to your comments soon.


- richard


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35591/#review94911
-----------------------------------------------------------


On Aug. 4, 2015, 8:25 a.m., richard zhou wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35591/
> -----------------------------------------------------------
> 
> (Updated Aug. 4, 2015, 8:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> This is an initial patch. It may not work currently. Please check whether this design of framework is fine or not.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/error/code/KafkaConnectorErrors.java 436b852b0c69258753d7f09c3411bc19d16957a3 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConnector.java e3eafe87f7376031d79fec76739fd9e648df872b 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConstants.java 9d3877db4ab53b09aec34511be6a29cc28611f63 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromDestroyer.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromInitializer.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartition.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfig.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfiguration.java PRE-CREATION 
>   connector/connector-kafka/src/main/resources/kafka-connector-config.properties d3e1e6f3f0de8bae60ac22bbf718dddd8330362a 
> 
> Diff: https://reviews.apache.org/r/35591/diff/
> 
> 
> Testing
> -------
> 
> Passed compile, but it still needs to be tested in the real cluster.
> 
> 
> Thanks,
> 
> richard zhou
> 
>


Re: Review Request 35591: Sqoop-1853: Kafka connector supporting FROM direction

Posted by Grant Henke <gr...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35591/#review94911
-----------------------------------------------------------


Can you add documentation to the existing Kafka Connector documentation in docs/src/site/sphinx/Connectors.rst?


connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConstants.java (lines 42 - 46)
<https://reviews.apache.org/r/35591/#comment149597>

    Are these configurable by the end user? Should the y be?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (lines 78 - 79)
<https://reviews.apache.org/r/35591/#comment149593>

    The broker list (in other kafka usages) usually takes a list like host1:port1,host2:port2 where at least one broker will be available. This assumes only 1 broker.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 83)
<https://reviews.apache.org/r/35591/#comment149602>

    This only allows all brokers to be on 1 port. Each broker can have a different port. Which is fairly common is psuedo-distributed test environments.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 86)
<https://reviews.apache.org/r/35591/#comment149594>

    This behavior would cause very unexpected behavior for a user with any topic that has more than 1 partition as it would only read the first and ignore all other data. I dont think a connector that can only read 1 hardcoded partition is usefull. The default should be reading all partitions. You should be able to use the KafkaPartition passed in by the Extractor to pick which partition you are consuming. Though I am not well versed in Sqoop2 that appears to be what the partitioners are for.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 98)
<https://reviews.apache.org/r/35591/#comment149595>

    Unfortunatly in the next realease of Kafka (Released October-ish) there will be a new consumer. It may be worth tracking that now instead of adding the connector on the old consumer that will no longer be used going forward. It would help the test & design effort of the new consumer as well. That may not be an option, but I figure I would suggest it.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 100)
<https://reviews.apache.org/r/35591/#comment149608>

    This implies we will only read an entire topic each time. Even if we don't support custom offsets. Supporting commiting offsets to Kafka would be nice and give you incrimental loads. If you look at ConsoleConsumer in Kafka it offers options like: --from-beginning, --delete-consumer-offsets (which is like saying --from-end), and picks up where you left off by default.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 116)
<https://reviews.apache.org/r/35591/#comment149596>

    DEFAULT_RETRY_TIME is a confusing name. Seams like its some form of MAX_ERRORS. Though I am not sure failing after X errors makes sense. I suspect either the error is unrecoverable or we can safely recover.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 119)
<https://reviews.apache.org/r/35591/#comment149598>

    This could indicate some issue in Kafka. In many cases its too low of a retention time configured for the topic and we may be loosing data. At a minimum we should warn log that this is occuring.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 124)
<https://reviews.apache.org/r/35591/#comment149599>

    Should this only occur for certain errors? ex: NotLeaderForPartitionCode, LeaderNotAvailableCode, BrokerNotAvailableCode



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 137)
<https://reviews.apache.org/r/35591/#comment149605>

    When/how does this happen?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 146)
<https://reviews.apache.org/r/35591/#comment149615>

    I am not sure about sqoop2 Serde support, but does this imply we only support topics that contain strings? What if it contains other formats?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 148)
<https://reviews.apache.org/r/35591/#comment149606>

    KAFKA_CONNECTOR_0004's error message is really generic



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 154)
<https://reviews.apache.org/r/35591/#comment149607>

    Would this do anything since in this case hasData = false and the while loop would kick out?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 178)
<https://reviews.apache.org/r/35591/#comment149610>

    Should this really return 0? What kind of errors can happen?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 182)
<https://reviews.apache.org/r/35591/#comment149611>

    When can this happen? should it return 0?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 189)
<https://reviews.apache.org/r/35591/#comment149603>

    3 seams like an arbitrary number. At least name it in a variable to convery its purpose.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 190)
<https://reviews.apache.org/r/35591/#comment149613>

    It looks like m_replicaBrokers is not needed or used. I am not a fan of mutating passed arguments. It makes state hard to track.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 200)
<https://reviews.apache.org/r/35591/#comment149612>

    Should at least do something here, or don't catch it.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 203)
<https://reviews.apache.org/r/35591/#comment149604>

    Broker failure is not the only reason for leader change. It could simply be a leader election.



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 204)
<https://reviews.apache.org/r/35591/#comment149616>

    Should this return null? The error message above says "Exiting".



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 207)
<https://reviews.apache.org/r/35591/#comment149600>

    Should this be findPartitionMetadata?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java (line 244)
<https://reviews.apache.org/r/35591/#comment149601>

    perhaps createSimpleConsumer since a makes a new one?



connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java (lines 42 - 46)
<https://reviews.apache.org/r/35591/#comment149614>

    I think this should be where you lookup all partitions in a topic and return them for use in the Extractor.


Thanks for submitting a patch for this! I know my review has a lot of comments, but I wanted to put all of my thoughts out there. Feel free to discuss any of them.

- Grant Henke


On Aug. 4, 2015, 8:25 a.m., richard zhou wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35591/
> -----------------------------------------------------------
> 
> (Updated Aug. 4, 2015, 8:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> This is an initial patch. It may not work currently. Please check whether this design of framework is fine or not.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/error/code/KafkaConnectorErrors.java 436b852b0c69258753d7f09c3411bc19d16957a3 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConnector.java e3eafe87f7376031d79fec76739fd9e648df872b 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConstants.java 9d3877db4ab53b09aec34511be6a29cc28611f63 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromDestroyer.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromInitializer.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartition.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfig.java PRE-CREATION 
>   connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfiguration.java PRE-CREATION 
>   connector/connector-kafka/src/main/resources/kafka-connector-config.properties d3e1e6f3f0de8bae60ac22bbf718dddd8330362a 
> 
> Diff: https://reviews.apache.org/r/35591/diff/
> 
> 
> Testing
> -------
> 
> Passed compile, but it still needs to be tested in the real cluster.
> 
> 
> Thanks,
> 
> richard zhou
> 
>


Re: Review Request 35591: Sqoop-1853: Kafka connector supporting FROM direction

Posted by richard zhou <ri...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35591/
-----------------------------------------------------------

(Updated Aug. 4, 2015, 8:25 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

This is an initial patch. It may not work currently. Please check whether this design of framework is fine or not.


Diffs (updated)
-----

  common/src/main/java/org/apache/sqoop/error/code/KafkaConnectorErrors.java 436b852b0c69258753d7f09c3411bc19d16957a3 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConnector.java e3eafe87f7376031d79fec76739fd9e648df872b 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConstants.java 9d3877db4ab53b09aec34511be6a29cc28611f63 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromDestroyer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromInitializer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartition.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfig.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfiguration.java PRE-CREATION 
  connector/connector-kafka/src/main/resources/kafka-connector-config.properties d3e1e6f3f0de8bae60ac22bbf718dddd8330362a 

Diff: https://reviews.apache.org/r/35591/diff/


Testing
-------

Passed compile, but it still needs to be tested in the real cluster.


Thanks,

richard zhou


Re: Review Request 35591: Sqoop-1853: Kafka connector supporting FROM direction

Posted by richard zhou <ri...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35591/
-----------------------------------------------------------

(Updated July 15, 2015, 7:41 a.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

This is an initial patch. It may not work currently. Please check whether this design of framework is fine or not.


Diffs (updated)
-----

  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConnector.java e3eafe87f7376031d79fec76739fd9e648df872b 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaConstants.java 9d3877db4ab53b09aec34511be6a29cc28611f63 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaExtractor.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromDestroyer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaFromInitializer.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaLoader.java ad9ec34755e36985078e880a0a076c46337ef411 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartition.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaPartitioner.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/KafkaUtils.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfig.java PRE-CREATION 
  connector/connector-kafka/src/main/java/org/apache/sqoop/connector/kafka/configuration/FromJobConfiguration.java PRE-CREATION 
  connector/connector-kafka/src/main/resources/kafka-connector-config.properties d3e1e6f3f0de8bae60ac22bbf718dddd8330362a 

Diff: https://reviews.apache.org/r/35591/diff/


Testing
-------

Passed compile, but it still needs to be tested in the real cluster.


Thanks,

richard zhou