You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/05 05:25:16 UTC

[GitHub] [incubator-hudi] garyli1019 opened a new pull request #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer

garyli1019 opened a new pull request #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486
 
 
   
   ## What is the purpose of the pull request
   
   Integrate the initial checkpoint provider with delta streamer
   
   ## Brief change log
   
     - Add two options to delta streamer to use the initial checkpoint provider
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - WIP
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406554063
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
 
 Review comment:
   totally understand

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406555030
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   I think this depends on how we design the migration flow for the user.
   What I did myself is I use Spark datasource to do a bulkInsert to convert all the plain parquet files to Hudi format, then the second job I'd like to use delta streamer to read from Kafka. So this initialCheckpointProvider should be the first delta streamer job when switching sources. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405413213
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
 
 Review comment:
   lets change the variable name from hiveConf to conf? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407233504
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   I still think its confusing to have it as hiveConf...  esp when you are actually creating ` HiveConf` from this down below.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/5d717a28f45137bea71dffa31b0ae7ccbf1bda00&el=desc) will **decrease** coverage by `0.07%`.
   > The diff coverage is `78.78%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   - Coverage     72.23%   72.15%   -0.08%     
   - Complexity      289      294       +5     
   ============================================
     Files           338      373      +35     
     Lines         15947    16282     +335     
     Branches       1624     1638      +14     
   ============================================
   + Hits          11519    11748     +229     
   - Misses         3700     3798      +98     
   - Partials        728      736       +8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <50.00%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `89.28% <71.42%> (-3.03%)` | `14.00 <3.00> (+2.00)` | :arrow_down: |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `83.33% <83.33%> (ø)` | `1.00 <1.00> (?)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.93% <85.71%> (-1.22%)` | `11.00 <4.00> (+1.00)` | :arrow_down: |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `72.44% <100.00%> (ø)` | `37.00 <0.00> (ø)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `61.62% <0.00%> (-27.66%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh) | `57.50% <0.00%> (-25.63%)` | `0.00% <0.00%> (ø%)` | |
   | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | `38.46% <0.00%> (-15.39%)` | `0.00% <0.00%> (ø%)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [51 more](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [5d717a2...b923a97](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407273794
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -219,13 +230,13 @@ public static JavaSparkContext buildSparkContext(String appName, String sparkMas
   /**
    * Build Hoodie write client.
    *
-   * @param jsc Java Spark Context
-   * @param basePath Base Path
-   * @param schemaStr Schema
+   * @param jsc         Java Spark Context
 
 Review comment:
   are these from intellij formatting... 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406617783
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
 
 Review comment:
   same as above, it will need a code refactoring if we wanna get rid of null

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405719163
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   I keep `hiveConf` here for better readability. There are two other configuration `cfg` `props` here, so I believe `hiveConf` could be more clear for the user to know what this is for. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407326317
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -219,13 +230,13 @@ public static JavaSparkContext buildSparkContext(String appName, String sparkMas
   /**
    * Build Hoodie write client.
    *
-   * @param jsc Java Spark Context
-   * @param basePath Base Path
-   * @param schemaStr Schema
+   * @param jsc         Java Spark Context
 
 Review comment:
   yes, these might cause by the different settings when I switched machines. I removed all `this` and `final` manually instead of reverting commit, so there are still a few lines changing due to the initial formatting. I can revert those comments as well if it's preferable. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405413639
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -371,7 +379,7 @@ public static void main(String[] args) throws Exception {
      */
     private transient DeltaSync deltaSync;
 
-    public DeltaSyncService(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+    public DeltaSyncService(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   ditto. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609361716
 
 
   @pratyakshsharma @vinothchandar 
   I agree with Vinoth's idea that having those two options in the delta streamer. Is this implementation makes sense to you guys?
   I will add a test case if this approach looks good. 
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/eaf6cc2d90bf27c0d9414a4ea18dbd1b61f58e50&el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   + Coverage     71.54%   71.56%   +0.01%     
   - Complexity      261      262       +1     
   ============================================
     Files           336      337       +1     
     Lines         15744    15755      +11     
     Branches       1610     1611       +1     
   ============================================
   + Hits          11264    11275      +11     
   + Misses         3759     3758       -1     
   - Partials        721      722       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `63.97% <0.00%> (-1.45%)` | `21.00 <0.00> (ø)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.29% <33.33%> (-1.32%)` | `8.00 <0.00> (ø)` | |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `100.00% <100.00%> (ø)` | `1.00 <1.00> (?)` | |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `92.00% <100.00%> (-0.31%)` | `12.00 <1.00> (ø)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `72.22% <0.00%> (+13.88%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [eaf6cc2...5329fc8](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-613162710
 
 
   @vinothchandar Thanks for all the feedback! Very helpful! 
   Comments addressed. Please take a look.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407793452
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/InitialCheckPointProvider.java
 ##########
 @@ -18,14 +18,38 @@
 
 package org.apache.hudi.utilities.checkpointing;
 
+import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.exception.HoodieException;
 
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 /**
  * Provide the initial checkpoint for delta streamer.
  */
-public interface InitialCheckPointProvider {
+public abstract class InitialCheckPointProvider {
+  protected transient Path path;
+  protected transient FileSystem fs;
+  protected transient TypedProperties props;
+
+  static class Config {
+    private static String CHECKPOINT_PROVIDER_PATH_PROP = "hoodie.deltastreamer.checkpoint.provider.path";
+  }
+
+  public InitialCheckPointProvider(TypedProperties props) {
+    this.props = props;
+    this.path = new Path(props.getString(Config.CHECKPOINT_PROVIDER_PATH_PROP));
+  }
+
+  /**
+   * Initialize the class with the current filesystem.
+   *
+   * @param fileSystem
+   */
+  public abstract void init(FileSystem fileSystem);
 
 Review comment:
   Good point. now it's not `hiveConf` any more :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405966361
 
 

 ##########
 File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##########
 @@ -394,6 +394,26 @@ public void testProps() {
         props.getString("hoodie.datasource.write.keygenerator.class"));
   }
 
+  @Test
+  public void testInitialCheckpointProvider() throws IOException {
 
 Review comment:
   testKafkaConnectCheckpointProvider? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405968782
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/InitialCheckPointProvider.java
 ##########
 @@ -20,12 +20,23 @@
 
 import org.apache.hudi.exception.HoodieException;
 
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 /**
  * Provide the initial checkpoint for delta streamer.
  */
-public interface InitialCheckPointProvider {
+public abstract class InitialCheckPointProvider {
+  protected final Path path;
+  protected final FileSystem fs;
+
+  public InitialCheckPointProvider(final Path basePath, final FileSystem fileSystem) {
 
 Review comment:
   it may be worth turning this into  `abstract void init(Path, FileSystem)` method? that way all subclasses are forced to implement the right one.. 
   
   Also can we pass the `props` or the master list of properties (like we do for key generators) into this abstraction and let the CheckpointProvider use an explicit property like `hoodie.deltastreamer.checkpointprovider.kafka.connect.path` to derive the bootstrap path 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/eaf6cc2d90bf27c0d9414a4ea18dbd1b61f58e50&el=desc) will **increase** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   + Coverage     71.54%   71.56%   +0.01%     
   - Complexity      261      262       +1     
   ============================================
     Files           336      337       +1     
     Lines         15744    15755      +11     
     Branches       1610     1611       +1     
   ============================================
   + Hits          11264    11275      +11     
   + Misses         3759     3758       -1     
   - Partials        721      722       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `63.97% <0.00%> (-1.45%)` | `21.00 <0.00> (ø)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.29% <33.33%> (-1.32%)` | `8.00 <0.00> (ø)` | |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `100.00% <100.00%> (ø)` | `1.00 <1.00> (?)` | |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `92.00% <100.00%> (-0.31%)` | `12.00 <1.00> (ø)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `72.22% <0.00%> (+13.88%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [eaf6cc2...5329fc8](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407274109
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/InitialCheckPointProvider.java
 ##########
 @@ -18,14 +18,38 @@
 
 package org.apache.hudi.utilities.checkpointing;
 
+import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.exception.HoodieException;
 
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 /**
  * Provide the initial checkpoint for delta streamer.
  */
-public interface InitialCheckPointProvider {
+public abstract class InitialCheckPointProvider {
+  protected transient Path path;
+  protected transient FileSystem fs;
+  protected transient TypedProperties props;
+
+  static class Config {
+    private static String CHECKPOINT_PROVIDER_PATH_PROP = "hoodie.deltastreamer.checkpoint.provider.path";
+  }
+
+  public InitialCheckPointProvider(TypedProperties props) {
+    this.props = props;
+    this.path = new Path(props.getString(Config.CHECKPOINT_PROVIDER_PATH_PROP));
+  }
+
+  /**
+   * Initialize the class with the current filesystem.
+   *
+   * @param fileSystem
+   */
+  public abstract void init(FileSystem fileSystem);
 
 Review comment:
   would `Configuration` be a better choice here? is more general and the user can construct the filesystem from that? It will also carry other configurations picked up sparkContext say aws creds etc

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407793139
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   I believe the initial checkpoint provider should be just used once when the user wants to switch from one source to another. After that, the delta streamer should be able to get the checkpoint from the previous commit. We can improve this once the bootstrap is ready. At this point, I am not sure how to put everything together if we want one step to handling everything.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406555444
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -293,6 +295,12 @@ public Operation convert(String value) throws ParameterException {
     @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.")
     public String checkpoint = null;
 
+    @Parameter(names = {"--bootstrap-from"}, description = "Initial bootstrap from this path")
 
 Review comment:
   yea, with the `props` file, we can get the path there. Don't need this field anymore

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/5d717a28f45137bea71dffa31b0ae7ccbf1bda00&el=desc) will **decrease** coverage by `0.07%`.
   > The diff coverage is `78.78%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   - Coverage     72.23%   72.15%   -0.08%     
   - Complexity      289      294       +5     
   ============================================
     Files           338      373      +35     
     Lines         15947    16282     +335     
     Branches       1624     1638      +14     
   ============================================
   + Hits          11519    11748     +229     
   - Misses         3700     3798      +98     
   - Partials        728      736       +8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <50.00%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `89.28% <71.42%> (-3.03%)` | `14.00 <3.00> (+2.00)` | :arrow_down: |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `83.33% <83.33%> (ø)` | `1.00 <1.00> (?)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.93% <85.71%> (-1.22%)` | `11.00 <4.00> (+1.00)` | :arrow_down: |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `72.44% <100.00%> (ø)` | `37.00 <0.00> (ø)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `61.62% <0.00%> (-27.66%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh) | `57.50% <0.00%> (-25.63%)` | `0.00% <0.00%> (ø%)` | |
   | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | `38.46% <0.00%> (-15.39%)` | `0.00% <0.00%> (ø%)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [51 more](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [5d717a2...b923a97](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407244940
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   changed to `conf`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406617505
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
 
 Review comment:
   unfortunately, the `null` was served as a flag in many places in the delta streamer, because the command-line tool will produce `null`. If we want to change this, it might need a code refactoring.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407328044
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
 
 Review comment:
   lets file a separate issue for this? :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407233388
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -319,18 +331,18 @@ private static Boolean tableExists(Connection conn, Map<String, String> options)
    * @return
    * @throws Exception
    */
-  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
-    Connection conn = createConnectionFactory(options);
-    String url = options.get(JDBCOptions.JDBC_URL());
-    String table = options.get(JDBCOptions.JDBC_TABLE_NAME());
-    boolean tableExists = tableExists(conn,options);
+  public static Schema getJDBCSchema(final Map<String, String> options) throws Exception {
+    final Connection conn = createConnectionFactory(options);
 
 Review comment:
   I get where you are coming from for `final` but given we are not following this everywhere.. can we please limit this PR to the minimal changes needed for this functionality.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407867948
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   okay.. lets revisit once we have bootstrap support.. cc @bvaradar as fyi 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405966015
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   IIUC setting `cfg.checkpoint` will force use of that timestamp instead of what we normally do - read from the last commit? 
   
   Should we do this only when creating the dataset for the first time.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405412784
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   Let us change the variable name to conf? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/5d717a28f45137bea71dffa31b0ae7ccbf1bda00&el=desc) will **decrease** coverage by `0.07%`.
   > The diff coverage is `78.78%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   - Coverage     72.23%   72.15%   -0.08%     
   - Complexity      289      294       +5     
   ============================================
     Files           338      373      +35     
     Lines         15947    16282     +335     
     Branches       1624     1638      +14     
   ============================================
   + Hits          11519    11748     +229     
   - Misses         3700     3798      +98     
   - Partials        728      736       +8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <50.00%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `89.28% <71.42%> (-3.03%)` | `14.00 <3.00> (+2.00)` | :arrow_down: |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `83.33% <83.33%> (ø)` | `1.00 <1.00> (?)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.93% <85.71%> (-1.22%)` | `11.00 <4.00> (+1.00)` | :arrow_down: |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `72.44% <100.00%> (ø)` | `37.00 <0.00> (ø)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `61.62% <0.00%> (-27.66%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh) | `57.50% <0.00%> (-25.63%)` | `0.00% <0.00%> (ø%)` | |
   | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | `38.46% <0.00%> (-15.39%)` | `0.00% <0.00%> (ø%)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [51 more](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [5d717a2...b923a97](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405413318
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   ditto.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407868180
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -293,6 +296,12 @@ public Operation convert(String value) throws ParameterException {
     @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.")
     public String checkpoint = null;
 
+    @Parameter(names = {"--initial-checkpoint-provider"}, description = "Generate check point for delta streamer "
+        + "for the first run. This field will override the checkpoint of last commit using the checkpoint field. "
+        + "Use this field only when switch source, for example, from DFS source to Kafka Source. Check the class "
+        + "org.apache.hudi.utilities.checkpointing for details")
 
 Review comment:
   `InitialCheckPointProvider` did you intend to write the name of the class here? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405967820
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -293,6 +295,12 @@ public Operation convert(String value) throws ParameterException {
     @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.")
     public String checkpoint = null;
 
+    @Parameter(names = {"--bootstrap-from"}, description = "Initial bootstrap from this path")
 
 Review comment:
   this is a path where the current dataset resides? I can see that we can eventually use this for actually bootstrapping the dataset.. but wondering if for now, the CheckpointProvider just takes a property containing the base path for reading/.computing checkpoints? 
   
   i.e we can remove `--bootstrap-from`.. it almost sounds like we are actually bootstrapping the data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407274513
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
+      InitialCheckPointProvider checkPointProvider =
+          UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new Path(cfg.bootstrapFromPath), fs);
+      cfg.checkpoint = checkPointProvider.getCheckpoint();
 
 Review comment:
   yes.. do you think if we made it such that even if someone runs delta streamer few times after initial bootstrap, the initial checkpoint provider is used just once?  otherwise, you need to scramble to stop the delta streamer after the first run or manually run it by hand once before scheduling it using airflow or deploying in --continuous mode?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406553968
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   This config was only used by hive sync, if the user don't config this it will use the default Hadoop config. I don't have a strong opinion here and ok to go with `conf`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407273922
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -219,13 +230,13 @@ public static JavaSparkContext buildSparkContext(String appName, String sparkMas
   /**
    * Build Hoodie write client.
    *
-   * @param jsc Java Spark Context
-   * @param basePath Base Path
-   * @param schemaStr Schema
+   * @param jsc         Java Spark Context
+   * @param basePath    Base Path
+   * @param schemaStr   Schema
    * @param parallelism Parallelism
    */
   public static HoodieWriteClient createHoodieClient(JavaSparkContext jsc, String basePath, String schemaStr,
-      int parallelism, Option<String> compactionStrategyClass, TypedProperties properties) {
+                                                     int parallelism, Option<String> compactionStrategyClass, TypedProperties properties) {
 
 Review comment:
   I think there is a setting not to format the lines untouched.. may be we can avoid all these whitespace changes that way.. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407875123
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -293,6 +296,12 @@ public Operation convert(String value) throws ParameterException {
     @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.")
     public String checkpoint = null;
 
+    @Parameter(names = {"--initial-checkpoint-provider"}, description = "Generate check point for delta streamer "
+        + "for the first run. This field will override the checkpoint of last commit using the checkpoint field. "
+        + "Use this field only when switch source, for example, from DFS source to Kafka Source. Check the class "
+        + "org.apache.hudi.utilities.checkpointing for details")
 
 Review comment:
   Changed the description to match with `--schemaprovider-class`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-610701882
 
 
   Add https://github.com/apache/incubator-hudi/pull/1493 into this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1486: WIP[HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609480755
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405958980
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
 
 Review comment:
   same here. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-612171042
 
 
   Addressed some comments, summary:
   
   - Removed `--bootstrap-from` option in the delta streamer. Use `hoodie.deltastreamer.checkpoint.provider.path` field in the props instead. 
   - Use TypedProperty to construct the `InitialCheckPointProvider` and `init(FileSystem fs)` to initialize the class
   - Keep `hiveConf` as the variable name even change `HiveConf` type to `Configuration`. Open to discussion if you guys don't agree.
   - Not able to replace all `null` in this PR because `null` was served as a flag in the delta streamer workflow, this might change the behavior of other classes using the `TypedProperty` field. Will need a separate PR to do the code refactoring. 
   - The style check tool automatically adds `final` and `this` to match the stylecheck.xml. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407791400
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
 
 Review comment:
   yes https://issues.apache.org/jira/browse/HUDI-791

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407274259
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
 
 Review comment:
   got it.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar merged pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406617865
 
 

 ##########
 File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##########
 @@ -394,6 +394,26 @@ public void testProps() {
         props.getString("hoodie.datasource.write.keygenerator.class"));
   }
 
+  @Test
+  public void testInitialCheckpointProvider() throws IOException {
 
 Review comment:
   done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406589335
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   ok...After a closer look, I'd still vote for naming this as `hiveConf`. There are `cfg` `props` in `HoodieDeltaStreamer` and `DeltaSync`. If we name it conf, I personally will get confused while reading the code. This `hiveConf` is specifically for Hive Sync, even it's not `HiveConf` type anymore, but `hiveConf` will still represent what it does? WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407872338
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -293,6 +296,12 @@ public Operation convert(String value) throws ParameterException {
     @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.")
     public String checkpoint = null;
 
+    @Parameter(names = {"--initial-checkpoint-provider"}, description = "Generate check point for delta streamer "
+        + "for the first run. This field will override the checkpoint of last commit using the checkpoint field. "
+        + "Use this field only when switch source, for example, from DFS source to Kafka Source. Check the class "
+        + "org.apache.hudi.utilities.checkpointing for details")
 
 Review comment:
   Do you mean we should add `InitialCheckPointProvider` here or we should remove this description? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407274239
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -130,7 +130,7 @@
   /**
    * Hive Config.
    */
-  private transient HiveConf hiveConf;
+  private transient Configuration conf;
 
 Review comment:
   :). thank you. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r406552799
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/InitialCheckPointProvider.java
 ##########
 @@ -20,12 +20,23 @@
 
 import org.apache.hudi.exception.HoodieException;
 
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 /**
  * Provide the initial checkpoint for delta streamer.
  */
-public interface InitialCheckPointProvider {
+public abstract class InitialCheckPointProvider {
+  protected final Path path;
+  protected final FileSystem fs;
+
+  public InitialCheckPointProvider(final Path basePath, final FileSystem fileSystem) {
 
 Review comment:
   Adding `props` is a good idea. It will add more flexibility for other providers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405957805
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -153,7 +153,7 @@
   private transient HoodieWriteClient writeClient;
 
   public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, SchemaProvider schemaProvider,
-                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
+                   TypedProperties props, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
 
 Review comment:
   but it can be misleading for e.g if user does not even configure hive sync right? I also prefer `conf`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407328188
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -219,13 +230,13 @@ public static JavaSparkContext buildSparkContext(String appName, String sparkMas
   /**
    * Build Hoodie write client.
    *
-   * @param jsc Java Spark Context
-   * @param basePath Base Path
-   * @param schemaStr Schema
+   * @param jsc         Java Spark Context
 
 Review comment:
   Thanks for doing it.. looks okay for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/eaf6cc2d90bf27c0d9414a4ea18dbd1b61f58e50&el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `80.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   + Coverage     71.54%   71.59%   +0.04%     
   - Complexity      261      265       +4     
   ============================================
     Files           336      337       +1     
     Lines         15744    15756      +12     
     Branches       1610     1611       +1     
   ============================================
   + Hits          11264    11280      +16     
   + Misses         3759     3754       -5     
   - Partials        721      722       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <33.33%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `78.84% <85.71%> (+0.23%)` | `10.00 <1.00> (+2.00)` | |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `100.00% <100.00%> (ø)` | `1.00 <1.00> (?)` | |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `92.00% <100.00%> (-0.31%)` | `12.00 <1.00> (ø)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `72.22% <0.00%> (+13.88%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [eaf6cc2...c904218](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/c0f96e072650d39433929c6efe3bc0b2cd882a39&el=desc) will **decrease** coverage by `0.08%`.
   > The diff coverage is `78.83%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   - Coverage     72.25%   72.16%   -0.09%     
   - Complexity      289      293       +4     
   ============================================
     Files           338      339       +1     
     Lines         15946    15956      +10     
     Branches       1624     1625       +1     
   ============================================
   - Hits          11521    11515       -6     
   - Misses         3697     3712      +15     
   - Partials        728      729       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <72.22%> (-0.71%)` | `22.00 <12.00> (+1.00)` | :arrow_down: |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `80.00% <80.00%> (ø)` | `1.00 <1.00> (?)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.93% <80.23%> (-1.22%)` | `11.00 <7.00> (+1.00)` | :arrow_down: |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `92.30% <88.88%> (ø)` | `13.00 <6.00> (+1.00)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `72.44% <100.00%> (ø)` | `37.00 <0.00> (ø)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [c0f96e0...639b972](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405958797
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
 
 Review comment:
   can we not pass `null` as a sentinel.. its risky business.. would an empty properties/map work?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609493045
 
 
   Test added. Thanks for the review

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/eaf6cc2d90bf27c0d9414a4ea18dbd1b61f58e50&el=desc) will **increase** coverage by `0.04%`.
   > The diff coverage is `80.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   + Coverage     71.54%   71.59%   +0.04%     
   - Complexity      261      265       +4     
   ============================================
     Files           336      337       +1     
     Lines         15744    15756      +12     
     Branches       1610     1611       +1     
   ============================================
   + Hits          11264    11280      +16     
   + Misses         3759     3754       -5     
   - Partials        721      722       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <33.33%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `78.84% <85.71%> (+0.23%)` | `10.00 <1.00> (+2.00)` | |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `100.00% <100.00%> (ø)` | `1.00 <1.00> (?)` | |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `92.00% <100.00%> (-0.31%)` | `12.00 <1.00> (ø)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `72.22% <0.00%> (+13.88%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [eaf6cc2...c904218](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #1486: [HUDI-759] Integrate checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-609364046
 
 
   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=h1) Report
   > Merging [#1486](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/5d717a28f45137bea71dffa31b0ae7ccbf1bda00&el=desc) will **decrease** coverage by `0.07%`.
   > The diff coverage is `78.78%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1486/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1486      +/-   ##
   ============================================
   - Coverage     72.23%   72.15%   -0.08%     
   - Complexity      289      294       +5     
   ============================================
     Files           338      373      +35     
     Lines         15947    16282     +335     
     Branches       1624     1638      +14     
   ============================================
   + Hits          11519    11748     +229     
   - Misses         3700     3798      +98     
   - Partials        728      736       +8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <50.00%> (-0.71%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `89.28% <71.42%> (-3.03%)` | `14.00 <3.00> (+2.00)` | :arrow_down: |
   | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `83.33% <83.33%> (ø)` | `1.00 <1.00> (?)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `77.93% <85.71%> (-1.22%)` | `11.00 <4.00> (+1.00)` | :arrow_down: |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `72.44% <100.00%> (ø)` | `37.00 <0.00> (ø)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `40.00% <0.00%> (-40.00%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh) | `61.62% <0.00%> (-27.66%)` | `0.00% <0.00%> (ø%)` | |
   | [.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh) | `57.50% <0.00%> (-25.63%)` | `0.00% <0.00%> (ø%)` | |
   | [...hudi/common/fs/inline/InLineFsDataInputStream.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGc0RhdGFJbnB1dFN0cmVhbS5qYXZh) | `38.46% <0.00%> (-15.39%)` | `0.00% <0.00%> (ø%)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `58.33% <0.00%> (-13.89%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [51 more](https://codecov.io/gh/apache/incubator-hudi/pull/1486/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=footer). Last update [5d717a2...b923a97](https://codecov.io/gh/apache/incubator-hudi/pull/1486?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r405959193
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##########
 @@ -90,35 +90,33 @@
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()));
+        jssc.hadoopConfiguration(), null);
   }
 
   public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, TypedProperties props) throws IOException {
     this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath, jssc.hadoopConfiguration()),
-        getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+        jssc.hadoopConfiguration(), props);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf,
-                             TypedProperties properties) throws IOException {
-    this.cfg = cfg;
-    this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf, properties);
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf) throws IOException {
+    this(cfg, jssc, fs, hiveConf, null);
   }
 
-  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, HiveConf hiveConf) throws IOException {
+  public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs, Configuration hiveConf,
+                             TypedProperties properties) throws IOException {
+    if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null && cfg.checkpoint == null) {
 
 Review comment:
   let's use Option instead of nulls? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-611897189
 
 
   I added the save action and checkstyle as documented, not sure which one triggered all those `final` and `this`. Is that ok to add those? As a scala programmer I do prefer to use `val` all the time and the Google Java Guide does encourage using final, but I am not sure what is the preference of Hudi. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-611891832
 
 
   hmm... Looks like the checkstyle auto-fix something...Let me see what's going...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services