You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/10 23:16:24 UTC

[GitHub] [hudi] prashantwason opened a new pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

prashantwason opened a new pull request #2565:
URL: https://github.com/apache/hudi/pull/2565


   
   ## What is the purpose of the pull request
   During the bootstrap of the Metadata Table, all the directories which contain the partition metadata directory are assumed to be partitions and are added to the metadata table.
   
   In our HDFS clusters, we have directories like .backup, .temp which are used by various teams for non-hoodie purposes (e.g. .backup may be keeping a snapshot of the dataset). During bootstrap, Metadata Table ends up containing all those paths also as partitions.
   
   In this patch, I would like to introduce a configuration for HoodieMetadataConfig to filter out some directories based on a regular expression string. 
   
   ## Brief change log
   
   Added a config to HoodieMetadataConfig.
   Added a unit test.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   Extended the unit test TestHoodieBackedMetadata#testOnlyValidPartitionsAdded 
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-777120159


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=h1) Report
   > Merging [#2565](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=desc) (6cb664d) into [master](https://codecov.io/gh/apache/hudi/commit/26da4f546275e8ab6496537743efe73510cb723d?el=desc) (26da4f5) will **increase** coverage by `0.11%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2565/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2565      +/-   ##
   ============================================
   + Coverage     50.92%   51.04%   +0.11%     
   - Complexity     3168     3174       +6     
   ============================================
     Files           433      433              
     Lines         19812    19901      +89     
     Branches       2033     2065      +32     
   ============================================
   + Hits          10090    10159      +69     
   - Misses         8902     8919      +17     
   - Partials        820      823       +3     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.90% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.36% <0.00%> (-0.03%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `43.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `70.00% <ø> (+0.49%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...pache/hudi/common/config/HoodieMetadataConfig.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Ib29kaWVNZXRhZGF0YUNvbmZpZy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `73.27% <0.00%> (+2.41%)` | `57.00% <0.00%> (+6.00%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-780972942


   @prashantwason can you take care of @nsivabalan comment and concern, we can merge then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nbalajee commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
nbalajee commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-779996470


   LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash closed pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
n3nash closed pull request #2565:
URL: https://github.com/apache/hudi/pull/2565


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-777120159


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=h1) Report
   > Merging [#2565](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=desc) (9e7e18b) into [master](https://codecov.io/gh/apache/hudi/commit/26da4f546275e8ab6496537743efe73510cb723d?el=desc) (26da4f5) will **decrease** coverage by `41.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2565/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #2565       +/-   ##
   ============================================
   - Coverage     50.92%   9.68%   -41.25%     
   + Complexity     3168      48     -3120     
   ============================================
     Files           433      53      -380     
     Lines         19812    1931    -17881     
     Branches       2033     230     -1803     
   ============================================
   - Hits          10090     187     -9903     
   + Misses         8902    1731     -7171     
   + Partials        820      13      -807     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.84%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [409 more](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#discussion_r578801683



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -326,9 +327,15 @@ private void bootstrapFromFilesystem(HoodieEngineContext engineContext, HoodieTa
       }, listingParallelism);
       pathsToList.clear();
 
+

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-781686174


   > just curious. actual bootstrap should have some config on these lines right? while bootstrapping data to hudi, filter directories based on some predicate. can't we reuse the same?
   > CC @bvaradar @n3nash
   I did not find any such config in HoodieBootstrapConfig. I feel keeping it separate may be better:
   1. so as to not be confusing the two features which work independently (also the configs in bootstrap have a prefix hoodie.bootstrap)
   2. building the HoodieWriteConfig will be awkward with having to provide a HoodieBootstrapConfig when enabling Metadata Table
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash merged pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
n3nash merged pull request #2565:
URL: https://github.com/apache/hudi/pull/2565


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-780556494


   just curious. actual bootstrap should have some config on these lines right? while bootstrapping data to hudi, filter directories based on some predicate. can't we reuse the same? 
   CC @bvaradar @n3nash 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-777120159


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=h1) Report
   > Merging [#2565](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=desc) (6cb664d) into [master](https://codecov.io/gh/apache/hudi/commit/26da4f546275e8ab6496537743efe73510cb723d?el=desc) (26da4f5) will **decrease** coverage by `1.48%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2565/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2565      +/-   ##
   ============================================
   - Coverage     50.92%   49.44%   -1.49%     
   + Complexity     3168     3002     -166     
   ============================================
     Files           433      399      -34     
     Lines         19812    18382    -1430     
     Branches       2033     1848     -185     
   ============================================
   - Hits          10090     9089    -1001     
   + Misses         8902     8569     -333     
   + Partials        820      724      -96     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.90% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.36% <0.00%> (-0.03%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `43.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.46% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...pache/hudi/common/config/HoodieMetadataConfig.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Ib29kaWVNZXRhZGF0YUNvbmZpZy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   | [...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0luY3JlbWVudGFsUmVsYXRpb24uc2NhbGE=) | | | |
   | [...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh) | | | |
   | [...nal/HoodieDataSourceInternalBatchWriteBuilder.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlQnVpbGRlci5qYXZh) | | | |
   | [...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZURlbHRhU3RyZWFtZXJFeGNlcHRpb24uamF2YQ==) | | | |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | | | |
   | [...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=) | | | |
   | [.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=) | | | |
   | [...i/internal/HoodieBulkInsertDataInternalWriter.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0hvb2RpZUJ1bGtJbnNlcnREYXRhSW50ZXJuYWxXcml0ZXIuamF2YQ==) | | | |
   | ... and [26 more](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-777120159


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=h1) Report
   > Merging [#2565](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=desc) (6cb664d) into [master](https://codecov.io/gh/apache/hudi/commit/26da4f546275e8ab6496537743efe73510cb723d?el=desc) (26da4f5) will **decrease** coverage by `1.81%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2565/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2565      +/-   ##
   ============================================
   - Coverage     50.92%   49.10%   -1.82%     
   + Complexity     3168     2819     -349     
   ============================================
     Files           433      378      -55     
     Lines         19812    16948    -2864     
     Branches       2033     1714     -319     
   ============================================
   - Hits          10090     8323    -1767     
   + Misses         8902     7967     -935     
   + Partials        820      658     -162     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.90% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.36% <0.00%> (-0.03%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `43.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.46% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...pache/hudi/common/config/HoodieMetadataConfig.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Ib29kaWVNZXRhZGF0YUNvbmZpZy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | | | |
   | [...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=) | | | |
   | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | | | |
   | [...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh) | | | |
   | [...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=) | | | |
   | [.../src/main/java/org/apache/hudi/dla/util/Utils.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL3V0aWwvVXRpbHMuamF2YQ==) | | | |
   | [...in/scala/org/apache/hudi/HoodieEmptyRelation.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUVtcHR5UmVsYXRpb24uc2NhbGE=) | | | |
   | [...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=) | | | |
   | ... and [47 more](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nbalajee commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
nbalajee commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-779996283


   > @nbalajee Can you review this ?
   
   Reviewed the changes.  LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#discussion_r577606882



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -326,9 +327,15 @@ private void bootstrapFromFilesystem(HoodieEngineContext engineContext, HoodieTa
       }, listingParallelism);
       pathsToList.clear();
 
+

Review comment:
       can we remove the extra line breaks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-779609442


   @nbalajee Can you review this ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2565: [HUDI-1611] Added a configuration to allow specific directories to be filtered out during Metadata Table bootstrap.

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2565:
URL: https://github.com/apache/hudi/pull/2565#issuecomment-777120159


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=h1) Report
   > Merging [#2565](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=desc) (6cb664d) into [master](https://codecov.io/gh/apache/hudi/commit/26da4f546275e8ab6496537743efe73510cb723d?el=desc) (26da4f5) will **decrease** coverage by `0.01%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2565/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2565      +/-   ##
   ============================================
   - Coverage     50.92%   50.91%   -0.02%     
   + Complexity     3168     3167       -1     
   ============================================
     Files           433      433              
     Lines         19812    19816       +4     
     Branches       2033     2034       +1     
   ============================================
   - Hits          10090    10089       -1     
   - Misses         8902     8906       +4     
   - Partials        820      821       +1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.90% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.36% <0.00%> (-0.03%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `43.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.46% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2565?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...pache/hudi/common/config/HoodieMetadataConfig.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Ib29kaWVNZXRhZGF0YUNvbmZpZy5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2565/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org