You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/21 00:36:46 UTC
[GitHub] [incubator-hudi] bvaradar opened a new pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
bvaradar opened a new pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542
@lamber-ken : This is something I missed when reviewing cleaner repair code changes. The repair command has a serious bug in that it might delete inflight instants of other actions.
cc @vinothchandar
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] lamber-ken commented on pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#issuecomment-626420797
hi @bvaradar, the unit test `TestCleaner#testCleanPreviousCorruptedCleanFiles` has covered. IMO, `TestRepairsCommand.java` is redundant, WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#discussion_r412703133
##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##########
@@ -147,14 +149,16 @@ public String overwriteHoodieProperties(
public void removeCorruptedPendingCleanAction() {
HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
- HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-
- activeTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
+ HoodieTimeline cleanerTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline().getCleanerTimeline();
Review comment:
oh wow :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] codecov-io commented on pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#issuecomment-626396467
# [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=h1) Report
> Merging [#1542](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/fa6aba751d8de16d9d109a8cfc21150b17b59cff&el=desc) will **decrease** coverage by `0.01%`.
> The diff coverage is `n/a`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1542/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #1542 +/- ##
============================================
- Coverage 71.78% 71.77% -0.02%
Complexity 1087 1087
============================================
Files 385 385
Lines 16575 16575
Branches 1668 1668
============================================
- Hits 11899 11897 -2
- Misses 3947 3949 +2
Partials 729 729
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1542/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
| [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1542/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `76.92% <0.00%> (+0.96%)` | `0.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=footer). Last update [fa6aba7...2c72c50](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on a change in pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#discussion_r412008079
##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##########
@@ -147,14 +149,16 @@ public String overwriteHoodieProperties(
public void removeCorruptedPendingCleanAction() {
HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
- HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-
- activeTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
+ HoodieTimeline cleanerTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline().getCleanerTimeline();
+ LOG.info("Inspecting pending clean metadata in timeline for corrupted files");
+ cleanerTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
try {
CleanerUtils.getCleanerPlan(client, instant);
- } catch (IOException e) {
- LOG.warn("try to remove corrupted instant file: " + instant);
+ } catch (AvroRuntimeException e) {
Review comment:
`AvroRuntimeException` will never be catched. `Not an Avro data file` is en `IOException`.
```
// org.apache.avro.file.DataFileReader#openReader
public static <D> FileReader<D> openReader(SeekableInput in,
DatumReader<D> reader)
throws IOException {
if (in.length() < MAGIC.length)
throw new IOException("Not an Avro data file");
// read magic header
byte[] magic = new byte[MAGIC.length];
in.seek(0);
for (int c = 0; c < magic.length; c = in.read(magic, c, magic.length-c)) {}
in.seek(0);
if (Arrays.equals(MAGIC, magic)) // current format
return new DataFileReader<D>(in, reader);
if (Arrays.equals(DataFileReader12.MAGIC, magic)) // 1.2 format
return new DataFileReader12<D>(in, reader);
throw new IOException("Not an Avro data file");
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on a change in pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#discussion_r422728172
##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##########
@@ -147,14 +149,16 @@ public String overwriteHoodieProperties(
public void removeCorruptedPendingCleanAction() {
HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
- HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-
- activeTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
+ HoodieTimeline cleanerTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline().getCleanerTimeline();
+ LOG.info("Inspecting pending clean metadata in timeline for corrupted files");
+ cleanerTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
try {
CleanerUtils.getCleanerPlan(client, instant);
- } catch (IOException e) {
- LOG.warn("try to remove corrupted instant file: " + instant);
+ } catch (AvroRuntimeException e) {
Review comment:
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] lamber-ken commented on issue #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on issue #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#issuecomment-617608459
> Randomly saw this in test logs..
>
> ```
> 298541 [main] WARN org.apache.hudi.common.HoodieClientTestHarness - Closing file-system instance used in previous test-run
> 298630 [main] WARN org.apache.hudi.table.action.clean.CleanActionExecutor - Failed to perform previous clean operation, instant: [==>000000023__clean__INFLIGHT]
> org.apache.hudi.exception.HoodieIOException: Not an Avro data file
> at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:212)
> at org.apache.hudi.table.action.clean.CleanActionExecutor.lambda$execute$5(CleanActionExecutor.java:261)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:258)
> at org.apache.hudi.table.HoodieCopyOnWriteTable.clean(HoodieCopyOnWriteTable.java:197)
> at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:630)
> at org.apache.hudi.table.TestCleaner.runCleaner(TestCleaner.java:425)
> at org.apache.hudi.table.TestCleaner.runCleaner(TestCleaner.java:414)
> at org.apache.hudi.table.TestCleaner.testCleanPreviousCorruptedCleanFiles(TestCleaner.java:1000)
> ```
>
> (May be totally a false alarm, but just saying we should understand if all these are expected)..
These are expected, in `TestCleaner#testCleanPreviousCorruptedCleanFiles` test case, create a corrupted clean files first, then run cleaner.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
lamber-ken commented on a change in pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#discussion_r412009000
##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##########
@@ -147,14 +149,16 @@ public String overwriteHoodieProperties(
public void removeCorruptedPendingCleanAction() {
HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
- HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-
- activeTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
+ HoodieTimeline cleanerTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline().getCleanerTimeline();
Review comment:
Good catch! `.getCleanerTimeline()` 👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#issuecomment-626396467
# [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=h1) Report
> Merging [#1542](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/fa6aba751d8de16d9d109a8cfc21150b17b59cff&el=desc) will **decrease** coverage by `0.01%`.
> The diff coverage is `n/a`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1542/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #1542 +/- ##
============================================
- Coverage 71.78% 71.77% -0.02%
Complexity 1087 1087
============================================
Files 385 385
Lines 16575 16575
Branches 1668 1668
============================================
- Hits 11899 11897 -2
- Misses 3947 3949 +2
Partials 729 729
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/incubator-hudi/pull/1542/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `0.00% <0.00%> (ø%)` | |
| [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1542/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `76.92% <0.00%> (+0.96%)` | `0.00% <0.00%> (ø%)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=footer). Last update [fa6aba7...2c72c50](https://codecov.io/gh/apache/incubator-hudi/pull/1542?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#discussion_r422684688
##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java
##########
@@ -147,14 +149,16 @@ public String overwriteHoodieProperties(
public void removeCorruptedPendingCleanAction() {
HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
- HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline();
-
- activeTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
+ HoodieTimeline cleanerTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline().getCleanerTimeline();
+ LOG.info("Inspecting pending clean metadata in timeline for corrupted files");
+ cleanerTimeline.filterInflightsAndRequested().getInstants().forEach(instant -> {
try {
CleanerUtils.getCleanerPlan(client, instant);
- } catch (IOException e) {
- LOG.warn("try to remove corrupted instant file: " + instant);
+ } catch (AvroRuntimeException e) {
Review comment:
@lamber-ken : Thanks. I have made changes to specifically look for this message in the exception to detect corruption. Please take a look.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-hudi] vinothchandar commented on issue #1542: [HUDI-820] cleaner repair command should only inspect clean metadata files
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1542:
URL: https://github.com/apache/incubator-hudi/pull/1542#issuecomment-617606373
Randomly saw this in test logs..
```
298541 [main] WARN org.apache.hudi.common.HoodieClientTestHarness - Closing file-system instance used in previous test-run
298630 [main] WARN org.apache.hudi.table.action.clean.CleanActionExecutor - Failed to perform previous clean operation, instant: [==>000000023__clean__INFLIGHT]
org.apache.hudi.exception.HoodieIOException: Not an Avro data file
at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:212)
at org.apache.hudi.table.action.clean.CleanActionExecutor.lambda$execute$5(CleanActionExecutor.java:261)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:258)
at org.apache.hudi.table.HoodieCopyOnWriteTable.clean(HoodieCopyOnWriteTable.java:197)
at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:630)
at org.apache.hudi.table.TestCleaner.runCleaner(TestCleaner.java:425)
at org.apache.hudi.table.TestCleaner.runCleaner(TestCleaner.java:414)
at org.apache.hudi.table.TestCleaner.testCleanPreviousCorruptedCleanFiles(TestCleaner.java:1000)
```
(May be totally a false alarm, but just saying we should understand if all these are expected)..
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org