You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "CTTY (via GitHub)" <gi...@apache.org> on 2023/03/15 04:32:58 UTC

[GitHub] [hudi] CTTY opened a new pull request, #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

CTTY opened a new pull request, #8190:
URL: https://github.com/apache/hudi/pull/8190

   ### Change Logs
   
   Fixed a potential serialization issue when Hudi is running on FileSystem implementation whose FileStatus is not serializable. 
   
   ### Impact
   
   no impact
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1469318467

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1193222415


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+
+/**
+ * A serializable wrapper for FileStatus.
+ * <p>
+ * Hadoop 2.x FileStatus does not implement Serializable and can cause issues. (HUDI-5936)
+ * This class is supposed to make sure FileStatus can be safely serialized by wrapping FileStatus
+ * with it, and it should be only used when we absolutely need to serialize FileStatus.
+ */
+public class HoodieSerializableFileStatus extends FileStatus implements Serializable {
+
+  Path path;
+  long length;
+  boolean isDirectory;
+  short blockReplication;
+  long blockSize;

Review Comment:
   Good point, I just updated it to reuse `HoodieFileStatus`. It works well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1667186889

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19157",
       "triggerID" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 6304fd2f1a14e2918839fccd93f9068d9bc6e2d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496) 
   * 3691e528a97dbc0487a517da8b76a76856af9a58 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19157) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1182135994


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Did you send the feedback to EMR technical support?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1543473706

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 687ea9631daa66ba4f0144146b2c73636891bb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928) 
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1538303538

   Can you rebase with the latest master and re-trigger the test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1326627251


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -169,9 +170,9 @@ private List<String> getPartitionPathWithPathPrefixUsingFilterExpression(String
 
       // List all directories in parallel
       engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all partitions with prefix " + relativePathPrefix);
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {

Review Comment:
   @CTTY looks like in the latest master, we no longer return `FileStatus` here (the `Path` instances are used instead).  Is this PR still needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1469478065

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1190814040


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   I just noticed that we already have a AVRO spec `HoodieFileStatus`, can we use that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1537673187

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) 
   * 687ea9631daa66ba4f0144146b2c73636891bb04 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1185709070


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+
+import java.io.IOException;
+import java.io.Serializable;
+
+public class HoodieSerializableFileStatus extends FileStatus implements Serializable {
+  FileStatus status;

Review Comment:
   Can we add some doc for it, can we reuse this clazz for Spark ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1548213275

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * afa9e28856dbdf05addf7f6a83db1a2046eac193 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1183184527


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Yes, I'm from EMR and we reviewed EMRFS with its owner. It would be very tricky to fix this on FS level and it would make more sense to fix this within Hudi to make sure objects are serializable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1139186361


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Sorry for the confusion, `FileStatus` in Hadoop2 is not serializable. And for some customized FS implementations like EMRFS it could also be an issue if they are not aware of serialization issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1190954590


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+
+import java.io.Serializable;
+
+/**
+ * A serializable wrapper for FileStatus.
+ * <p>
+ * Hadoop 2.x FileStatus does not implement Serializable and can cause issues. (HUDI-5936)
+ * This class is supposed to make sure FileStatus can be safely serialized by wrapping FileStatus
+ * with it, and it should be only used when we absolutely need to serialize FileStatus.
+ */
+public class HoodieSerializableFileStatus extends FileStatus implements Serializable {
+
+  Path path;
+  long length;
+  boolean isDirectory;
+  short blockReplication;
+  long blockSize;

Review Comment:
   There is a `HoodieFileStatus`, can we use that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1548074042

   @CTTY You need to rebase with the latest master code to trigger the Azure CI, there had been some changes on the Azure CI conf files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1184548905


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Make sense, do you think there is need to impl a custom SE/DE for the file status or just use the Java default serialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1181475001


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   > java.util.ConcurrentModificationException
   
   It seems a thread safety issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1585445444

   > @CTTY It looks like Hive query fails in the bundle validation due to: `org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/avro/AvroMissingFieldException`. Could you take a look? This is likely because `hudi-hadoop-mr-bundle` relies on Avro 1.8 which does not have the class, while other modules can be compiled with a higher Avro version. And this PR uses Avro to serialize the file status.
   
   Thanks for pointing this out. 
   
   I guess we still need a new `SerializableFileStatus` so we don't have to depend on Avro-generated `HoodieFileStatus` if `hudi-hadoop-mr-bundle` has to use `hive.avro.version` which is 1.8.2. I'll try to add it back later.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1182107824


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   This happens when users use EMR to interact with S3 bucket with CSE enabled. It triggers some classes within EMRFS that are not serializable. But kyro tries to serialize it anyway thus this exception was thrown. See a similiar case [here](https://github.com/EsotericSoftware/kryo/issues/449) 
   
   This scenerio would apply to other FSs come with non-serializable classes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1184471426


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   There are too many code snippets that expose the `FileStatus` interface in Hudi, are all the codes need to be fixed? That would be a very tricky change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528868341

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) 
   * 7c71b63797be01ee91268c2520f82b18b3f13b7c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1548375291

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * afa9e28856dbdf05addf7f6a83db1a2046eac193 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091) 
   * 5c380789c304ea58597e0c848d068035777e25d3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1547030431

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1569293665

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483) 
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 6304fd2f1a14e2918839fccd93f9068d9bc6e2d0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1569069350

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1543662318

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1569232461

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483) 
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1537649378

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) 
   * 687ea9631daa66ba4f0144146b2c73636891bb04 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1138139043


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   What kind of `FileSystem` is not serializable ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1469322774

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1667219831

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19157",
       "triggerID" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 3691e528a97dbc0487a517da8b76a76856af9a58 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19157) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua closed pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable
URL: https://github.com/apache/hudi/pull/8190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1183184527


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Yes, I'm from EMR and we reviewed EMRFS with their owner. It would be very tricky to fix this on FS level and it would make more sense to fix this within Hudi to make sure objects are serializable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1185501702


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Java default serialization here should be sufficient. But could you help me learn what's the benefit of having custom SE/DE here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1185699362


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   I see. But it seems to be an overkill to have custom serde for the issue here. If we want to add custom serde later, we can also do that by having `HoodieSerializableFileStatus` extend the serde classes. 
   Wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1548902476

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 5c380789c304ea58597e0c848d068035777e25d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1187407726


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+
+import java.io.IOException;
+import java.io.Serializable;
+
+/**
+ * A serializable wrapper for FileStatus.
+ *
+ * Hadoop 2.x FileStatus does not implement Serializable and can cause issues. (HUDI-5936)

Review Comment:
   Hadoop 2.x -> <p>Hadoop 2.x



##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+
+import java.io.IOException;
+import java.io.Serializable;
+
+/**
+ * A serializable wrapper for FileStatus.
+ *
+ * Hadoop 2.x FileStatus does not implement Serializable and can cause issues. (HUDI-5936)

Review Comment:
   Hadoop 2.x -> `<p>Hadoop 2.x`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1569485206

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 6304fd2f1a14e2918839fccd93f9068d9bc6e2d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528887656

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1185708479


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Yeah, makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1585429304

   @CTTY It looks like Hive query fails in the bundle validation due to: `org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/avro/AvroMissingFieldException`.  Could you take a look?  This is likely because `hudi-hadoop-mr-bundle` relies on Avro 1.8 which does not have the class, while other modules can be compiled with a higher Avro version.  And this PR uses Avro to serialize the file status.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1187408123


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+
+import java.io.IOException;
+import java.io.Serializable;
+
+/**
+ * A serializable wrapper for FileStatus.
+ *
+ * Hadoop 2.x FileStatus does not implement Serializable and can cause issues. (HUDI-5936)
+ * This class is supposed to make sure FileStatus can be safely serialized by wrapping FileStatus
+ * with it, and it should be only used when we absolutely need to serialize FileStatus

Review Comment:
   need to serialize FileStatus -> need to serialize FileStatus.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1186726288


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hadoop.fs.FileStatus;
+
+import java.io.IOException;
+import java.io.Serializable;
+
+public class HoodieSerializableFileStatus extends FileStatus implements Serializable {
+  FileStatus status;

Review Comment:
   It might be unnecessary to reuse this class for Spark part becasue: 1) That code is borrowed from [Spark](https://github.com/apache/spark/blob/5103e00c4ce5fcc4264ca9c4df12295d42557af6/core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala#L156) 2) The Spark `SerializableFileStatus` is tailored for `LocatedFileStatus` instead of vanilla `FileStatus` so it also uses `SerializableBlockLocation`. I think reusing it there means we'll need to have `HoodieSerializableFileStatus` to hold more states but ideally we want to keep it as simple as possible so it can just be a serializable wrapper of `FileStatus`.
   
   I'll add doc to `HoodieSerializableFileStatus`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1537726679

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 687ea9631daa66ba4f0144146b2c73636891bb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1470480710

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1167305239


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Sure,  here is the stack trace. I've also updated it in the JIRA
   ```
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2863)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2799)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2798)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2798)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1239)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1239)
   	at scala.Option.foreach(Option.scala:407)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1239)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3051)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2993)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1009)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2229)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2250)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2269)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2294)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
   	at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
   	at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
   	at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
   	at org.apache.hudi.client.common.HoodieSparkEngineContext.flatMap(HoodieSparkEngineContext.java:137)
   	at org.apache.hudi.metadata.FileSystemBackedTableMetadata.getAllPartitionPaths(FileSystemBackedTableMetadata.java:86)
   	at org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsForFullCleaning(CleanPlanner.java:214)
   	at org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsForCleanByCommits(CleanPlanner.java:168)
   	at org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsToClean(CleanPlanner.java:133)
   	at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:106)
   	at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:148)
   	at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:173)
   	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:204)
   	at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1354)
   	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
   	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:827)
   	at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
   	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more
   Caused by: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1326778825


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -169,9 +170,9 @@ private List<String> getPartitionPathWithPathPrefixUsingFilterExpression(String
 
       // List all directories in parallel
       engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all partitions with prefix " + relativePathPrefix);
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {

Review Comment:
   You are right, I don't think this is needed anymore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1470798048

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1141581428


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   Okay, then we may need to introduce our own serelizable `FileStatus` clazz, but before that, I want to see the specific error stack trace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1543546942

   There are test failues, can you squash with the latest master, let's see whether the failures could be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1543484173

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 687ea9631daa66ba4f0144146b2c73636891bb04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928) 
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1548388235

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * afa9e28856dbdf05addf7f6a83db1a2046eac193 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091) 
   * 5c380789c304ea58597e0c848d068035777e25d3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1568899840

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 5c380789c304ea58597e0c848d068035777e25d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097) 
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "CTTY (via GitHub)" <gi...@apache.org>.
CTTY commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1184505252


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   We need to only use this serializable filestatus when we absolutely need to serialize filestatus object, not everywhere. 
   
   In fact I just realized we already have something similiar for Spark datasource: https://github.com/apache/hudi/blob/21c913d826479931158e9eda30a99036c6e3c585/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/HoodieHadoopFSUtils.scala#LL349C1-L349C1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1185678595


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -106,9 +106,9 @@ private List<String> getPartitionPathWithPathPrefix(String relativePathPrefix) t
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
+      List<Pair<Path, Boolean>> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return Arrays.stream(fileSystem.listStatus(path));
+        return Arrays.stream(fileSystem.listStatus(path)).map(fileStatus -> Pair.of(fileStatus.getPath(), fileStatus.isDirectory()));
       }, listingParallelism);

Review Comment:
   It has better compatibility and performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1568833524

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 5c380789c304ea58597e0c848d068035777e25d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097) 
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1569286414

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * 9288a102f2e4eac58612b0f6bab2ec339447d332 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483) 
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 6304fd2f1a14e2918839fccd93f9068d9bc6e2d0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528869463

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) 
   * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1193278783


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/fs/TestHoodieFileStatusSerialization.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.avro.model.HoodieFileStatus;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.bootstrap.FileStatusUtils;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.testutils.HoodieClientTestHarness;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestInstance;
+import org.junit.jupiter.api.TestInstance.Lifecycle;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Test the if {@link HoodieFileStatus} is serializable
+ */
+@TestInstance(Lifecycle.PER_CLASS)
+public class TestHoodieFileStatusSerialization extends HoodieClientTestHarness {
+
+  HoodieEngineContext engineContext;
+  List<Path> testPaths;
+
+  @BeforeAll
+  public void setUp() throws IOException {
+    initSparkContexts();
+    testPaths = new ArrayList<>(5);
+    for (int i = 0; i < 5; i++) {
+      testPaths.add(new Path("s3://table-bucket/"));
+    }
+    engineContext = new HoodieSparkEngineContext(jsc);
+  }
+
+  @Test
+  public void testNonSerializableFileStatus() {
+    try {
+      // this is supposed to throw exception
+      List<FileStatus> statuses = engineContext.flatMap(testPaths, path -> {
+        FileSystem fileSystem = new NonSerializableFileSystem();
+        return Arrays.stream(fileSystem.listStatus(path));
+      }, 5);
+    } catch (Exception e) {
+      System.out.println("Exception message:" + e.getMessage());

Review Comment:
   Can we use `org.junit.jupiter.api.assertThrows` instead? And we should not print to stdout in testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1547150618

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1253db4e3d1372ebad8ec4e8c1bf143bb5947693 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017) 
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1667181751

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762",
       "triggerID" : "7c71b63797be01ee91268c2520f82b18b3f13b7c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16928",
       "triggerID" : "687ea9631daa66ba4f0144146b2c73636891bb04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17017",
       "triggerID" : "1253db4e3d1372ebad8ec4e8c1bf143bb5947693",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75",
       "triggerType" : "PUSH"
     }, {
       "hash" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "30d043fb18dca954f9df59c450671106a7fa070e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17091",
       "triggerID" : "afa9e28856dbdf05addf7f6a83db1a2046eac193",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c380789c304ea58597e0c848d068035777e25d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17097",
       "triggerID" : "5c380789c304ea58597e0c848d068035777e25d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17483",
       "triggerID" : "9288a102f2e4eac58612b0f6bab2ec339447d332",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ada7e29d46179057b839f62ca4241a3ef4ac9c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496",
       "triggerID" : "6304fd2f1a14e2918839fccd93f9068d9bc6e2d0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3691e528a97dbc0487a517da8b76a76856af9a58",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1ca7cd7dcd3534fa8d21fcf80eefc760c16e6f75 UNKNOWN
   * 30d043fb18dca954f9df59c450671106a7fa070e UNKNOWN
   * ada7e29d46179057b839f62ca4241a3ef4ac9c04 UNKNOWN
   * 6304fd2f1a14e2918839fccd93f9068d9bc6e2d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17496) 
   * 3691e528a97dbc0487a517da8b76a76856af9a58 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8190:
URL: https://github.com/apache/hudi/pull/8190#discussion_r1326784115


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -169,9 +170,9 @@ private List<String> getPartitionPathWithPathPrefixUsingFilterExpression(String
 
       // List all directories in parallel
       engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all partitions with prefix " + relativePathPrefix);
-      List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {

Review Comment:
   Got it.  I'll close this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8190:
URL: https://github.com/apache/hudi/pull/8190#issuecomment-1470486109

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726",
       "triggerID" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734",
       "triggerID" : "1470480710",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org