You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "stream2000 (via GitHub)" <gi...@apache.org> on 2023/04/19 09:08:02 UTC

[GitHub] [hudi] stream2000 opened a new issue, #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

stream2000 opened a new issue, #8500:
URL: https://github.com/apache/hudi/issues/8500

   **_Tips before filing an issue_**
   
   In hive sync, we get new partition events by getting written partitions since `lastCommitTimeSynced` field and will sync the newest completed instant time to hive after syncing partitions.  But in multi writer scenario, some instant before lastCommitTimeSynced may be completed after lastCommitTimeSynced have been synced to hive, and the written partitions in those instants will never be included in the future sync procedure. 
   
   My opinion is that we can only sync partitions before the oldest inflight instant, just like what we do in archive and incremental clean. 
   
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1523567035

   since we have a tracking ticket, can we go ahead and close the github issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] stream2000 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "stream2000 (via GitHub)" <gi...@apache.org>.
stream2000 commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1525023072

   > since we have a tracking ticket, can we go ahead and close the github issue.
   
   Hi, I don't really understand how can I go ahead with this issue. Do you mean that I should create a relative jira ticket and close this github issue? Or just close this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 closed issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario
URL: https://github.com/apache/hudi/issues/8500


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] stream2000 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "stream2000 (via GitHub)" <gi...@apache.org>.
stream2000 commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1535671020

   Sure, you can assign the issue to me发自我的 iPhone在 2023年5月5日,12:06,Danny Chan ***@***.***> 写道:
   @stream2000 , do you have intreast to fire a PR to fix the hive meta sync sync  issue, the #7627 has been landed.
   
   —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1525210337

   Recently we plan to introduce completion time on the timeline:  https://github.com/apache/hudi/pull/7627, after that, we can use this completion time to filter the timeline to resolve this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1535669578

   @stream2000 , do you have intreast to fire a PR to fix the hive meta sync sync  issue, the #7627 has been landed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8500:
URL: https://github.com/apache/hudi/issues/8500#issuecomment-1515654829

   You are right, that's why recently we are trying to address the issue by introducing the real transition time on the timeline: https://github.com/apache/hudi/pull/7627, by using the transition time instead, the instants can be sorted by completion time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org