You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/13 08:01:31 UTC

[GitHub] [hudi] novisfff opened a new issue, #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

novisfff opened a new issue, #6095:
URL: https://github.com/apache/hudi/issues/6095

   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : 
   
   * Running on Docker? (yes/no) : no
   
   
   
   TimelineServerBasedWriteMarkers wirte marker file to fileSystem asynchronously, if some error crash the task, will the marker not written in time be lost? Will this cause some problems?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] novisfff commented on issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

Posted by GitBox <gi...@apache.org>.
novisfff commented on issue #6095:
URL: https://github.com/apache/hudi/issues/6095#issuecomment-1211483503

   > Nope. It will never lose data. If process is crashed mid-way, the commit has also failed mid-way. So, next time when you restart your pipeline, the rollback of that partially failed commit will get triggered.
   > 
   > Main purpose of the marker files are, during rollbacks, instead of doing fs.listing of entire data directory, we can exactly get hold of data files written as part of the commit of interest. And we have a contract where in, data files will be created only after marker files are created. So, unless the marker creation succeeds, the corresponding data file will not be created.
   > 
   > So, given how timeline server based markers are designed, you should not see any data loss.
   > 
   > let me know if you need any more clarification. happy to help.
   
   Thank you very much. I understand. I'm sorry I had some problems with my previous understanding of timeline server based markers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6095:
URL: https://github.com/apache/hudi/issues/6095#issuecomment-1210108807

   Nope. It will never lose data. If process is crashed mid-way, the commit has also failed mid-way. So, next time when you restart your pipeline, the rollback of that partially failed commit will get triggered. 
   
   Main purpose of the marker files are, during rollbacks, instead of doing fs.listing of entire data directory, we can exactly get hold of data files written as part of the commit of interest. And we have a contract where in, data files will be created only after marker files are created. So, unless the marker creation succeeds, the corresponding data file will not be created. 
   
   So, given how timeline server based markers are designed, you should not see any data loss. 
   
   let me know if you need any more clarification. 
   happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] novisfff closed issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

Posted by GitBox <gi...@apache.org>.
novisfff closed issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?
URL: https://github.com/apache/hudi/issues/6095


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6095:
URL: https://github.com/apache/hudi/issues/6095#issuecomment-1212671710

   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org