You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sydneyhoran (via GitHub)" <gi...@apache.org> on 2023/04/20 19:59:18 UTC

[GitHub] [hudi] sydneyhoran opened a new issue, #8522: [SUPPORT] Parameter `--checkpoint` is different for DebeziumSource

sydneyhoran opened a new issue, #8522:
URL: https://github.com/apache/hudi/issues/8522

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Not an actual issue but an observation I wanted to post so that others can find it in the future. Maybe we could add something for this into the documentation? Regarding how to pass checkpoint string to Deltastreamer job that uses Debezium.
   
   I've been searching/experimenting for a while on how to pass `--checkpoint 12345` (for example) to a resume Deltastreamer from a specific checkpoint, in order to jump around in the Kafka topic to test unexpected messages that have been found to break the job.
   
   According to [this doc](https://hudi.apache.org/docs/hoodie_deltastreamer/#checkpointing), checkpoint is the Kafka offset when using a Kafka source. However since we are using a Debezium Source, the checkpoint string looks a little bit different.
   
   It includes the topic name, followed by `,0:<new_desired_checkpoint>`. For example, you have to explicitly pass `--checkpoint "users.public.transactions,0:106600"`.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Use Deltastreamer with Kafka PostgresDebeziumSource, pass `--checkpoint 106600` to try to force it to jump to that offset
   - Results in log `DeltaSync: Checkpoint to resume from : Option{val=106600}`
   - Job begins streaming from wherever it left off (105524).
   
   2. Change the parameter to be more explicit and line up with the previous "New checkpoint string"s printed in past job logs `--checkpoint "users.public.transactions,0:106600"`
   - Results in log `DeltaSync: Checkpoint to resume from : Option{val=users.public.transactions,0:106600}`
   - Job begins streaming from correct location(106600).
   
   
   **Expected behavior**
   
   Able to pass a new checkpoint string to Deltastreamer with Debezium source. It is achievable but was difficult to find an example of how to achieve this, and not as simple as just the offset number alone.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.1
   
   * Hive version : N/A
   
   * Hadoop version : N/A
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : both
   
   
   **Additional context**
   
   N/A
   
   **Stacktrace**
   
   N/A
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8522: [SUPPORT] Parameter `--checkpoint` is different for DebeziumSource

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8522:
URL: https://github.com/apache/hudi/issues/8522#issuecomment-1538271008

   Thanks @sydneyhoran, Created the JIRA ticket to track this change - https://issues.apache.org/jira/browse/HUDI-6191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8522: [SUPPORT] Parameter `--checkpoint` is different for DebeziumSource

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8522:
URL: https://github.com/apache/hudi/issues/8522#issuecomment-1517151256

   Thanks for the nice findings, would you mind fire a pr to the asf-site branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org