You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "bithw1 (via GitHub)" <gi...@apache.org> on 2023/04/01 09:01:55 UTC

[GitHub] [hudi] bithw1 opened a new issue, #8348: [SUPPORT]One question about flink hudi streaming query

bithw1 opened a new issue, #8348:
URL: https://github.com/apache/hudi/issues/8348

   Hi,
   
   I am reading at https://hudi.apache.org/docs/flink-quick-start-guide#streaming-query
   
   The example query is as follows:
   
   ```
   CREATE TABLE t1(
     uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
     name VARCHAR(10),
     age INT,
     ts TIMESTAMP(3),
     `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
     'connector' = 'hudi',
     'path' = '${path}',
     'table.type' = 'MERGE_ON_READ',
     'read.streaming.enabled' = 'true',  -- this option enable the streaming read
     'read.start-commit' = '20210316134557', -- specifies the start commit instant time
     'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s.
   );
   
   -- Then query the table in stream mode
   select * from t1;
   
   ```
   I got a question about the option `read.start-commit`:
    When I start to run the query for the first time, the `read.start-commit` specify where the query starts.Then, the query run for a while(eg, one day) and the query stops ,  the hudi commits time have changed many times during this period.
   
   When I restart the query, how could I deal with the commit time? Should I manually specify a newer start-commit(It is very likely that I don't know which commits that flink query has processed)? 
   Are there checkpoint mechanism for `read.start-commit`?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #8348:
URL: https://github.com/apache/hudi/issues/8348#issuecomment-1493793156

   Thansk @danny0405 . Are there guide about the checkpoint capability about `read.start-commit`, how it works together with flink query(different query would have different start commit time). It is like the kafka consumer  checkpoint (offset and consumer group should work together to make the checkpointed offset makes sense)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8348:
URL: https://github.com/apache/hudi/issues/8348#issuecomment-1569717084

   @bithw1 Fo you need any other help on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8348:
URL: https://github.com/apache/hudi/issues/8348#issuecomment-1493683006

   > Are there checkpoint mechanism for read.start-commit that could automatically deal with this situation when flink query restarts
   
   Yes, the checkpoint would remember the last consumed instant time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 closed issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 closed issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query
URL: https://github.com/apache/hudi/issues/8348


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8348: [SUPPORT]One question about read.start-commit for flink hudi streaming query

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8348:
URL: https://github.com/apache/hudi/issues/8348#issuecomment-1493862409

   Not really, you can take a look at the `StreamReadMonitoringFunction#issuedInstant` about how it is proceeding and checkpointing, each source would manager the offset in per-source scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org