You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/20 23:55:03 UTC

[GitHub] [iceberg] MohamedAdelHsn opened a new issue, #6465: Iceberg not rolling files to hdfs while flink streaming job running

MohamedAdelHsn opened a new issue, #6465:
URL: https://github.com/apache/iceberg/issues/6465

   ### Query engine
   
   Flink SQL 
   
   ### Question
   
   Hello, 
   I am using Flink ssql to stream data from cdc to hive table and hdfs 
   i create create flink tables one as source table kafka and the other hive table 
   while streaming  job  running i can not see data rolled in hdfs and hive table  if i canceled flink job i can see data in hdfs / hive table 
   
   Thanks for your feedback ASAP.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #6465: Iceberg not rolling files to hdfs while flink streaming job running 
URL: https://github.com/apache/iceberg/issues/6465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
SHuixo commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1360915033

   @MohamedAdelHsn Can you provide more details about the code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
MohamedAdelHsn commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361121107

   i am using sql client so i go through flink and iceberg document and i can not found something like this sorry can you explain more i am not expert in flink 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
hililiwei commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361230094

   > is checkpoint enabled?
   
   Your current problem is that you have not found any data in the iceberg table?
   
   As mentioned above, please confirm whether checkpoint is enabled. If you do not, the data will not be commited to the iceberg table.
   
   When Flink writes data, it will roll the file when the size of the file being written is larger than the max target file size value, or  when Flink performs a checkpoint. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
MohamedAdelHsn commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1368262792

   Thank you all ,it is working now after enabling checkpoints in flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1644809084

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
MohamedAdelHsn commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361037555

   @SHuixo @luoyuxia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
SHuixo commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361267227

   > i am using sql client so i go through flink and iceberg document and i can not found something like this sorry can you explain more i am not expert in flink is this configuration in yaml conf file why flink not committed file in hdfs i search to much in flnik document and i can not find anything helped me
   
   You can directly modify the **checkpointing** related configuration in the **flink-conf.yaml** file to make it take effect, the official website reference link is as follows [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/)
   And the configuration items can be added as follows:
   
   ```
   execution.checkpointing.interval: 5000
   execution.checkpointing.mode: EXACTLY_ONCE
   state.backend: filesystem
   state.checkpoints.dir: hdfs:///flink/checkpoints
   state.savepoints.dir: hdfs:///flink/checkpoints
   execution.checkpointing.timeout: 600000
   execution.checkpointing.min-pause: 500
   execution.checkpointing.max-concurrent-checkpoints: 1
   state.checkpoints.num-retained: 3
   execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] luoyuxia commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
luoyuxia commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1360921223

   is checkpoint enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MohamedAdelHsn commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
MohamedAdelHsn commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361037210

   this code is used 
   
   CREATE TABLE flink_table_source (
   id BIGINT
   ,data VARCHAR
   ) WITH (
   'connector' = 'kafka'
   ,'topic' = 'flink'
   ,'properties.bootstrap.servers' = 'localhost:9092'
   ,'properties.group.id' = 'user_log_x'
   ,'scan.startup.mode' = 'earliest-offset'
   ,'format' = 'json'
   );
   
   this table are using default_catalog and default_database and when i start publish data to kafka topic it is working fine with
   select * from flink_table_source;
   
   i want to insert data into iceberg hive table from that table and i am facing an issue with below sql
   
   insert into hive_iceberg_table select * from flink_table_source;
   
   my hive_iceberg_table created as below
   
   CREATE CATALOG ice WITH (
   'type'='iceberg',
   'catalog-type'='hive',
   'uri'='thrift://localhost:9083',
   'clients'='5',
   'property-version'='2',
   'warehouse'='hdfs://localhost:9000/user/hive/warehouse'
   );
   
   USE ice;
   
   CREATE DATABASE iceberg;
   
   USE iceberg;
   
   /*hive table definition */
   CREATE TABLE ice.ice.user_log_sink (
   user_id INT,
   item_id STRING,
   PRIMARY KEY (user_id) NOT ENFORCED
   )WITH(
   'engine.hive.enabled' = 'true',
   );


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] SHuixo commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by GitBox <gi...@apache.org>.
SHuixo commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1361102904

   How do you write the context configuration code for the flink execution environment? Like this:
   
   ```
   final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
   
   env.enableCheckpointing(60000, EXACTLY_ONCE);
   env.setStateBackend(new HashMapStateBackend());
   
   final CheckpointConfig checkpointConfig = env.getCheckpointConfig();
   checkpointConfig.setMinPauseBetweenCheckpoints(500);
   checkpointConfig.setCheckpointTimeout(60000);
   checkpointConfig.setTolerableCheckpointFailureNumber(2);
   checkpointConfig.setMaxConcurrentCheckpoints(1);
   checkpointConfig.enableUnalignedCheckpoints();
   checkpointConfig.enableExternalizedCheckpoints(
           CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
   checkpointConfig.setCheckpointStorage(path);
   .....
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6465: Iceberg not rolling files to hdfs while flink streaming job running

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6465:
URL: https://github.com/apache/iceberg/issues/6465#issuecomment-1622720593

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org