You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/05 00:56:02 UTC

[GitHub] [iceberg] zhaobangcai opened a new issue #3846: 循环插入数据时,随机丢失插入的数据

zhaobangcai opened a new issue #3846:
URL: https://github.com/apache/iceberg/issues/3846


   版本:iceberg 0.13 ;flink 1.13.2 ;
   calalog:hadoop、hive均会出现此情况;
   无明显的报错信息;
   代码如下:
   EnvironmentSettings settings = null;
       TableEnvironment tableEnv = null;
       String dbName = "test_ods_preview_read_db";
       String tableName = "test_ods_preview_read_data";
       String warehouse="hdfs://cdh2:8020/user/hive/warehouse/zbc";
       @Before
       public  void init() {
   
           settings = EnvironmentSettings.newInstance().build();
   
           tableEnv = TableEnvironment.create(settings);
   
           String toWithClause = "(\n" +
                   "  'type'='iceberg',\n" +
                   "  'catalog-type'='hadoop',\n" +
                   "  'clients'='2',\n" +
                   "  'property-version'='1',\n" +
                   "  'warehouse'='hdfs://cdh2:8020/user/hive/warehouse/zbc'\n" +
                   ")";
   
           tableEnv.executeSql("CREATE CATALOG test_hadoop_catalog WITH " + toWithClause);
           tableEnv.executeSql("USE CATALOG test_hadoop_catalog");
           tableEnv.executeSql("CREATE DATABASE IF NOT EXISTS test_ods_preview_read_db");
           tableEnv.executeSql("USE test_ods_preview_read_db");
           tableEnv.executeSql("CREATE TABLE IF NOT EXISTS test_ods_preview_read_db.test_ods_preview_read_data(sensor_id STRING, ts BIGINT)");
           
           for (int i = 0;i< 10;i++){
               String sql = String.format("INSERT INTO test_ods_preview_read_data SELECT 'sensor_id_%d',%d" ,i ,i);
               tableEnv.executeSql(sql); --执行了10次,随机的写入1-10条数据,不是每次都写入成功,无明显报错。
               --Thread.sleep(2000);在此种情况下,数据写入成功率变大
           }
   
           tableEnv.getConfig().getConfiguration().setBoolean("table.dynamic-table-options.enabled", true);
       }
   
   没有排查到具体原因,help!谢谢。
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #3846: 循环插入数据时,随机丢失插入的数据

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #3846:
URL: https://github.com/apache/iceberg/issues/3846#issuecomment-1006184999


   Agree with @Initial-neko. Some "insert" might fail.
   To verify it, you can execute the following command to get all snapshots after the execution. There are going to be 10 snapshots if all "insert" succeed.
   ```
   select * from test_ods_preview_read_data.snapshots
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Initial-neko commented on issue #3846: 循环插入数据时,随机丢失插入的数据

Posted by GitBox <gi...@apache.org>.
Initial-neko commented on issue #3846:
URL: https://github.com/apache/iceberg/issues/3846#issuecomment-1005327948


   文件系统那边,你会发现metadata中版本号是冲突的。原因就是提交冲突了,平常的checkpoint提交不会这么频繁,所以在实际场景基本不会有这样的情况,sleep之后冲突的概率就变小了


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org