You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/06 08:04:57 UTC

[GitHub] [iceberg] xuzhiwen1255 opened a new issue, #6531: Flink:Reading the UPsert-enabled table will cause checkpoint failure

xuzhiwen1255 opened a new issue, #6531:
URL: https://github.com/apache/iceberg/issues/6531

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   After upsert is enabled on my flink table, I write 300w data in advance and read the table in stream read mode. As a result, all checkpoint entries fail. After all data in the table is read,checkpoint works properly
   
   According to my analysis, the upstream monitor sent splits and blocked the pipeline, so the reader operator could not get the checkpoint barrier at the first time. As a result, checkpoint could not start, and finally, checkpoint timeout problem
   
   Like the following : 
       --------------------------------------------------
       barrier, split , split , split , split , split , split , split  丨  --- split --->  reader : processing split  --> reader File
       --------------------------------------------------
                                     .. processing ..
       --------------------------------------------------
     split , barrier, split , split , split , barrier , split , split 丨  --- split --->  reader : processing barrier --> trigger checkpoint
       --------------------------------------------------
   The barrier is processed only after all the splits before it have been processed, triggering a checkpoint.
   
   According to the actual situation of my test, because upsert is enabled, a lot of delete data will be generated, which will lead to a long time to process split. In the case of reader operator, the processing speed is slow, resulting in backpressure
   
   The split read is performed by the flink task thread, and since the same thread is used, a blocking problem occurs
   
   Here is my sql
   ```sql
   
   
   CREATE TABLE dg (
       id INT,c1 VARCHAR,c2 VARCHAR,c3 VARCHAR,c4 VARCHAR,c5 VARCHAR,c6 VARCHAR,c7 VARCHAR,c8 VARCHAR,c9 VARCHAR,c10 VARCHAR,c11 VARCHAR,c12 VARCHAR,c13 VARCHAR,c14 VARCHAR,c15 VARCHAR,c16 VARCHAR,c17 VARCHAR,c18 VARCHAR,c19 VARCHAR,c20 VARCHAR,c21 VARCHAR,c22 VARCHAR,c23 VARCHAR,c24 VARCHAR,c25 VARCHAR,c26 VARCHAR,c27 VARCHAR,c28 VARCHAR,c29 VARCHAR,c30 VARCHAR,c31 VARCHAR,c32 VARCHAR 
   )
   WITH (
       'connector' = 'datagen',
       'rows-per-second' = '10000',
       'number-of-rows' = '300000',
       'fields.id.min'='0',
       'fields.id.max'='50000001',
       'fields.c1.length' = '20','fields.c2.length' = '20','fields.c3.length' = '20','fields.c4.length' = '20','fields.c5.length' = '20','fields.c6.length' = '20','fields.c7.length' = '20','fields.c8.length' = '20','fields.c9.length' = '20','fields.c10.length' = '20','fields.c11.length' = '20','fields.c12.length' = '20','fields.c13.length' = '20','fields.c14.length' = '20','fields.c15.length' = '20','fields.c16.length' = '20','fields.c17.length' = '20','fields.c18.length' = '20','fields.c19.length' = '20','fields.c20.length' = '20','fields.c21.length' = '20','fields.c22.length' = '20','fields.c23.length' = '20','fields.c24.length' = '20','fields.c25.length' = '20','fields.c26.length' = '20','fields.c27.length' = '20','fields.c28.length' = '20','fields.c29.length' = '20','fields.c30.length' = '20','fields.c31.length' = '20','fields.c32.length' = '20'
     );
   
   
   create table hc.db.test1(
    id INT,c1 VARCHAR,c2 VARCHAR,c3 VARCHAR,c4 VARCHAR,c5 VARCHAR,c6 VARCHAR,c7 VARCHAR,c8 VARCHAR,c9 VARCHAR,c10 VARCHAR,c11 VARCHAR,c12 VARCHAR,c13 VARCHAR,c14 VARCHAR,c15 VARCHAR,c16 VARCHAR,c17 VARCHAR,c18 VARCHAR,c19 VARCHAR,c20 VARCHAR,c21 VARCHAR,c22 VARCHAR,c23 VARCHAR,c24 VARCHAR,c25 VARCHAR,c26 VARCHAR,c27 VARCHAR,c28 VARCHAR,c29 VARCHAR,c30 VARCHAR,c31 VARCHAR,c32 VARCHAR
   , PRIMARY KEY (`id`) NOT ENFORCED
   ) with (
    'format-version'='2',
    'write.upsert.enabled' = 'true'
    );
   
   
   create table hc.db.test4(
    id INT,c1 VARCHAR,c2 VARCHAR,c3 VARCHAR,c4 VARCHAR,c5 VARCHAR,c6 VARCHAR,c7 VARCHAR,c8 VARCHAR,c9 VARCHAR,c10 VARCHAR,c11 VARCHAR,c12 VARCHAR,c13 VARCHAR,c14 VARCHAR,c15 VARCHAR,c16 VARCHAR,c17 VARCHAR,c18 VARCHAR,c19 VARCHAR,c20 VARCHAR,c21 VARCHAR,c22 VARCHAR,c23 VARCHAR,c24 VARCHAR,c25 VARCHAR,c26 VARCHAR,c27 VARCHAR,c28 VARCHAR,c29 VARCHAR,c30 VARCHAR,c31 VARCHAR,c32 VARCHAR
   , PRIMARY KEY (`id`) NOT ENFORCED
   ) with (
    'format-version'='2',
    'write.upsert.enabled' = 'true'
    );
   
   -- streaming read
   insert into hc.db.test4  select * from hc.db.test1 /*+ OPTIONS('streaming'='true', 'monitor-interval'='10s')*/;
   
   ```
   
   
   ![image](https://user-images.githubusercontent.com/105710753/210957322-b79123ef-ab9d-4b6d-8b63-2ec395354a4a.png)
   
   
   ![image](https://user-images.githubusercontent.com/105710753/210957211-7fd03036-c528-4af0-a3cb-baba16232848.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xuzhiwen1255 commented on issue #6531: Flink:Reading the Upsert-enabled table will cause checkpoint failure

Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 commented on issue #6531:
URL: https://github.com/apache/iceberg/issues/6531#issuecomment-1373327193

   This should be an existing problem, regardless of version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xuzhiwen1255 commented on issue #6531: Flink:Reading the UPsert-enabled table will cause checkpoint failure

Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 commented on issue #6531:
URL: https://github.com/apache/iceberg/issues/6531#issuecomment-1373325280

   @stevenzwu Can you look at this problem? thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xuzhiwen1255 closed issue #6531: Flink:Reading the Upsert-enabled table will cause checkpoint failure

Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 closed issue #6531: Flink:Reading the Upsert-enabled table will cause checkpoint failure
URL: https://github.com/apache/iceberg/issues/6531


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org