You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/30 11:53:42 UTC

[GitHub] [iceberg] Shane-Yu opened a new issue, #5671: The upsert mode can query the historical version of the data under certain conditions

Shane-Yu opened a new issue, #5671:
URL: https://github.com/apache/iceberg/issues/5671

   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Hive
   
   ### Please describe the bug 🐞
   
      In Iceberg upsert mode, create v2 table like this:
   
   > create table upsert_update_time_test(
   > id bigint comment 'pk',
   > data bigint comment 'data',
   > update_time string comment 'update_time'
   > )
   >  comment 'upsert_update_time_test'
   > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
   > TBLPROPERTIES (
   > 'engine.hive.enabled'='true',
   > 'write.metadata.delete-after-commit.enabled'='true',
   > 'write.target-file-size-bytes'='134217728',
   > 'write.metadata.previous-versions-max'='5',
   > 'write.metadata.metrics.default'='full',
   > 'format-version'='2'
   >  );
   
   
     Write data to iceberg with Flink like the code below:
   
   > FlinkSink.forRow(rowDataStream, tableSchema)
   >                 .tableLoader(tableLoader)
   >                 .tableSchema(tableSchema)
   >                 .upsert(true)
   >                 .writeParallelism(1)
   >                 .equalityFieldColumns(ImmutableList.of("id"))
   >                 .append();
   
   And send  data to like this:
   > $ nc -lk 3287
   > I,1,101,2022-08-26 15:44:50
   > U,1,103,2022-08-26 15:45:23
   ![image](https://user-images.githubusercontent.com/26053387/187426590-57fc1d33-e07e-4892-b148-96253eebf786.png)
   
   
    Finally, using hive and spark both got the following query results:
   
   >    select * from upsert_update_time_test;
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	103	2022-08-26 15:45:23
   > Time taken: 0.107 seconds, Fetched: 1 row(s)
   > hive (iceberg_yx)> select * from upsert_update_time_test where update_time <= '2022-08-26 15:45:00';
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	101	2022-08-26 15:44:50
   > Time taken: 0.76 seconds, Fetched: 1 row(s)
   > hive (iceberg_yx)> select * from upsert_update_time_test where update_time <= '2022-08-26 15:46:00';
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	103	2022-08-26 15:45:23
   > Time taken: 1.26 seconds, Fetched: 1 row(s)
   > hive (iceberg_yx)>
   >                  > select * from upsert_update_time_test where data <= 102;
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	101	2022-08-26 15:44:50
   > Time taken: 0.119 seconds, Fetched: 1 row(s)
   > hive (iceberg_yx)>
   >                  > select * from upsert_update_time_test where data <= 103;
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	103	2022-08-26 15:45:23
   > Time taken: 0.114 seconds, Fetched: 1 row(s)
   > hive (iceberg_yx)>
   >                  > select * from upsert_update_time_test where id = 1;
   > OK
   > upsert_update_time_test.id	upsert_update_time_test.data	upsert_update_time_test.update_time
   > 1	103	2022-08-26 15:45:23
   > Time taken: 0.134 seconds, Fetched: 1 row(s)
   
   ![image](https://user-images.githubusercontent.com/26053387/187428640-94f5e24d-6381-4a27-acc4-b4341d7d5242.png)
   
   ![image](https://user-images.githubusercontent.com/26053387/187427949-a83c029d-66b2-4c45-9866-d96abbe07033.png)
   
   
    The above query results show that the v2 table can **_query the historical version of the data when it meets the historical data conditions_**. Is this a bug or is there something wrong with my operation?   Anybody else met this?
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #5671:  The upsert mode can query the historical version of the data under certain conditions
URL: https://github.com/apache/iceberg/issues/5671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Shane-Yu closed issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
Shane-Yu closed issue #5671:  The upsert mode can query the historical version of the data under certain conditions
URL: https://github.com/apache/iceberg/issues/5671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1613941865

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1636576962

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Shane-Yu closed issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
Shane-Yu closed issue #5671:  The upsert mode can query the historical version of the data under certain conditions
URL: https://github.com/apache/iceberg/issues/5671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Shane-Yu commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
Shane-Yu commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1231565348

   @rdblue @openinx @stevenzwu @kbendick   Can you guys take some time to look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1232842923

   Do you mean:  it should see empty row set when execute the following SQL ? 
   
   ```sql
   hive (iceberg_yx)> select * from upsert_update_time_test where update_time <= '2022-08-26 15:45:00';
   OK
   upsert_update_time_test.id upsert_update_time_test.data upsert_update_time_test.update_time
   1 101 2022-08-26 15:44:50
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Shane-Yu commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
Shane-Yu commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1236900418

   > Do you mean: it should see empty row set when execute the following SQL ?
   > 
   > ```sql
   > hive (iceberg_yx)> select * from upsert_update_time_test where update_time <= '2022-08-26 15:45:00';
   > OK
   > upsert_update_time_test.id upsert_update_time_test.data upsert_update_time_test.update_time
   > 1 101 2022-08-26 15:44:50
   > ```
   > 
   > I'm not quite sure that whether your input stream `rowDataStream` will really transform the `U, 1, .. ` record as an `UPDATE` event , you can try to confirm this.
   
   
   Yeah, the result should be empty. This seems to be the problem of this PR https://github.com/apache/iceberg/pull/4316#issuecomment-1066097462, but this phenomenon only occurs for parquet files and 'write.metadata.metrics.default'='full'. When putting 'write.metadata.metrics.default'='count', the problem goes away.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenwyi2 commented on issue #5671: The upsert mode can query the historical version of the data under certain conditions

Posted by GitBox <gi...@apache.org>.
chenwyi2 commented on issue #5671:
URL: https://github.com/apache/iceberg/issues/5671#issuecomment-1358800244

   thie problem is write.metadata.metrics.default=full?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org