You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yann Byron (Jira)" <ji...@apache.org> on 2022/01/11 14:12:00 UTC

[jira] [Updated] (HUDI-3213) compaction should not change the commit time

     [ https://issues.apache.org/jira/browse/HUDI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yann Byron updated HUDI-3213:
-----------------------------
    Description: 
when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`,  `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and  third times and 50 records updated in the fifth updated, and 2 records inserted in the six time.

The right answer should be 2, and 150 records should not be counted in.

The reason is that `compaction` has changed the commit time of some records which are updated later and stored in log file. 
{code:java}
val hudiIncDF6 = spark.read.format("org.apache.hudi")
  .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
  .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time)
  .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time)
  .load(basePath)
// compaction updated 150 rows + inserted 2 new row
assertEquals(152, hudiIncDF6.count()) {code}
 

 

  was:
when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`,  `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and  third times and 50 records updated in the fifth updated, and 2 records inserted in the six time.

The right answer should be 2, and 150 records should not be counted in.

 
{code:java}
val hudiIncDF6 = spark.read.format("org.apache.hudi")
  .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
  .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time)
  .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time)
  .load(basePath)
// compaction updated 150 rows + inserted 2 new row
assertEquals(152, hudiIncDF6.count()) {code}
 

 


> compaction should not change the commit time
> --------------------------------------------
>
>                 Key: HUDI-3213
>                 URL: https://issues.apache.org/jira/browse/HUDI-3213
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Spark Integration, Writer Core
>            Reporter: Yann Byron
>            Assignee: Yann Byron
>            Priority: Major
>             Fix For: 0.11.0
>
>
> when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`,  `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and  third times and 50 records updated in the fifth updated, and 2 records inserted in the six time.
> The right answer should be 2, and 150 records should not be counted in.
> The reason is that `compaction` has changed the commit time of some records which are updated later and stored in log file. 
> {code:java}
> val hudiIncDF6 = spark.read.format("org.apache.hudi")
>   .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
>   .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time)
>   .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time)
>   .load(basePath)
> // compaction updated 150 rows + inserted 2 new row
> assertEquals(152, hudiIncDF6.count()) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)