You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/02/10 12:02:01 UTC

[jira] [Created] (HUDI-1608) MOR w/ global bloom fetches all records for read optimized query

sivabalan narayanan created HUDI-1608:
-----------------------------------------

             Summary: MOR w/ global bloom fetches all records for read optimized query
                 Key: HUDI-1608
                 URL: https://issues.apache.org/jira/browse/HUDI-1608
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.7.0
            Reporter: sivabalan narayanan


Script to reproduce in local spark:

[https://gist.github.com/nsivabalan/7250b794788516f1aec35650c2632364]

```

scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, id, __op from hudi_trips_snapshot order by _hoodie_record_key").show(false)

+-------------------+------------------+----------------------+---+----+

|_hoodie_commit_time|_hoodie_record_key|_hoodie_partition_path|id |__op|

+-------------------+------------------+----------------------+---+----+

|20210210065058     |1                 |1970-01-01            |1  |null|

|20210210065127     |2                 |2020-01-04            |2  |D   |

|20210210065127     |3                 |1970-01-01            |3  |D   |

|20210210065127     |4                 |2020-01-01            |4  |U   |

|20210210065058     |5                 |2020-01-01            |5  |I   |

|20210210065127     |6                 |1998-04-13            |6  |I   |

+-------------------+------------------+----------------------+---+----+

```

After an upsert, read optimized query returns records from both C1 and C2. 

Also, I don't find any log files in partitions. all of them are parquet files. 

 

ls /tmp/hudi_trips_cow/1998-04-13/

0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-23-12025_20210210065058.parquet

0d1e6a84-d036-42e9-806e-a3075b6bc677-0_1-61-25595_20210210065127.parquet

ls /tmp/hudi_trips_cow/1970-01-01/

7b836833-a656-485d-967a-871bdc653dc3-0_2-61-25596_20210210065127.parquet

7b836833-a656-485d-967a-871bdc653dc3-0_3-23-12027_20210210065058.parquet

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)