You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by crazymb <cr...@163.com> on 2020/04/14 15:56:49 UTC

how to understand incremental query?

Hello, everyone,
 
     I am a new user of Hudi. After reading quickstart, I started experimenting, but I am a little confused about incremental query. Please help me. My question: When I insert new data every time, I execute the previous incremental query. Why is there no new insert data?


My test environment is 0.5.2


1. Create table according to quickstart, insert 10 data, table type: COPY_ON_WRITE
2. Generate 10 new data and append to the table
3. According to quickstart, perform an incremental query and return 10 results
4. Generate 10 new results again and append to the table
5. Only execute the SQL of incremental query (do not execute spark.read.load and createOrReplaceTempView), but still return 10 results? Should 20 results be returned?


Thanks a lot!

Re: how to understand incremental query?

Posted by Bhavani Sudha <bh...@gmail.com>.

Hi there,

Thanks for trying the quickstart. You might want to reload the data in step
5 before trying the incremental query. Since the tempview is not refreshed
it would still keep serving old data. We have used only spark in the
quickstart to make it easier for users to try Hudi APIs without external
dependencies. However, ideally if the incremental query is against a table
registered with Hive or in S3, the last step should work without need for
reload.

Hope that helps!

Thanks,
Sudha

On Tue, Apr 14, 2020 at 8:59 AM crazymb <cr...@163.com> wrote:

> Hello, everyone,
>
>      I am a new user of Hudi. After reading quickstart, I started
> experimenting, but I am a little confused about incremental query. Please
> help me. My question: When I insert new data every time, I execute the
> previous incremental query. Why is there no new insert data?
>
>
> My test environment is 0.5.2
>
>
> 1. Create table according to quickstart, insert 10 data, table type:
> COPY_ON_WRITE
> 2. Generate 10 new data and append to the table
> 3. According to quickstart, perform an incremental query and return 10
> results
> 4. Generate 10 new results again and append to the table
> 5. Only execute the SQL of incremental query (do not execute
> spark.read.load and createOrReplaceTempView), but still return 10 results?
> Should 20 results be returned?
>
>
> Thanks a lot!