You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2020/01/01 01:14:00 UTC

[jira] [Commented] (HUDI-485) Check for where clause is wrong in HiveIncrementalPuller

    [ https://issues.apache.org/jira/browse/HUDI-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006277#comment-17006277 ] 

Vinoth Chandar commented on HUDI-485:
-------------------------------------

tbh, its a simple enough utility.. feel free to change it as you see fit.. 

 

one thing to keep in mind : when you query from hive, you need the `where  `_hoodie_commit_time_`  > ` filter..._ We can only filter out file(slice)s using the split level filtering that Hive does.. Back in the day, we tried adding an extra filter pushdown to also only select rows matching the commit time ranges from the incremental pulled files.. but ran into some issue.. if you are interested, we also revisit that and fix it more nicely. 

> Check for where clause is wrong in HiveIncrementalPuller
> --------------------------------------------------------
>
>                 Key: HUDI-485
>                 URL: https://issues.apache.org/jira/browse/HUDI-485
>             Project: Apache Hudi (incubating)
>          Issue Type: Sub-task
>          Components: Incremental Pull, newbie
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>
> HiveIncrementalPuller checks the clause in incrementalSqlFile like this -> 
> if (!incrementalSQL.contains("`_hoodie_commit_time` > '%targetBasePath'")) {
>  LOG.info("Incremental SQL : " + incrementalSQL
>  + " does not contain `_hoodie_commit_time` > %targetBasePath. Please add "
>  + "this clause for incremental to work properly.");
>  throw new HoodieIncrementalPullSQLException(
>  "Incremental SQL does not have clause `_hoodie_commit_time` > '%targetBasePath', which "
>  + "means its not pulling incrementally");
> }
> Basically we are trying to add a placeholder here which is later replaced with config.fromCommitTime here - 
> incrementalPullSQLtemplate.add("incrementalSQL", String.format(incrementalSQL, config.fromCommitTime));
> Hence, the above check needs to replaced with `_hoodie_commit_time` > %targetBasePath



--
This message was sent by Atlassian Jira
(v8.3.4#803005)