You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Bhavani Sudha Saktheeswaran <bh...@uber.com.INVALID> on 2019/05/20 21:12:35 UTC

Re: [DISCUSS] HIP-4: Faster Hive incremental pull queries

Create the HIP here - https://cwiki.apache.org/confluence/display/HUDI/HIP-4
Please share your thoughts.

Thanks,
Sudha

On Mon, May 20, 2019 at 11:31 AM Bhavani Sudha Saktheeswaran <
bhasudha@uber.com> wrote:

> Works now. Thanks Vinoth!
>
> -Sudha
>
> On Mon, May 20, 2019 at 11:05 AM Vinoth Chandar <vi...@apache.org> wrote:
>
>> I just gave you wiki access. can you try again ?
>>
>> On Mon, May 20, 2019 at 10:53 AM Bhavani Sudha Saktheeswaran
>> <bh...@uber.com.invalid> wrote:
>>
>> > Hi,
>> >
>> > I am trying to create a HIP in cwiki ( username: bhasudha) . Seems like
>> I
>> > need some access to create a HIP. Can you grant me permission ?
>> >
>> > Thanks,
>> > Sudha
>> >
>> > On Sun, May 19, 2019 at 5:04 PM Bhavani Sudha Saktheeswaran <
>> > bhasudha@uber.com> wrote:
>> >
>> > > Hello all,
>> > >
>> > > Hive Incremental queries on Hoodie currently suffer a limitation of
>> > > listing all partitions when a datestr is not present (lists .hoodie
>> and
>> > the
>> > > partitions) and end up throwing away a lot of the files (since
>> > `_*hoodie*_commit_time`
>> > > column values filters out those files) . This can be very expensive
>> and
>> > can
>> > > impact query planning time and sometime causes timeouts as well if the
>> > > table is large. https://issues.apache.org/jira/browse/HUDI-25  tracks
>> > the
>> > > issue.
>> > >
>> > > If we can leverage the timeline and partitions touched by the commits
>> > > involved in incremental pull, then we can avoid listing all partitions
>> > and
>> > > hence reduce the query planning time. I am planning to send a HIP to
>> > > discuss this further. Please share your thoughts.
>> > >
>> > > Thanks,
>> > > Sudha
>> > >
>> >
>>
>

Re: [DISCUSS] HIP-4: Faster Hive incremental pull queries

Posted by Vinoth Chandar <vi...@apache.org>.

LGTM. thanks!

On Mon, May 20, 2019 at 2:19 PM Bhavani Sudha Saktheeswaran
<bh...@uber.com.invalid> wrote:

> Create the HIP here -
> https://cwiki.apache.org/confluence/display/HUDI/HIP-4
> Please share your thoughts.
>
> Thanks,
> Sudha
>
> On Mon, May 20, 2019 at 11:31 AM Bhavani Sudha Saktheeswaran <
> bhasudha@uber.com> wrote:
>
> > Works now. Thanks Vinoth!
> >
> > -Sudha
> >
> > On Mon, May 20, 2019 at 11:05 AM Vinoth Chandar <vi...@apache.org>
> wrote:
> >
> >> I just gave you wiki access. can you try again ?
> >>
> >> On Mon, May 20, 2019 at 10:53 AM Bhavani Sudha Saktheeswaran
> >> <bh...@uber.com.invalid> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am trying to create a HIP in cwiki ( username: bhasudha) . Seems
> like
> >> I
> >> > need some access to create a HIP. Can you grant me permission ?
> >> >
> >> > Thanks,
> >> > Sudha
> >> >
> >> > On Sun, May 19, 2019 at 5:04 PM Bhavani Sudha Saktheeswaran <
> >> > bhasudha@uber.com> wrote:
> >> >
> >> > > Hello all,
> >> > >
> >> > > Hive Incremental queries on Hoodie currently suffer a limitation of
> >> > > listing all partitions when a datestr is not present (lists .hoodie
> >> and
> >> > the
> >> > > partitions) and end up throwing away a lot of the files (since
> >> > `_*hoodie*_commit_time`
> >> > > column values filters out those files) . This can be very expensive
> >> and
> >> > can
> >> > > impact query planning time and sometime causes timeouts as well if
> the
> >> > > table is large. https://issues.apache.org/jira/browse/HUDI-25
> tracks
> >> > the
> >> > > issue.
> >> > >
> >> > > If we can leverage the timeline and partitions touched by the
> commits
> >> > > involved in incremental pull, then we can avoid listing all
> partitions
> >> > and
> >> > > hence reduce the query planning time. I am planning to send a HIP to
> >> > > discuss this further. Please share your thoughts.
> >> > >
> >> > > Thanks,
> >> > > Sudha
> >> > >
> >> >
> >>
> >
>