You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by nishith agarwal <na...@apache.org> on 2019/05/17 18:54:56 UTC

Upgrade HUDI to Hive 2.x

All,

Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
Hive 1.x and works against Hive 2.x by using specific profiles.
There are non-backwards compatible changes in the HiveRecordReader for Hive
1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
depend on forwards compatibility right now for Hudi to work with 2.x and
beyond).
Let me know what you guys think.

Thanks,
Nishith

Re: Upgrade HUDI to Hive 2.x

Posted by Balaji Varadarajan <v....@ymail.com.INVALID>.
 
Thanks Kabeer. That would be very helpful. 
Balaji.V    On Sunday, May 19, 2019, 3:26:22 PM PDT, Kabeer Ahmed <ka...@linuxmail.org> wrote:  
 
 Hi Balaji,

I never had this issue with 0.4.5 version of the release. The only release I got this issue was with 0.4.6-SNAPSHOT about 3 to 5 weeks ago. I resorted to manually making the changes in my local to get through some testing done. I am a bit confused as to how CDH5.7 is working with 0.4.6 as there are no hive version changes.
I shall take the latest 0.4.6 release and can retest and then come back to you with my findings. It shouldnt take much time. I shall circle around in 2 days.
Thanks
Kabeer.

On May 19 2019, at 8:43 pm, vbalaji@apache.org wrote:
> + 1 on deprecating Hive 1.1.
> On the other note mentioned by Kabeer,
> Hey Kabeer,
> The Hive integration with CDH 5.7.x still works fine. We internally use the hive-sync capability of latest version of Hudi to let deltastreamer sync to Hive 1.x tables. We do not have CH 5.13 setup. Did you notice that older version of Hudi (pre-0.4.6) worked fine with CDH 5.13 ?
> Balaji.V
>
>
>
>
>
>
>
>
> On Sunday, May 19, 2019, 11:45:24 AM PDT, Kabeer Ahmed <ka...@linuxmail.org> wrote:
> Hi,
> I think it is OK to deprecate the Hive 1.1 support. As of 0.4.6-SNAPSHOT that I was using the latest build about 3 weeks ago, I did face issues if I did want to work with Hive 1.1 that is bundled as a part of CDH 5.13 docker image. I did have to make manual tweaks listed at: https://github.com/bvaradar/hudi/commit/e189734a07b8782ea1d21b3c780dfc61c2ab8f2b (https://link.getmailspring.com/link/F03C0E62-95AE-4EA2-B280-241FBDA0C9DC@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fbvaradar%2Fhudi%2Fcommit%2Fe189734a07b8782ea1d21b3c780dfc61c2ab8f2b&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) to get it to work.
> I thought at Uber CDH is being used. Have you upgraded to CDH6.x?
> In summary: My experience has been that without the changes listed above hive 1.1 has issues. So 0.4.6-SNAPSHOT didnt work for me? I think it will be great to document that Hive 1.1 support is deprecated and doesnt work beyond 0.4.5?
> Thanks
> Kabeer.
>
> On May 17 2019, at 11:18 pm, Vinoth Chandar <vi...@apache.org> wrote:
> > I am in favor of deprecating Hive 1.x unless someone has a strong
> > objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and
> > Hive 3.x is going to grow.
> > This seems like a move in the right direction
> >
> > /thanks/vinoth
> > On Fri, May 17, 2019 at 11:55 AM nishith agarwal <na...@apache.org>
> > wrote:
> >
> > > All,
> > > Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
> > > Hive 1.x and works against Hive 2.x by using specific profiles.
> > > There are non-backwards compatible changes in the HiveRecordReader for Hive
> > > 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
> > > essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
> > > work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
> > > I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
> > > depend on forwards compatibility right now for Hudi to work with 2.x and
> > > beyond).
> > > Let me know what you guys think.
> > >
> > > Thanks,
> > > Nishith
> >
>
>

  

Re: Upgrade HUDI to Hive 2.x

Posted by Kabeer Ahmed <ka...@linuxmail.org>.
Hi Balaji,

I never had this issue with 0.4.5 version of the release. The only release I got this issue was with 0.4.6-SNAPSHOT about 3 to 5 weeks ago. I resorted to manually making the changes in my local to get through some testing done. I am a bit confused as to how CDH5.7 is working with 0.4.6 as there are no hive version changes.
I shall take the latest 0.4.6 release and can retest and then come back to you with my findings. It shouldnt take much time. I shall circle around in 2 days.
Thanks
Kabeer.

On May 19 2019, at 8:43 pm, vbalaji@apache.org wrote:
> + 1 on deprecating Hive 1.1.
> On the other note mentioned by Kabeer,
> Hey Kabeer,
> The Hive integration with CDH 5.7.x still works fine. We internally use the hive-sync capability of latest version of Hudi to let deltastreamer sync to Hive 1.x tables. We do not have CH 5.13 setup. Did you notice that older version of Hudi (pre-0.4.6) worked fine with CDH 5.13 ?
> Balaji.V
>
>
>
>
>
>
>
>
> On Sunday, May 19, 2019, 11:45:24 AM PDT, Kabeer Ahmed <ka...@linuxmail.org> wrote:
> Hi,
> I think it is OK to deprecate the Hive 1.1 support. As of 0.4.6-SNAPSHOT that I was using the latest build about 3 weeks ago, I did face issues if I did want to work with Hive 1.1 that is bundled as a part of CDH 5.13 docker image. I did have to make manual tweaks listed at: https://github.com/bvaradar/hudi/commit/e189734a07b8782ea1d21b3c780dfc61c2ab8f2b (https://link.getmailspring.com/link/F03C0E62-95AE-4EA2-B280-241FBDA0C9DC@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fbvaradar%2Fhudi%2Fcommit%2Fe189734a07b8782ea1d21b3c780dfc61c2ab8f2b&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) to get it to work.
> I thought at Uber CDH is being used. Have you upgraded to CDH6.x?
> In summary: My experience has been that without the changes listed above hive 1.1 has issues. So 0.4.6-SNAPSHOT didnt work for me? I think it will be great to document that Hive 1.1 support is deprecated and doesnt work beyond 0.4.5?
> Thanks
> Kabeer.
>
> On May 17 2019, at 11:18 pm, Vinoth Chandar <vi...@apache.org> wrote:
> > I am in favor of deprecating Hive 1.x unless someone has a strong
> > objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and
> > Hive 3.x is going to grow.
> > This seems like a move in the right direction
> >
> > /thanks/vinoth
> > On Fri, May 17, 2019 at 11:55 AM nishith agarwal <na...@apache.org>
> > wrote:
> >
> > > All,
> > > Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
> > > Hive 1.x and works against Hive 2.x by using specific profiles.
> > > There are non-backwards compatible changes in the HiveRecordReader for Hive
> > > 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
> > > essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
> > > work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
> > > I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
> > > depend on forwards compatibility right now for Hudi to work with 2.x and
> > > beyond).
> > > Let me know what you guys think.
> > >
> > > Thanks,
> > > Nishith
> >
>
>


Re: Upgrade HUDI to Hive 2.x

Posted by "vbalaji@apache.org" <vb...@apache.org>.
 + 1 on deprecating Hive 1.1.
On the other note mentioned by Kabeer,
Hey Kabeer,
The Hive integration with CDH 5.7.x still works fine. We internally use the hive-sync capability of latest version of Hudi to let deltastreamer sync to Hive 1.x tables.  We do not have CH 5.13 setup. Did you notice that older version of Hudi (pre-0.4.6) worked fine with CDH 5.13 ?
Balaji.V 








    On Sunday, May 19, 2019, 11:45:24 AM PDT, Kabeer Ahmed <ka...@linuxmail.org> wrote:  
 
 Hi,

I think it is OK to deprecate the Hive 1.1 support. As of 0.4.6-SNAPSHOT that I was using the latest build about 3 weeks ago, I did face issues if I did want to work with Hive 1.1 that is bundled as a part of CDH 5.13 docker image. I did have to make manual tweaks listed at: https://github.com/bvaradar/hudi/commit/e189734a07b8782ea1d21b3c780dfc61c2ab8f2b (https://link.getmailspring.com/link/F03C0E62-95AE-4EA2-B280-241FBDA0C9DC@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fbvaradar%2Fhudi%2Fcommit%2Fe189734a07b8782ea1d21b3c780dfc61c2ab8f2b&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) to get it to work.
I thought at Uber CDH is being used. Have you upgraded to CDH6.x?
In summary: My experience has been that without the changes listed above hive 1.1 has issues. So 0.4.6-SNAPSHOT didnt work for me? I think it will be great to document that Hive 1.1 support is deprecated and doesnt work beyond 0.4.5?
Thanks
Kabeer.

On May 17 2019, at 11:18 pm, Vinoth Chandar <vi...@apache.org> wrote:
> I am in favor of deprecating Hive 1.x unless someone has a strong
> objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and
> Hive 3.x is going to grow.
> This seems like a move in the right direction
>
> /thanks/vinoth
> On Fri, May 17, 2019 at 11:55 AM nishith agarwal <na...@apache.org>
> wrote:
>
> > All,
> > Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
> > Hive 1.x and works against Hive 2.x by using specific profiles.
> > There are non-backwards compatible changes in the HiveRecordReader for Hive
> > 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
> > essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
> > work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
> > I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
> > depend on forwards compatibility right now for Hudi to work with 2.x and
> > beyond).
> > Let me know what you guys think.
> >
> > Thanks,
> > Nishith
>
>

  

Re: Upgrade HUDI to Hive 2.x

Posted by Kabeer Ahmed <ka...@linuxmail.org>.
Hi,

I think it is OK to deprecate the Hive 1.1 support. As of 0.4.6-SNAPSHOT that I was using the latest build about 3 weeks ago, I did face issues if I did want to work with Hive 1.1 that is bundled as a part of CDH 5.13 docker image. I did have to make manual tweaks listed at: https://github.com/bvaradar/hudi/commit/e189734a07b8782ea1d21b3c780dfc61c2ab8f2b (https://link.getmailspring.com/link/F03C0E62-95AE-4EA2-B280-241FBDA0C9DC@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fbvaradar%2Fhudi%2Fcommit%2Fe189734a07b8782ea1d21b3c780dfc61c2ab8f2b&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) to get it to work.
I thought at Uber CDH is being used. Have you upgraded to CDH6.x?
In summary: My experience has been that without the changes listed above hive 1.1 has issues. So 0.4.6-SNAPSHOT didnt work for me? I think it will be great to document that Hive 1.1 support is deprecated and doesnt work beyond 0.4.5?
Thanks
Kabeer.

On May 17 2019, at 11:18 pm, Vinoth Chandar <vi...@apache.org> wrote:
> I am in favor of deprecating Hive 1.x unless someone has a strong
> objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and
> Hive 3.x is going to grow.
> This seems like a move in the right direction
>
> /thanks/vinoth
> On Fri, May 17, 2019 at 11:55 AM nishith agarwal <na...@apache.org>
> wrote:
>
> > All,
> > Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
> > Hive 1.x and works against Hive 2.x by using specific profiles.
> > There are non-backwards compatible changes in the HiveRecordReader for Hive
> > 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
> > essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
> > work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
> > I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
> > depend on forwards compatibility right now for Hudi to work with 2.x and
> > beyond).
> > Let me know what you guys think.
> >
> > Thanks,
> > Nishith
>
>


Re: Upgrade HUDI to Hive 2.x

Posted by Vinoth Chandar <vi...@apache.org>.
I am in favor of deprecating Hive 1.x unless someone has a strong
objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and
Hive 3.x is going to grow.
This seems like a move in the right direction

/thanks/vinoth

On Fri, May 17, 2019 at 11:55 AM nishith agarwal <na...@apache.org>
wrote:

> All,
>
> Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on
> Hive 1.x and works against Hive 2.x by using specific profiles.
> There are non-backwards compatible changes in the HiveRecordReader for Hive
> 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would
> essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT
> work with Hive 1.x anymore (mostly if the schema has nested columns). Also,
> I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we
> depend on forwards compatibility right now for Hudi to work with 2.x and
> beyond).
> Let me know what you guys think.
>
> Thanks,
> Nishith
>