You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Josh Wills <jw...@cloudera.com> on 2014/02/10 08:17:47 UTC

HCatalog Source for Crunch

Hey all,

I wanted to solicit feedback on
https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
Source and Target types to Crunch. I'm of two minds about adding in support
for HCatalog to the core project; in general, I like for their to be more
ways to read and write data from Crunch, esp. when we can add
interoperability w/Hive, which is such a natural complement to Crunch. On
the other hand, I'm concerned about the costs that Hive's not-super-nice
dependency framework impose on Crunch clients and the project-- that is to
say, I'm worried that bringing in HCat dependencies makes it harder for us
to update existing dependencies and add in new dependencies that will
conflict with the Hive/HCat dependencies. I'd like to hear from at least a
few folks that the HCat support would be useful for them before we promote
it from a useful extension to a feature of the core project.

Thanks!
Josh

-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: HCatalog Source for Crunch

Posted by Matthias Friedrich <ma...@mafr.de>.
Hi,

as someone who had to deal with dependency hell many times, I'd really
appreciate a separate Maven module for HCat.

Regards,
  Matthias

On Sunday, 2014-02-09, Josh Wills wrote:
> Hey all,
> 
> I wanted to solicit feedback on
> https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
> Source and Target types to Crunch. I'm of two minds about adding in support
> for HCatalog to the core project; in general, I like for their to be more
> ways to read and write data from Crunch, esp. when we can add
> interoperability w/Hive, which is such a natural complement to Crunch. On
> the other hand, I'm concerned about the costs that Hive's not-super-nice
> dependency framework impose on Crunch clients and the project-- that is to
> say, I'm worried that bringing in HCat dependencies makes it harder for us
> to update existing dependencies and add in new dependencies that will
> conflict with the Hive/HCat dependencies. I'd like to hear from at least a
> few folks that the HCat support would be useful for them before we promote
> it from a useful extension to a feature of the core project.
> 
> Thanks!
> Josh
> 
> -- 
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: HCatalog Source for Crunch

Posted by Josh Wills <jw...@cloudera.com>.
+1, I think that's fine.


On Wed, Feb 12, 2014 at 6:58 AM, Chao Shi <st...@live.com> wrote:

> Hi there,
>
> I think we have agreed to make it as a separate module. I will update
> CRUNCH-340 soon if there's no opposition.
>
>
> 2014-02-10 23:03 GMT+08:00 Gabriel Reid <ga...@gmail.com>:
>
> > I'm not currently in a position to make use of the HCat support, but
> > it definitely sounds like a cool thing to have in Crunch.
> >
> > I also agree that it should be in a separate Maven module.
> >
> > - Gabriel
> >
> > On Mon, Feb 10, 2014 at 8:17 AM, Josh Wills <jw...@cloudera.com> wrote:
> > > Hey all,
> > >
> > > I wanted to solicit feedback on
> > > https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
> > > Source and Target types to Crunch. I'm of two minds about adding in
> > support
> > > for HCatalog to the core project; in general, I like for their to be
> more
> > > ways to read and write data from Crunch, esp. when we can add
> > > interoperability w/Hive, which is such a natural complement to Crunch.
> On
> > > the other hand, I'm concerned about the costs that Hive's
> not-super-nice
> > > dependency framework impose on Crunch clients and the project-- that is
> > to
> > > say, I'm worried that bringing in HCat dependencies makes it harder for
> > us
> > > to update existing dependencies and add in new dependencies that will
> > > conflict with the Hive/HCat dependencies. I'd like to hear from at
> least
> > a
> > > few folks that the HCat support would be useful for them before we
> > promote
> > > it from a useful extension to a feature of the core project.
> > >
> > > Thanks!
> > > Josh
> > >
> > > --
> > > Director of Data Science
> > > Cloudera <http://www.cloudera.com>
> > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: HCatalog Source for Crunch

Posted by Chao Shi <st...@live.com>.
Hi there,

I think we have agreed to make it as a separate module. I will update
CRUNCH-340 soon if there's no opposition.


2014-02-10 23:03 GMT+08:00 Gabriel Reid <ga...@gmail.com>:

> I'm not currently in a position to make use of the HCat support, but
> it definitely sounds like a cool thing to have in Crunch.
>
> I also agree that it should be in a separate Maven module.
>
> - Gabriel
>
> On Mon, Feb 10, 2014 at 8:17 AM, Josh Wills <jw...@cloudera.com> wrote:
> > Hey all,
> >
> > I wanted to solicit feedback on
> > https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
> > Source and Target types to Crunch. I'm of two minds about adding in
> support
> > for HCatalog to the core project; in general, I like for their to be more
> > ways to read and write data from Crunch, esp. when we can add
> > interoperability w/Hive, which is such a natural complement to Crunch. On
> > the other hand, I'm concerned about the costs that Hive's not-super-nice
> > dependency framework impose on Crunch clients and the project-- that is
> to
> > say, I'm worried that bringing in HCat dependencies makes it harder for
> us
> > to update existing dependencies and add in new dependencies that will
> > conflict with the Hive/HCat dependencies. I'd like to hear from at least
> a
> > few folks that the HCat support would be useful for them before we
> promote
> > it from a useful extension to a feature of the core project.
> >
> > Thanks!
> > Josh
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>

Re: HCatalog Source for Crunch

Posted by Gabriel Reid <ga...@gmail.com>.
I'm not currently in a position to make use of the HCat support, but
it definitely sounds like a cool thing to have in Crunch.

I also agree that it should be in a separate Maven module.

- Gabriel

On Mon, Feb 10, 2014 at 8:17 AM, Josh Wills <jw...@cloudera.com> wrote:
> Hey all,
>
> I wanted to solicit feedback on
> https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
> Source and Target types to Crunch. I'm of two minds about adding in support
> for HCatalog to the core project; in general, I like for their to be more
> ways to read and write data from Crunch, esp. when we can add
> interoperability w/Hive, which is such a natural complement to Crunch. On
> the other hand, I'm concerned about the costs that Hive's not-super-nice
> dependency framework impose on Crunch clients and the project-- that is to
> say, I'm worried that bringing in HCat dependencies makes it harder for us
> to update existing dependencies and add in new dependencies that will
> conflict with the Hive/HCat dependencies. I'd like to hear from at least a
> few folks that the HCat support would be useful for them before we promote
> it from a useful extension to a feature of the core project.
>
> Thanks!
> Josh
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: HCatalog Source for Crunch

Posted by Gabriel Reid <ga...@gmail.com>.
I'm not currently in a position to make use of the HCat support, but
it definitely sounds like a cool thing to have in Crunch.

I also agree that it should be in a separate Maven module.

- Gabriel

On Mon, Feb 10, 2014 at 8:17 AM, Josh Wills <jw...@cloudera.com> wrote:
> Hey all,
>
> I wanted to solicit feedback on
> https://issues.apache.org/jira/browse/CRUNCH-340, which adds HCatalog
> Source and Target types to Crunch. I'm of two minds about adding in support
> for HCatalog to the core project; in general, I like for their to be more
> ways to read and write data from Crunch, esp. when we can add
> interoperability w/Hive, which is such a natural complement to Crunch. On
> the other hand, I'm concerned about the costs that Hive's not-super-nice
> dependency framework impose on Crunch clients and the project-- that is to
> say, I'm worried that bringing in HCat dependencies makes it harder for us
> to update existing dependencies and add in new dependencies that will
> conflict with the Hive/HCat dependencies. I'd like to hear from at least a
> few folks that the HCat support would be useful for them before we promote
> it from a useful extension to a feature of the core project.
>
> Thanks!
> Josh
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>