You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Sean Busbey <bu...@cloudera.com> on 2013/10/21 19:28:50 UTC

contrib vs. pushed to other projects

Heya,

I'm working on the docs for our contrib projects, and I noticed the Hive
Serde wasn't about. ACCUMULO-143[1] shows patch available, but it didn't
make it into a repo before the git move.

Based on the discussion back around the flume sink, my guess is things like
this should be in other projects if possible. However, this particular
contrib is basically the equivalent of hte Pig Storage contrib[2].

This leaves me with a few questions I'd like to get some consensus on:

1) Generally, do we want the contrib project guide to explicitly say that
interoperability projects should favor pushing Accumulo specific components
into other projects? (e.g. an Accumulo Sink for Flume)

2) Specifically for Hive, do want a contrib project added for this Serde,
or try to get it into Hive?

3) For consistency, would we prefer Accumulo-Pig get pushed into Pig (or
more likely Piggybank)?


[1]: https://issues.apache.org/jira/browse/ACCUMULO-143
[2]: https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git;a=summary
-- 
Sean

Re: contrib vs. pushed to other projects

Posted by Josh Elser <jo...@gmail.com>.
Had this sitting in a draft... but I think I'm for a case-by-case basis, 
leaning towards upstream contributions when at all possible.

Some more stuff inline.

On 10/21/13 10:28 AM, Sean Busbey wrote:
> Heya,
>
> I'm working on the docs for our contrib projects, and I noticed the Hive
> Serde wasn't about. ACCUMULO-143[1] shows patch available, but it didn't
> make it into a repo before the git move.

Should we make a repo for the Hive integration? Seems potentially 
worthwhile as we figure out what to do with it.

>
> Based on the discussion back around the flume sink, my guess is things like
> this should be in other projects if possible. However, this particular
> contrib is basically the equivalent of hte Pig Storage contrib[2].
>
> This leaves me with a few questions I'd like to get some consensus on:
>
> 1) Generally, do we want the contrib project guide to explicitly say that
> interoperability projects should favor pushing Accumulo specific components
> into other projects? (e.g. an Accumulo Sink for Flume)
>
> 2) Specifically for Hive, do want a contrib project added for this Serde,
> or try to get it into Hive?
>
> 3) For consistency, would we prefer Accumulo-Pig get pushed into Pig (or
> more likely Piggybank)?

As I bring myself up to speed on Pig, my intent was to leave it on its 
own. Once I feel confident in how it all works and all that, my intent 
was to bring it up to the Pig community to see what they would think 
about upstream inclusion. This would mirror a bit of what exists for 
HBaseStorage.

I'm positive there are pros/cons, I just don't know what they area yet 
:). Any other feedback/knowledge/advice would definitely be welcome. 
(https://issues.apache.org/jira/browse/ACCUMULO-1783)

>
> [1]: https://issues.apache.org/jira/browse/ACCUMULO-143
> [2]: https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git;a=summary
>

Re: contrib vs. pushed to other projects

Posted by Donald Miner <dm...@clearedgeit.com>.
Speaking from my recent experience with the flume/accumulo integration,
here are my 2 cents. The Accumulo sink was suggested to get committed into
Flume, which is what we are doing now.

First, I was impressed by how willing the Flume community was willing to
help us conform to their standards to get it committed. This appears pretty
normal in my experience across projects. However, reaching out to someone
in the flume community and playing by their rules makes it slightly harder
for me to contribute something like this.

So, regardless if it makes sense technically, maybe right now it's just
mostly about getting the contributions in. Today, I think it's more likely
an Accumulo developer wants to integrate with other things than the other
way around... so accumulo-contrib is the easiest path for someone like me
who wants to make these types of contributions. I think this is a corollary
to Chris's #2: Who is going to be responsible for maintaining it?

When other projects start independently doing Accumulo integration
independently, we'll know Accumulo "made it" ;)

-don


On Mon, Oct 21, 2013 at 3:27 PM, Sean Busbey <bu...@cloudera.com> wrote:

> On Mon, Oct 21, 2013 at 2:05 PM, Christopher <ct...@apache.org> wrote:
>
> > I think the answer to where things should go depends on two main factors:
> >
> > 1) Which project(s) does it benefit the most? (does it benefit
> > Accumulo users more to have another way to access Accumulo, or does it
> > benefit Hive users more to have another database to query from?), and
> > 2) Who is going to be responsible for maintaining it?
> >
> > The first question is probably a very subjective one, so I expect the
> > second to play a bigger role. Perhaps the discussion should involve
> > both communities to consolidate potentially multiple efforts?
> >
>
> I think the second question also comes down to community-specific
> subjectivity. For some projects, being in core doesn't imply a different
> level of maintenance than being in contrib or being in an outside repo (see
> the discussion from this summer around Trevni in Hive) -- if no one uses
> something it doesn't get maintained. If that happens long enough, it gets
> cut. I don't think we should use that lack of maintenance assurance to mean
> that we keep things in Accumulo just because we care about them being
> maintained.
>
> I tend to favor Jon's reasoning[1], which mostly focuses on which API is
> more likely to change in a way that requires maintenance. In the case of
> things like Flume, Hive, or Pig, I think the level of familiarity needed to
> maintain an integration point requires more knowledge of the non-Accumulo
> side.
>
> If the answer is that it's all case-by-case, then I can just put wording to
> that end in the contrib document. I just want to make sure people have some
> idea of our reasoning as a project without reading our mail archive or
> jiras.
>
> [1]:
>
> http://mail-archives.apache.org/mod_mbox/accumulo-dev/201307.mbox/%3CCAAha9a23xdnJOQyZBT7SOfDtb-Eg2Y2vUJ%2BVH3Eh3AA-rF0sbQ%40mail.gmail.com%3E
>
> --
> Sean
>

Re: contrib vs. pushed to other projects

Posted by Sean Busbey <bu...@cloudera.com>.
On Mon, Oct 21, 2013 at 2:05 PM, Christopher <ct...@apache.org> wrote:

> I think the answer to where things should go depends on two main factors:
>
> 1) Which project(s) does it benefit the most? (does it benefit
> Accumulo users more to have another way to access Accumulo, or does it
> benefit Hive users more to have another database to query from?), and
> 2) Who is going to be responsible for maintaining it?
>
> The first question is probably a very subjective one, so I expect the
> second to play a bigger role. Perhaps the discussion should involve
> both communities to consolidate potentially multiple efforts?
>

I think the second question also comes down to community-specific
subjectivity. For some projects, being in core doesn't imply a different
level of maintenance than being in contrib or being in an outside repo (see
the discussion from this summer around Trevni in Hive) -- if no one uses
something it doesn't get maintained. If that happens long enough, it gets
cut. I don't think we should use that lack of maintenance assurance to mean
that we keep things in Accumulo just because we care about them being
maintained.

I tend to favor Jon's reasoning[1], which mostly focuses on which API is
more likely to change in a way that requires maintenance. In the case of
things like Flume, Hive, or Pig, I think the level of familiarity needed to
maintain an integration point requires more knowledge of the non-Accumulo
side.

If the answer is that it's all case-by-case, then I can just put wording to
that end in the contrib document. I just want to make sure people have some
idea of our reasoning as a project without reading our mail archive or
jiras.

[1]:
http://mail-archives.apache.org/mod_mbox/accumulo-dev/201307.mbox/%3CCAAha9a23xdnJOQyZBT7SOfDtb-Eg2Y2vUJ%2BVH3Eh3AA-rF0sbQ%40mail.gmail.com%3E

-- 
Sean

Re: contrib vs. pushed to other projects

Posted by Christopher <ct...@apache.org>.
I think the answer to where things should go depends on two main factors:

1) Which project(s) does it benefit the most? (does it benefit
Accumulo users more to have another way to access Accumulo, or does it
benefit Hive users more to have another database to query from?), and
2) Who is going to be responsible for maintaining it?

The first question is probably a very subjective one, so I expect the
second to play a bigger role. Perhaps the discussion should involve
both communities to consolidate potentially multiple efforts?

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Mon, Oct 21, 2013 at 1:28 PM, Sean Busbey <bu...@cloudera.com> wrote:
> Heya,
>
> I'm working on the docs for our contrib projects, and I noticed the Hive
> Serde wasn't about. ACCUMULO-143[1] shows patch available, but it didn't
> make it into a repo before the git move.
>
> Based on the discussion back around the flume sink, my guess is things like
> this should be in other projects if possible. However, this particular
> contrib is basically the equivalent of hte Pig Storage contrib[2].
>
> This leaves me with a few questions I'd like to get some consensus on:
>
> 1) Generally, do we want the contrib project guide to explicitly say that
> interoperability projects should favor pushing Accumulo specific components
> into other projects? (e.g. an Accumulo Sink for Flume)
>
> 2) Specifically for Hive, do want a contrib project added for this Serde,
> or try to get it into Hive?
>
> 3) For consistency, would we prefer Accumulo-Pig get pushed into Pig (or
> more likely Piggybank)?
>
>
> [1]: https://issues.apache.org/jira/browse/ACCUMULO-143
> [2]: https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git;a=summary
> --
> Sean