You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Edward Seidl <et...@live.com> on 2019/10/25 22:40:24 UTC

new contributor intro

Hi all,
I'm not exactly new to contributing to Accumulo, but just did my first PR through github today, so I thought I'd send the recommended intro email.  (I am trying from a different email address, since apparently my last post from work got sent to most people's junk folder.)  I've been using Accumulo for a number of years (it was still called cloudbase when I started).  I'm currently involved in evaluating how Accumulo interacts with HDFS erasure coding, and also generally interested in Accumulo performance.

Cheers,
Ed Seidl

Re: new contributor intro

Posted by Edward Seidl <et...@live.com>.
Thanks Christopher.

I haven't yet hit any blockers to using EC with Accumulo.  There's still a lot of work to be done testing performance and rooting out any hidden gotchas.  The only big issue I've run across, which I mention in my blog post, is that the WAL absolutely cannot be written to an erasure coded directory.  It might be a good idea to add some guard code to the DfsLogger to check the policies set on the WAL directory and throw at least a warning if EC is detected.

I've been working on usability improvements to make working with EC easier.  Right now, to set the policy for a table requires using the "hdfs ec" command and setting policies on the /accumulo/tables/<id> and children manually.  I'm trying to add per-namespace/table properties to control EC (and storage policy), the idea being that a user sets an encoding policy for a namespace, and then any tables created within that namespace will have their tablet directories set to that policy.  I'm also trying to implement changing the EC policy at the directory level whenever the encoding policy property is changed via the shell.  I'd like to invite any interested parties to check out my fork at https://github.com/etseidl/accumulo/tree/ecprops-2.1 It's already out of date since Keith just checked in some conflicting changes, but you can at least see what I'm trying to accomplish.  I'd appreciate some feedback to let me know if I'm on a reasonable track.  In particular, the propagation of changes is pretty raw (in the past I used the table config observer to detect changes, but that disappeared).  I'd also like to know if how I've approached things would work with how you envision abstracting the filesystem...I currently check for DistributedFileSystem in VolumeManagerImpl, but am not keen on using instanceof.

I don't know if any of this is baked enough to do a pull request, but will do so if you'd prefer.

Thanks,
Ed

________________________________
From: Christopher <ct...@apache.org>
Sent: Wednesday, October 30, 2019 3:07 PM
To: accumulo-dev <de...@accumulo.apache.org>
Subject: Re: new contributor intro

Awesome! Thanks for the intro Ed. I'm very curious if there's any
improvements or features we need to change in Accumulo to better
support erasure coding in HDFS (and especially if we can do so without
increasing our entanglement with Hadoop HDFS, specifically, as it is a
long-term goal of mine to abstract our DFS-related code, to more
easily use alternative implementations).



Re: new contributor intro

Posted by Christopher <ct...@apache.org>.
Awesome! Thanks for the intro Ed. I'm very curious if there's any
improvements or features we need to change in Accumulo to better
support erasure coding in HDFS (and especially if we can do so without
increasing our entanglement with Hadoop HDFS, specifically, as it is a
long-term goal of mine to abstract our DFS-related code, to more
easily use alternative implementations).

On Tue, Oct 29, 2019 at 3:54 AM Nikhil Manchanda <sl...@gmail.com> wrote:
>
>
> Nice to (virtually) meet you, Ed!
>
> Great to see you taking a look at Accumulo's interactions with
> HDFS erasure coding. I've been discussing this informally with
> Chris Green and I know he's also very interested in how well this
> works. We haven't really had a chance to run any concrete
> experiments yet, so would love to see what you discover.
>
> Cheers,
> Nikhil
>
> On Mon, Oct 28 2019, Mike Miller <mm...@apache.org> wrote:
> > Thanks for the intro Ed and welcome back!
> >
> > On Fri, Oct 25, 2019 at 6:40 PM Edward Seidl <et...@live.com>
> > wrote:
> >
> >> Hi all,
> >> I'm not exactly new to contributing to Accumulo, but just did
> >> my first PR
> >> through github today, so I thought I'd send the recommended
> >> intro email.
> >> (I am trying from a different email address, since apparently
> >> my last post
> >> from work got sent to most people's junk folder.)  I've been
> >> using Accumulo
> >> for a number of years (it was still called cloudbase when I
> >> started).  I'm
> >> currently involved in evaluating how Accumulo interacts with
> >> HDFS erasure
> >> coding, and also generally interested in Accumulo performance.
> >>
> >> Cheers,
> >> Ed Seidl
> >>

Re: new contributor intro

Posted by Nikhil Manchanda <sl...@gmail.com>.
Nice to (virtually) meet you, Ed!

Great to see you taking a look at Accumulo's interactions with 
HDFS erasure coding. I've been discussing this informally with 
Chris Green and I know he's also very interested in how well this 
works. We haven't really had a chance to run any concrete 
experiments yet, so would love to see what you discover.

Cheers,
Nikhil

On Mon, Oct 28 2019, Mike Miller <mm...@apache.org> wrote:
> Thanks for the intro Ed and welcome back!
>
> On Fri, Oct 25, 2019 at 6:40 PM Edward Seidl <et...@live.com> 
> wrote:
>
>> Hi all,
>> I'm not exactly new to contributing to Accumulo, but just did 
>> my first PR
>> through github today, so I thought I'd send the recommended 
>> intro email.
>> (I am trying from a different email address, since apparently 
>> my last post
>> from work got sent to most people's junk folder.)  I've been 
>> using Accumulo
>> for a number of years (it was still called cloudbase when I 
>> started).  I'm
>> currently involved in evaluating how Accumulo interacts with 
>> HDFS erasure
>> coding, and also generally interested in Accumulo performance.
>>
>> Cheers,
>> Ed Seidl
>>

Re: new contributor intro

Posted by Mike Miller <mm...@apache.org>.
Thanks for the intro Ed and welcome back!

On Fri, Oct 25, 2019 at 6:40 PM Edward Seidl <et...@live.com> wrote:

> Hi all,
> I'm not exactly new to contributing to Accumulo, but just did my first PR
> through github today, so I thought I'd send the recommended intro email.
> (I am trying from a different email address, since apparently my last post
> from work got sent to most people's junk folder.)  I've been using Accumulo
> for a number of years (it was still called cloudbase when I started).  I'm
> currently involved in evaluating how Accumulo interacts with HDFS erasure
> coding, and also generally interested in Accumulo performance.
>
> Cheers,
> Ed Seidl
>