You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Srihari Srinivasan <ar...@yahoo.com.INVALID> on 2016/07/15 13:49:33 UTC

Request for advise, collaboration

Hi Folks,
I am Hari, a developer with a company called ThoughtWorks. We've been developing data pipelines using on Hadoop,Spark etc for a while now. From our experiences with different customers we've noticed a recurring need to carry out tasks such as data preparation, data anonymization etc on large datasets using Java MR and Spark.Based on this experience, we have been working on building a couple of libraries targeted at data preparation and data protection to begin with. Its hosted under an umbrella project called Data Commons at the moment (inspired by the Apache Commons project which is organized around a similar theme).
At the moment this is a fledgling project and its contributions are driven by our data team. However we are very keen on making this part of the larger Apache collective and make it a community driven effort. 
Hence, I am reaching out to you folks for advise on what could be the best way forward for this effort. We are also open to explore collaborations with other existing projects that are already part of Apache. Please share your thoughts, advise.
-- Hari

Re: Request for advise, collaboration

Posted by Venkatesh Seetharam <ve...@apache.org>.

Hi Hari,

I'm on the Apache Falcon PMC and Falcon being a data pipeline management
solution for Hadoop, there might be enough interest to explore if we can
collaborate either being part of Falcon or a separate project.

Can you please elaborate on the scope and if orchestration is part of this?
Falcon also integrates with a metadata solution in Apache Atlas which I'm
part of as well.

Thanks!
Venkatesh

On Fri, Jul 15, 2016 at 6:49 AM Srihari Srinivasan
<ar...@yahoo.com.invalid> wrote:

> Hi Folks,
> I am Hari, a developer with a company called ThoughtWorks. We've been
> developing data pipelines using on Hadoop,Spark etc for a while now. From
> our experiences with different customers we've noticed a recurring need to
> carry out tasks such as data preparation, data anonymization etc on large
> datasets using Java MR and Spark.Based on this experience, we have been
> working on building a couple of libraries targeted at data preparation and
> data protection to begin with. Its hosted under an umbrella project
> called Data Commons at the moment (inspired by the Apache Commons project
> which is organized around a similar theme).
> At the moment this is a fledgling project and its contributions are driven
> by our data team. However we are very keen on making this part of the
> larger Apache collective and make it a community driven effort.
> Hence, I am reaching out to you folks for advise on what could be the best
> way forward for this effort. We are also open to explore collaborations
> with other existing projects that are already part of Apache. Please share
> your thoughts, advise.
> -- Hari
>
>

Re: Request for advise, collaboration

Posted by Nick Kew <ni...@apache.org>.

On Fri, 2016-07-15 at 13:49 +0000, Srihari Srinivasan wrote:
> Hence, I am reaching out to you folks for advise on what could be the best way forward for this effort. We are also open to explore collaborations with other existing projects that are already part of Apache. Please share your thoughts, advise.
> -- Hari
> 
You might want to start by talking to the projects you consider
most relevant.  If you find interest there, they'd be the people who
might help bring your project to the incubator.  If not, you consider
why not and see where that takes you.

Not everyone does that.  But since you're asking, I guess you
don't already have a clear alternative idea.

-- 
Nick Kew


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org