You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Mike Percy <mp...@apache.org> on 2018/07/17 18:39:27 UTC

Growing the Kudu community

Hi Apache Kudu community,

Apologies for cross-posting, we just wanted to reach a broad audience for
this topic.

Grant and I have been brainstorming about what we can do to grow the
community of Kudu developers and users. We think Kudu has a lot going for
it, but not everybody knows what it is and what it’s capable of. Focusing
and combining our collective efforts to increase awareness (marketing) and
to reduce barriers to contribution and adoption could be a good way to
achieve organic growth.

We’d like to hear your ideas about what barriers and pain points exist and
any ideas you may have to fix some of those things -- especially ideas
requiring minimal effort and maximum impact.

To kick this off, here are some ideas Grant and I have come up with so far,
in sort of a rough priority order:

Ideas for general improvements

   1. Java MiniCluster support out of the box (KUDU-2411)
   1. This will enable integration with other projects in a way that allows
      them to test against a running Kudu cluster and ensure quality without
      having to build it themselves.
      2. Create a dedicated Maven-consumable java module for a Kudu
      MiniCluster
      3. Pre-built binary artifacts (for testing use only) downloadable
      with MiniCluster (Linux / MacOS)
      4. Ship all dependencies (even security deps, which will not be fixed
      if CVEs found)
      5. Make the binaries Linux distro-independent by building on an old
      distro (EL6)
   2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
      1. Remove barrier to submitting a patch
      2. Latest version of Gerrit has a fix for the bad GitHub login
      redirect
   3. Upstream pre-built packages for production use (Start rhel7, maybe
   ubuntu)
   1. This is potentially a pretty large effort, depending in the number of
      platforms we want to support
      2. Tarballs -- per-OS / per-distro
      3. Yum install, apt get: per-OS / per-distro
      4. Homebrew?
   4. CLI based tools with zero dependencies for quick experiments/demos
   1. Create, describe, alter tables
      2. Cat data out, pipe data in.
      3. Or simple Python examples to do similar
   5. Create developer oriented docs and faqs (wiki style?)
   6. CONTRIBUTING.adoc in repo
   1. Simplified
      2. Quick “assume nothing tutorial”
      3. Video Guide?

Ongoing marketing and engagement

   1. Quarterly email to the dev / users list
   1. Recognize new contributors
      2. Call out beginner jiras
      3. Summarize ongoing projects
   2. Consistently use the beginner / newbie tag in JIRA
   1. Doc how to find beginner jiras in the contributing docs
   3. Regular blog posts
   1. Developer and community contributors
      2. Invite people from other projects that integrate w/ Kudu to post
      on our Blog
      3. Document how to contribute a blog post
      4. Topics: Compile and maintain a list of blog post ideas in case
      people want inspiration -- Grant has been gathering ideas for this
   4. Archive Slack to a mailing list to be indexed by search engines
   (SlackArchive.io has shut down)

Please offer your suggestions for where we can get a good bang for our
collective buck, and if there is anything you would like to work on by all
means please either speak up or feel free to reach out directly.

Thanks,

Grant and Mike

Re: Growing the Kudu community

Posted by Sailesh Mukil <sa...@cloudera.com.INVALID>.
On Tue, Jul 17, 2018 at 11:39 AM, Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so far,
> in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that allows
>       them to test against a running Kudu cluster and ensure quality
> without
>       having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be fixed
>       if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number of
>       platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>

A suggestion to add on to the easily downloadable pre-built packages, is to
have easily accessible/downloadable example test-data that's fairly
representative of real world datasets (but it doesn't have to be too
large). Additionally, we can write tutorials in kudu/examples/ that use
this test data, to give new users a better feel for the system.


>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>

Re: Growing the Kudu community

Posted by Mauricio Aristizabal <ma...@impact.com>.
Thanks very much Grant! I will start participating in #kudu-backup

On Tue, Jul 17, 2018 at 12:15 PM Grant Henke <gh...@cloudera.com> wrote:

> Thank you for the quick feedback Tim and Maurice.
>
> Tim, I have some rough work on the Java/Maven/Gradle related parts to the
> MiniCluster that I have been experimenting with locally. It would be great
> to coordinate and collaborate with you on those contributions.
>
> Mauricio, we have been doing a lot of work on Kudu's backup features as a
> top priority. The formal design communication exists on the dev mailing
> list here, but also a lot of conversation is happening in the Slack
> #kudu-backup channel. Your feedback on the design docs would be great! Duly
> noted on the conference feedback.
>
> On Tue, Jul 17, 2018 at 2:04 PM Mauricio Aristizabal <ma...@impact.com>
> wrote:
>
>> My new-user thoughts: MiniCluster is nice but right now we get by
>> launching a docker instance in tests, it's pretty fast.  What's really
>> hurting adoption at our org is lack of a proper backup/snapshot/replication
>> feature.  As for marketing, i think conferences are crucial, so I was
>> disappointed that Strata SJ 2018 didn't have a single session on Kudu,
>> there were no committers in attendance that I could tell, and it wasn't
>> being highlighted at all in the Cloudera booth.  Between Strata and
>> ScalaDays I must have enthusiastically mentioned the product to 15 people
>> and none had heard of it. -m
>>
>> On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:
>>
>>> Hi Apache Kudu community,
>>>
>>> Apologies for cross-posting, we just wanted to reach a broad audience
>>> for this topic.
>>>
>>> Grant and I have been brainstorming about what we can do to grow the
>>> community of Kudu developers and users. We think Kudu has a lot going for
>>> it, but not everybody knows what it is and what it’s capable of. Focusing
>>> and combining our collective efforts to increase awareness (marketing) and
>>> to reduce barriers to contribution and adoption could be a good way to
>>> achieve organic growth.
>>>
>>> We’d like to hear your ideas about what barriers and pain points exist
>>> and any ideas you may have to fix some of those things -- especially ideas
>>> requiring minimal effort and maximum impact.
>>>
>>> To kick this off, here are some ideas Grant and I have come up with so
>>> far, in sort of a rough priority order:
>>>
>>> Ideas for general improvements
>>>
>>>    1. Java MiniCluster support out of the box (KUDU-2411)
>>>    1. This will enable integration with other projects in a way that
>>>       allows them to test against a running Kudu cluster and ensure quality
>>>       without having to build it themselves.
>>>       2. Create a dedicated Maven-consumable java module for a Kudu
>>>       MiniCluster
>>>       3. Pre-built binary artifacts (for testing use only) downloadable
>>>       with MiniCluster (Linux / MacOS)
>>>       4. Ship all dependencies (even security deps, which will not be
>>>       fixed if CVEs found)
>>>       5. Make the binaries Linux distro-independent by building on an
>>>       old distro (EL6)
>>>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>>>       1. Remove barrier to submitting a patch
>>>       2. Latest version of Gerrit has a fix for the bad GitHub login
>>>       redirect
>>>    3. Upstream pre-built packages for production use (Start rhel7,
>>>    maybe ubuntu)
>>>    1. This is potentially a pretty large effort, depending in the
>>>       number of platforms we want to support
>>>       2. Tarballs -- per-OS / per-distro
>>>       3. Yum install, apt get: per-OS / per-distro
>>>       4. Homebrew?
>>>    4. CLI based tools with zero dependencies for quick experiments/demos
>>>    1. Create, describe, alter tables
>>>       2. Cat data out, pipe data in.
>>>       3. Or simple Python examples to do similar
>>>    5. Create developer oriented docs and faqs (wiki style?)
>>>    6. CONTRIBUTING.adoc in repo
>>>    1. Simplified
>>>       2. Quick “assume nothing tutorial”
>>>       3. Video Guide?
>>>
>>> Ongoing marketing and engagement
>>>
>>>    1. Quarterly email to the dev / users list
>>>    1. Recognize new contributors
>>>       2. Call out beginner jiras
>>>       3. Summarize ongoing projects
>>>    2. Consistently use the beginner / newbie tag in JIRA
>>>    1. Doc how to find beginner jiras in the contributing docs
>>>    3. Regular blog posts
>>>    1. Developer and community contributors
>>>       2. Invite people from other projects that integrate w/ Kudu to
>>>       post on our Blog
>>>       3. Document how to contribute a blog post
>>>       4. Topics: Compile and maintain a list of blog post ideas in case
>>>       people want inspiration -- Grant has been gathering ideas for this
>>>    4. Archive Slack to a mailing list to be indexed by search engines
>>>    (SlackArchive.io has shut down)
>>>
>>> Please offer your suggestions for where we can get a good bang for our
>>> collective buck, and if there is anything you would like to work on by all
>>> means please either speak up or feel free to reach out directly.
>>>
>>> Thanks,
>>>
>>> Grant and Mike
>>>
>>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauricio@impact.com | 323 309 4260
>> https://impact.com
>>    <https://www.facebook.com/ImpactMarTech/>
>> <https://twitter.com/impactmartech>
>>
>>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260 <javascript:void(0);>
https://impact.com
   <https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>

Re: Growing the Kudu community

Posted by Mauricio Aristizabal <ma...@impact.com>.
Thanks very much Grant! I will start participating in #kudu-backup

On Tue, Jul 17, 2018 at 12:15 PM Grant Henke <gh...@cloudera.com> wrote:

> Thank you for the quick feedback Tim and Maurice.
>
> Tim, I have some rough work on the Java/Maven/Gradle related parts to the
> MiniCluster that I have been experimenting with locally. It would be great
> to coordinate and collaborate with you on those contributions.
>
> Mauricio, we have been doing a lot of work on Kudu's backup features as a
> top priority. The formal design communication exists on the dev mailing
> list here, but also a lot of conversation is happening in the Slack
> #kudu-backup channel. Your feedback on the design docs would be great! Duly
> noted on the conference feedback.
>
> On Tue, Jul 17, 2018 at 2:04 PM Mauricio Aristizabal <ma...@impact.com>
> wrote:
>
>> My new-user thoughts: MiniCluster is nice but right now we get by
>> launching a docker instance in tests, it's pretty fast.  What's really
>> hurting adoption at our org is lack of a proper backup/snapshot/replication
>> feature.  As for marketing, i think conferences are crucial, so I was
>> disappointed that Strata SJ 2018 didn't have a single session on Kudu,
>> there were no committers in attendance that I could tell, and it wasn't
>> being highlighted at all in the Cloudera booth.  Between Strata and
>> ScalaDays I must have enthusiastically mentioned the product to 15 people
>> and none had heard of it. -m
>>
>> On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:
>>
>>> Hi Apache Kudu community,
>>>
>>> Apologies for cross-posting, we just wanted to reach a broad audience
>>> for this topic.
>>>
>>> Grant and I have been brainstorming about what we can do to grow the
>>> community of Kudu developers and users. We think Kudu has a lot going for
>>> it, but not everybody knows what it is and what it’s capable of. Focusing
>>> and combining our collective efforts to increase awareness (marketing) and
>>> to reduce barriers to contribution and adoption could be a good way to
>>> achieve organic growth.
>>>
>>> We’d like to hear your ideas about what barriers and pain points exist
>>> and any ideas you may have to fix some of those things -- especially ideas
>>> requiring minimal effort and maximum impact.
>>>
>>> To kick this off, here are some ideas Grant and I have come up with so
>>> far, in sort of a rough priority order:
>>>
>>> Ideas for general improvements
>>>
>>>    1. Java MiniCluster support out of the box (KUDU-2411)
>>>    1. This will enable integration with other projects in a way that
>>>       allows them to test against a running Kudu cluster and ensure quality
>>>       without having to build it themselves.
>>>       2. Create a dedicated Maven-consumable java module for a Kudu
>>>       MiniCluster
>>>       3. Pre-built binary artifacts (for testing use only) downloadable
>>>       with MiniCluster (Linux / MacOS)
>>>       4. Ship all dependencies (even security deps, which will not be
>>>       fixed if CVEs found)
>>>       5. Make the binaries Linux distro-independent by building on an
>>>       old distro (EL6)
>>>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>>>       1. Remove barrier to submitting a patch
>>>       2. Latest version of Gerrit has a fix for the bad GitHub login
>>>       redirect
>>>    3. Upstream pre-built packages for production use (Start rhel7,
>>>    maybe ubuntu)
>>>    1. This is potentially a pretty large effort, depending in the
>>>       number of platforms we want to support
>>>       2. Tarballs -- per-OS / per-distro
>>>       3. Yum install, apt get: per-OS / per-distro
>>>       4. Homebrew?
>>>    4. CLI based tools with zero dependencies for quick experiments/demos
>>>    1. Create, describe, alter tables
>>>       2. Cat data out, pipe data in.
>>>       3. Or simple Python examples to do similar
>>>    5. Create developer oriented docs and faqs (wiki style?)
>>>    6. CONTRIBUTING.adoc in repo
>>>    1. Simplified
>>>       2. Quick “assume nothing tutorial”
>>>       3. Video Guide?
>>>
>>> Ongoing marketing and engagement
>>>
>>>    1. Quarterly email to the dev / users list
>>>    1. Recognize new contributors
>>>       2. Call out beginner jiras
>>>       3. Summarize ongoing projects
>>>    2. Consistently use the beginner / newbie tag in JIRA
>>>    1. Doc how to find beginner jiras in the contributing docs
>>>    3. Regular blog posts
>>>    1. Developer and community contributors
>>>       2. Invite people from other projects that integrate w/ Kudu to
>>>       post on our Blog
>>>       3. Document how to contribute a blog post
>>>       4. Topics: Compile and maintain a list of blog post ideas in case
>>>       people want inspiration -- Grant has been gathering ideas for this
>>>    4. Archive Slack to a mailing list to be indexed by search engines
>>>    (SlackArchive.io has shut down)
>>>
>>> Please offer your suggestions for where we can get a good bang for our
>>> collective buck, and if there is anything you would like to work on by all
>>> means please either speak up or feel free to reach out directly.
>>>
>>> Thanks,
>>>
>>> Grant and Mike
>>>
>>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauricio@impact.com | 323 309 4260
>> https://impact.com
>>    <https://www.facebook.com/ImpactMarTech/>
>> <https://twitter.com/impactmartech>
>>
>>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260 <javascript:void(0);>
https://impact.com
   <https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>

Re: Growing the Kudu community

Posted by Grant Henke <gh...@cloudera.com.INVALID>.
Thank you for the quick feedback Tim and Maurice.

Tim, I have some rough work on the Java/Maven/Gradle related parts to the
MiniCluster that I have been experimenting with locally. It would be great
to coordinate and collaborate with you on those contributions.

Mauricio, we have been doing a lot of work on Kudu's backup features as a
top priority. The formal design communication exists on the dev mailing
list here, but also a lot of conversation is happening in the Slack
#kudu-backup channel. Your feedback on the design docs would be great! Duly
noted on the conference feedback.

On Tue, Jul 17, 2018 at 2:04 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> My new-user thoughts: MiniCluster is nice but right now we get by
> launching a docker instance in tests, it's pretty fast.  What's really
> hurting adoption at our org is lack of a proper backup/snapshot/replication
> feature.  As for marketing, i think conferences are crucial, so I was
> disappointed that Strata SJ 2018 didn't have a single session on Kudu,
> there were no committers in attendance that I could tell, and it wasn't
> being highlighted at all in the Cloudera booth.  Between Strata and
> ScalaDays I must have enthusiastically mentioned the product to 15 people
> and none had heard of it. -m
>
> On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:
>
>> Hi Apache Kudu community,
>>
>> Apologies for cross-posting, we just wanted to reach a broad audience for
>> this topic.
>>
>> Grant and I have been brainstorming about what we can do to grow the
>> community of Kudu developers and users. We think Kudu has a lot going for
>> it, but not everybody knows what it is and what it’s capable of. Focusing
>> and combining our collective efforts to increase awareness (marketing) and
>> to reduce barriers to contribution and adoption could be a good way to
>> achieve organic growth.
>>
>> We’d like to hear your ideas about what barriers and pain points exist
>> and any ideas you may have to fix some of those things -- especially ideas
>> requiring minimal effort and maximum impact.
>>
>> To kick this off, here are some ideas Grant and I have come up with so
>> far, in sort of a rough priority order:
>>
>> Ideas for general improvements
>>
>>    1. Java MiniCluster support out of the box (KUDU-2411)
>>    1. This will enable integration with other projects in a way that
>>       allows them to test against a running Kudu cluster and ensure quality
>>       without having to build it themselves.
>>       2. Create a dedicated Maven-consumable java module for a Kudu
>>       MiniCluster
>>       3. Pre-built binary artifacts (for testing use only) downloadable
>>       with MiniCluster (Linux / MacOS)
>>       4. Ship all dependencies (even security deps, which will not be
>>       fixed if CVEs found)
>>       5. Make the binaries Linux distro-independent by building on an
>>       old distro (EL6)
>>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>>       1. Remove barrier to submitting a patch
>>       2. Latest version of Gerrit has a fix for the bad GitHub login
>>       redirect
>>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>>    ubuntu)
>>    1. This is potentially a pretty large effort, depending in the number
>>       of platforms we want to support
>>       2. Tarballs -- per-OS / per-distro
>>       3. Yum install, apt get: per-OS / per-distro
>>       4. Homebrew?
>>    4. CLI based tools with zero dependencies for quick experiments/demos
>>    1. Create, describe, alter tables
>>       2. Cat data out, pipe data in.
>>       3. Or simple Python examples to do similar
>>    5. Create developer oriented docs and faqs (wiki style?)
>>    6. CONTRIBUTING.adoc in repo
>>    1. Simplified
>>       2. Quick “assume nothing tutorial”
>>       3. Video Guide?
>>
>> Ongoing marketing and engagement
>>
>>    1. Quarterly email to the dev / users list
>>    1. Recognize new contributors
>>       2. Call out beginner jiras
>>       3. Summarize ongoing projects
>>    2. Consistently use the beginner / newbie tag in JIRA
>>    1. Doc how to find beginner jiras in the contributing docs
>>    3. Regular blog posts
>>    1. Developer and community contributors
>>       2. Invite people from other projects that integrate w/ Kudu to
>>       post on our Blog
>>       3. Document how to contribute a blog post
>>       4. Topics: Compile and maintain a list of blog post ideas in case
>>       people want inspiration -- Grant has been gathering ideas for this
>>    4. Archive Slack to a mailing list to be indexed by search engines
>>    (SlackArchive.io has shut down)
>>
>> Please offer your suggestions for where we can get a good bang for our
>> collective buck, and if there is anything you would like to work on by all
>> means please either speak up or feel free to reach out directly.
>>
>> Thanks,
>>
>> Grant and Mike
>>
>>
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
>    <https://www.facebook.com/ImpactMarTech/>
> <https://twitter.com/impactmartech>
>
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Growing the Kudu community

Posted by Grant Henke <gh...@cloudera.com>.
Thank you for the quick feedback Tim and Maurice.

Tim, I have some rough work on the Java/Maven/Gradle related parts to the
MiniCluster that I have been experimenting with locally. It would be great
to coordinate and collaborate with you on those contributions.

Mauricio, we have been doing a lot of work on Kudu's backup features as a
top priority. The formal design communication exists on the dev mailing
list here, but also a lot of conversation is happening in the Slack
#kudu-backup channel. Your feedback on the design docs would be great! Duly
noted on the conference feedback.

On Tue, Jul 17, 2018 at 2:04 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> My new-user thoughts: MiniCluster is nice but right now we get by
> launching a docker instance in tests, it's pretty fast.  What's really
> hurting adoption at our org is lack of a proper backup/snapshot/replication
> feature.  As for marketing, i think conferences are crucial, so I was
> disappointed that Strata SJ 2018 didn't have a single session on Kudu,
> there were no committers in attendance that I could tell, and it wasn't
> being highlighted at all in the Cloudera booth.  Between Strata and
> ScalaDays I must have enthusiastically mentioned the product to 15 people
> and none had heard of it. -m
>
> On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:
>
>> Hi Apache Kudu community,
>>
>> Apologies for cross-posting, we just wanted to reach a broad audience for
>> this topic.
>>
>> Grant and I have been brainstorming about what we can do to grow the
>> community of Kudu developers and users. We think Kudu has a lot going for
>> it, but not everybody knows what it is and what it’s capable of. Focusing
>> and combining our collective efforts to increase awareness (marketing) and
>> to reduce barriers to contribution and adoption could be a good way to
>> achieve organic growth.
>>
>> We’d like to hear your ideas about what barriers and pain points exist
>> and any ideas you may have to fix some of those things -- especially ideas
>> requiring minimal effort and maximum impact.
>>
>> To kick this off, here are some ideas Grant and I have come up with so
>> far, in sort of a rough priority order:
>>
>> Ideas for general improvements
>>
>>    1. Java MiniCluster support out of the box (KUDU-2411)
>>    1. This will enable integration with other projects in a way that
>>       allows them to test against a running Kudu cluster and ensure quality
>>       without having to build it themselves.
>>       2. Create a dedicated Maven-consumable java module for a Kudu
>>       MiniCluster
>>       3. Pre-built binary artifacts (for testing use only) downloadable
>>       with MiniCluster (Linux / MacOS)
>>       4. Ship all dependencies (even security deps, which will not be
>>       fixed if CVEs found)
>>       5. Make the binaries Linux distro-independent by building on an
>>       old distro (EL6)
>>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>>       1. Remove barrier to submitting a patch
>>       2. Latest version of Gerrit has a fix for the bad GitHub login
>>       redirect
>>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>>    ubuntu)
>>    1. This is potentially a pretty large effort, depending in the number
>>       of platforms we want to support
>>       2. Tarballs -- per-OS / per-distro
>>       3. Yum install, apt get: per-OS / per-distro
>>       4. Homebrew?
>>    4. CLI based tools with zero dependencies for quick experiments/demos
>>    1. Create, describe, alter tables
>>       2. Cat data out, pipe data in.
>>       3. Or simple Python examples to do similar
>>    5. Create developer oriented docs and faqs (wiki style?)
>>    6. CONTRIBUTING.adoc in repo
>>    1. Simplified
>>       2. Quick “assume nothing tutorial”
>>       3. Video Guide?
>>
>> Ongoing marketing and engagement
>>
>>    1. Quarterly email to the dev / users list
>>    1. Recognize new contributors
>>       2. Call out beginner jiras
>>       3. Summarize ongoing projects
>>    2. Consistently use the beginner / newbie tag in JIRA
>>    1. Doc how to find beginner jiras in the contributing docs
>>    3. Regular blog posts
>>    1. Developer and community contributors
>>       2. Invite people from other projects that integrate w/ Kudu to
>>       post on our Blog
>>       3. Document how to contribute a blog post
>>       4. Topics: Compile and maintain a list of blog post ideas in case
>>       people want inspiration -- Grant has been gathering ideas for this
>>    4. Archive Slack to a mailing list to be indexed by search engines
>>    (SlackArchive.io has shut down)
>>
>> Please offer your suggestions for where we can get a good bang for our
>> collective buck, and if there is anything you would like to work on by all
>> means please either speak up or feel free to reach out directly.
>>
>> Thanks,
>>
>> Grant and Mike
>>
>>
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
>    <https://www.facebook.com/ImpactMarTech/>
> <https://twitter.com/impactmartech>
>
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Growing the Kudu community

Posted by Mauricio Aristizabal <ma...@impact.com>.
My new-user thoughts: MiniCluster is nice but right now we get by launching
a docker instance in tests, it's pretty fast.  What's really hurting
adoption at our org is lack of a proper backup/snapshot/replication
feature.  As for marketing, i think conferences are crucial, so I was
disappointed that Strata SJ 2018 didn't have a single session on Kudu,
there were no committers in attendance that I could tell, and it wasn't
being highlighted at all in the Cloudera booth.  Between Strata and
ScalaDays I must have enthusiastically mentioned the product to 15 people
and none had heard of it. -m

On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so
> far, in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that
>       allows them to test against a running Kudu cluster and ensure quality
>       without having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be
>       fixed if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number
>       of platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>
>

-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260 <javascript:void(0);>
https://impact.com
   <https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Wed, Jul 18, 2018 at 8:52 AM Tim Robertson <ti...@gmail.com>
wrote:

> Perhaps we should continue this on the dev@ list discussion I started a
> few weeks back [2]?



[2]
> https://lists.apache.org/thread.html/ee697a022b72bbca2761b1af0581773d8fb708f701fc969bc259fc2d@%3Cdev.kudu.apache.org%3E
>


Sure, let's continue the conversation on that thread.

Mike

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Wed, Jul 18, 2018 at 8:52 AM Tim Robertson <ti...@gmail.com>
wrote:

> Perhaps we should continue this on the dev@ list discussion I started a
> few weeks back [2]?



[2]
> https://lists.apache.org/thread.html/ee697a022b72bbca2761b1af0581773d8fb708f701fc969bc259fc2d@%3Cdev.kudu.apache.org%3E
>


Sure, let's continue the conversation on that thread.

Mike

Re: Growing the Kudu community

Posted by Tim Robertson <ti...@gmail.com>.
On the mini cluster as a maven artifact:
The Beam KuduIO is in progress here [1], with Integration test (currently I
use Docker) and I just refactored the code so I could mock a KuduService
for unit tests.
It is an ideal time to try your current work Mike/Grant, as I'd use a
minicluster instead of mocking. Perhaps we should continue this on the dev@
list discussion I started a few weeks back [2]?
Do you have a build for OS X by any chance?

> I'm not really sure there is a lot of overlap between creating a Docker
> image and the kind of relocatable artifacts I'm trying to build, aside
from
> the actual compiling part.

I've had to manipulate hosts files with fake entries to get the Docker
images to work while the minicluster seemingly has a FakeDNS thing (?).
That might just be me doing things wrong though.




[1]
https://github.com/timrobertson100/beam/tree/BEAM-2661-KuduIO/sdks/java/io/kudu

[2]
https://lists.apache.org/thread.html/ee697a022b72bbca2761b1af0581773d8fb708f701fc969bc259fc2d@%3Cdev.kudu.apache.org%3E

On Wed, Jul 18, 2018 at 4:37 AM, Mike Percy <mp...@apache.org> wrote:

> On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
> wrote:
>
> > I have started a document for blog post ideas/topics here:
> >
> > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> byVt1D3NaTl7lI/edit?usp=sharing
> >
>
> Nice list, Grant. Actually I think that quarterly email would probably make
> for a better blog post instead and I've added it as a suggestion on that
> doc.
>
> On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <mauricio@impact.com
> >
> wrote:
>
> > I was disappointed that Strata SJ 2018 didn't have a single session on
> > Kudu, there were no committers in attendance that I could tell, and it
> > wasn't being highlighted at all in the Cloudera booth.  Between Strata
> and
> > ScalaDays I must have enthusiastically mentioned the product to 15 people
> > and none had heard of it.
> >
>
> Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
> everybody else was going to submit... actually I had intended to submit a
> talk proposal to Strata this year but got busy and missed the deadline. :(
>
> I wonder if folks using Kudu would like to present on their use case? I'm
> sure conference-goers would like to hear from more people using Kudu "in
> anger" (hopefully not angrily).
>
> On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sailesh@cloudera.com.invalid
> >
> wrote:
>
> > A suggestion to add on to the easily downloadable pre-built packages, is
> to
> > have easily accessible/downloadable example test-data that's fairly
> > representative of real world datasets (but it doesn't have to be too
> > large). Additionally, we can write tutorials in kudu/examples/ that use
> > this test data, to give new users a better feel for the system.
>
>
> That sounds useful. Any ideas on where we could find such a data set?
>
> On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > ++1 on the mini cluster
> > Perhaps include a docker image build at the same time which presumably
> > wouldn't be much effort?
> >
>
> I'm not really sure there is a lot of overlap between creating a Docker
> image and the kind of relocatable artifacts I'm trying to build, aside from
> the actual compiling part. But I think it would be valuable for Docker
> users to be able to easily pull down a Kudu image.
>
>
> > l'll be happy to contribute on the Java / maven related parts to that. I
> > will use this for the testing framework for the Apache Beam KuduIO and
> will
> > certainly help test / write a blog.
> >
>
> I don't really know how to handle the Maven part where we unpack the
> tarball and set it up somewhere so we can invoke it from the
> KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?
>
> I'd love to see a blog post about how to use Kudu with Beam!
>
> Mike
>

Re: Growing the Kudu community

Posted by Tim Robertson <ti...@gmail.com>.
On the mini cluster as a maven artifact:
The Beam KuduIO is in progress here [1], with Integration test (currently I
use Docker) and I just refactored the code so I could mock a KuduService
for unit tests.
It is an ideal time to try your current work Mike/Grant, as I'd use a
minicluster instead of mocking. Perhaps we should continue this on the dev@
list discussion I started a few weeks back [2]?
Do you have a build for OS X by any chance?

> I'm not really sure there is a lot of overlap between creating a Docker
> image and the kind of relocatable artifacts I'm trying to build, aside
from
> the actual compiling part.

I've had to manipulate hosts files with fake entries to get the Docker
images to work while the minicluster seemingly has a FakeDNS thing (?).
That might just be me doing things wrong though.




[1]
https://github.com/timrobertson100/beam/tree/BEAM-2661-KuduIO/sdks/java/io/kudu

[2]
https://lists.apache.org/thread.html/ee697a022b72bbca2761b1af0581773d8fb708f701fc969bc259fc2d@%3Cdev.kudu.apache.org%3E

On Wed, Jul 18, 2018 at 4:37 AM, Mike Percy <mp...@apache.org> wrote:

> On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
> wrote:
>
> > I have started a document for blog post ideas/topics here:
> >
> > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> byVt1D3NaTl7lI/edit?usp=sharing
> >
>
> Nice list, Grant. Actually I think that quarterly email would probably make
> for a better blog post instead and I've added it as a suggestion on that
> doc.
>
> On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <mauricio@impact.com
> >
> wrote:
>
> > I was disappointed that Strata SJ 2018 didn't have a single session on
> > Kudu, there were no committers in attendance that I could tell, and it
> > wasn't being highlighted at all in the Cloudera booth.  Between Strata
> and
> > ScalaDays I must have enthusiastically mentioned the product to 15 people
> > and none had heard of it.
> >
>
> Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
> everybody else was going to submit... actually I had intended to submit a
> talk proposal to Strata this year but got busy and missed the deadline. :(
>
> I wonder if folks using Kudu would like to present on their use case? I'm
> sure conference-goers would like to hear from more people using Kudu "in
> anger" (hopefully not angrily).
>
> On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sailesh@cloudera.com.invalid
> >
> wrote:
>
> > A suggestion to add on to the easily downloadable pre-built packages, is
> to
> > have easily accessible/downloadable example test-data that's fairly
> > representative of real world datasets (but it doesn't have to be too
> > large). Additionally, we can write tutorials in kudu/examples/ that use
> > this test data, to give new users a better feel for the system.
>
>
> That sounds useful. Any ideas on where we could find such a data set?
>
> On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > ++1 on the mini cluster
> > Perhaps include a docker image build at the same time which presumably
> > wouldn't be much effort?
> >
>
> I'm not really sure there is a lot of overlap between creating a Docker
> image and the kind of relocatable artifacts I'm trying to build, aside from
> the actual compiling part. But I think it would be valuable for Docker
> users to be able to easily pull down a Kudu image.
>
>
> > l'll be happy to contribute on the Java / maven related parts to that. I
> > will use this for the testing framework for the Apache Beam KuduIO and
> will
> > certainly help test / write a blog.
> >
>
> I don't really know how to handle the Maven part where we unpack the
> tarball and set it up somewhere so we can invoke it from the
> KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?
>
> I'd love to see a blog post about how to use Kudu with Beam!
>
> Mike
>

Re:Re: Growing the Kudu community

Posted by Quanlong Huang <hu...@126.com>.
+1 for archiving kudu slack topics and quarterly updates on ongoing projects or roadmap.



We're always curious about how is Kudu now and what have the contributors bean doing lately.


Thanks,
Quanlong
--
Quanlong Huang
Software Developer, Hulu

At 2018-07-25 07:20:41, "Mike Percy" <mp...@apache.org> wrote:
>Hey Andrew, hmm that would definitely lower the first edit-compile-test
>cycle of a new contributor. The Docker image could even come with ccache
>and ninja already set up. We might have to figure out some file ownership
>details for individual user accounts but it should be doable.
>
>This is pretty orthogonal, but another use of Docker I've heard people
>suggest a use for is automatic rebuilding of docs so the docs push could be
>a quick one-liner approval on Gerrit.
>
>Mike
>
>On Mon, Jul 23, 2018 at 9:20 PM Andrew Wong <aw...@cloudera.com.invalid>
>wrote:
>
>> >
>> > I wanted to contribute a while back, but there was hardly any support
>> > on new contributors and especially the ones who wanted to write larger
>> > features and bugs.
>> >
>>
>>
>> > Upstream pre-built packages for production use (Start rhel7, maybe
>> ubuntu)
>> >
>>
>> A similar, developer-focused idea I've been toying with is hosting a Docker
>> image that contains a pre-built thirdparty directory. Depending on the
>> machine, running thirdparty/build-if-necessary.sh could take on the order
>> of hours, after which the build itself might slap on some more.
>>
>> Republishing a thirdparty Docker image whenever we update thirdparty could
>> help lower the barrier to entry for a developer wanting to write and test
>> more meaningful patches. The idea then would that we could expose some
>> build-with-docker.sh, or have users develop, build, and test, all in the
>> Docker image itself. Would be interested in hearing if there's interest in
>> that.
>>
>> On Mon, Jul 23, 2018 at 4:19 PM Mike Percy <mp...@apache.org> wrote:
>>
>> > Thank you for the feedback!
>> >
>> > I think that pointing to the newbie label <https://s.apache.org/OcI6> on
>> > JIRA (not always tagged reliably) and the design docs
>> > <https://github.com/apache/kudu/tree/master/docs/design-docs> section of
>> > our Git repo would be a good start for improving CONTRIBUTING.adoc or
>> > http://kudu.apache.org/docs/contributing.html (rather than talking about
>> > style guides and stuff).
>> >
>> > Mike
>> >
>> > On Mon, Jul 23, 2018 at 2:36 PM Atri Sharma <at...@linux.com> wrote:
>> >
>> > > A potential ideas list, with internals architecture would be great.
>> > > On Mon, Jul 23, 2018 at 11:40 PM Mike Percy <mp...@apache.org> wrote:
>> > > >
>> > > > Hi Atri, do you have suggestions on what we could do better next
>> time?
>> > > >
>> > > > Mike
>> > > >
>> > > > On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:
>> > > >
>> > > > > I wanted to contribute a while back, but there was hardly any
>> support
>> > > > > on new contributors and especially the ones who wanted to write
>> > larger
>> > > > > features and bugs.
>> > > > >
>> > > > > Just my 2c
>> > > > > On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
>> > > > > <sa...@cloudera.com.invalid> wrote:
>> > > > > >
>> > > > > > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org>
>> > > wrote:
>> > > > > >
>> > > > > > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
>> > > > > <gh...@cloudera.com.invalid>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I have started a document for blog post ideas/topics here:
>> > > > > > > >
>> > > > > > > >
>> > > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
>> > > > > > > byVt1D3NaTl7lI/edit?usp=sharing
>> > > > > > > >
>> > > > > > >
>> > > > > > > Nice list, Grant. Actually I think that quarterly email would
>> > > probably
>> > > > > make
>> > > > > > > for a better blog post instead and I've added it as a
>> suggestion
>> > on
>> > > > > that
>> > > > > > > doc.
>> > > > > > >
>> > > > > > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
>> > > > > mauricio@impact.com
>> > > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I was disappointed that Strata SJ 2018 didn't have a single
>> > > session
>> > > > > on
>> > > > > > > > Kudu, there were no committers in attendance that I could
>> tell,
>> > > and
>> > > > > it
>> > > > > > > > wasn't being highlighted at all in the Cloudera booth.
>> Between
>> > > > > Strata
>> > > > > > > and
>> > > > > > > > ScalaDays I must have enthusiastically mentioned the product
>> to
>> > > 15
>> > > > > people
>> > > > > > > > and none had heard of it.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Hmm, that is disappointing, and a bit surprising. Perhaps
>> > everybody
>> > > > > thought
>> > > > > > > everybody else was going to submit... actually I had intended
>> to
>> > > > > submit a
>> > > > > > > talk proposal to Strata this year but got busy and missed the
>> > > > > deadline. :(
>> > > > > > >
>> > > > > > > I wonder if folks using Kudu would like to present on their use
>> > > case?
>> > > > > I'm
>> > > > > > > sure conference-goers would like to hear from more people using
>> > > Kudu
>> > > > > "in
>> > > > > > > anger" (hopefully not angrily).
>> > > > > > >
>> > > > > > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
>> > > > > <sailesh@cloudera.com.invalid
>> > > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > A suggestion to add on to the easily downloadable pre-built
>> > > > > packages, is
>> > > > > > > to
>> > > > > > > > have easily accessible/downloadable example test-data that's
>> > > fairly
>> > > > > > > > representative of real world datasets (but it doesn't have to
>> > be
>> > > too
>> > > > > > > > large). Additionally, we can write tutorials in
>> kudu/examples/
>> > > that
>> > > > > use
>> > > > > > > > this test data, to give new users a better feel for the
>> system.
>> > > > > > >
>> > > > > > >
>> > > > > > > That sounds useful. Any ideas on where we could find such a
>> data
>> > > set?
>> > > > > > >
>> > > > > >
>> > > > > > Starting with a small scale factor of TPC-H and TPC-DS might not
>> be
>> > > a bad
>> > > > > > idea.
>> > > > > >
>> > > > > >
>> > > > > > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
>> > > > > timrobertson100@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > ++1 on the mini cluster
>> > > > > > > > Perhaps include a docker image build at the same time which
>> > > > > presumably
>> > > > > > > > wouldn't be much effort?
>> > > > > > > >
>> > > > > > >
>> > > > > > > I'm not really sure there is a lot of overlap between creating
>> a
>> > > Docker
>> > > > > > > image and the kind of relocatable artifacts I'm trying to
>> build,
>> > > aside
>> > > > > from
>> > > > > > > the actual compiling part. But I think it would be valuable for
>> > > Docker
>> > > > > > > users to be able to easily pull down a Kudu image.
>> > > > > > >
>> > > > > > >
>> > > > > > > > l'll be happy to contribute on the Java / maven related parts
>> > to
>> > > > > that. I
>> > > > > > > > will use this for the testing framework for the Apache Beam
>> > > KuduIO
>> > > > > and
>> > > > > > > will
>> > > > > > > > certainly help test / write a blog.
>> > > > > > > >
>> > > > > > >
>> > > > > > > I don't really know how to handle the Maven part where we
>> unpack
>> > > the
>> > > > > > > tarball and set it up somewhere so we can invoke it from the
>> > > > > > > KuduMiniCluster. Maybe it that would require writing a custom
>> > Maven
>> > > > > plugin?
>> > > > > > >
>> > > > > > > I'd love to see a blog post about how to use Kudu with Beam!
>> > > > > > >
>> > > > > > > Mike
>> > > > > > >
>> > > > >
>> > >
>> >
>>
>>
>> --
>> Andrew Wong
>>

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
Hey Andrew, hmm that would definitely lower the first edit-compile-test
cycle of a new contributor. The Docker image could even come with ccache
and ninja already set up. We might have to figure out some file ownership
details for individual user accounts but it should be doable.

This is pretty orthogonal, but another use of Docker I've heard people
suggest a use for is automatic rebuilding of docs so the docs push could be
a quick one-liner approval on Gerrit.

Mike

On Mon, Jul 23, 2018 at 9:20 PM Andrew Wong <aw...@cloudera.com.invalid>
wrote:

> >
> > I wanted to contribute a while back, but there was hardly any support
> > on new contributors and especially the ones who wanted to write larger
> > features and bugs.
> >
>
>
> > Upstream pre-built packages for production use (Start rhel7, maybe
> ubuntu)
> >
>
> A similar, developer-focused idea I've been toying with is hosting a Docker
> image that contains a pre-built thirdparty directory. Depending on the
> machine, running thirdparty/build-if-necessary.sh could take on the order
> of hours, after which the build itself might slap on some more.
>
> Republishing a thirdparty Docker image whenever we update thirdparty could
> help lower the barrier to entry for a developer wanting to write and test
> more meaningful patches. The idea then would that we could expose some
> build-with-docker.sh, or have users develop, build, and test, all in the
> Docker image itself. Would be interested in hearing if there's interest in
> that.
>
> On Mon, Jul 23, 2018 at 4:19 PM Mike Percy <mp...@apache.org> wrote:
>
> > Thank you for the feedback!
> >
> > I think that pointing to the newbie label <https://s.apache.org/OcI6> on
> > JIRA (not always tagged reliably) and the design docs
> > <https://github.com/apache/kudu/tree/master/docs/design-docs> section of
> > our Git repo would be a good start for improving CONTRIBUTING.adoc or
> > http://kudu.apache.org/docs/contributing.html (rather than talking about
> > style guides and stuff).
> >
> > Mike
> >
> > On Mon, Jul 23, 2018 at 2:36 PM Atri Sharma <at...@linux.com> wrote:
> >
> > > A potential ideas list, with internals architecture would be great.
> > > On Mon, Jul 23, 2018 at 11:40 PM Mike Percy <mp...@apache.org> wrote:
> > > >
> > > > Hi Atri, do you have suggestions on what we could do better next
> time?
> > > >
> > > > Mike
> > > >
> > > > On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:
> > > >
> > > > > I wanted to contribute a while back, but there was hardly any
> support
> > > > > on new contributors and especially the ones who wanted to write
> > larger
> > > > > features and bugs.
> > > > >
> > > > > Just my 2c
> > > > > On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
> > > > > <sa...@cloudera.com.invalid> wrote:
> > > > > >
> > > > > > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
> > > > > <gh...@cloudera.com.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I have started a document for blog post ideas/topics here:
> > > > > > > >
> > > > > > > >
> > > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > > > > > > byVt1D3NaTl7lI/edit?usp=sharing
> > > > > > > >
> > > > > > >
> > > > > > > Nice list, Grant. Actually I think that quarterly email would
> > > probably
> > > > > make
> > > > > > > for a better blog post instead and I've added it as a
> suggestion
> > on
> > > > > that
> > > > > > > doc.
> > > > > > >
> > > > > > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
> > > > > mauricio@impact.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I was disappointed that Strata SJ 2018 didn't have a single
> > > session
> > > > > on
> > > > > > > > Kudu, there were no committers in attendance that I could
> tell,
> > > and
> > > > > it
> > > > > > > > wasn't being highlighted at all in the Cloudera booth.
> Between
> > > > > Strata
> > > > > > > and
> > > > > > > > ScalaDays I must have enthusiastically mentioned the product
> to
> > > 15
> > > > > people
> > > > > > > > and none had heard of it.
> > > > > > > >
> > > > > > >
> > > > > > > Hmm, that is disappointing, and a bit surprising. Perhaps
> > everybody
> > > > > thought
> > > > > > > everybody else was going to submit... actually I had intended
> to
> > > > > submit a
> > > > > > > talk proposal to Strata this year but got busy and missed the
> > > > > deadline. :(
> > > > > > >
> > > > > > > I wonder if folks using Kudu would like to present on their use
> > > case?
> > > > > I'm
> > > > > > > sure conference-goers would like to hear from more people using
> > > Kudu
> > > > > "in
> > > > > > > anger" (hopefully not angrily).
> > > > > > >
> > > > > > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
> > > > > <sailesh@cloudera.com.invalid
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > A suggestion to add on to the easily downloadable pre-built
> > > > > packages, is
> > > > > > > to
> > > > > > > > have easily accessible/downloadable example test-data that's
> > > fairly
> > > > > > > > representative of real world datasets (but it doesn't have to
> > be
> > > too
> > > > > > > > large). Additionally, we can write tutorials in
> kudu/examples/
> > > that
> > > > > use
> > > > > > > > this test data, to give new users a better feel for the
> system.
> > > > > > >
> > > > > > >
> > > > > > > That sounds useful. Any ideas on where we could find such a
> data
> > > set?
> > > > > > >
> > > > > >
> > > > > > Starting with a small scale factor of TPC-H and TPC-DS might not
> be
> > > a bad
> > > > > > idea.
> > > > > >
> > > > > >
> > > > > > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
> > > > > timrobertson100@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > ++1 on the mini cluster
> > > > > > > > Perhaps include a docker image build at the same time which
> > > > > presumably
> > > > > > > > wouldn't be much effort?
> > > > > > > >
> > > > > > >
> > > > > > > I'm not really sure there is a lot of overlap between creating
> a
> > > Docker
> > > > > > > image and the kind of relocatable artifacts I'm trying to
> build,
> > > aside
> > > > > from
> > > > > > > the actual compiling part. But I think it would be valuable for
> > > Docker
> > > > > > > users to be able to easily pull down a Kudu image.
> > > > > > >
> > > > > > >
> > > > > > > > l'll be happy to contribute on the Java / maven related parts
> > to
> > > > > that. I
> > > > > > > > will use this for the testing framework for the Apache Beam
> > > KuduIO
> > > > > and
> > > > > > > will
> > > > > > > > certainly help test / write a blog.
> > > > > > > >
> > > > > > >
> > > > > > > I don't really know how to handle the Maven part where we
> unpack
> > > the
> > > > > > > tarball and set it up somewhere so we can invoke it from the
> > > > > > > KuduMiniCluster. Maybe it that would require writing a custom
> > Maven
> > > > > plugin?
> > > > > > >
> > > > > > > I'd love to see a blog post about how to use Kudu with Beam!
> > > > > > >
> > > > > > > Mike
> > > > > > >
> > > > >
> > >
> >
>
>
> --
> Andrew Wong
>

Re: Growing the Kudu community

Posted by Andrew Wong <aw...@cloudera.com.INVALID>.
>
> I wanted to contribute a while back, but there was hardly any support
> on new contributors and especially the ones who wanted to write larger
> features and bugs.
>


> Upstream pre-built packages for production use (Start rhel7, maybe ubuntu)
>

A similar, developer-focused idea I've been toying with is hosting a Docker
image that contains a pre-built thirdparty directory. Depending on the
machine, running thirdparty/build-if-necessary.sh could take on the order
of hours, after which the build itself might slap on some more.

Republishing a thirdparty Docker image whenever we update thirdparty could
help lower the barrier to entry for a developer wanting to write and test
more meaningful patches. The idea then would that we could expose some
build-with-docker.sh, or have users develop, build, and test, all in the
Docker image itself. Would be interested in hearing if there's interest in
that.

On Mon, Jul 23, 2018 at 4:19 PM Mike Percy <mp...@apache.org> wrote:

> Thank you for the feedback!
>
> I think that pointing to the newbie label <https://s.apache.org/OcI6> on
> JIRA (not always tagged reliably) and the design docs
> <https://github.com/apache/kudu/tree/master/docs/design-docs> section of
> our Git repo would be a good start for improving CONTRIBUTING.adoc or
> http://kudu.apache.org/docs/contributing.html (rather than talking about
> style guides and stuff).
>
> Mike
>
> On Mon, Jul 23, 2018 at 2:36 PM Atri Sharma <at...@linux.com> wrote:
>
> > A potential ideas list, with internals architecture would be great.
> > On Mon, Jul 23, 2018 at 11:40 PM Mike Percy <mp...@apache.org> wrote:
> > >
> > > Hi Atri, do you have suggestions on what we could do better next time?
> > >
> > > Mike
> > >
> > > On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:
> > >
> > > > I wanted to contribute a while back, but there was hardly any support
> > > > on new contributors and especially the ones who wanted to write
> larger
> > > > features and bugs.
> > > >
> > > > Just my 2c
> > > > On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
> > > > <sa...@cloudera.com.invalid> wrote:
> > > > >
> > > > > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org>
> > wrote:
> > > > >
> > > > > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
> > > > <gh...@cloudera.com.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > I have started a document for blog post ideas/topics here:
> > > > > > >
> > > > > > >
> > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > > > > > byVt1D3NaTl7lI/edit?usp=sharing
> > > > > > >
> > > > > >
> > > > > > Nice list, Grant. Actually I think that quarterly email would
> > probably
> > > > make
> > > > > > for a better blog post instead and I've added it as a suggestion
> on
> > > > that
> > > > > > doc.
> > > > > >
> > > > > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
> > > > mauricio@impact.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I was disappointed that Strata SJ 2018 didn't have a single
> > session
> > > > on
> > > > > > > Kudu, there were no committers in attendance that I could tell,
> > and
> > > > it
> > > > > > > wasn't being highlighted at all in the Cloudera booth.  Between
> > > > Strata
> > > > > > and
> > > > > > > ScalaDays I must have enthusiastically mentioned the product to
> > 15
> > > > people
> > > > > > > and none had heard of it.
> > > > > > >
> > > > > >
> > > > > > Hmm, that is disappointing, and a bit surprising. Perhaps
> everybody
> > > > thought
> > > > > > everybody else was going to submit... actually I had intended to
> > > > submit a
> > > > > > talk proposal to Strata this year but got busy and missed the
> > > > deadline. :(
> > > > > >
> > > > > > I wonder if folks using Kudu would like to present on their use
> > case?
> > > > I'm
> > > > > > sure conference-goers would like to hear from more people using
> > Kudu
> > > > "in
> > > > > > anger" (hopefully not angrily).
> > > > > >
> > > > > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
> > > > <sailesh@cloudera.com.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > A suggestion to add on to the easily downloadable pre-built
> > > > packages, is
> > > > > > to
> > > > > > > have easily accessible/downloadable example test-data that's
> > fairly
> > > > > > > representative of real world datasets (but it doesn't have to
> be
> > too
> > > > > > > large). Additionally, we can write tutorials in kudu/examples/
> > that
> > > > use
> > > > > > > this test data, to give new users a better feel for the system.
> > > > > >
> > > > > >
> > > > > > That sounds useful. Any ideas on where we could find such a data
> > set?
> > > > > >
> > > > >
> > > > > Starting with a small scale factor of TPC-H and TPC-DS might not be
> > a bad
> > > > > idea.
> > > > >
> > > > >
> > > > > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
> > > > timrobertson100@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > ++1 on the mini cluster
> > > > > > > Perhaps include a docker image build at the same time which
> > > > presumably
> > > > > > > wouldn't be much effort?
> > > > > > >
> > > > > >
> > > > > > I'm not really sure there is a lot of overlap between creating a
> > Docker
> > > > > > image and the kind of relocatable artifacts I'm trying to build,
> > aside
> > > > from
> > > > > > the actual compiling part. But I think it would be valuable for
> > Docker
> > > > > > users to be able to easily pull down a Kudu image.
> > > > > >
> > > > > >
> > > > > > > l'll be happy to contribute on the Java / maven related parts
> to
> > > > that. I
> > > > > > > will use this for the testing framework for the Apache Beam
> > KuduIO
> > > > and
> > > > > > will
> > > > > > > certainly help test / write a blog.
> > > > > > >
> > > > > >
> > > > > > I don't really know how to handle the Maven part where we unpack
> > the
> > > > > > tarball and set it up somewhere so we can invoke it from the
> > > > > > KuduMiniCluster. Maybe it that would require writing a custom
> Maven
> > > > plugin?
> > > > > >
> > > > > > I'd love to see a blog post about how to use Kudu with Beam!
> > > > > >
> > > > > > Mike
> > > > > >
> > > >
> >
>


-- 
Andrew Wong

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
Thank you for the feedback!

I think that pointing to the newbie label <https://s.apache.org/OcI6> on
JIRA (not always tagged reliably) and the design docs
<https://github.com/apache/kudu/tree/master/docs/design-docs> section of
our Git repo would be a good start for improving CONTRIBUTING.adoc or
http://kudu.apache.org/docs/contributing.html (rather than talking about
style guides and stuff).

Mike

On Mon, Jul 23, 2018 at 2:36 PM Atri Sharma <at...@linux.com> wrote:

> A potential ideas list, with internals architecture would be great.
> On Mon, Jul 23, 2018 at 11:40 PM Mike Percy <mp...@apache.org> wrote:
> >
> > Hi Atri, do you have suggestions on what we could do better next time?
> >
> > Mike
> >
> > On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:
> >
> > > I wanted to contribute a while back, but there was hardly any support
> > > on new contributors and especially the ones who wanted to write larger
> > > features and bugs.
> > >
> > > Just my 2c
> > > On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
> > > <sa...@cloudera.com.invalid> wrote:
> > > >
> > > > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org>
> wrote:
> > > >
> > > > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
> > > <gh...@cloudera.com.invalid>
> > > > > wrote:
> > > > >
> > > > > > I have started a document for blog post ideas/topics here:
> > > > > >
> > > > > >
> https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > > > > byVt1D3NaTl7lI/edit?usp=sharing
> > > > > >
> > > > >
> > > > > Nice list, Grant. Actually I think that quarterly email would
> probably
> > > make
> > > > > for a better blog post instead and I've added it as a suggestion on
> > > that
> > > > > doc.
> > > > >
> > > > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
> > > mauricio@impact.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I was disappointed that Strata SJ 2018 didn't have a single
> session
> > > on
> > > > > > Kudu, there were no committers in attendance that I could tell,
> and
> > > it
> > > > > > wasn't being highlighted at all in the Cloudera booth.  Between
> > > Strata
> > > > > and
> > > > > > ScalaDays I must have enthusiastically mentioned the product to
> 15
> > > people
> > > > > > and none had heard of it.
> > > > > >
> > > > >
> > > > > Hmm, that is disappointing, and a bit surprising. Perhaps everybody
> > > thought
> > > > > everybody else was going to submit... actually I had intended to
> > > submit a
> > > > > talk proposal to Strata this year but got busy and missed the
> > > deadline. :(
> > > > >
> > > > > I wonder if folks using Kudu would like to present on their use
> case?
> > > I'm
> > > > > sure conference-goers would like to hear from more people using
> Kudu
> > > "in
> > > > > anger" (hopefully not angrily).
> > > > >
> > > > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
> > > <sailesh@cloudera.com.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > A suggestion to add on to the easily downloadable pre-built
> > > packages, is
> > > > > to
> > > > > > have easily accessible/downloadable example test-data that's
> fairly
> > > > > > representative of real world datasets (but it doesn't have to be
> too
> > > > > > large). Additionally, we can write tutorials in kudu/examples/
> that
> > > use
> > > > > > this test data, to give new users a better feel for the system.
> > > > >
> > > > >
> > > > > That sounds useful. Any ideas on where we could find such a data
> set?
> > > > >
> > > >
> > > > Starting with a small scale factor of TPC-H and TPC-DS might not be
> a bad
> > > > idea.
> > > >
> > > >
> > > > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
> > > timrobertson100@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > ++1 on the mini cluster
> > > > > > Perhaps include a docker image build at the same time which
> > > presumably
> > > > > > wouldn't be much effort?
> > > > > >
> > > > >
> > > > > I'm not really sure there is a lot of overlap between creating a
> Docker
> > > > > image and the kind of relocatable artifacts I'm trying to build,
> aside
> > > from
> > > > > the actual compiling part. But I think it would be valuable for
> Docker
> > > > > users to be able to easily pull down a Kudu image.
> > > > >
> > > > >
> > > > > > l'll be happy to contribute on the Java / maven related parts to
> > > that. I
> > > > > > will use this for the testing framework for the Apache Beam
> KuduIO
> > > and
> > > > > will
> > > > > > certainly help test / write a blog.
> > > > > >
> > > > >
> > > > > I don't really know how to handle the Maven part where we unpack
> the
> > > > > tarball and set it up somewhere so we can invoke it from the
> > > > > KuduMiniCluster. Maybe it that would require writing a custom Maven
> > > plugin?
> > > > >
> > > > > I'd love to see a blog post about how to use Kudu with Beam!
> > > > >
> > > > > Mike
> > > > >
> > >
>

Re: Growing the Kudu community

Posted by Atri Sharma <at...@linux.com>.
A potential ideas list, with internals architecture would be great.
On Mon, Jul 23, 2018 at 11:40 PM Mike Percy <mp...@apache.org> wrote:
>
> Hi Atri, do you have suggestions on what we could do better next time?
>
> Mike
>
> On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:
>
> > I wanted to contribute a while back, but there was hardly any support
> > on new contributors and especially the ones who wanted to write larger
> > features and bugs.
> >
> > Just my 2c
> > On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
> > <sa...@cloudera.com.invalid> wrote:
> > >
> > > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
> > >
> > > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
> > <gh...@cloudera.com.invalid>
> > > > wrote:
> > > >
> > > > > I have started a document for blog post ideas/topics here:
> > > > >
> > > > > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > > > byVt1D3NaTl7lI/edit?usp=sharing
> > > > >
> > > >
> > > > Nice list, Grant. Actually I think that quarterly email would probably
> > make
> > > > for a better blog post instead and I've added it as a suggestion on
> > that
> > > > doc.
> > > >
> > > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
> > mauricio@impact.com
> > > > >
> > > > wrote:
> > > >
> > > > > I was disappointed that Strata SJ 2018 didn't have a single session
> > on
> > > > > Kudu, there were no committers in attendance that I could tell, and
> > it
> > > > > wasn't being highlighted at all in the Cloudera booth.  Between
> > Strata
> > > > and
> > > > > ScalaDays I must have enthusiastically mentioned the product to 15
> > people
> > > > > and none had heard of it.
> > > > >
> > > >
> > > > Hmm, that is disappointing, and a bit surprising. Perhaps everybody
> > thought
> > > > everybody else was going to submit... actually I had intended to
> > submit a
> > > > talk proposal to Strata this year but got busy and missed the
> > deadline. :(
> > > >
> > > > I wonder if folks using Kudu would like to present on their use case?
> > I'm
> > > > sure conference-goers would like to hear from more people using Kudu
> > "in
> > > > anger" (hopefully not angrily).
> > > >
> > > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
> > <sailesh@cloudera.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > A suggestion to add on to the easily downloadable pre-built
> > packages, is
> > > > to
> > > > > have easily accessible/downloadable example test-data that's fairly
> > > > > representative of real world datasets (but it doesn't have to be too
> > > > > large). Additionally, we can write tutorials in kudu/examples/ that
> > use
> > > > > this test data, to give new users a better feel for the system.
> > > >
> > > >
> > > > That sounds useful. Any ideas on where we could find such a data set?
> > > >
> > >
> > > Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
> > > idea.
> > >
> > >
> > > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
> > timrobertson100@gmail.com>
> > > > wrote:
> > > >
> > > > > ++1 on the mini cluster
> > > > > Perhaps include a docker image build at the same time which
> > presumably
> > > > > wouldn't be much effort?
> > > > >
> > > >
> > > > I'm not really sure there is a lot of overlap between creating a Docker
> > > > image and the kind of relocatable artifacts I'm trying to build, aside
> > from
> > > > the actual compiling part. But I think it would be valuable for Docker
> > > > users to be able to easily pull down a Kudu image.
> > > >
> > > >
> > > > > l'll be happy to contribute on the Java / maven related parts to
> > that. I
> > > > > will use this for the testing framework for the Apache Beam KuduIO
> > and
> > > > will
> > > > > certainly help test / write a blog.
> > > > >
> > > >
> > > > I don't really know how to handle the Maven part where we unpack the
> > > > tarball and set it up somewhere so we can invoke it from the
> > > > KuduMiniCluster. Maybe it that would require writing a custom Maven
> > plugin?
> > > >
> > > > I'd love to see a blog post about how to use Kudu with Beam!
> > > >
> > > > Mike
> > > >
> >

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
Hi Atri, do you have suggestions on what we could do better next time?

Mike

On Mon, Jul 23, 2018 at 10:56 AM Atri Sharma <at...@linux.com> wrote:

> I wanted to contribute a while back, but there was hardly any support
> on new contributors and especially the ones who wanted to write larger
> features and bugs.
>
> Just my 2c
> On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
> <sa...@cloudera.com.invalid> wrote:
> >
> > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
> >
> > > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke
> <gh...@cloudera.com.invalid>
> > > wrote:
> > >
> > > > I have started a document for blog post ideas/topics here:
> > > >
> > > > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > > byVt1D3NaTl7lI/edit?usp=sharing
> > > >
> > >
> > > Nice list, Grant. Actually I think that quarterly email would probably
> make
> > > for a better blog post instead and I've added it as a suggestion on
> that
> > > doc.
> > >
> > > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <
> mauricio@impact.com
> > > >
> > > wrote:
> > >
> > > > I was disappointed that Strata SJ 2018 didn't have a single session
> on
> > > > Kudu, there were no committers in attendance that I could tell, and
> it
> > > > wasn't being highlighted at all in the Cloudera booth.  Between
> Strata
> > > and
> > > > ScalaDays I must have enthusiastically mentioned the product to 15
> people
> > > > and none had heard of it.
> > > >
> > >
> > > Hmm, that is disappointing, and a bit surprising. Perhaps everybody
> thought
> > > everybody else was going to submit... actually I had intended to
> submit a
> > > talk proposal to Strata this year but got busy and missed the
> deadline. :(
> > >
> > > I wonder if folks using Kudu would like to present on their use case?
> I'm
> > > sure conference-goers would like to hear from more people using Kudu
> "in
> > > anger" (hopefully not angrily).
> > >
> > > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil
> <sailesh@cloudera.com.invalid
> > > >
> > > wrote:
> > >
> > > > A suggestion to add on to the easily downloadable pre-built
> packages, is
> > > to
> > > > have easily accessible/downloadable example test-data that's fairly
> > > > representative of real world datasets (but it doesn't have to be too
> > > > large). Additionally, we can write tutorials in kudu/examples/ that
> use
> > > > this test data, to give new users a better feel for the system.
> > >
> > >
> > > That sounds useful. Any ideas on where we could find such a data set?
> > >
> >
> > Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
> > idea.
> >
> >
> > > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <
> timrobertson100@gmail.com>
> > > wrote:
> > >
> > > > ++1 on the mini cluster
> > > > Perhaps include a docker image build at the same time which
> presumably
> > > > wouldn't be much effort?
> > > >
> > >
> > > I'm not really sure there is a lot of overlap between creating a Docker
> > > image and the kind of relocatable artifacts I'm trying to build, aside
> from
> > > the actual compiling part. But I think it would be valuable for Docker
> > > users to be able to easily pull down a Kudu image.
> > >
> > >
> > > > l'll be happy to contribute on the Java / maven related parts to
> that. I
> > > > will use this for the testing framework for the Apache Beam KuduIO
> and
> > > will
> > > > certainly help test / write a blog.
> > > >
> > >
> > > I don't really know how to handle the Maven part where we unpack the
> > > tarball and set it up somewhere so we can invoke it from the
> > > KuduMiniCluster. Maybe it that would require writing a custom Maven
> plugin?
> > >
> > > I'd love to see a blog post about how to use Kudu with Beam!
> > >
> > > Mike
> > >
>

Re: Growing the Kudu community

Posted by Atri Sharma <at...@linux.com>.
I wanted to contribute a while back, but there was hardly any support
on new contributors and especially the ones who wanted to write larger
features and bugs.

Just my 2c
On Mon, Jul 23, 2018 at 11:16 PM Sailesh Mukil
<sa...@cloudera.com.invalid> wrote:
>
> On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
>
> > On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
> > wrote:
> >
> > > I have started a document for blog post ideas/topics here:
> > >
> > > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> > byVt1D3NaTl7lI/edit?usp=sharing
> > >
> >
> > Nice list, Grant. Actually I think that quarterly email would probably make
> > for a better blog post instead and I've added it as a suggestion on that
> > doc.
> >
> > On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <mauricio@impact.com
> > >
> > wrote:
> >
> > > I was disappointed that Strata SJ 2018 didn't have a single session on
> > > Kudu, there were no committers in attendance that I could tell, and it
> > > wasn't being highlighted at all in the Cloudera booth.  Between Strata
> > and
> > > ScalaDays I must have enthusiastically mentioned the product to 15 people
> > > and none had heard of it.
> > >
> >
> > Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
> > everybody else was going to submit... actually I had intended to submit a
> > talk proposal to Strata this year but got busy and missed the deadline. :(
> >
> > I wonder if folks using Kudu would like to present on their use case? I'm
> > sure conference-goers would like to hear from more people using Kudu "in
> > anger" (hopefully not angrily).
> >
> > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sailesh@cloudera.com.invalid
> > >
> > wrote:
> >
> > > A suggestion to add on to the easily downloadable pre-built packages, is
> > to
> > > have easily accessible/downloadable example test-data that's fairly
> > > representative of real world datasets (but it doesn't have to be too
> > > large). Additionally, we can write tutorials in kudu/examples/ that use
> > > this test data, to give new users a better feel for the system.
> >
> >
> > That sounds useful. Any ideas on where we could find such a data set?
> >
>
> Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
> idea.
>
>
> > On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
> > wrote:
> >
> > > ++1 on the mini cluster
> > > Perhaps include a docker image build at the same time which presumably
> > > wouldn't be much effort?
> > >
> >
> > I'm not really sure there is a lot of overlap between creating a Docker
> > image and the kind of relocatable artifacts I'm trying to build, aside from
> > the actual compiling part. But I think it would be valuable for Docker
> > users to be able to easily pull down a Kudu image.
> >
> >
> > > l'll be happy to contribute on the Java / maven related parts to that. I
> > > will use this for the testing framework for the Apache Beam KuduIO and
> > will
> > > certainly help test / write a blog.
> > >
> >
> > I don't really know how to handle the Maven part where we unpack the
> > tarball and set it up somewhere so we can invoke it from the
> > KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?
> >
> > I'd love to see a blog post about how to use Kudu with Beam!
> >
> > Mike
> >

Re: Growing the Kudu community

Posted by Brock Noland <br...@phdata.io>.
On Mon, Jul 23, 2018 at 1:06 PM Mike Percy <mp...@apache.org> wrote:

> On Mon, Jul 23, 2018 at 10:46 AM Sailesh Mukil
> <sa...@cloudera.com.invalid>
> wrote:
>
> > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
>
> Once backup and restore has stabilized we could push some example data sets
> to S3 and allow people to restore locally from the bucket. That could make
> a nice basis for a quickstart tutorial.



This would be really useful.

>

Re: Growing the Kudu community

Posted by Brock Noland <br...@phdata.io>.
On Mon, Jul 23, 2018 at 1:06 PM Mike Percy <mp...@apache.org> wrote:

> On Mon, Jul 23, 2018 at 10:46 AM Sailesh Mukil
> <sa...@cloudera.com.invalid>
> wrote:
>
> > On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
>
> Once backup and restore has stabilized we could push some example data sets
> to S3 and allow people to restore locally from the bucket. That could make
> a nice basis for a quickstart tutorial.



This would be really useful.

>

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Mon, Jul 23, 2018 at 10:46 AM Sailesh Mukil <sa...@cloudera.com.invalid>
wrote:

> On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
> > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sa...@cloudera.com>
> wrote:
> >
> > > A suggestion to add on to the easily downloadable pre-built packages,
> is to
> > > have easily accessible/downloadable example test-data that's fairly
> > > representative of real world datasets (but it doesn't have to be too
> > > large). Additionally, we can write tutorials in kudu/examples/ that use
> > > this test data, to give new users a better feel for the system.
> >
> > That sounds useful. Any ideas on where we could find such a data set?
>
> Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
> idea.
>

Once backup and restore has stabilized we could push some example data sets
to S3 and allow people to restore locally from the bucket. That could make
a nice basis for a quickstart tutorial.

Mike

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Mon, Jul 23, 2018 at 10:46 AM Sailesh Mukil <sa...@cloudera.com.invalid>
wrote:

> On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:
> > On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sa...@cloudera.com>
> wrote:
> >
> > > A suggestion to add on to the easily downloadable pre-built packages,
> is to
> > > have easily accessible/downloadable example test-data that's fairly
> > > representative of real world datasets (but it doesn't have to be too
> > > large). Additionally, we can write tutorials in kudu/examples/ that use
> > > this test data, to give new users a better feel for the system.
> >
> > That sounds useful. Any ideas on where we could find such a data set?
>
> Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
> idea.
>

Once backup and restore has stabilized we could push some example data sets
to S3 and allow people to restore locally from the bucket. That could make
a nice basis for a quickstart tutorial.

Mike

Re: Growing the Kudu community

Posted by Sailesh Mukil <sa...@cloudera.com.INVALID>.
On Tue, Jul 17, 2018 at 7:37 PM, Mike Percy <mp...@apache.org> wrote:

> On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
> wrote:
>
> > I have started a document for blog post ideas/topics here:
> >
> > https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6U
> byVt1D3NaTl7lI/edit?usp=sharing
> >
>
> Nice list, Grant. Actually I think that quarterly email would probably make
> for a better blog post instead and I've added it as a suggestion on that
> doc.
>
> On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <mauricio@impact.com
> >
> wrote:
>
> > I was disappointed that Strata SJ 2018 didn't have a single session on
> > Kudu, there were no committers in attendance that I could tell, and it
> > wasn't being highlighted at all in the Cloudera booth.  Between Strata
> and
> > ScalaDays I must have enthusiastically mentioned the product to 15 people
> > and none had heard of it.
> >
>
> Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
> everybody else was going to submit... actually I had intended to submit a
> talk proposal to Strata this year but got busy and missed the deadline. :(
>
> I wonder if folks using Kudu would like to present on their use case? I'm
> sure conference-goers would like to hear from more people using Kudu "in
> anger" (hopefully not angrily).
>
> On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sailesh@cloudera.com.invalid
> >
> wrote:
>
> > A suggestion to add on to the easily downloadable pre-built packages, is
> to
> > have easily accessible/downloadable example test-data that's fairly
> > representative of real world datasets (but it doesn't have to be too
> > large). Additionally, we can write tutorials in kudu/examples/ that use
> > this test data, to give new users a better feel for the system.
>
>
> That sounds useful. Any ideas on where we could find such a data set?
>

Starting with a small scale factor of TPC-H and TPC-DS might not be a bad
idea.


> On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > ++1 on the mini cluster
> > Perhaps include a docker image build at the same time which presumably
> > wouldn't be much effort?
> >
>
> I'm not really sure there is a lot of overlap between creating a Docker
> image and the kind of relocatable artifacts I'm trying to build, aside from
> the actual compiling part. But I think it would be valuable for Docker
> users to be able to easily pull down a Kudu image.
>
>
> > l'll be happy to contribute on the Java / maven related parts to that. I
> > will use this for the testing framework for the Apache Beam KuduIO and
> will
> > certainly help test / write a blog.
> >
>
> I don't really know how to handle the Maven part where we unpack the
> tarball and set it up somewhere so we can invoke it from the
> KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?
>
> I'd love to see a blog post about how to use Kudu with Beam!
>
> Mike
>

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
wrote:

> I have started a document for blog post ideas/topics here:
>
> https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6UbyVt1D3NaTl7lI/edit?usp=sharing
>

Nice list, Grant. Actually I think that quarterly email would probably make
for a better blog post instead and I've added it as a suggestion on that
doc.

On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> I was disappointed that Strata SJ 2018 didn't have a single session on
> Kudu, there were no committers in attendance that I could tell, and it
> wasn't being highlighted at all in the Cloudera booth.  Between Strata and
> ScalaDays I must have enthusiastically mentioned the product to 15 people
> and none had heard of it.
>

Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
everybody else was going to submit... actually I had intended to submit a
talk proposal to Strata this year but got busy and missed the deadline. :(

I wonder if folks using Kudu would like to present on their use case? I'm
sure conference-goers would like to hear from more people using Kudu "in
anger" (hopefully not angrily).

On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sa...@cloudera.com.invalid>
wrote:

> A suggestion to add on to the easily downloadable pre-built packages, is to
> have easily accessible/downloadable example test-data that's fairly
> representative of real world datasets (but it doesn't have to be too
> large). Additionally, we can write tutorials in kudu/examples/ that use
> this test data, to give new users a better feel for the system.


That sounds useful. Any ideas on where we could find such a data set?

On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
wrote:

> ++1 on the mini cluster
> Perhaps include a docker image build at the same time which presumably
> wouldn't be much effort?
>

I'm not really sure there is a lot of overlap between creating a Docker
image and the kind of relocatable artifacts I'm trying to build, aside from
the actual compiling part. But I think it would be valuable for Docker
users to be able to easily pull down a Kudu image.


> l'll be happy to contribute on the Java / maven related parts to that. I
> will use this for the testing framework for the Apache Beam KuduIO and will
> certainly help test / write a blog.
>

I don't really know how to handle the Maven part where we unpack the
tarball and set it up somewhere so we can invoke it from the
KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?

I'd love to see a blog post about how to use Kudu with Beam!

Mike

Re: Growing the Kudu community

Posted by Mike Percy <mp...@apache.org>.
On Tue, Jul 17, 2018 at 12:22 PM Grant Henke <gh...@cloudera.com.invalid>
wrote:

> I have started a document for blog post ideas/topics here:
>
> https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6UbyVt1D3NaTl7lI/edit?usp=sharing
>

Nice list, Grant. Actually I think that quarterly email would probably make
for a better blog post instead and I've added it as a suggestion on that
doc.

On Tue, Jul 17, 2018 at 12:04 PM Mauricio Aristizabal <ma...@impact.com>
wrote:

> I was disappointed that Strata SJ 2018 didn't have a single session on
> Kudu, there were no committers in attendance that I could tell, and it
> wasn't being highlighted at all in the Cloudera booth.  Between Strata and
> ScalaDays I must have enthusiastically mentioned the product to 15 people
> and none had heard of it.
>

Hmm, that is disappointing, and a bit surprising. Perhaps everybody thought
everybody else was going to submit... actually I had intended to submit a
talk proposal to Strata this year but got busy and missed the deadline. :(

I wonder if folks using Kudu would like to present on their use case? I'm
sure conference-goers would like to hear from more people using Kudu "in
anger" (hopefully not angrily).

On Tue, Jul 17, 2018 at 2:59 PM Sailesh Mukil <sa...@cloudera.com.invalid>
wrote:

> A suggestion to add on to the easily downloadable pre-built packages, is to
> have easily accessible/downloadable example test-data that's fairly
> representative of real world datasets (but it doesn't have to be too
> large). Additionally, we can write tutorials in kudu/examples/ that use
> this test data, to give new users a better feel for the system.


That sounds useful. Any ideas on where we could find such a data set?

On Tue, Jul 17, 2018 at 11:59 AM Tim Robertson <ti...@gmail.com>
wrote:

> ++1 on the mini cluster
> Perhaps include a docker image build at the same time which presumably
> wouldn't be much effort?
>

I'm not really sure there is a lot of overlap between creating a Docker
image and the kind of relocatable artifacts I'm trying to build, aside from
the actual compiling part. But I think it would be valuable for Docker
users to be able to easily pull down a Kudu image.


> l'll be happy to contribute on the Java / maven related parts to that. I
> will use this for the testing framework for the Apache Beam KuduIO and will
> certainly help test / write a blog.
>

I don't really know how to handle the Maven part where we unpack the
tarball and set it up somewhere so we can invoke it from the
KuduMiniCluster. Maybe it that would require writing a custom Maven plugin?

I'd love to see a blog post about how to use Kudu with Beam!

Mike

Re: Growing the Kudu community

Posted by Grant Henke <gh...@cloudera.com.INVALID>.
I have started a document for blog post ideas/topics here:
https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6UbyVt1D3NaTl7lI/edit?usp=sharing

Feel free to add ideas to the document, comment/vote on existing ideas, and
take ideas to make a post.

We don't currently have a guide on contributing blogs, but it is fairly
straightforward. It follows the standard contribution process
<http://kudu.apache.org/docs/contributing.html#contributing-to-apache-kudu> but
instead or working on the master branch you work on the gh-pages
<https://github.com/apache/kudu/tree/gh-pages> branch. Here is a sample
commit for reference:
https://github.com/apache/kudu/commit/35fe952f2684fe47af14dc8906a2e1869bf6d584

Thank you,
Grant

On Tue, Jul 17, 2018 at 1:40 PM Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so
> far, in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that
>       allows them to test against a running Kudu cluster and ensure quality
>       without having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be
>       fixed if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number
>       of platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>
>

-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Growing the Kudu community

Posted by Tim Robertson <ti...@gmail.com>.
Thanks for this initiative Grant and Mike,

++1 on the mini cluster
Perhaps include a docker image build at the same time which presumably
wouldn't be much effort?
l'll be happy to contribute on the Java / maven related parts to that. I
will use this for the testing framework for the Apache Beam KuduIO and will
certainly help test / write a blog.

Tim





On Tue, Jul 17, 2018 at 8:39 PM, Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so far,
> in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that allows
>       them to test against a running Kudu cluster and ensure quality
> without
>       having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be fixed
>       if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number of
>       platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>

Re: Growing the Kudu community

Posted by Martin Weindel <ma...@gmail.com>.
- Regarding the Java MiniCluster support:
I think there are already a few solutions existing based on Docker, 
which are more flexible.
 From my side, there is a test integration using Docker Compose for the 
Presto-Kudu connector, see [1].
Currently it can spawn a cluster with one or three tservers.

- Regarding pre-build packages for production use:
For my needs, I have already set up a project to provide RPM packages 
for CentOS 7 (and rhel7), see [2].
Maybe it can be used as a starting point for a broader support. The tool 
chain is probably not sufficient to support other distros.
But I can image to help on such an effort.

I would also like to see some support for performing backups of Kudu tables.
In the project I'm currently involved, we are thinking about a 
command-line tool based on Kudu-Spark to export a table or parts of it 
to parquet files.

Best,
Martin Weindel

[1] 
https://github.com/MartinWeindel/presto/blob/fa59fadc2df7c70744ce3ab8d654b12fa185df39/presto-kudu/bin/run_kudu_tests.sh, 
https://github.com/MartinWeindel/presto/blob/fa59fadc2df7c70744ce3ab8d654b12fa185df39/presto-kudu/conf
[2] https://github.com/MartinWeindel/kudu-rpm

Re: Growing the Kudu community

Posted by Sailesh Mukil <sa...@cloudera.com>.
On Tue, Jul 17, 2018 at 11:39 AM, Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so far,
> in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that allows
>       them to test against a running Kudu cluster and ensure quality
> without
>       having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be fixed
>       if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number of
>       platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>

A suggestion to add on to the easily downloadable pre-built packages, is to
have easily accessible/downloadable example test-data that's fairly
representative of real world datasets (but it doesn't have to be too
large). Additionally, we can write tutorials in kudu/examples/ that use
this test data, to give new users a better feel for the system.


>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>

Re: Growing the Kudu community

Posted by Grant Henke <gh...@cloudera.com>.
I have started a document for blog post ideas/topics here:
https://docs.google.com/document/d/12QFRIhNDMoOI1kOQBgch64xYJ9t6UbyVt1D3NaTl7lI/edit?usp=sharing

Feel free to add ideas to the document, comment/vote on existing ideas, and
take ideas to make a post.

We don't currently have a guide on contributing blogs, but it is fairly
straightforward. It follows the standard contribution process
<http://kudu.apache.org/docs/contributing.html#contributing-to-apache-kudu> but
instead or working on the master branch you work on the gh-pages
<https://github.com/apache/kudu/tree/gh-pages> branch. Here is a sample
commit for reference:
https://github.com/apache/kudu/commit/35fe952f2684fe47af14dc8906a2e1869bf6d584

Thank you,
Grant

On Tue, Jul 17, 2018 at 1:40 PM Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so
> far, in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that
>       allows them to test against a running Kudu cluster and ensure quality
>       without having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be
>       fixed if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number
>       of platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>
>

-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Growing the Kudu community

Posted by Mauricio Aristizabal <ma...@impact.com>.
My new-user thoughts: MiniCluster is nice but right now we get by launching
a docker instance in tests, it's pretty fast.  What's really hurting
adoption at our org is lack of a proper backup/snapshot/replication
feature.  As for marketing, i think conferences are crucial, so I was
disappointed that Strata SJ 2018 didn't have a single session on Kudu,
there were no committers in attendance that I could tell, and it wasn't
being highlighted at all in the Cloudera booth.  Between Strata and
ScalaDays I must have enthusiastically mentioned the product to 15 people
and none had heard of it. -m

On Tue, Jul 17, 2018 at 11:40 AM Mike Percy <mp...@apache.org> wrote:

> Hi Apache Kudu community,
>
> Apologies for cross-posting, we just wanted to reach a broad audience for
> this topic.
>
> Grant and I have been brainstorming about what we can do to grow the
> community of Kudu developers and users. We think Kudu has a lot going for
> it, but not everybody knows what it is and what it’s capable of. Focusing
> and combining our collective efforts to increase awareness (marketing) and
> to reduce barriers to contribution and adoption could be a good way to
> achieve organic growth.
>
> We’d like to hear your ideas about what barriers and pain points exist and
> any ideas you may have to fix some of those things -- especially ideas
> requiring minimal effort and maximum impact.
>
> To kick this off, here are some ideas Grant and I have come up with so
> far, in sort of a rough priority order:
>
> Ideas for general improvements
>
>    1. Java MiniCluster support out of the box (KUDU-2411)
>    1. This will enable integration with other projects in a way that
>       allows them to test against a running Kudu cluster and ensure quality
>       without having to build it themselves.
>       2. Create a dedicated Maven-consumable java module for a Kudu
>       MiniCluster
>       3. Pre-built binary artifacts (for testing use only) downloadable
>       with MiniCluster (Linux / MacOS)
>       4. Ship all dependencies (even security deps, which will not be
>       fixed if CVEs found)
>       5. Make the binaries Linux distro-independent by building on an old
>       distro (EL6)
>    2. Upgrade Gerrit to fix the “New UI” GitHub Login Bug (KUDU-2402)
>       1. Remove barrier to submitting a patch
>       2. Latest version of Gerrit has a fix for the bad GitHub login
>       redirect
>    3. Upstream pre-built packages for production use (Start rhel7, maybe
>    ubuntu)
>    1. This is potentially a pretty large effort, depending in the number
>       of platforms we want to support
>       2. Tarballs -- per-OS / per-distro
>       3. Yum install, apt get: per-OS / per-distro
>       4. Homebrew?
>    4. CLI based tools with zero dependencies for quick experiments/demos
>    1. Create, describe, alter tables
>       2. Cat data out, pipe data in.
>       3. Or simple Python examples to do similar
>    5. Create developer oriented docs and faqs (wiki style?)
>    6. CONTRIBUTING.adoc in repo
>    1. Simplified
>       2. Quick “assume nothing tutorial”
>       3. Video Guide?
>
> Ongoing marketing and engagement
>
>    1. Quarterly email to the dev / users list
>    1. Recognize new contributors
>       2. Call out beginner jiras
>       3. Summarize ongoing projects
>    2. Consistently use the beginner / newbie tag in JIRA
>    1. Doc how to find beginner jiras in the contributing docs
>    3. Regular blog posts
>    1. Developer and community contributors
>       2. Invite people from other projects that integrate w/ Kudu to post
>       on our Blog
>       3. Document how to contribute a blog post
>       4. Topics: Compile and maintain a list of blog post ideas in case
>       people want inspiration -- Grant has been gathering ideas for this
>    4. Archive Slack to a mailing list to be indexed by search engines
>    (SlackArchive.io has shut down)
>
> Please offer your suggestions for where we can get a good bang for our
> collective buck, and if there is anything you would like to work on by all
> means please either speak up or feel free to reach out directly.
>
> Thanks,
>
> Grant and Mike
>
>

-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260 <javascript:void(0);>
https://impact.com
   <https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>

Re: Growing the Kudu community

Posted by Martin Weindel <ma...@gmail.com>.
- Regarding the Java MiniCluster support:
I think there are already a few solutions existing based on Docker, 
which are more flexible.
 From my side, there is a test integration using Docker Compose for the 
Presto-Kudu connector, see [1].
Currently it can spawn a cluster with one or three tservers.

- Regarding pre-build packages for production use:
For my needs, I have already set up a project to provide RPM packages 
for CentOS 7 (and rhel7), see [2].
Maybe it can be used as a starting point for a broader support. The tool 
chain is probably not sufficient to support other distros.
But I can image to help on such an effort.

I would also like to see some support for performing backups of Kudu tables.
In the project I'm currently involved, we are thinking about a 
command-line tool based on Kudu-Spark to export a table or parts of it 
to parquet files.

Best,
Martin Weindel

[1] 
https://github.com/MartinWeindel/presto/blob/fa59fadc2df7c70744ce3ab8d654b12fa185df39/presto-kudu/bin/run_kudu_tests.sh, 
https://github.com/MartinWeindel/presto/blob/fa59fadc2df7c70744ce3ab8d654b12fa185df39/presto-kudu/conf
[2] https://github.com/MartinWeindel/kudu-rpm