You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2013/06/16 20:02:34 UTC

Supporting an independent build farm

Hive's unit test suite has gotten larger as we have added more features and
thus it takes longer to run. For a single machine duel core with solid
state disks I have to start a test run at night, and then check the next
morning to see if the run has finished. (I have been running tests for
maybe 2 hours and am up to escape.q)

::opinion::
Also for a long time the distribution of which features get reviewed,
tested, and committed has been unfair. With more people involved in the
project this situation has gotten better however it is still not fair. What
sometimes ends up happening is that a good feature, which is reviewed, and
+1ed sits uncommitted for months or years.

Some committers or groups of commiters have an agenda and dedicated testing
resources, and others do not. This unbalances the project. It means that
small incremental improvements and new features not important to 'large
company with testing resources x' sit ready to be committed while other
people working in pairs further the project to their agenda. (This last
statement is not a condemnation of anyone, just possibly a fact of life)

::suggestion::
1) The project should sponsor an open and independent build/test farm
2) Once a ticket is marked 'patch available' this build farm should
automatically notice this and begin testing the patch
3) patches/issues which pass tests first should be considered 1st for
inclusions

We can use a hosted testing service such as:
http://www.cloudbees.com/platform/pricing/devcloud.cb

Q. Do any committers/interested parties like the idea?
Q. Would anyone be interested in dedicating financial resources to getting
this off the ground (I am)

Q. Does anyone have ideas for a better platform or a better system

Re: Supporting an independent build farm

Posted by Brock Noland <br...@cloudera.com>.
Hi Edward and Vinod,

I agree with what has been said here and it's an area I have taken a
particular interest.

I think the project modularity could be improved. For example the ql module
is quite large. Additionally using unit tests with mocked components as
opposed to .q file tests/tests which start some kind of server component
would improve unit test performance while increasing the precision of
tests. HIVE-4290 <https://issues.apache.org/jira/browse/HIVE-4290> should
also improve build and test times.

As some of you are aware, a few weeks ago I posted
HIVE-4675<https://issues.apache.org/jira/browse/HIVE-4675>which is a
new parallel unit test framework. I created that framework for
myself as I was frustrated by the build times for the Hive project. I've
implemented it at my employer, Cloudera, and it's been very successful. Too
ensure the framework is usable by the community at large, they have agreed
to sponsor some virtual test infrastructure. I should have more details
soon on exactly what that infrastructure will look like and I've created
HIVE-4739 <https://issues.apache.org/jira/browse/HIVE-4739> to track this
effort. If anyone is interested in sponsoring a portion of that
infrastructure please indicate as such on that JIRA and we can work out the
details.

Cheers!
Brock


On Sun, Jun 16, 2013 at 2:31 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> This is from someone from Hadoop and who's  been on and off in Hive.
>
> Dedicated test resources is good, but there are other (simpler?) things
> worth pursuing to begin with - suggestions from the peanut gallery:
>  - Split the project into modules. Without thinking much, a simple split
> could be client, execution engine, metastore. We did the module split in
> Hadoop, it is initially a bit of pain but pays back a lot in future. And
> whenever there are isolated module changes, only those modules needs to be
> tested. Also has the added benefit of clear modularity.
>  - A separate candidate suite of pre-commit tests. It can be a subset of
> all the tests, may be even hand-picked. Sure they won't catch some bugs,
> but it is a reasonable compromise that worked in Hadoop.
>  - And wire the pre-commit tests with JIRA/Jenkins.
>
> Thanks,
> +Vinod
>
> On Jun 16, 2013, at 11:02 AM, Edward Capriolo wrote:
>
> > Hive's unit test suite has gotten larger as we have added more features
> and
> > thus it takes longer to run. For a single machine duel core with solid
> > state disks I have to start a test run at night, and then check the next
> > morning to see if the run has finished. (I have been running tests for
> > maybe 2 hours and am up to escape.q)
> >
> > ::opinion::
> > Also for a long time the distribution of which features get reviewed,
> > tested, and committed has been unfair. With more people involved in the
> > project this situation has gotten better however it is still not fair.
> What
> > sometimes ends up happening is that a good feature, which is reviewed,
> and
> > +1ed sits uncommitted for months or years.
> >
> > Some committers or groups of commiters have an agenda and dedicated
> testing
> > resources, and others do not. This unbalances the project. It means that
> > small incremental improvements and new features not important to 'large
> > company with testing resources x' sit ready to be committed while other
> > people working in pairs further the project to their agenda. (This last
> > statement is not a condemnation of anyone, just possibly a fact of life)
> >
> > ::suggestion::
> > 1) The project should sponsor an open and independent build/test farm
> > 2) Once a ticket is marked 'patch available' this build farm should
> > automatically notice this and begin testing the patch
> > 3) patches/issues which pass tests first should be considered 1st for
> > inclusions
> >
> > We can use a hosted testing service such as:
> > http://www.cloudbees.com/platform/pricing/devcloud.cb
> >
> > Q. Do any committers/interested parties like the idea?
> > Q. Would anyone be interested in dedicating financial resources to
> getting
> > this off the ground (I am)
> >
> > Q. Does anyone have ideas for a better platform or a better system
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: Supporting an independent build farm

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
This is from someone from Hadoop and who's  been on and off in Hive.

Dedicated test resources is good, but there are other (simpler?) things worth pursuing to begin with - suggestions from the peanut gallery:
 - Split the project into modules. Without thinking much, a simple split could be client, execution engine, metastore. We did the module split in Hadoop, it is initially a bit of pain but pays back a lot in future. And whenever there are isolated module changes, only those modules needs to be tested. Also has the added benefit of clear modularity.
 - A separate candidate suite of pre-commit tests. It can be a subset of all the tests, may be even hand-picked. Sure they won't catch some bugs, but it is a reasonable compromise that worked in Hadoop.
 - And wire the pre-commit tests with JIRA/Jenkins.

Thanks,
+Vinod

On Jun 16, 2013, at 11:02 AM, Edward Capriolo wrote:

> Hive's unit test suite has gotten larger as we have added more features and
> thus it takes longer to run. For a single machine duel core with solid
> state disks I have to start a test run at night, and then check the next
> morning to see if the run has finished. (I have been running tests for
> maybe 2 hours and am up to escape.q)
> 
> ::opinion::
> Also for a long time the distribution of which features get reviewed,
> tested, and committed has been unfair. With more people involved in the
> project this situation has gotten better however it is still not fair. What
> sometimes ends up happening is that a good feature, which is reviewed, and
> +1ed sits uncommitted for months or years.
> 
> Some committers or groups of commiters have an agenda and dedicated testing
> resources, and others do not. This unbalances the project. It means that
> small incremental improvements and new features not important to 'large
> company with testing resources x' sit ready to be committed while other
> people working in pairs further the project to their agenda. (This last
> statement is not a condemnation of anyone, just possibly a fact of life)
> 
> ::suggestion::
> 1) The project should sponsor an open and independent build/test farm
> 2) Once a ticket is marked 'patch available' this build farm should
> automatically notice this and begin testing the patch
> 3) patches/issues which pass tests first should be considered 1st for
> inclusions
> 
> We can use a hosted testing service such as:
> http://www.cloudbees.com/platform/pricing/devcloud.cb
> 
> Q. Do any committers/interested parties like the idea?
> Q. Would anyone be interested in dedicating financial resources to getting
> this off the ground (I am)
> 
> Q. Does anyone have ideas for a better platform or a better system