You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Parth Chandra <pa...@apache.org> on 2015/07/24 06:58:24 UTC

[DISCUSS] Publishing advanced/functional tests

Drill devs use a set of tests that are not available as part of the Apache
distribution. These tests are a pre-requisite for all commits, but are not
available to any contributors outside the current devs.

This thread is to discuss various options to make these tests available.

Assumptions and requirements  -
1) A functional test (as opposed to a unit test) needs to be closer to the
end user environment than a development environment. As such, we should be
running functional tests in a cluster environment, connect using  zookeeper
etc.
2) Functional test will keep increasing in number, get more complex and
take a longer and longer time to execute as we go along.
3) Some requirements are:
    a) We want to be strict in enforcing the pre-commit requirements, but
not penalize the contributor who has a minor fix.
    b) All parts of the product (especially various 'certified' storage
plugins like Hive and Hbase should get tested)
    c) It should be easy to debug issues when a test fails. Tests should
fail deterministically. If a test fails, it should always fail and always
fail in the same way (easier said than done).

Some suggestions -
1) Tests should be a top-level maven module within the drill project
        a) We want  the integration tests to run as part of the drill's
maven build process
        b) The build step for the integration-tests module would launch an
embedded drillbit and runs tests against it
        c) The tests will be a separate target so they need not be run all
the time
 2) Tests should be divided into multiple suites that are based on
components. For example a test suite for testing datatypes will contain the
tests for various datatypes including complex types. A contributor or
developer can then run these tests more frequently as an issue is being
addressed and run the entire suite only once before commit.
3) Provide the tests as a hosted service
4) Setup a bot to fire the test on an AWS cluster and post the results to
the JIRA  (Hive does this). Or some variant of this idea.


Some questions -
1) What do some other projects do?
2) Are there any technologies we can leverage that will make this easier?
3) How do we make it easier to debug failing tests.


Please feel free to question the assumptions and requirements. Be creative
with your suggestions.

Parth

Re: [DISCUSS] Publishing advanced/functional tests

Posted by rahul challapalli <ch...@gmail.com>.
Ramana,

Yes the plan is to have it out with 1.2 and the work is under progress.

- Rahul

On Mon, Aug 17, 2015 at 10:27 AM, Chun Chang <cc...@maprtech.com> wrote:

> Hi Ramana,
>
> Glad to see your post here. I agree with your point that we should have a
> way for public to run all the pre-commit tests. I feel that's a higher
> priority than anything else since with that, people can commit their
> patches.
>
> Thanks,
> Chun
>
> On Fri, Aug 14, 2015 at 11:33 AM, Ramana I N <in...@gmail.com> wrote:
>
> > So what is the status on this? It would be nice to have this out with 1.2
> > coming out.
> >
> > Regards
> > Ramana
> >
> >
> >
> > On Wed, Aug 5, 2015 at 11:08 AM, Abhishek Girish <
> > abhishek.girish@gmail.com>
> > wrote:
> >
> > > Ramana,
> > >
> > > I think the issue with licenses is mostly resolved. It was discussed
> that
> > > for TPC-*, since we shall not be redistributing the data-gen software,
> > but
> > > distributing a randomized variant of the data generated by it, we
> should
> > be
> > > okay to include it part of our framework. For other datasets, we shall
> > > either provide their copy of license with our framework, or simply
> > provide
> > > a link for users to download data before they execute.
> > >
> > > For now we should focus on having the framework out with minimal
> cleanup.
> > > In near future we can work on setting up infrastructure and enhancing
> the
> > > framework itself.
> > >
> > > -Abhishek
> > >
> > > On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N <inramana@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>> wrote:
> > >
> > > > @Jacques, Ted
> > > >
> > > > in the mean time, we risk patches being merged that have less than
> > > complete
> > > > > testing.
> > > >
> > > >
> > > > While I agree with the premise of getting the tests out as soon as
> > > possible
> > > > it does not help us achieve anything except transparency. Your
> > statement
> > > > that getting the tests out will increase quality is dependent on
> > someone
> > > > actually being able to run the tests once they have access to it.
> > > >
> > > > Maybe we should focus on making a jenkins job to run the tests
> > publicly.
> > > > With that in place we can exclude the TPC* datasets as well as the
> yelp
> > > > data sets from the framework and avoid licensing issues.
> > > >
> > > > Regards
> > > > Ramana
> > > >
> > > >
> > > > On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','abhishek.girish@gmail.com');>>
> > > > wrote:
> > > >
> > > > > We not only re-distribute external data-sets as-is, but also
> include
> > > > > variants for those (text -> parquet, json, ...). So the challenge
> > here
> > > is
> > > > > not simply disabling automatic downloads via the framework, and
> point
> > > > users
> > > > > to manually download the files before running the framework, but
> also
> > > > about
> > > > > how we will handle tests which require variants of the data sets.
> It
> > > > simply
> > > > > isn't practical to users of the framework to (1) download data-gen
> > > > manually
> > > > > (2) use specific seed / options before generating data, (3) convert
> > > them
> > > > to
> > > > > parquet, etc.. (4) move them to specific locations inside their
> copy
> > of
> > > > the
> > > > > framework.
> > > > >
> > > > > Something we'll need to know is how other projects are handling
> > > > bench-mark
> > > > > & other external datasets.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
> > > > > challapallirahul@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > >
> > > > > > Thanks for your inputs.
> > > > > >
> > > > > > Once issue with just publishing the tests in their current state
> is
> > > > that,
> > > > > > the framework re-distributes tpch, tpcds, yelp data sets without
> > > > > requiring
> > > > > > the users to accept their relevant licenses. A good number of
> tests
> > > > uses
> > > > > > these data sets. Any thoughts on how to handle this?
> > > > > >
> > > > > > - Rahul
> > > > > >
> > > > > > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <
> > ted.dunning@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > > wrote:
> > > > > >
> > > > > > > +1.  Get it out there.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <
> > > jacques@dremio.com
> > > > <javascript:_e(%7B%7D,'cvml','jacques@dremio.com');>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Rahul,
> > > > > > > >
> > > > > > > > My suggestion would be to the lower bar--do the absolute bare
> > > > minimum
> > > > > > to
> > > > > > > > get the tests out there.  For example, simply remove
> > proprietary
> > > > > > > > information and then get it on a public github (whether your
> > > > personal
> > > > > > > > github or a corporate one).  From there, people can help by
> > > > > submitting
> > > > > > > pull
> > > > > > > > requests to improve the infrastructure and harness.  Making
> > > things
> > > > > > easier
> > > > > > > > is something that can be done over time.  For example, we've
> > had
> > > > > offers
> > > > > > > > from a couple different Linux Admins to help on something.
> I'm
> > > > sure
> > > > > > that
> > > > > > > > they could help with a number of the items you've identified.
> > In
> > > > the
> > > > > > > mean
> > > > > > > > time, we risk patches being merged that have less than
> complete
> > > > > > testing.
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Jacques Nadeau
> > > > > > > > CTO and Co-Founder, Dremio
> > > > > > > >
> > > > > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > > > > > > challapallirahul@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > >
> > > > > > > > > Jacques,
> > > > > > > > >
> > > > > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > > > > > > add/prioritize
> > > > > > > > > these tasks
> > > > > > > > >
> > > > > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Remove Proprietary Data & Queries
> > > > > > > > > 0
> > > > > > > > >
> > > > > > > > > Redact Propriety Data/Queries
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Move tests into drill repo
> > > > > > > > > This requires some refactoring to the framework code since
> > the
> > > > test
> > > > > > > > > framework uses a 2-level directory structure
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Organize the tests using a label based approach
> > > > > > > > > This involves code changes and moving a lot of files. When
> > > doing
> > > > a
> > > > > > one
> > > > > > > > time
> > > > > > > > > push it might be better to do this before publishing the
> > tests?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Each suite should be independentSome suites wrongly assume
> > that
> > > > the
> > > > > > > data
> > > > > > > > is
> > > > > > > > > present. They should be identified and fixed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cleanup hardcoded dependencies during data generationSome
> > > > data-gen
> > > > > > > > scripts
> > > > > > > > > have hard-coded references
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cleanup downloadsThe same dataset is being downloaded
> > multiple
> > > > > times
> > > > > > by
> > > > > > > > > different suites
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Licenses for downloadsThe framework downloads some files
> > > > > > automatically.
> > > > > > > > > These files are publicly available.
> > > > > > > > > However before downloading them users need to agree to
> > certain
> > > > > terms.
> > > > > > > By
> > > > > > > > > using the framework users might be skipping this step. We
> > > should
> > > > > look
> > > > > > > > into
> > > > > > > > > this
> > > > > > > > > 2*Setup a cluster infrastructure to run the pre-commit
> tests*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 3*Local debugging of tests*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Add an optional maven target for running tests on a local
> > > machine
> > > > > > > > > Tests can launch an embedded drillbit or they can connect
> to
> > a
> > > > > > running
> > > > > > > > > drillbit through zookeeper
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Running suites which require additional setup (hive, hbase
> > etc)
> > > > > > should
> > > > > > > be
> > > > > > > > > made optional
> > > > > > > > >
> > > > > > > > > 4*Documentation*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Running Tests (options available and also listing the
> asumed
> > > > > > defaults)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Explaining how tests are organized
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Process for adding a new suite
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <
> > > > > jacques@dremio.com <javascript:_e(%7B%7D,'cvml','
> jacques@dremio.com
> > > ');>>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Let's get number one done (tests out there so all
> community
> > > > > members
> > > > > > > can
> > > > > > > > > run
> > > > > > > > > > them).  Then the whole community can work together to
> solve
> > > the
> > > > > > rest.
> > > > > > > > > >
> > > > > > > > > > I don't think the base install should include integration
> > > test
> > > > > > > > execution.
> > > > > > > > > > I do think the tests should be in the main repo (as
> opposed
> > > to
> > > > a
> > > > > > > > > > secondary).
> > > > > > > > > >
> > > > > > > > > > We should strive to ultimately make running these
> > integration
> > > > > > tests a
> > > > > > > > > > requirement for merging.  We need to complete all the
> steps
> > > > > before
> > > > > > we
> > > > > > > > can
> > > > > > > > > > impose that.  I should be able to help on the global run
> > > > > component
> > > > > > > and
> > > > > > > > > > supporting infrastructure.
> > > > > > > > > >
> > > > > > > > > > J
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Jacques Nadeau
> > > > > > > > > > CTO and Co-Founder, Dremio
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > > > > > > challapallirahul@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > > >
> > > > > > > > > > > Ramana,
> > > > > > > > > > >
> > > > > > > > > > > You are right. We are trying to address multiple issues
> > > here,
> > > > > but
> > > > > > > not
> > > > > > > > > > with
> > > > > > > > > > > a single solution. I am summarizing them
> > > > > > > > > > >
> > > > > > > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > > > > > > 2. Before applying a patch we should run tests in a
> > > clustered
> > > > > > > > > > environment.
> > > > > > > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > > > > > > 3. Developers should be able to debug majority of the
> > tests
> > > > on
> > > > > > > their
> > > > > > > > > > local
> > > > > > > > > > > environment. I made a few suggestions above to this
> > regard
> > > > > > > > > > >
> > > > > > > > > > > - Rahul
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <
> > > > > inramana@gmail.com <javascript:_e(%7B%7D,'cvml','
> inramana@gmail.com
> > > ');>
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > One important thing which we need to be clear on here
> > is
> > > > what
> > > > > > are
> > > > > > > > we
> > > > > > > > > > > trying
> > > > > > > > > > > > to address?
> > > > > > > > > > > >
> > > > > > > > > > > > I feel there are two separate issues here and I do
> not
> > > > think
> > > > > > one
> > > > > > > > > > solution
> > > > > > > > > > > > will fit both the issues.
> > > > > > > > > > > >
> > > > > > > > > > > >    1. Allowing developers to run tests on their local
> > box
> > > > so
> > > > > > they
> > > > > > > > > know
> > > > > > > > > > > the
> > > > > > > > > > > >    changes they have are not completely wrong.
> > > > > > > > > > > >    2. Allowing transparency in the integration tests
> > > > process
> > > > > > > which
> > > > > > > > is
> > > > > > > > > > > >    currently a black box.
> > > > > > > > > > > >
> > > > > > > > > > > > 1 is needed for developers to make changes and have
> an
> > > idea
> > > > > > that
> > > > > > > > > their
> > > > > > > > > > > > changes are not going to fail tests en masse in the
> > > > > integration
> > > > > > > > > suite.
> > > > > > > > > > 2
> > > > > > > > > > > is
> > > > > > > > > > > > needed because its a prerequisite for changes to be
> > > > > committed.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Regards
> > > > > > > > > > > > Ramana
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > > > > > > challapallirahul@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Ramana,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let me fill in more details.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Before we accept a patch we want to make sure
> the
> > > > tests
> > > > > > run
> > > > > > > > in a
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > environment. No exceptions here.
> > > > > > > > > > > > > 2. We want  the contributors to be able to debug
> the
> > > > > failing
> > > > > > > > tests
> > > > > > > > > on
> > > > > > > > > > > > their
> > > > > > > > > > > > > laptops in as many cases as possbile. This
> requires :
> > > > > > > > > > > > >         1. Tests should run on top of a local file
> > > > system.
> > > > > > > (Tests
> > > > > > > > > can
> > > > > > > > > > > > > launch an embedded drillbit or they can connect to
> a
> > > > > running
> > > > > > > > > drillbit
> > > > > > > > > > > > > through zookeeper)
> > > > > > > > > > > > >         2. Running suites which require additional
> > > setup
> > > > > > (hive,
> > > > > > > > > hbase
> > > > > > > > > > > > etc)
> > > > > > > > > > > > > should be made optional and sufficient
> documentation
> > > > should
> > > > > > be
> > > > > > > > > > provided
> > > > > > > > > > > > for
> > > > > > > > > > > > > enabling and disabling these tests.
> > > > > > > > > > > > > 3. In my opinion making these new tests part of
> drill
> > > > would
> > > > > > > make
> > > > > > > > it
> > > > > > > > > > > > easier
> > > > > > > > > > > > > for the developers to debug and run tests instead
> of
> > > > > having a
> > > > > > > > > > different
> > > > > > > > > > > > > repository. But as you said it might bloat the
> drill
> > > > > project
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Rahul
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > > > > > > ted.dunning@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The Hadoop family of projects has some software
> > that
> > > > > > > > integrates a
> > > > > > > > > > > > > > continuous integration system so that every time
> a
> > > JIRA
> > > > > is
> > > > > > > > marked
> > > > > > > > > > as
> > > > > > > > > > > > > > patch-available, the associated patch attached to
> > the
> > > > bug
> > > > > > > will
> > > > > > > > > have
> > > > > > > > > > > > > > integration tests run against it.  I believe that
> > > there
> > > > > has
> > > > > > > > been
> > > > > > > > > > some
> > > > > > > > > > > > > > process to use git hashes instead of patches.
> The
> > CI
> > > > > > results
> > > > > > > > are
> > > > > > > > > > put
> > > > > > > > > > > > > back
> > > > > > > > > > > > > > on the JIRA.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is done using a fairly simple set of
> scripts.
> > > > > Apache
> > > > > > > > Yetus
> > > > > > > > > is
> > > > > > > > > > > > just
> > > > > > > > > > > > > > forming as a direct-to-top-level spinoff from
> > Hadoop
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Proposal is here (don't be fooled by the fact
> that
> > it
> > > > > looks
> > > > > > > > like
> > > > > > > > > an
> > > > > > > > > > > > > > incubation proposal):
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Early code can be found here (don't guess that
> this
> > > is
> > > > > very
> > > > > > > > real
> > > > > > > > > > > yet).
> > > > > > > > > > > > > > More links can be found in the proposal.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The project has not yet been formed and there are
> > no
> > > > > > mailing
> > > > > > > > > lists
> > > > > > > > > > or
> > > > > > > > > > > > git
> > > > > > > > > > > > > > repo yet.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > > > > > > inramana@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As someone who worked on this for a while,
> > > including
> > > > it
> > > > > > as
> > > > > > > > part
> > > > > > > > > > of
> > > > > > > > > > > > > drill
> > > > > > > > > > > > > > > may bloat drill a bit too much. Also not a big
> > fan
> > > of
> > > > > > > running
> > > > > > > > > > > against
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > embedded drillbit. Does not replicate an actual
> > > > > > production
> > > > > > > > use
> > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Additionally, setting up hive hbase and other
> > > > > components
> > > > > > > > maybe
> > > > > > > > > > > > painful
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > unnecessary for most ppl. It would deter people
> > > from
> > > > > ever
> > > > > > > > > > > > contributing
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > drill. We could spin up in memory hive and
> hbase
> > > but
> > > > > > that's
> > > > > > > > > > similar
> > > > > > > > > > > > to
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > embedded drill bit. Does not replicate a
> > production
> > > > > > > scenario.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Would prefer the hive way with a central
> Jenkins
> > > > server
> > > > > > > > hosted
> > > > > > > > > on
> > > > > > > > > > > aws
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > accessible to everyone.  Users should be able
> to
> > > > > submit a
> > > > > > > git
> > > > > > > > > url
> > > > > > > > > > > and
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > should be able to deploy and fire off tests.
> > Should
> > > > > then
> > > > > > > > have a
> > > > > > > > > > way
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > easily communicate failures to contributors and
> > if
> > > > > > success
> > > > > > > > > notify
> > > > > > > > > > > the
> > > > > > > > > > > > > > > commiters to commit the change.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can
> > look
> > > > into
> > > > > > > reuse
> > > > > > > > > > > rather
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > doing it from scratch. Esp the Jenkins and
> > > > > configuration
> > > > > > > > stuff.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > > Ramana
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > > > > > > parthc@apache.org
> > > > <javascript:_e(%7B%7D,'cvml','parthc@apache.org');>
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Drill devs use a set of tests that are not
> > > > available
> > > > > as
> > > > > > > > part
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > > distribution. These tests are a pre-requisite
> > for
> > > > all
> > > > > > > > > commits,
> > > > > > > > > > > but
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > available to any contributors outside the
> > current
> > > > > devs.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This thread is to discuss various options to
> > make
> > > > > these
> > > > > > > > tests
> > > > > > > > > > > > > > available.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > > > > > > 1) A functional test (as opposed to a unit
> > test)
> > > > > needs
> > > > > > to
> > > > > > > > be
> > > > > > > > > > > closer
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > end user environment than a development
> > > > environment.
> > > > > As
> > > > > > > > such,
> > > > > > > > > > we
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > running functional tests in a cluster
> > > environment,
> > > > > > > connect
> > > > > > > > > > using
> > > > > > > > > > > > > > > zookeeper
> > > > > > > > > > > > > > > > etc.
> > > > > > > > > > > > > > > > 2) Functional test will keep increasing in
> > > number,
> > > > > get
> > > > > > > more
> > > > > > > > > > > complex
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > take a longer and longer time to execute as
> we
> > go
> > > > > > along.
> > > > > > > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > > > > > > >     a) We want to be strict in enforcing the
> > > > > pre-commit
> > > > > > > > > > > > requirements,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > not penalize the contributor who has a minor
> > fix.
> > > > > > > > > > > > > > > >     b) All parts of the product (especially
> > > various
> > > > > > > > > 'certified'
> > > > > > > > > > > > > storage
> > > > > > > > > > > > > > > > plugins like Hive and Hbase should get
> tested)
> > > > > > > > > > > > > > > >     c) It should be easy to debug issues
> when a
> > > > test
> > > > > > > fails.
> > > > > > > > > > Tests
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > fail deterministically. If a test fails, it
> > > should
> > > > > > always
> > > > > > > > > fail
> > > > > > > > > > > and
> > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Some suggestions -
> > > > > > > > > > > > > > > > 1) Tests should be a top-level maven module
> > > within
> > > > > the
> > > > > > > > drill
> > > > > > > > > > > > project
> > > > > > > > > > > > > > > >         a) We want  the integration tests to
> > run
> > > as
> > > > > > part
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > > drill's
> > > > > > > > > > > > > > > > maven build process
> > > > > > > > > > > > > > > >         b) The build step for the
> > > integration-tests
> > > > > > > module
> > > > > > > > > > would
> > > > > > > > > > > > > launch
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > > > > > > >         c) The tests will be a separate
> target
> > so
> > > > > they
> > > > > > > need
> > > > > > > > > not
> > > > > > > > > > > be
> > > > > > > > > > > > > run
> > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > the time
> > > > > > > > > > > > > > > >  2) Tests should be divided into multiple
> > suites
> > > > that
> > > > > > are
> > > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > > > > > > components. For example a test suite for
> > testing
> > > > > > > datatypes
> > > > > > > > > will
> > > > > > > > > > > > > contain
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > tests for various datatypes including complex
> > > > types.
> > > > > A
> > > > > > > > > > > contributor
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > > developer can then run these tests more
> > > frequently
> > > > as
> > > > > > an
> > > > > > > > > issue
> > > > > > > > > > is
> > > > > > > > > > > > > being
> > > > > > > > > > > > > > > > addressed and run the entire suite only once
> > > before
> > > > > > > commit.
> > > > > > > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS
> > cluster
> > > > and
> > > > > > > post
> > > > > > > > > the
> > > > > > > > > > > > > results
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the JIRA  (Hive does this). Or some variant
> of
> > > this
> > > > > > idea.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Some questions -
> > > > > > > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > > > > > > 2) Are there any technologies we can leverage
> > > that
> > > > > will
> > > > > > > > make
> > > > > > > > > > this
> > > > > > > > > > > > > > easier?
> > > > > > > > > > > > > > > > 3) How do we make it easier to debug failing
> > > tests.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please feel free to question the assumptions
> > and
> > > > > > > > > requirements.
> > > > > > > > > > Be
> > > > > > > > > > > > > > > creative
> > > > > > > > > > > > > > > > with your suggestions.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Parth
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Chun Chang <cc...@maprtech.com>.
Hi Ramana,

Glad to see your post here. I agree with your point that we should have a
way for public to run all the pre-commit tests. I feel that's a higher
priority than anything else since with that, people can commit their
patches.

Thanks,
Chun

On Fri, Aug 14, 2015 at 11:33 AM, Ramana I N <in...@gmail.com> wrote:

> So what is the status on this? It would be nice to have this out with 1.2
> coming out.
>
> Regards
> Ramana
>
>
>
> On Wed, Aug 5, 2015 at 11:08 AM, Abhishek Girish <
> abhishek.girish@gmail.com>
> wrote:
>
> > Ramana,
> >
> > I think the issue with licenses is mostly resolved. It was discussed that
> > for TPC-*, since we shall not be redistributing the data-gen software,
> but
> > distributing a randomized variant of the data generated by it, we should
> be
> > okay to include it part of our framework. For other datasets, we shall
> > either provide their copy of license with our framework, or simply
> provide
> > a link for users to download data before they execute.
> >
> > For now we should focus on having the framework out with minimal cleanup.
> > In near future we can work on setting up infrastructure and enhancing the
> > framework itself.
> >
> > -Abhishek
> >
> > On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N <inramana@gmail.com
> > <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>> wrote:
> >
> > > @Jacques, Ted
> > >
> > > in the mean time, we risk patches being merged that have less than
> > complete
> > > > testing.
> > >
> > >
> > > While I agree with the premise of getting the tests out as soon as
> > possible
> > > it does not help us achieve anything except transparency. Your
> statement
> > > that getting the tests out will increase quality is dependent on
> someone
> > > actually being able to run the tests once they have access to it.
> > >
> > > Maybe we should focus on making a jenkins job to run the tests
> publicly.
> > > With that in place we can exclude the TPC* datasets as well as the yelp
> > > data sets from the framework and avoid licensing issues.
> > >
> > > Regards
> > > Ramana
> > >
> > >
> > > On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','abhishek.girish@gmail.com');>>
> > > wrote:
> > >
> > > > We not only re-distribute external data-sets as-is, but also include
> > > > variants for those (text -> parquet, json, ...). So the challenge
> here
> > is
> > > > not simply disabling automatic downloads via the framework, and point
> > > users
> > > > to manually download the files before running the framework, but also
> > > about
> > > > how we will handle tests which require variants of the data sets. It
> > > simply
> > > > isn't practical to users of the framework to (1) download data-gen
> > > manually
> > > > (2) use specific seed / options before generating data, (3) convert
> > them
> > > to
> > > > parquet, etc.. (4) move them to specific locations inside their copy
> of
> > > the
> > > > framework.
> > > >
> > > > Something we'll need to know is how other projects are handling
> > > bench-mark
> > > > & other external datasets.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
> > > > challapallirahul@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > >
> > > > > Thanks for your inputs.
> > > > >
> > > > > Once issue with just publishing the tests in their current state is
> > > that,
> > > > > the framework re-distributes tpch, tpcds, yelp data sets without
> > > > requiring
> > > > > the users to accept their relevant licenses. A good number of tests
> > > uses
> > > > > these data sets. Any thoughts on how to handle this?
> > > > >
> > > > > - Rahul
> > > > >
> > > > > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <
> ted.dunning@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > wrote:
> > > > >
> > > > > > +1.  Get it out there.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <
> > jacques@dremio.com
> > > <javascript:_e(%7B%7D,'cvml','jacques@dremio.com');>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Rahul,
> > > > > > >
> > > > > > > My suggestion would be to the lower bar--do the absolute bare
> > > minimum
> > > > > to
> > > > > > > get the tests out there.  For example, simply remove
> proprietary
> > > > > > > information and then get it on a public github (whether your
> > > personal
> > > > > > > github or a corporate one).  From there, people can help by
> > > > submitting
> > > > > > pull
> > > > > > > requests to improve the infrastructure and harness.  Making
> > things
> > > > > easier
> > > > > > > is something that can be done over time.  For example, we've
> had
> > > > offers
> > > > > > > from a couple different Linux Admins to help on something.  I'm
> > > sure
> > > > > that
> > > > > > > they could help with a number of the items you've identified.
> In
> > > the
> > > > > > mean
> > > > > > > time, we risk patches being merged that have less than complete
> > > > > testing.
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jacques Nadeau
> > > > > > > CTO and Co-Founder, Dremio
> > > > > > >
> > > > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > > > > > challapallirahul@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > >
> > > > > > > > Jacques,
> > > > > > > >
> > > > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > > > > > add/prioritize
> > > > > > > > these tasks
> > > > > > > >
> > > > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Remove Proprietary Data & Queries
> > > > > > > > 0
> > > > > > > >
> > > > > > > > Redact Propriety Data/Queries
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Move tests into drill repo
> > > > > > > > This requires some refactoring to the framework code since
> the
> > > test
> > > > > > > > framework uses a 2-level directory structure
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Organize the tests using a label based approach
> > > > > > > > This involves code changes and moving a lot of files. When
> > doing
> > > a
> > > > > one
> > > > > > > time
> > > > > > > > push it might be better to do this before publishing the
> tests?
> > > > > > > >
> > > > > > > >
> > > > > > > > Each suite should be independentSome suites wrongly assume
> that
> > > the
> > > > > > data
> > > > > > > is
> > > > > > > > present. They should be identified and fixed
> > > > > > > >
> > > > > > > >
> > > > > > > > Cleanup hardcoded dependencies during data generationSome
> > > data-gen
> > > > > > > scripts
> > > > > > > > have hard-coded references
> > > > > > > >
> > > > > > > >
> > > > > > > > Cleanup downloadsThe same dataset is being downloaded
> multiple
> > > > times
> > > > > by
> > > > > > > > different suites
> > > > > > > >
> > > > > > > >
> > > > > > > > Licenses for downloadsThe framework downloads some files
> > > > > automatically.
> > > > > > > > These files are publicly available.
> > > > > > > > However before downloading them users need to agree to
> certain
> > > > terms.
> > > > > > By
> > > > > > > > using the framework users might be skipping this step. We
> > should
> > > > look
> > > > > > > into
> > > > > > > > this
> > > > > > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > > > > > >
> > > > > > > >
> > > > > > > > 3*Local debugging of tests*
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Add an optional maven target for running tests on a local
> > machine
> > > > > > > > Tests can launch an embedded drillbit or they can connect to
> a
> > > > > running
> > > > > > > > drillbit through zookeeper
> > > > > > > >
> > > > > > > >
> > > > > > > > Running suites which require additional setup (hive, hbase
> etc)
> > > > > should
> > > > > > be
> > > > > > > > made optional
> > > > > > > >
> > > > > > > > 4*Documentation*
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Running Tests (options available and also listing the asumed
> > > > > defaults)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Explaining how tests are organized
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Process for adding a new suite
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <
> > > > jacques@dremio.com <javascript:_e(%7B%7D,'cvml','jacques@dremio.com
> > ');>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Let's get number one done (tests out there so all community
> > > > members
> > > > > > can
> > > > > > > > run
> > > > > > > > > them).  Then the whole community can work together to solve
> > the
> > > > > rest.
> > > > > > > > >
> > > > > > > > > I don't think the base install should include integration
> > test
> > > > > > > execution.
> > > > > > > > > I do think the tests should be in the main repo (as opposed
> > to
> > > a
> > > > > > > > > secondary).
> > > > > > > > >
> > > > > > > > > We should strive to ultimately make running these
> integration
> > > > > tests a
> > > > > > > > > requirement for merging.  We need to complete all the steps
> > > > before
> > > > > we
> > > > > > > can
> > > > > > > > > impose that.  I should be able to help on the global run
> > > > component
> > > > > > and
> > > > > > > > > supporting infrastructure.
> > > > > > > > >
> > > > > > > > > J
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jacques Nadeau
> > > > > > > > > CTO and Co-Founder, Dremio
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > > > > > challapallirahul@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > >
> > > > > > > > > > Ramana,
> > > > > > > > > >
> > > > > > > > > > You are right. We are trying to address multiple issues
> > here,
> > > > but
> > > > > > not
> > > > > > > > > with
> > > > > > > > > > a single solution. I am summarizing them
> > > > > > > > > >
> > > > > > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > > > > > 2. Before applying a patch we should run tests in a
> > clustered
> > > > > > > > > environment.
> > > > > > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > > > > > 3. Developers should be able to debug majority of the
> tests
> > > on
> > > > > > their
> > > > > > > > > local
> > > > > > > > > > environment. I made a few suggestions above to this
> regard
> > > > > > > > > >
> > > > > > > > > > - Rahul
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <
> > > > inramana@gmail.com <javascript:_e(%7B%7D,'cvml','inramana@gmail.com
> > ');>
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > One important thing which we need to be clear on here
> is
> > > what
> > > > > are
> > > > > > > we
> > > > > > > > > > trying
> > > > > > > > > > > to address?
> > > > > > > > > > >
> > > > > > > > > > > I feel there are two separate issues here and I do not
> > > think
> > > > > one
> > > > > > > > > solution
> > > > > > > > > > > will fit both the issues.
> > > > > > > > > > >
> > > > > > > > > > >    1. Allowing developers to run tests on their local
> box
> > > so
> > > > > they
> > > > > > > > know
> > > > > > > > > > the
> > > > > > > > > > >    changes they have are not completely wrong.
> > > > > > > > > > >    2. Allowing transparency in the integration tests
> > > process
> > > > > > which
> > > > > > > is
> > > > > > > > > > >    currently a black box.
> > > > > > > > > > >
> > > > > > > > > > > 1 is needed for developers to make changes and have an
> > idea
> > > > > that
> > > > > > > > their
> > > > > > > > > > > changes are not going to fail tests en masse in the
> > > > integration
> > > > > > > > suite.
> > > > > > > > > 2
> > > > > > > > > > is
> > > > > > > > > > > needed because its a prerequisite for changes to be
> > > > committed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Regards
> > > > > > > > > > > Ramana
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > > > > > challapallirahul@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Ramana,
> > > > > > > > > > > >
> > > > > > > > > > > > Let me fill in more details.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Before we accept a patch we want to make sure the
> > > tests
> > > > > run
> > > > > > > in a
> > > > > > > > > > > cluster
> > > > > > > > > > > > environment. No exceptions here.
> > > > > > > > > > > > 2. We want  the contributors to be able to debug the
> > > > failing
> > > > > > > tests
> > > > > > > > on
> > > > > > > > > > > their
> > > > > > > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > > > > > > >         1. Tests should run on top of a local file
> > > system.
> > > > > > (Tests
> > > > > > > > can
> > > > > > > > > > > > launch an embedded drillbit or they can connect to a
> > > > running
> > > > > > > > drillbit
> > > > > > > > > > > > through zookeeper)
> > > > > > > > > > > >         2. Running suites which require additional
> > setup
> > > > > (hive,
> > > > > > > > hbase
> > > > > > > > > > > etc)
> > > > > > > > > > > > should be made optional and sufficient documentation
> > > should
> > > > > be
> > > > > > > > > provided
> > > > > > > > > > > for
> > > > > > > > > > > > enabling and disabling these tests.
> > > > > > > > > > > > 3. In my opinion making these new tests part of drill
> > > would
> > > > > > make
> > > > > > > it
> > > > > > > > > > > easier
> > > > > > > > > > > > for the developers to debug and run tests instead of
> > > > having a
> > > > > > > > > different
> > > > > > > > > > > > repository. But as you said it might bloat the drill
> > > > project
> > > > > > > > > > > >
> > > > > > > > > > > > - Rahul
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > > > > > ted.dunning@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > The Hadoop family of projects has some software
> that
> > > > > > > integrates a
> > > > > > > > > > > > > continuous integration system so that every time a
> > JIRA
> > > > is
> > > > > > > marked
> > > > > > > > > as
> > > > > > > > > > > > > patch-available, the associated patch attached to
> the
> > > bug
> > > > > > will
> > > > > > > > have
> > > > > > > > > > > > > integration tests run against it.  I believe that
> > there
> > > > has
> > > > > > > been
> > > > > > > > > some
> > > > > > > > > > > > > process to use git hashes instead of patches.  The
> CI
> > > > > results
> > > > > > > are
> > > > > > > > > put
> > > > > > > > > > > > back
> > > > > > > > > > > > > on the JIRA.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is done using a fairly simple set of scripts.
> > > > Apache
> > > > > > > Yetus
> > > > > > > > is
> > > > > > > > > > > just
> > > > > > > > > > > > > forming as a direct-to-top-level spinoff from
> Hadoop
> > > > > > > > > > > > >
> > > > > > > > > > > > > Proposal is here (don't be fooled by the fact that
> it
> > > > looks
> > > > > > > like
> > > > > > > > an
> > > > > > > > > > > > > incubation proposal):
> > > > > > > > > > > > >
> > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > > > > > >
> > > > > > > > > > > > > Early code can be found here (don't guess that this
> > is
> > > > very
> > > > > > > real
> > > > > > > > > > yet).
> > > > > > > > > > > > > More links can be found in the proposal.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > > > > > >
> > > > > > > > > > > > > The project has not yet been formed and there are
> no
> > > > > mailing
> > > > > > > > lists
> > > > > > > > > or
> > > > > > > > > > > git
> > > > > > > > > > > > > repo yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > > > > > inramana@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > As someone who worked on this for a while,
> > including
> > > it
> > > > > as
> > > > > > > part
> > > > > > > > > of
> > > > > > > > > > > > drill
> > > > > > > > > > > > > > may bloat drill a bit too much. Also not a big
> fan
> > of
> > > > > > running
> > > > > > > > > > against
> > > > > > > > > > > > an
> > > > > > > > > > > > > > embedded drillbit. Does not replicate an actual
> > > > > production
> > > > > > > use
> > > > > > > > > > case.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Additionally, setting up hive hbase and other
> > > > components
> > > > > > > maybe
> > > > > > > > > > > painful
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > unnecessary for most ppl. It would deter people
> > from
> > > > ever
> > > > > > > > > > > contributing
> > > > > > > > > > > > to
> > > > > > > > > > > > > > drill. We could spin up in memory hive and hbase
> > but
> > > > > that's
> > > > > > > > > similar
> > > > > > > > > > > to
> > > > > > > > > > > > an
> > > > > > > > > > > > > > embedded drill bit. Does not replicate a
> production
> > > > > > scenario.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Would prefer the hive way with a central Jenkins
> > > server
> > > > > > > hosted
> > > > > > > > on
> > > > > > > > > > aws
> > > > > > > > > > > > and
> > > > > > > > > > > > > > accessible to everyone.  Users should be able to
> > > > submit a
> > > > > > git
> > > > > > > > url
> > > > > > > > > > and
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > should be able to deploy and fire off tests.
> Should
> > > > then
> > > > > > > have a
> > > > > > > > > way
> > > > > > > > > > > to
> > > > > > > > > > > > > > easily communicate failures to contributors and
> if
> > > > > success
> > > > > > > > notify
> > > > > > > > > > the
> > > > > > > > > > > > > > commiters to commit the change.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can
> look
> > > into
> > > > > > reuse
> > > > > > > > > > rather
> > > > > > > > > > > > than
> > > > > > > > > > > > > > doing it from scratch. Esp the Jenkins and
> > > > configuration
> > > > > > > stuff.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > Ramana
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > > > > > parthc@apache.org
> > > <javascript:_e(%7B%7D,'cvml','parthc@apache.org');>
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Drill devs use a set of tests that are not
> > > available
> > > > as
> > > > > > > part
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > distribution. These tests are a pre-requisite
> for
> > > all
> > > > > > > > commits,
> > > > > > > > > > but
> > > > > > > > > > > > are
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > available to any contributors outside the
> current
> > > > devs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This thread is to discuss various options to
> make
> > > > these
> > > > > > > tests
> > > > > > > > > > > > > available.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > > > > > 1) A functional test (as opposed to a unit
> test)
> > > > needs
> > > > > to
> > > > > > > be
> > > > > > > > > > closer
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > end user environment than a development
> > > environment.
> > > > As
> > > > > > > such,
> > > > > > > > > we
> > > > > > > > > > > > should
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > running functional tests in a cluster
> > environment,
> > > > > > connect
> > > > > > > > > using
> > > > > > > > > > > > > > zookeeper
> > > > > > > > > > > > > > > etc.
> > > > > > > > > > > > > > > 2) Functional test will keep increasing in
> > number,
> > > > get
> > > > > > more
> > > > > > > > > > complex
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > take a longer and longer time to execute as we
> go
> > > > > along.
> > > > > > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > > > > > >     a) We want to be strict in enforcing the
> > > > pre-commit
> > > > > > > > > > > requirements,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > not penalize the contributor who has a minor
> fix.
> > > > > > > > > > > > > > >     b) All parts of the product (especially
> > various
> > > > > > > > 'certified'
> > > > > > > > > > > > storage
> > > > > > > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > > > > > > >     c) It should be easy to debug issues when a
> > > test
> > > > > > fails.
> > > > > > > > > Tests
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > fail deterministically. If a test fails, it
> > should
> > > > > always
> > > > > > > > fail
> > > > > > > > > > and
> > > > > > > > > > > > > always
> > > > > > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Some suggestions -
> > > > > > > > > > > > > > > 1) Tests should be a top-level maven module
> > within
> > > > the
> > > > > > > drill
> > > > > > > > > > > project
> > > > > > > > > > > > > > >         a) We want  the integration tests to
> run
> > as
> > > > > part
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > drill's
> > > > > > > > > > > > > > > maven build process
> > > > > > > > > > > > > > >         b) The build step for the
> > integration-tests
> > > > > > module
> > > > > > > > > would
> > > > > > > > > > > > launch
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > > > > > >         c) The tests will be a separate target
> so
> > > > they
> > > > > > need
> > > > > > > > not
> > > > > > > > > > be
> > > > > > > > > > > > run
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > the time
> > > > > > > > > > > > > > >  2) Tests should be divided into multiple
> suites
> > > that
> > > > > are
> > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > > > > > components. For example a test suite for
> testing
> > > > > > datatypes
> > > > > > > > will
> > > > > > > > > > > > contain
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > tests for various datatypes including complex
> > > types.
> > > > A
> > > > > > > > > > contributor
> > > > > > > > > > > or
> > > > > > > > > > > > > > > developer can then run these tests more
> > frequently
> > > as
> > > > > an
> > > > > > > > issue
> > > > > > > > > is
> > > > > > > > > > > > being
> > > > > > > > > > > > > > > addressed and run the entire suite only once
> > before
> > > > > > commit.
> > > > > > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS
> cluster
> > > and
> > > > > > post
> > > > > > > > the
> > > > > > > > > > > > results
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the JIRA  (Hive does this). Or some variant of
> > this
> > > > > idea.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Some questions -
> > > > > > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > > > > > 2) Are there any technologies we can leverage
> > that
> > > > will
> > > > > > > make
> > > > > > > > > this
> > > > > > > > > > > > > easier?
> > > > > > > > > > > > > > > 3) How do we make it easier to debug failing
> > tests.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please feel free to question the assumptions
> and
> > > > > > > > requirements.
> > > > > > > > > Be
> > > > > > > > > > > > > > creative
> > > > > > > > > > > > > > > with your suggestions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Parth
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ramana I N <in...@gmail.com>.
So what is the status on this? It would be nice to have this out with 1.2
coming out.

Regards
Ramana



On Wed, Aug 5, 2015 at 11:08 AM, Abhishek Girish <ab...@gmail.com>
wrote:

> Ramana,
>
> I think the issue with licenses is mostly resolved. It was discussed that
> for TPC-*, since we shall not be redistributing the data-gen software, but
> distributing a randomized variant of the data generated by it, we should be
> okay to include it part of our framework. For other datasets, we shall
> either provide their copy of license with our framework, or simply provide
> a link for users to download data before they execute.
>
> For now we should focus on having the framework out with minimal cleanup.
> In near future we can work on setting up infrastructure and enhancing the
> framework itself.
>
> -Abhishek
>
> On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N <inramana@gmail.com
> <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>> wrote:
>
> > @Jacques, Ted
> >
> > in the mean time, we risk patches being merged that have less than
> complete
> > > testing.
> >
> >
> > While I agree with the premise of getting the tests out as soon as
> possible
> > it does not help us achieve anything except transparency. Your statement
> > that getting the tests out will increase quality is dependent on someone
> > actually being able to run the tests once they have access to it.
> >
> > Maybe we should focus on making a jenkins job to run the tests publicly.
> > With that in place we can exclude the TPC* datasets as well as the yelp
> > data sets from the framework and avoid licensing issues.
> >
> > Regards
> > Ramana
> >
> >
> > On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish <
> > abhishek.girish@gmail.com
> > <javascript:_e(%7B%7D,'cvml','abhishek.girish@gmail.com');>>
> > wrote:
> >
> > > We not only re-distribute external data-sets as-is, but also include
> > > variants for those (text -> parquet, json, ...). So the challenge here
> is
> > > not simply disabling automatic downloads via the framework, and point
> > users
> > > to manually download the files before running the framework, but also
> > about
> > > how we will handle tests which require variants of the data sets. It
> > simply
> > > isn't practical to users of the framework to (1) download data-gen
> > manually
> > > (2) use specific seed / options before generating data, (3) convert
> them
> > to
> > > parquet, etc.. (4) move them to specific locations inside their copy of
> > the
> > > framework.
> > >
> > > Something we'll need to know is how other projects are handling
> > bench-mark
> > > & other external datasets.
> > >
> > > -Abhishek
> > >
> > > On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
> > > challapallirahul@gmail.com
> > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > >
> > > > Thanks for your inputs.
> > > >
> > > > Once issue with just publishing the tests in their current state is
> > that,
> > > > the framework re-distributes tpch, tpcds, yelp data sets without
> > > requiring
> > > > the users to accept their relevant licenses. A good number of tests
> > uses
> > > > these data sets. Any thoughts on how to handle this?
> > > >
> > > > - Rahul
> > > >
> > > > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <ted.dunning@gmail.com
> > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > wrote:
> > > >
> > > > > +1.  Get it out there.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <
> jacques@dremio.com
> > <javascript:_e(%7B%7D,'cvml','jacques@dremio.com');>>
> > > > > wrote:
> > > > >
> > > > > > Hey Rahul,
> > > > > >
> > > > > > My suggestion would be to the lower bar--do the absolute bare
> > minimum
> > > > to
> > > > > > get the tests out there.  For example, simply remove proprietary
> > > > > > information and then get it on a public github (whether your
> > personal
> > > > > > github or a corporate one).  From there, people can help by
> > > submitting
> > > > > pull
> > > > > > requests to improve the infrastructure and harness.  Making
> things
> > > > easier
> > > > > > is something that can be done over time.  For example, we've had
> > > offers
> > > > > > from a couple different Linux Admins to help on something.  I'm
> > sure
> > > > that
> > > > > > they could help with a number of the items you've identified.  In
> > the
> > > > > mean
> > > > > > time, we risk patches being merged that have less than complete
> > > > testing.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > > > > challapallirahul@gmail.com
> > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > >
> > > > > > > Jacques,
> > > > > > >
> > > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > > > > add/prioritize
> > > > > > > these tasks
> > > > > > >
> > > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Remove Proprietary Data & Queries
> > > > > > > 0
> > > > > > >
> > > > > > > Redact Propriety Data/Queries
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Move tests into drill repo
> > > > > > > This requires some refactoring to the framework code since the
> > test
> > > > > > > framework uses a 2-level directory structure
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Organize the tests using a label based approach
> > > > > > > This involves code changes and moving a lot of files. When
> doing
> > a
> > > > one
> > > > > > time
> > > > > > > push it might be better to do this before publishing the tests?
> > > > > > >
> > > > > > >
> > > > > > > Each suite should be independentSome suites wrongly assume that
> > the
> > > > > data
> > > > > > is
> > > > > > > present. They should be identified and fixed
> > > > > > >
> > > > > > >
> > > > > > > Cleanup hardcoded dependencies during data generationSome
> > data-gen
> > > > > > scripts
> > > > > > > have hard-coded references
> > > > > > >
> > > > > > >
> > > > > > > Cleanup downloadsThe same dataset is being downloaded multiple
> > > times
> > > > by
> > > > > > > different suites
> > > > > > >
> > > > > > >
> > > > > > > Licenses for downloadsThe framework downloads some files
> > > > automatically.
> > > > > > > These files are publicly available.
> > > > > > > However before downloading them users need to agree to certain
> > > terms.
> > > > > By
> > > > > > > using the framework users might be skipping this step. We
> should
> > > look
> > > > > > into
> > > > > > > this
> > > > > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > > > > >
> > > > > > >
> > > > > > > 3*Local debugging of tests*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Add an optional maven target for running tests on a local
> machine
> > > > > > > Tests can launch an embedded drillbit or they can connect to a
> > > > running
> > > > > > > drillbit through zookeeper
> > > > > > >
> > > > > > >
> > > > > > > Running suites which require additional setup (hive, hbase etc)
> > > > should
> > > > > be
> > > > > > > made optional
> > > > > > >
> > > > > > > 4*Documentation*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Running Tests (options available and also listing the asumed
> > > > defaults)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Explaining how tests are organized
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Process for adding a new suite
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <
> > > jacques@dremio.com <javascript:_e(%7B%7D,'cvml','jacques@dremio.com
> ');>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Let's get number one done (tests out there so all community
> > > members
> > > > > can
> > > > > > > run
> > > > > > > > them).  Then the whole community can work together to solve
> the
> > > > rest.
> > > > > > > >
> > > > > > > > I don't think the base install should include integration
> test
> > > > > > execution.
> > > > > > > > I do think the tests should be in the main repo (as opposed
> to
> > a
> > > > > > > > secondary).
> > > > > > > >
> > > > > > > > We should strive to ultimately make running these integration
> > > > tests a
> > > > > > > > requirement for merging.  We need to complete all the steps
> > > before
> > > > we
> > > > > > can
> > > > > > > > impose that.  I should be able to help on the global run
> > > component
> > > > > and
> > > > > > > > supporting infrastructure.
> > > > > > > >
> > > > > > > > J
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Jacques Nadeau
> > > > > > > > CTO and Co-Founder, Dremio
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > > > > challapallirahul@gmail.com
> > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > >
> > > > > > > > > Ramana,
> > > > > > > > >
> > > > > > > > > You are right. We are trying to address multiple issues
> here,
> > > but
> > > > > not
> > > > > > > > with
> > > > > > > > > a single solution. I am summarizing them
> > > > > > > > >
> > > > > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > > > > 2. Before applying a patch we should run tests in a
> clustered
> > > > > > > > environment.
> > > > > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > > > > 3. Developers should be able to debug majority of the tests
> > on
> > > > > their
> > > > > > > > local
> > > > > > > > > environment. I made a few suggestions above to this regard
> > > > > > > > >
> > > > > > > > > - Rahul
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <
> > > inramana@gmail.com <javascript:_e(%7B%7D,'cvml','inramana@gmail.com
> ');>
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > One important thing which we need to be clear on here is
> > what
> > > > are
> > > > > > we
> > > > > > > > > trying
> > > > > > > > > > to address?
> > > > > > > > > >
> > > > > > > > > > I feel there are two separate issues here and I do not
> > think
> > > > one
> > > > > > > > solution
> > > > > > > > > > will fit both the issues.
> > > > > > > > > >
> > > > > > > > > >    1. Allowing developers to run tests on their local box
> > so
> > > > they
> > > > > > > know
> > > > > > > > > the
> > > > > > > > > >    changes they have are not completely wrong.
> > > > > > > > > >    2. Allowing transparency in the integration tests
> > process
> > > > > which
> > > > > > is
> > > > > > > > > >    currently a black box.
> > > > > > > > > >
> > > > > > > > > > 1 is needed for developers to make changes and have an
> idea
> > > > that
> > > > > > > their
> > > > > > > > > > changes are not going to fail tests en masse in the
> > > integration
> > > > > > > suite.
> > > > > > > > 2
> > > > > > > > > is
> > > > > > > > > > needed because its a prerequisite for changes to be
> > > committed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > > Ramana
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > > > > challapallirahul@gmail.com
> > <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > > >
> > > > > > > > > > > Ramana,
> > > > > > > > > > >
> > > > > > > > > > > Let me fill in more details.
> > > > > > > > > > >
> > > > > > > > > > > 1. Before we accept a patch we want to make sure the
> > tests
> > > > run
> > > > > > in a
> > > > > > > > > > cluster
> > > > > > > > > > > environment. No exceptions here.
> > > > > > > > > > > 2. We want  the contributors to be able to debug the
> > > failing
> > > > > > tests
> > > > > > > on
> > > > > > > > > > their
> > > > > > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > > > > > >         1. Tests should run on top of a local file
> > system.
> > > > > (Tests
> > > > > > > can
> > > > > > > > > > > launch an embedded drillbit or they can connect to a
> > > running
> > > > > > > drillbit
> > > > > > > > > > > through zookeeper)
> > > > > > > > > > >         2. Running suites which require additional
> setup
> > > > (hive,
> > > > > > > hbase
> > > > > > > > > > etc)
> > > > > > > > > > > should be made optional and sufficient documentation
> > should
> > > > be
> > > > > > > > provided
> > > > > > > > > > for
> > > > > > > > > > > enabling and disabling these tests.
> > > > > > > > > > > 3. In my opinion making these new tests part of drill
> > would
> > > > > make
> > > > > > it
> > > > > > > > > > easier
> > > > > > > > > > > for the developers to debug and run tests instead of
> > > having a
> > > > > > > > different
> > > > > > > > > > > repository. But as you said it might bloat the drill
> > > project
> > > > > > > > > > >
> > > > > > > > > > > - Rahul
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > > > > ted.dunning@gmail.com
> > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > The Hadoop family of projects has some software that
> > > > > > integrates a
> > > > > > > > > > > > continuous integration system so that every time a
> JIRA
> > > is
> > > > > > marked
> > > > > > > > as
> > > > > > > > > > > > patch-available, the associated patch attached to the
> > bug
> > > > > will
> > > > > > > have
> > > > > > > > > > > > integration tests run against it.  I believe that
> there
> > > has
> > > > > > been
> > > > > > > > some
> > > > > > > > > > > > process to use git hashes instead of patches.  The CI
> > > > results
> > > > > > are
> > > > > > > > put
> > > > > > > > > > > back
> > > > > > > > > > > > on the JIRA.
> > > > > > > > > > > >
> > > > > > > > > > > > This is done using a fairly simple set of scripts.
> > > Apache
> > > > > > Yetus
> > > > > > > is
> > > > > > > > > > just
> > > > > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > > > > > >
> > > > > > > > > > > > Proposal is here (don't be fooled by the fact that it
> > > looks
> > > > > > like
> > > > > > > an
> > > > > > > > > > > > incubation proposal):
> > > > > > > > > > > >
> > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > > > > >
> > > > > > > > > > > > Early code can be found here (don't guess that this
> is
> > > very
> > > > > > real
> > > > > > > > > yet).
> > > > > > > > > > > > More links can be found in the proposal.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > > > > >
> > > > > > > > > > > > The project has not yet been formed and there are no
> > > > mailing
> > > > > > > lists
> > > > > > > > or
> > > > > > > > > > git
> > > > > > > > > > > > repo yet.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > > > > inramana@gmail.com
> > <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > As someone who worked on this for a while,
> including
> > it
> > > > as
> > > > > > part
> > > > > > > > of
> > > > > > > > > > > drill
> > > > > > > > > > > > > may bloat drill a bit too much. Also not a big fan
> of
> > > > > running
> > > > > > > > > against
> > > > > > > > > > > an
> > > > > > > > > > > > > embedded drillbit. Does not replicate an actual
> > > > production
> > > > > > use
> > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Additionally, setting up hive hbase and other
> > > components
> > > > > > maybe
> > > > > > > > > > painful
> > > > > > > > > > > > and
> > > > > > > > > > > > > unnecessary for most ppl. It would deter people
> from
> > > ever
> > > > > > > > > > contributing
> > > > > > > > > > > to
> > > > > > > > > > > > > drill. We could spin up in memory hive and hbase
> but
> > > > that's
> > > > > > > > similar
> > > > > > > > > > to
> > > > > > > > > > > an
> > > > > > > > > > > > > embedded drill bit. Does not replicate a production
> > > > > scenario.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Would prefer the hive way with a central Jenkins
> > server
> > > > > > hosted
> > > > > > > on
> > > > > > > > > aws
> > > > > > > > > > > and
> > > > > > > > > > > > > accessible to everyone.  Users should be able to
> > > submit a
> > > > > git
> > > > > > > url
> > > > > > > > > and
> > > > > > > > > > > > that
> > > > > > > > > > > > > should be able to deploy and fire off tests. Should
> > > then
> > > > > > have a
> > > > > > > > way
> > > > > > > > > > to
> > > > > > > > > > > > > easily communicate failures to contributors and if
> > > > success
> > > > > > > notify
> > > > > > > > > the
> > > > > > > > > > > > > commiters to commit the change.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ps: if hive's way is open source maybe we can look
> > into
> > > > > reuse
> > > > > > > > > rather
> > > > > > > > > > > than
> > > > > > > > > > > > > doing it from scratch. Esp the Jenkins and
> > > configuration
> > > > > > stuff.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards
> > > > > > > > > > > > > Ramana
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > > > > parthc@apache.org
> > <javascript:_e(%7B%7D,'cvml','parthc@apache.org');>
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Drill devs use a set of tests that are not
> > available
> > > as
> > > > > > part
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > distribution. These tests are a pre-requisite for
> > all
> > > > > > > commits,
> > > > > > > > > but
> > > > > > > > > > > are
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > available to any contributors outside the current
> > > devs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This thread is to discuss various options to make
> > > these
> > > > > > tests
> > > > > > > > > > > > available.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > > > > 1) A functional test (as opposed to a unit test)
> > > needs
> > > > to
> > > > > > be
> > > > > > > > > closer
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > end user environment than a development
> > environment.
> > > As
> > > > > > such,
> > > > > > > > we
> > > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > running functional tests in a cluster
> environment,
> > > > > connect
> > > > > > > > using
> > > > > > > > > > > > > zookeeper
> > > > > > > > > > > > > > etc.
> > > > > > > > > > > > > > 2) Functional test will keep increasing in
> number,
> > > get
> > > > > more
> > > > > > > > > complex
> > > > > > > > > > > and
> > > > > > > > > > > > > > take a longer and longer time to execute as we go
> > > > along.
> > > > > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > > > > >     a) We want to be strict in enforcing the
> > > pre-commit
> > > > > > > > > > requirements,
> > > > > > > > > > > > but
> > > > > > > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > > > > > > >     b) All parts of the product (especially
> various
> > > > > > > 'certified'
> > > > > > > > > > > storage
> > > > > > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > > > > > >     c) It should be easy to debug issues when a
> > test
> > > > > fails.
> > > > > > > > Tests
> > > > > > > > > > > > should
> > > > > > > > > > > > > > fail deterministically. If a test fails, it
> should
> > > > always
> > > > > > > fail
> > > > > > > > > and
> > > > > > > > > > > > always
> > > > > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Some suggestions -
> > > > > > > > > > > > > > 1) Tests should be a top-level maven module
> within
> > > the
> > > > > > drill
> > > > > > > > > > project
> > > > > > > > > > > > > >         a) We want  the integration tests to run
> as
> > > > part
> > > > > of
> > > > > > > the
> > > > > > > > > > > drill's
> > > > > > > > > > > > > > maven build process
> > > > > > > > > > > > > >         b) The build step for the
> integration-tests
> > > > > module
> > > > > > > > would
> > > > > > > > > > > launch
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > > > > >         c) The tests will be a separate target so
> > > they
> > > > > need
> > > > > > > not
> > > > > > > > > be
> > > > > > > > > > > run
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > the time
> > > > > > > > > > > > > >  2) Tests should be divided into multiple suites
> > that
> > > > are
> > > > > > > based
> > > > > > > > > on
> > > > > > > > > > > > > > components. For example a test suite for testing
> > > > > datatypes
> > > > > > > will
> > > > > > > > > > > contain
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > tests for various datatypes including complex
> > types.
> > > A
> > > > > > > > > contributor
> > > > > > > > > > or
> > > > > > > > > > > > > > developer can then run these tests more
> frequently
> > as
> > > > an
> > > > > > > issue
> > > > > > > > is
> > > > > > > > > > > being
> > > > > > > > > > > > > > addressed and run the entire suite only once
> before
> > > > > commit.
> > > > > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster
> > and
> > > > > post
> > > > > > > the
> > > > > > > > > > > results
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the JIRA  (Hive does this). Or some variant of
> this
> > > > idea.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Some questions -
> > > > > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > > > > 2) Are there any technologies we can leverage
> that
> > > will
> > > > > > make
> > > > > > > > this
> > > > > > > > > > > > easier?
> > > > > > > > > > > > > > 3) How do we make it easier to debug failing
> tests.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please feel free to question the assumptions and
> > > > > > > requirements.
> > > > > > > > Be
> > > > > > > > > > > > > creative
> > > > > > > > > > > > > > with your suggestions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Parth
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

[DISCUSS] Publishing advanced/functional tests

Posted by Abhishek Girish <ab...@gmail.com>.
Ramana,

I think the issue with licenses is mostly resolved. It was discussed that
for TPC-*, since we shall not be redistributing the data-gen software, but
distributing a randomized variant of the data generated by it, we should be
okay to include it part of our framework. For other datasets, we shall
either provide their copy of license with our framework, or simply provide
a link for users to download data before they execute.

For now we should focus on having the framework out with minimal cleanup.
In near future we can work on setting up infrastructure and enhancing the
framework itself.

-Abhishek

On Wed, Aug 5, 2015 at 10:46 AM, Ramana I N <inramana@gmail.com
<javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>> wrote:

> @Jacques, Ted
>
> in the mean time, we risk patches being merged that have less than complete
> > testing.
>
>
> While I agree with the premise of getting the tests out as soon as possible
> it does not help us achieve anything except transparency. Your statement
> that getting the tests out will increase quality is dependent on someone
> actually being able to run the tests once they have access to it.
>
> Maybe we should focus on making a jenkins job to run the tests publicly.
> With that in place we can exclude the TPC* datasets as well as the yelp
> data sets from the framework and avoid licensing issues.
>
> Regards
> Ramana
>
>
> On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish <
> abhishek.girish@gmail.com
> <javascript:_e(%7B%7D,'cvml','abhishek.girish@gmail.com');>>
> wrote:
>
> > We not only re-distribute external data-sets as-is, but also include
> > variants for those (text -> parquet, json, ...). So the challenge here is
> > not simply disabling automatic downloads via the framework, and point
> users
> > to manually download the files before running the framework, but also
> about
> > how we will handle tests which require variants of the data sets. It
> simply
> > isn't practical to users of the framework to (1) download data-gen
> manually
> > (2) use specific seed / options before generating data, (3) convert them
> to
> > parquet, etc.. (4) move them to specific locations inside their copy of
> the
> > framework.
> >
> > Something we'll need to know is how other projects are handling
> bench-mark
> > & other external datasets.
> >
> > -Abhishek
> >
> > On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
> > challapallirahul@gmail.com
> <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> >
> > > Thanks for your inputs.
> > >
> > > Once issue with just publishing the tests in their current state is
> that,
> > > the framework re-distributes tpch, tpcds, yelp data sets without
> > requiring
> > > the users to accept their relevant licenses. A good number of tests
> uses
> > > these data sets. Any thoughts on how to handle this?
> > >
> > > - Rahul
> > >
> > > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <ted.dunning@gmail.com
> <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > wrote:
> > >
> > > > +1.  Get it out there.
> > > >
> > > >
> > > >
> > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <jacques@dremio.com
> <javascript:_e(%7B%7D,'cvml','jacques@dremio.com');>>
> > > > wrote:
> > > >
> > > > > Hey Rahul,
> > > > >
> > > > > My suggestion would be to the lower bar--do the absolute bare
> minimum
> > > to
> > > > > get the tests out there.  For example, simply remove proprietary
> > > > > information and then get it on a public github (whether your
> personal
> > > > > github or a corporate one).  From there, people can help by
> > submitting
> > > > pull
> > > > > requests to improve the infrastructure and harness.  Making things
> > > easier
> > > > > is something that can be done over time.  For example, we've had
> > offers
> > > > > from a couple different Linux Admins to help on something.  I'm
> sure
> > > that
> > > > > they could help with a number of the items you've identified.  In
> the
> > > > mean
> > > > > time, we risk patches being merged that have less than complete
> > > testing.
> > > > >
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > > > challapallirahul@gmail.com
> <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > >
> > > > > > Jacques,
> > > > > >
> > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > > > add/prioritize
> > > > > > these tasks
> > > > > >
> > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Remove Proprietary Data & Queries
> > > > > > 0
> > > > > >
> > > > > > Redact Propriety Data/Queries
> > > > > >
> > > > > >
> > > > > >
> > > > > > Move tests into drill repo
> > > > > > This requires some refactoring to the framework code since the
> test
> > > > > > framework uses a 2-level directory structure
> > > > > >
> > > > > >
> > > > > >
> > > > > > Organize the tests using a label based approach
> > > > > > This involves code changes and moving a lot of files. When doing
> a
> > > one
> > > > > time
> > > > > > push it might be better to do this before publishing the tests?
> > > > > >
> > > > > >
> > > > > > Each suite should be independentSome suites wrongly assume that
> the
> > > > data
> > > > > is
> > > > > > present. They should be identified and fixed
> > > > > >
> > > > > >
> > > > > > Cleanup hardcoded dependencies during data generationSome
> data-gen
> > > > > scripts
> > > > > > have hard-coded references
> > > > > >
> > > > > >
> > > > > > Cleanup downloadsThe same dataset is being downloaded multiple
> > times
> > > by
> > > > > > different suites
> > > > > >
> > > > > >
> > > > > > Licenses for downloadsThe framework downloads some files
> > > automatically.
> > > > > > These files are publicly available.
> > > > > > However before downloading them users need to agree to certain
> > terms.
> > > > By
> > > > > > using the framework users might be skipping this step. We should
> > look
> > > > > into
> > > > > > this
> > > > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > > > >
> > > > > >
> > > > > > 3*Local debugging of tests*
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Add an optional maven target for running tests on a local machine
> > > > > > Tests can launch an embedded drillbit or they can connect to a
> > > running
> > > > > > drillbit through zookeeper
> > > > > >
> > > > > >
> > > > > > Running suites which require additional setup (hive, hbase etc)
> > > should
> > > > be
> > > > > > made optional
> > > > > >
> > > > > > 4*Documentation*
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Running Tests (options available and also listing the asumed
> > > defaults)
> > > > > >
> > > > > >
> > > > > >
> > > > > > Explaining how tests are organized
> > > > > >
> > > > > >
> > > > > >
> > > > > > Process for adding a new suite
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <
> > jacques@dremio.com <javascript:_e(%7B%7D,'cvml','jacques@dremio.com');>>
> > > > > > wrote:
> > > > > >
> > > > > > > Let's get number one done (tests out there so all community
> > members
> > > > can
> > > > > > run
> > > > > > > them).  Then the whole community can work together to solve the
> > > rest.
> > > > > > >
> > > > > > > I don't think the base install should include integration test
> > > > > execution.
> > > > > > > I do think the tests should be in the main repo (as opposed to
> a
> > > > > > > secondary).
> > > > > > >
> > > > > > > We should strive to ultimately make running these integration
> > > tests a
> > > > > > > requirement for merging.  We need to complete all the steps
> > before
> > > we
> > > > > can
> > > > > > > impose that.  I should be able to help on the global run
> > component
> > > > and
> > > > > > > supporting infrastructure.
> > > > > > >
> > > > > > > J
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jacques Nadeau
> > > > > > > CTO and Co-Founder, Dremio
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > > > challapallirahul@gmail.com
> <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > >
> > > > > > > > Ramana,
> > > > > > > >
> > > > > > > > You are right. We are trying to address multiple issues here,
> > but
> > > > not
> > > > > > > with
> > > > > > > > a single solution. I am summarizing them
> > > > > > > >
> > > > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > > > 2. Before applying a patch we should run tests in a clustered
> > > > > > > environment.
> > > > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > > > 3. Developers should be able to debug majority of the tests
> on
> > > > their
> > > > > > > local
> > > > > > > > environment. I made a few suggestions above to this regard
> > > > > > > >
> > > > > > > > - Rahul
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <
> > inramana@gmail.com <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > One important thing which we need to be clear on here is
> what
> > > are
> > > > > we
> > > > > > > > trying
> > > > > > > > > to address?
> > > > > > > > >
> > > > > > > > > I feel there are two separate issues here and I do not
> think
> > > one
> > > > > > > solution
> > > > > > > > > will fit both the issues.
> > > > > > > > >
> > > > > > > > >    1. Allowing developers to run tests on their local box
> so
> > > they
> > > > > > know
> > > > > > > > the
> > > > > > > > >    changes they have are not completely wrong.
> > > > > > > > >    2. Allowing transparency in the integration tests
> process
> > > > which
> > > > > is
> > > > > > > > >    currently a black box.
> > > > > > > > >
> > > > > > > > > 1 is needed for developers to make changes and have an idea
> > > that
> > > > > > their
> > > > > > > > > changes are not going to fail tests en masse in the
> > integration
> > > > > > suite.
> > > > > > > 2
> > > > > > > > is
> > > > > > > > > needed because its a prerequisite for changes to be
> > committed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Ramana
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > > > challapallirahul@gmail.com
> <javascript:_e(%7B%7D,'cvml','challapallirahul@gmail.com');>> wrote:
> > > > > > > > >
> > > > > > > > > > Ramana,
> > > > > > > > > >
> > > > > > > > > > Let me fill in more details.
> > > > > > > > > >
> > > > > > > > > > 1. Before we accept a patch we want to make sure the
> tests
> > > run
> > > > > in a
> > > > > > > > > cluster
> > > > > > > > > > environment. No exceptions here.
> > > > > > > > > > 2. We want  the contributors to be able to debug the
> > failing
> > > > > tests
> > > > > > on
> > > > > > > > > their
> > > > > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > > > > >         1. Tests should run on top of a local file
> system.
> > > > (Tests
> > > > > > can
> > > > > > > > > > launch an embedded drillbit or they can connect to a
> > running
> > > > > > drillbit
> > > > > > > > > > through zookeeper)
> > > > > > > > > >         2. Running suites which require additional setup
> > > (hive,
> > > > > > hbase
> > > > > > > > > etc)
> > > > > > > > > > should be made optional and sufficient documentation
> should
> > > be
> > > > > > > provided
> > > > > > > > > for
> > > > > > > > > > enabling and disabling these tests.
> > > > > > > > > > 3. In my opinion making these new tests part of drill
> would
> > > > make
> > > > > it
> > > > > > > > > easier
> > > > > > > > > > for the developers to debug and run tests instead of
> > having a
> > > > > > > different
> > > > > > > > > > repository. But as you said it might bloat the drill
> > project
> > > > > > > > > >
> > > > > > > > > > - Rahul
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > > > ted.dunning@gmail.com
> <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > The Hadoop family of projects has some software that
> > > > > integrates a
> > > > > > > > > > > continuous integration system so that every time a JIRA
> > is
> > > > > marked
> > > > > > > as
> > > > > > > > > > > patch-available, the associated patch attached to the
> bug
> > > > will
> > > > > > have
> > > > > > > > > > > integration tests run against it.  I believe that there
> > has
> > > > > been
> > > > > > > some
> > > > > > > > > > > process to use git hashes instead of patches.  The CI
> > > results
> > > > > are
> > > > > > > put
> > > > > > > > > > back
> > > > > > > > > > > on the JIRA.
> > > > > > > > > > >
> > > > > > > > > > > This is done using a fairly simple set of scripts.
> > Apache
> > > > > Yetus
> > > > > > is
> > > > > > > > > just
> > > > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > > > > >
> > > > > > > > > > > Proposal is here (don't be fooled by the fact that it
> > looks
> > > > > like
> > > > > > an
> > > > > > > > > > > incubation proposal):
> > > > > > > > > > >
> > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > > > >
> > > > > > > > > > > Early code can be found here (don't guess that this is
> > very
> > > > > real
> > > > > > > > yet).
> > > > > > > > > > > More links can be found in the proposal.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > > > >
> > > > > > > > > > > The project has not yet been formed and there are no
> > > mailing
> > > > > > lists
> > > > > > > or
> > > > > > > > > git
> > > > > > > > > > > repo yet.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > > > inramana@gmail.com
> <javascript:_e(%7B%7D,'cvml','inramana@gmail.com');>>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > As someone who worked on this for a while, including
> it
> > > as
> > > > > part
> > > > > > > of
> > > > > > > > > > drill
> > > > > > > > > > > > may bloat drill a bit too much. Also not a big fan of
> > > > running
> > > > > > > > against
> > > > > > > > > > an
> > > > > > > > > > > > embedded drillbit. Does not replicate an actual
> > > production
> > > > > use
> > > > > > > > case.
> > > > > > > > > > > >
> > > > > > > > > > > > Additionally, setting up hive hbase and other
> > components
> > > > > maybe
> > > > > > > > > painful
> > > > > > > > > > > and
> > > > > > > > > > > > unnecessary for most ppl. It would deter people from
> > ever
> > > > > > > > > contributing
> > > > > > > > > > to
> > > > > > > > > > > > drill. We could spin up in memory hive and hbase but
> > > that's
> > > > > > > similar
> > > > > > > > > to
> > > > > > > > > > an
> > > > > > > > > > > > embedded drill bit. Does not replicate a production
> > > > scenario.
> > > > > > > > > > > >
> > > > > > > > > > > > Would prefer the hive way with a central Jenkins
> server
> > > > > hosted
> > > > > > on
> > > > > > > > aws
> > > > > > > > > > and
> > > > > > > > > > > > accessible to everyone.  Users should be able to
> > submit a
> > > > git
> > > > > > url
> > > > > > > > and
> > > > > > > > > > > that
> > > > > > > > > > > > should be able to deploy and fire off tests. Should
> > then
> > > > > have a
> > > > > > > way
> > > > > > > > > to
> > > > > > > > > > > > easily communicate failures to contributors and if
> > > success
> > > > > > notify
> > > > > > > > the
> > > > > > > > > > > > commiters to commit the change.
> > > > > > > > > > > >
> > > > > > > > > > > > Ps: if hive's way is open source maybe we can look
> into
> > > > reuse
> > > > > > > > rather
> > > > > > > > > > than
> > > > > > > > > > > > doing it from scratch. Esp the Jenkins and
> > configuration
> > > > > stuff.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards
> > > > > > > > > > > > Ramana
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > > > parthc@apache.org
> <javascript:_e(%7B%7D,'cvml','parthc@apache.org');>
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Drill devs use a set of tests that are not
> available
> > as
> > > > > part
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > Apache
> > > > > > > > > > > > > distribution. These tests are a pre-requisite for
> all
> > > > > > commits,
> > > > > > > > but
> > > > > > > > > > are
> > > > > > > > > > > > not
> > > > > > > > > > > > > available to any contributors outside the current
> > devs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This thread is to discuss various options to make
> > these
> > > > > tests
> > > > > > > > > > > available.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > > > 1) A functional test (as opposed to a unit test)
> > needs
> > > to
> > > > > be
> > > > > > > > closer
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > end user environment than a development
> environment.
> > As
> > > > > such,
> > > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > > be
> > > > > > > > > > > > > running functional tests in a cluster environment,
> > > > connect
> > > > > > > using
> > > > > > > > > > > > zookeeper
> > > > > > > > > > > > > etc.
> > > > > > > > > > > > > 2) Functional test will keep increasing in number,
> > get
> > > > more
> > > > > > > > complex
> > > > > > > > > > and
> > > > > > > > > > > > > take a longer and longer time to execute as we go
> > > along.
> > > > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > > > >     a) We want to be strict in enforcing the
> > pre-commit
> > > > > > > > > requirements,
> > > > > > > > > > > but
> > > > > > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > > > > > >     b) All parts of the product (especially various
> > > > > > 'certified'
> > > > > > > > > > storage
> > > > > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > > > > >     c) It should be easy to debug issues when a
> test
> > > > fails.
> > > > > > > Tests
> > > > > > > > > > > should
> > > > > > > > > > > > > fail deterministically. If a test fails, it should
> > > always
> > > > > > fail
> > > > > > > > and
> > > > > > > > > > > always
> > > > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some suggestions -
> > > > > > > > > > > > > 1) Tests should be a top-level maven module within
> > the
> > > > > drill
> > > > > > > > > project
> > > > > > > > > > > > >         a) We want  the integration tests to run as
> > > part
> > > > of
> > > > > > the
> > > > > > > > > > drill's
> > > > > > > > > > > > > maven build process
> > > > > > > > > > > > >         b) The build step for the integration-tests
> > > > module
> > > > > > > would
> > > > > > > > > > launch
> > > > > > > > > > > > an
> > > > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > > > >         c) The tests will be a separate target so
> > they
> > > > need
> > > > > > not
> > > > > > > > be
> > > > > > > > > > run
> > > > > > > > > > > > all
> > > > > > > > > > > > > the time
> > > > > > > > > > > > >  2) Tests should be divided into multiple suites
> that
> > > are
> > > > > > based
> > > > > > > > on
> > > > > > > > > > > > > components. For example a test suite for testing
> > > > datatypes
> > > > > > will
> > > > > > > > > > contain
> > > > > > > > > > > > the
> > > > > > > > > > > > > tests for various datatypes including complex
> types.
> > A
> > > > > > > > contributor
> > > > > > > > > or
> > > > > > > > > > > > > developer can then run these tests more frequently
> as
> > > an
> > > > > > issue
> > > > > > > is
> > > > > > > > > > being
> > > > > > > > > > > > > addressed and run the entire suite only once before
> > > > commit.
> > > > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster
> and
> > > > post
> > > > > > the
> > > > > > > > > > results
> > > > > > > > > > > to
> > > > > > > > > > > > > the JIRA  (Hive does this). Or some variant of this
> > > idea.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some questions -
> > > > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > > > 2) Are there any technologies we can leverage that
> > will
> > > > > make
> > > > > > > this
> > > > > > > > > > > easier?
> > > > > > > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please feel free to question the assumptions and
> > > > > > requirements.
> > > > > > > Be
> > > > > > > > > > > > creative
> > > > > > > > > > > > > with your suggestions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Parth
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ramana I N <in...@gmail.com>.
@Jacques, Ted

in the mean time, we risk patches being merged that have less than complete
> testing.


While I agree with the premise of getting the tests out as soon as possible
it does not help us achieve anything except transparency. Your statement
that getting the tests out will increase quality is dependent on someone
actually being able to run the tests once they have access to it.

Maybe we should focus on making a jenkins job to run the tests publicly.
With that in place we can exclude the TPC* datasets as well as the yelp
data sets from the framework and avoid licensing issues.

Regards
Ramana


On Tue, Aug 4, 2015 at 11:39 AM, Abhishek Girish <ab...@gmail.com>
wrote:

> We not only re-distribute external data-sets as-is, but also include
> variants for those (text -> parquet, json, ...). So the challenge here is
> not simply disabling automatic downloads via the framework, and point users
> to manually download the files before running the framework, but also about
> how we will handle tests which require variants of the data sets. It simply
> isn't practical to users of the framework to (1) download data-gen manually
> (2) use specific seed / options before generating data, (3) convert them to
> parquet, etc.. (4) move them to specific locations inside their copy of the
> framework.
>
> Something we'll need to know is how other projects are handling bench-mark
> & other external datasets.
>
> -Abhishek
>
> On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Thanks for your inputs.
> >
> > Once issue with just publishing the tests in their current state is that,
> > the framework re-distributes tpch, tpcds, yelp data sets without
> requiring
> > the users to accept their relevant licenses. A good number of tests uses
> > these data sets. Any thoughts on how to handle this?
> >
> > - Rahul
> >
> > On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > +1.  Get it out there.
> > >
> > >
> > >
> > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <ja...@dremio.com>
> > > wrote:
> > >
> > > > Hey Rahul,
> > > >
> > > > My suggestion would be to the lower bar--do the absolute bare minimum
> > to
> > > > get the tests out there.  For example, simply remove proprietary
> > > > information and then get it on a public github (whether your personal
> > > > github or a corporate one).  From there, people can help by
> submitting
> > > pull
> > > > requests to improve the infrastructure and harness.  Making things
> > easier
> > > > is something that can be done over time.  For example, we've had
> offers
> > > > from a couple different Linux Admins to help on something.  I'm sure
> > that
> > > > they could help with a number of the items you've identified.  In the
> > > mean
> > > > time, we risk patches being merged that have less than complete
> > testing.
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > > challapallirahul@gmail.com> wrote:
> > > >
> > > > > Jacques,
> > > > >
> > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > > add/prioritize
> > > > > these tasks
> > > > >
> > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Remove Proprietary Data & Queries
> > > > > 0
> > > > >
> > > > > Redact Propriety Data/Queries
> > > > >
> > > > >
> > > > >
> > > > > Move tests into drill repo
> > > > > This requires some refactoring to the framework code since the test
> > > > > framework uses a 2-level directory structure
> > > > >
> > > > >
> > > > >
> > > > > Organize the tests using a label based approach
> > > > > This involves code changes and moving a lot of files. When doing a
> > one
> > > > time
> > > > > push it might be better to do this before publishing the tests?
> > > > >
> > > > >
> > > > > Each suite should be independentSome suites wrongly assume that the
> > > data
> > > > is
> > > > > present. They should be identified and fixed
> > > > >
> > > > >
> > > > > Cleanup hardcoded dependencies during data generationSome data-gen
> > > > scripts
> > > > > have hard-coded references
> > > > >
> > > > >
> > > > > Cleanup downloadsThe same dataset is being downloaded multiple
> times
> > by
> > > > > different suites
> > > > >
> > > > >
> > > > > Licenses for downloadsThe framework downloads some files
> > automatically.
> > > > > These files are publicly available.
> > > > > However before downloading them users need to agree to certain
> terms.
> > > By
> > > > > using the framework users might be skipping this step. We should
> look
> > > > into
> > > > > this
> > > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > > >
> > > > >
> > > > > 3*Local debugging of tests*
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Add an optional maven target for running tests on a local machine
> > > > > Tests can launch an embedded drillbit or they can connect to a
> > running
> > > > > drillbit through zookeeper
> > > > >
> > > > >
> > > > > Running suites which require additional setup (hive, hbase etc)
> > should
> > > be
> > > > > made optional
> > > > >
> > > > > 4*Documentation*
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Running Tests (options available and also listing the asumed
> > defaults)
> > > > >
> > > > >
> > > > >
> > > > > Explaining how tests are organized
> > > > >
> > > > >
> > > > >
> > > > > Process for adding a new suite
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <
> jacques@dremio.com>
> > > > > wrote:
> > > > >
> > > > > > Let's get number one done (tests out there so all community
> members
> > > can
> > > > > run
> > > > > > them).  Then the whole community can work together to solve the
> > rest.
> > > > > >
> > > > > > I don't think the base install should include integration test
> > > > execution.
> > > > > > I do think the tests should be in the main repo (as opposed to a
> > > > > > secondary).
> > > > > >
> > > > > > We should strive to ultimately make running these integration
> > tests a
> > > > > > requirement for merging.  We need to complete all the steps
> before
> > we
> > > > can
> > > > > > impose that.  I should be able to help on the global run
> component
> > > and
> > > > > > supporting infrastructure.
> > > > > >
> > > > > > J
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > > challapallirahul@gmail.com> wrote:
> > > > > >
> > > > > > > Ramana,
> > > > > > >
> > > > > > > You are right. We are trying to address multiple issues here,
> but
> > > not
> > > > > > with
> > > > > > > a single solution. I am summarizing them
> > > > > > >
> > > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > > 2. Before applying a patch we should run tests in a clustered
> > > > > > environment.
> > > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > > 3. Developers should be able to debug majority of the tests on
> > > their
> > > > > > local
> > > > > > > environment. I made a few suggestions above to this regard
> > > > > > >
> > > > > > > - Rahul
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <
> inramana@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > One important thing which we need to be clear on here is what
> > are
> > > > we
> > > > > > > trying
> > > > > > > > to address?
> > > > > > > >
> > > > > > > > I feel there are two separate issues here and I do not think
> > one
> > > > > > solution
> > > > > > > > will fit both the issues.
> > > > > > > >
> > > > > > > >    1. Allowing developers to run tests on their local box so
> > they
> > > > > know
> > > > > > > the
> > > > > > > >    changes they have are not completely wrong.
> > > > > > > >    2. Allowing transparency in the integration tests process
> > > which
> > > > is
> > > > > > > >    currently a black box.
> > > > > > > >
> > > > > > > > 1 is needed for developers to make changes and have an idea
> > that
> > > > > their
> > > > > > > > changes are not going to fail tests en masse in the
> integration
> > > > > suite.
> > > > > > 2
> > > > > > > is
> > > > > > > > needed because its a prerequisite for changes to be
> committed.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Ramana
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > > challapallirahul@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Ramana,
> > > > > > > > >
> > > > > > > > > Let me fill in more details.
> > > > > > > > >
> > > > > > > > > 1. Before we accept a patch we want to make sure the tests
> > run
> > > > in a
> > > > > > > > cluster
> > > > > > > > > environment. No exceptions here.
> > > > > > > > > 2. We want  the contributors to be able to debug the
> failing
> > > > tests
> > > > > on
> > > > > > > > their
> > > > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > > > >         1. Tests should run on top of a local file system.
> > > (Tests
> > > > > can
> > > > > > > > > launch an embedded drillbit or they can connect to a
> running
> > > > > drillbit
> > > > > > > > > through zookeeper)
> > > > > > > > >         2. Running suites which require additional setup
> > (hive,
> > > > > hbase
> > > > > > > > etc)
> > > > > > > > > should be made optional and sufficient documentation should
> > be
> > > > > > provided
> > > > > > > > for
> > > > > > > > > enabling and disabling these tests.
> > > > > > > > > 3. In my opinion making these new tests part of drill would
> > > make
> > > > it
> > > > > > > > easier
> > > > > > > > > for the developers to debug and run tests instead of
> having a
> > > > > > different
> > > > > > > > > repository. But as you said it might bloat the drill
> project
> > > > > > > > >
> > > > > > > > > - Rahul
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > > ted.dunning@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The Hadoop family of projects has some software that
> > > > integrates a
> > > > > > > > > > continuous integration system so that every time a JIRA
> is
> > > > marked
> > > > > > as
> > > > > > > > > > patch-available, the associated patch attached to the bug
> > > will
> > > > > have
> > > > > > > > > > integration tests run against it.  I believe that there
> has
> > > > been
> > > > > > some
> > > > > > > > > > process to use git hashes instead of patches.  The CI
> > results
> > > > are
> > > > > > put
> > > > > > > > > back
> > > > > > > > > > on the JIRA.
> > > > > > > > > >
> > > > > > > > > > This is done using a fairly simple set of scripts.
> Apache
> > > > Yetus
> > > > > is
> > > > > > > > just
> > > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > > > >
> > > > > > > > > > Proposal is here (don't be fooled by the fact that it
> looks
> > > > like
> > > > > an
> > > > > > > > > > incubation proposal):
> > > > > > > > > >
> > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > > >
> > > > > > > > > > Early code can be found here (don't guess that this is
> very
> > > > real
> > > > > > > yet).
> > > > > > > > > > More links can be found in the proposal.
> > > > > > > > > >
> > > > > > > > > >
> > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > > >
> > > > > > > > > > The project has not yet been formed and there are no
> > mailing
> > > > > lists
> > > > > > or
> > > > > > > > git
> > > > > > > > > > repo yet.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > > inramana@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > As someone who worked on this for a while, including it
> > as
> > > > part
> > > > > > of
> > > > > > > > > drill
> > > > > > > > > > > may bloat drill a bit too much. Also not a big fan of
> > > running
> > > > > > > against
> > > > > > > > > an
> > > > > > > > > > > embedded drillbit. Does not replicate an actual
> > production
> > > > use
> > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > > Additionally, setting up hive hbase and other
> components
> > > > maybe
> > > > > > > > painful
> > > > > > > > > > and
> > > > > > > > > > > unnecessary for most ppl. It would deter people from
> ever
> > > > > > > > contributing
> > > > > > > > > to
> > > > > > > > > > > drill. We could spin up in memory hive and hbase but
> > that's
> > > > > > similar
> > > > > > > > to
> > > > > > > > > an
> > > > > > > > > > > embedded drill bit. Does not replicate a production
> > > scenario.
> > > > > > > > > > >
> > > > > > > > > > > Would prefer the hive way with a central Jenkins server
> > > > hosted
> > > > > on
> > > > > > > aws
> > > > > > > > > and
> > > > > > > > > > > accessible to everyone.  Users should be able to
> submit a
> > > git
> > > > > url
> > > > > > > and
> > > > > > > > > > that
> > > > > > > > > > > should be able to deploy and fire off tests. Should
> then
> > > > have a
> > > > > > way
> > > > > > > > to
> > > > > > > > > > > easily communicate failures to contributors and if
> > success
> > > > > notify
> > > > > > > the
> > > > > > > > > > > commiters to commit the change.
> > > > > > > > > > >
> > > > > > > > > > > Ps: if hive's way is open source maybe we can look into
> > > reuse
> > > > > > > rather
> > > > > > > > > than
> > > > > > > > > > > doing it from scratch. Esp the Jenkins and
> configuration
> > > > stuff.
> > > > > > > > > > >
> > > > > > > > > > > Regards
> > > > > > > > > > > Ramana
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > > parthc@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Drill devs use a set of tests that are not available
> as
> > > > part
> > > > > of
> > > > > > > the
> > > > > > > > > > > Apache
> > > > > > > > > > > > distribution. These tests are a pre-requisite for all
> > > > > commits,
> > > > > > > but
> > > > > > > > > are
> > > > > > > > > > > not
> > > > > > > > > > > > available to any contributors outside the current
> devs.
> > > > > > > > > > > >
> > > > > > > > > > > > This thread is to discuss various options to make
> these
> > > > tests
> > > > > > > > > > available.
> > > > > > > > > > > >
> > > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > > 1) A functional test (as opposed to a unit test)
> needs
> > to
> > > > be
> > > > > > > closer
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > end user environment than a development environment.
> As
> > > > such,
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > running functional tests in a cluster environment,
> > > connect
> > > > > > using
> > > > > > > > > > > zookeeper
> > > > > > > > > > > > etc.
> > > > > > > > > > > > 2) Functional test will keep increasing in number,
> get
> > > more
> > > > > > > complex
> > > > > > > > > and
> > > > > > > > > > > > take a longer and longer time to execute as we go
> > along.
> > > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > > >     a) We want to be strict in enforcing the
> pre-commit
> > > > > > > > requirements,
> > > > > > > > > > but
> > > > > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > > > > >     b) All parts of the product (especially various
> > > > > 'certified'
> > > > > > > > > storage
> > > > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > > > >     c) It should be easy to debug issues when a test
> > > fails.
> > > > > > Tests
> > > > > > > > > > should
> > > > > > > > > > > > fail deterministically. If a test fails, it should
> > always
> > > > > fail
> > > > > > > and
> > > > > > > > > > always
> > > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > > >
> > > > > > > > > > > > Some suggestions -
> > > > > > > > > > > > 1) Tests should be a top-level maven module within
> the
> > > > drill
> > > > > > > > project
> > > > > > > > > > > >         a) We want  the integration tests to run as
> > part
> > > of
> > > > > the
> > > > > > > > > drill's
> > > > > > > > > > > > maven build process
> > > > > > > > > > > >         b) The build step for the integration-tests
> > > module
> > > > > > would
> > > > > > > > > launch
> > > > > > > > > > > an
> > > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > > >         c) The tests will be a separate target so
> they
> > > need
> > > > > not
> > > > > > > be
> > > > > > > > > run
> > > > > > > > > > > all
> > > > > > > > > > > > the time
> > > > > > > > > > > >  2) Tests should be divided into multiple suites that
> > are
> > > > > based
> > > > > > > on
> > > > > > > > > > > > components. For example a test suite for testing
> > > datatypes
> > > > > will
> > > > > > > > > contain
> > > > > > > > > > > the
> > > > > > > > > > > > tests for various datatypes including complex types.
> A
> > > > > > > contributor
> > > > > > > > or
> > > > > > > > > > > > developer can then run these tests more frequently as
> > an
> > > > > issue
> > > > > > is
> > > > > > > > > being
> > > > > > > > > > > > addressed and run the entire suite only once before
> > > commit.
> > > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and
> > > post
> > > > > the
> > > > > > > > > results
> > > > > > > > > > to
> > > > > > > > > > > > the JIRA  (Hive does this). Or some variant of this
> > idea.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Some questions -
> > > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > > 2) Are there any technologies we can leverage that
> will
> > > > make
> > > > > > this
> > > > > > > > > > easier?
> > > > > > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Please feel free to question the assumptions and
> > > > > requirements.
> > > > > > Be
> > > > > > > > > > > creative
> > > > > > > > > > > > with your suggestions.
> > > > > > > > > > > >
> > > > > > > > > > > > Parth
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Abhishek Girish <ab...@gmail.com>.
We not only re-distribute external data-sets as-is, but also include
variants for those (text -> parquet, json, ...). So the challenge here is
not simply disabling automatic downloads via the framework, and point users
to manually download the files before running the framework, but also about
how we will handle tests which require variants of the data sets. It simply
isn't practical to users of the framework to (1) download data-gen manually
(2) use specific seed / options before generating data, (3) convert them to
parquet, etc.. (4) move them to specific locations inside their copy of the
framework.

Something we'll need to know is how other projects are handling bench-mark
& other external datasets.

-Abhishek

On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Thanks for your inputs.
>
> Once issue with just publishing the tests in their current state is that,
> the framework re-distributes tpch, tpcds, yelp data sets without requiring
> the users to accept their relevant licenses. A good number of tests uses
> these data sets. Any thoughts on how to handle this?
>
> - Rahul
>
> On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > +1.  Get it out there.
> >
> >
> >
> > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > Hey Rahul,
> > >
> > > My suggestion would be to the lower bar--do the absolute bare minimum
> to
> > > get the tests out there.  For example, simply remove proprietary
> > > information and then get it on a public github (whether your personal
> > > github or a corporate one).  From there, people can help by submitting
> > pull
> > > requests to improve the infrastructure and harness.  Making things
> easier
> > > is something that can be done over time.  For example, we've had offers
> > > from a couple different Linux Admins to help on something.  I'm sure
> that
> > > they could help with a number of the items you've identified.  In the
> > mean
> > > time, we risk patches being merged that have less than complete
> testing.
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > >
> > > > Jacques,
> > > >
> > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > add/prioritize
> > > > these tasks
> > > >
> > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > >
> > > >
> > > >
> > > >
> > > > Remove Proprietary Data & Queries
> > > > 0
> > > >
> > > > Redact Propriety Data/Queries
> > > >
> > > >
> > > >
> > > > Move tests into drill repo
> > > > This requires some refactoring to the framework code since the test
> > > > framework uses a 2-level directory structure
> > > >
> > > >
> > > >
> > > > Organize the tests using a label based approach
> > > > This involves code changes and moving a lot of files. When doing a
> one
> > > time
> > > > push it might be better to do this before publishing the tests?
> > > >
> > > >
> > > > Each suite should be independentSome suites wrongly assume that the
> > data
> > > is
> > > > present. They should be identified and fixed
> > > >
> > > >
> > > > Cleanup hardcoded dependencies during data generationSome data-gen
> > > scripts
> > > > have hard-coded references
> > > >
> > > >
> > > > Cleanup downloadsThe same dataset is being downloaded multiple times
> by
> > > > different suites
> > > >
> > > >
> > > > Licenses for downloadsThe framework downloads some files
> automatically.
> > > > These files are publicly available.
> > > > However before downloading them users need to agree to certain terms.
> > By
> > > > using the framework users might be skipping this step. We should look
> > > into
> > > > this
> > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > >
> > > >
> > > > 3*Local debugging of tests*
> > > >
> > > >
> > > >
> > > >
> > > > Add an optional maven target for running tests on a local machine
> > > > Tests can launch an embedded drillbit or they can connect to a
> running
> > > > drillbit through zookeeper
> > > >
> > > >
> > > > Running suites which require additional setup (hive, hbase etc)
> should
> > be
> > > > made optional
> > > >
> > > > 4*Documentation*
> > > >
> > > >
> > > >
> > > >
> > > > Running Tests (options available and also listing the asumed
> defaults)
> > > >
> > > >
> > > >
> > > > Explaining how tests are organized
> > > >
> > > >
> > > >
> > > > Process for adding a new suite
> > > >
> > > >
> > > >
> > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com>
> > > > wrote:
> > > >
> > > > > Let's get number one done (tests out there so all community members
> > can
> > > > run
> > > > > them).  Then the whole community can work together to solve the
> rest.
> > > > >
> > > > > I don't think the base install should include integration test
> > > execution.
> > > > > I do think the tests should be in the main repo (as opposed to a
> > > > > secondary).
> > > > >
> > > > > We should strive to ultimately make running these integration
> tests a
> > > > > requirement for merging.  We need to complete all the steps before
> we
> > > can
> > > > > impose that.  I should be able to help on the global run component
> > and
> > > > > supporting infrastructure.
> > > > >
> > > > > J
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > challapallirahul@gmail.com> wrote:
> > > > >
> > > > > > Ramana,
> > > > > >
> > > > > > You are right. We are trying to address multiple issues here, but
> > not
> > > > > with
> > > > > > a single solution. I am summarizing them
> > > > > >
> > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > 2. Before applying a patch we should run tests in a clustered
> > > > > environment.
> > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > 3. Developers should be able to debug majority of the tests on
> > their
> > > > > local
> > > > > > environment. I made a few suggestions above to this regard
> > > > > >
> > > > > > - Rahul
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <inramana@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > One important thing which we need to be clear on here is what
> are
> > > we
> > > > > > trying
> > > > > > > to address?
> > > > > > >
> > > > > > > I feel there are two separate issues here and I do not think
> one
> > > > > solution
> > > > > > > will fit both the issues.
> > > > > > >
> > > > > > >    1. Allowing developers to run tests on their local box so
> they
> > > > know
> > > > > > the
> > > > > > >    changes they have are not completely wrong.
> > > > > > >    2. Allowing transparency in the integration tests process
> > which
> > > is
> > > > > > >    currently a black box.
> > > > > > >
> > > > > > > 1 is needed for developers to make changes and have an idea
> that
> > > > their
> > > > > > > changes are not going to fail tests en masse in the integration
> > > > suite.
> > > > > 2
> > > > > > is
> > > > > > > needed because its a prerequisite for changes to be committed.
> > > > > > >
> > > > > > >
> > > > > > > Regards
> > > > > > > Ramana
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > challapallirahul@gmail.com> wrote:
> > > > > > >
> > > > > > > > Ramana,
> > > > > > > >
> > > > > > > > Let me fill in more details.
> > > > > > > >
> > > > > > > > 1. Before we accept a patch we want to make sure the tests
> run
> > > in a
> > > > > > > cluster
> > > > > > > > environment. No exceptions here.
> > > > > > > > 2. We want  the contributors to be able to debug the failing
> > > tests
> > > > on
> > > > > > > their
> > > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > > >         1. Tests should run on top of a local file system.
> > (Tests
> > > > can
> > > > > > > > launch an embedded drillbit or they can connect to a running
> > > > drillbit
> > > > > > > > through zookeeper)
> > > > > > > >         2. Running suites which require additional setup
> (hive,
> > > > hbase
> > > > > > > etc)
> > > > > > > > should be made optional and sufficient documentation should
> be
> > > > > provided
> > > > > > > for
> > > > > > > > enabling and disabling these tests.
> > > > > > > > 3. In my opinion making these new tests part of drill would
> > make
> > > it
> > > > > > > easier
> > > > > > > > for the developers to debug and run tests instead of having a
> > > > > different
> > > > > > > > repository. But as you said it might bloat the drill project
> > > > > > > >
> > > > > > > > - Rahul
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > ted.dunning@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The Hadoop family of projects has some software that
> > > integrates a
> > > > > > > > > continuous integration system so that every time a JIRA is
> > > marked
> > > > > as
> > > > > > > > > patch-available, the associated patch attached to the bug
> > will
> > > > have
> > > > > > > > > integration tests run against it.  I believe that there has
> > > been
> > > > > some
> > > > > > > > > process to use git hashes instead of patches.  The CI
> results
> > > are
> > > > > put
> > > > > > > > back
> > > > > > > > > on the JIRA.
> > > > > > > > >
> > > > > > > > > This is done using a fairly simple set of scripts.  Apache
> > > Yetus
> > > > is
> > > > > > > just
> > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > > >
> > > > > > > > > Proposal is here (don't be fooled by the fact that it looks
> > > like
> > > > an
> > > > > > > > > incubation proposal):
> > > > > > > > >
> > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > >
> > > > > > > > > Early code can be found here (don't guess that this is very
> > > real
> > > > > > yet).
> > > > > > > > > More links can be found in the proposal.
> > > > > > > > >
> > > > > > > > >
> > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > >
> > > > > > > > > The project has not yet been formed and there are no
> mailing
> > > > lists
> > > > > or
> > > > > > > git
> > > > > > > > > repo yet.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > inramana@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > As someone who worked on this for a while, including it
> as
> > > part
> > > > > of
> > > > > > > > drill
> > > > > > > > > > may bloat drill a bit too much. Also not a big fan of
> > running
> > > > > > against
> > > > > > > > an
> > > > > > > > > > embedded drillbit. Does not replicate an actual
> production
> > > use
> > > > > > case.
> > > > > > > > > >
> > > > > > > > > > Additionally, setting up hive hbase and other components
> > > maybe
> > > > > > > painful
> > > > > > > > > and
> > > > > > > > > > unnecessary for most ppl. It would deter people from ever
> > > > > > > contributing
> > > > > > > > to
> > > > > > > > > > drill. We could spin up in memory hive and hbase but
> that's
> > > > > similar
> > > > > > > to
> > > > > > > > an
> > > > > > > > > > embedded drill bit. Does not replicate a production
> > scenario.
> > > > > > > > > >
> > > > > > > > > > Would prefer the hive way with a central Jenkins server
> > > hosted
> > > > on
> > > > > > aws
> > > > > > > > and
> > > > > > > > > > accessible to everyone.  Users should be able to submit a
> > git
> > > > url
> > > > > > and
> > > > > > > > > that
> > > > > > > > > > should be able to deploy and fire off tests. Should then
> > > have a
> > > > > way
> > > > > > > to
> > > > > > > > > > easily communicate failures to contributors and if
> success
> > > > notify
> > > > > > the
> > > > > > > > > > commiters to commit the change.
> > > > > > > > > >
> > > > > > > > > > Ps: if hive's way is open source maybe we can look into
> > reuse
> > > > > > rather
> > > > > > > > than
> > > > > > > > > > doing it from scratch. Esp the Jenkins and configuration
> > > stuff.
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > > Ramana
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> > parthc@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Drill devs use a set of tests that are not available as
> > > part
> > > > of
> > > > > > the
> > > > > > > > > > Apache
> > > > > > > > > > > distribution. These tests are a pre-requisite for all
> > > > commits,
> > > > > > but
> > > > > > > > are
> > > > > > > > > > not
> > > > > > > > > > > available to any contributors outside the current devs.
> > > > > > > > > > >
> > > > > > > > > > > This thread is to discuss various options to make these
> > > tests
> > > > > > > > > available.
> > > > > > > > > > >
> > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > 1) A functional test (as opposed to a unit test) needs
> to
> > > be
> > > > > > closer
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > end user environment than a development environment. As
> > > such,
> > > > > we
> > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > running functional tests in a cluster environment,
> > connect
> > > > > using
> > > > > > > > > > zookeeper
> > > > > > > > > > > etc.
> > > > > > > > > > > 2) Functional test will keep increasing in number, get
> > more
> > > > > > complex
> > > > > > > > and
> > > > > > > > > > > take a longer and longer time to execute as we go
> along.
> > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > > > > > requirements,
> > > > > > > > > but
> > > > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > > > >     b) All parts of the product (especially various
> > > > 'certified'
> > > > > > > > storage
> > > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > > >     c) It should be easy to debug issues when a test
> > fails.
> > > > > Tests
> > > > > > > > > should
> > > > > > > > > > > fail deterministically. If a test fails, it should
> always
> > > > fail
> > > > > > and
> > > > > > > > > always
> > > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > > >
> > > > > > > > > > > Some suggestions -
> > > > > > > > > > > 1) Tests should be a top-level maven module within the
> > > drill
> > > > > > > project
> > > > > > > > > > >         a) We want  the integration tests to run as
> part
> > of
> > > > the
> > > > > > > > drill's
> > > > > > > > > > > maven build process
> > > > > > > > > > >         b) The build step for the integration-tests
> > module
> > > > > would
> > > > > > > > launch
> > > > > > > > > > an
> > > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > > >         c) The tests will be a separate target so they
> > need
> > > > not
> > > > > > be
> > > > > > > > run
> > > > > > > > > > all
> > > > > > > > > > > the time
> > > > > > > > > > >  2) Tests should be divided into multiple suites that
> are
> > > > based
> > > > > > on
> > > > > > > > > > > components. For example a test suite for testing
> > datatypes
> > > > will
> > > > > > > > contain
> > > > > > > > > > the
> > > > > > > > > > > tests for various datatypes including complex types. A
> > > > > > contributor
> > > > > > > or
> > > > > > > > > > > developer can then run these tests more frequently as
> an
> > > > issue
> > > > > is
> > > > > > > > being
> > > > > > > > > > > addressed and run the entire suite only once before
> > commit.
> > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and
> > post
> > > > the
> > > > > > > > results
> > > > > > > > > to
> > > > > > > > > > > the JIRA  (Hive does this). Or some variant of this
> idea.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Some questions -
> > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > 2) Are there any technologies we can leverage that will
> > > make
> > > > > this
> > > > > > > > > easier?
> > > > > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Please feel free to question the assumptions and
> > > > requirements.
> > > > > Be
> > > > > > > > > > creative
> > > > > > > > > > > with your suggestions.
> > > > > > > > > > >
> > > > > > > > > > > Parth
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by rahul challapalli <ch...@gmail.com>.
Thanks for your inputs.

Once issue with just publishing the tests in their current state is that,
the framework re-distributes tpch, tpcds, yelp data sets without requiring
the users to accept their relevant licenses. A good number of tests uses
these data sets. Any thoughts on how to handle this?

- Rahul

On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <te...@gmail.com> wrote:

> +1.  Get it out there.
>
>
>
> On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Hey Rahul,
> >
> > My suggestion would be to the lower bar--do the absolute bare minimum to
> > get the tests out there.  For example, simply remove proprietary
> > information and then get it on a public github (whether your personal
> > github or a corporate one).  From there, people can help by submitting
> pull
> > requests to improve the infrastructure and harness.  Making things easier
> > is something that can be done over time.  For example, we've had offers
> > from a couple different Linux Admins to help on something.  I'm sure that
> > they could help with a number of the items you've identified.  In the
> mean
> > time, we risk patches being merged that have less than complete testing.
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Jacques,
> > >
> > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> add/prioritize
> > > these tasks
> > >
> > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > >
> > >
> > >
> > >
> > > Remove Proprietary Data & Queries
> > > 0
> > >
> > > Redact Propriety Data/Queries
> > >
> > >
> > >
> > > Move tests into drill repo
> > > This requires some refactoring to the framework code since the test
> > > framework uses a 2-level directory structure
> > >
> > >
> > >
> > > Organize the tests using a label based approach
> > > This involves code changes and moving a lot of files. When doing a one
> > time
> > > push it might be better to do this before publishing the tests?
> > >
> > >
> > > Each suite should be independentSome suites wrongly assume that the
> data
> > is
> > > present. They should be identified and fixed
> > >
> > >
> > > Cleanup hardcoded dependencies during data generationSome data-gen
> > scripts
> > > have hard-coded references
> > >
> > >
> > > Cleanup downloadsThe same dataset is being downloaded multiple times by
> > > different suites
> > >
> > >
> > > Licenses for downloadsThe framework downloads some files automatically.
> > > These files are publicly available.
> > > However before downloading them users need to agree to certain terms.
> By
> > > using the framework users might be skipping this step. We should look
> > into
> > > this
> > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > >
> > >
> > > 3*Local debugging of tests*
> > >
> > >
> > >
> > >
> > > Add an optional maven target for running tests on a local machine
> > > Tests can launch an embedded drillbit or they can connect to a running
> > > drillbit through zookeeper
> > >
> > >
> > > Running suites which require additional setup (hive, hbase etc) should
> be
> > > made optional
> > >
> > > 4*Documentation*
> > >
> > >
> > >
> > >
> > > Running Tests (options available and also listing the asumed defaults)
> > >
> > >
> > >
> > > Explaining how tests are organized
> > >
> > >
> > >
> > > Process for adding a new suite
> > >
> > >
> > >
> > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com>
> > > wrote:
> > >
> > > > Let's get number one done (tests out there so all community members
> can
> > > run
> > > > them).  Then the whole community can work together to solve the rest.
> > > >
> > > > I don't think the base install should include integration test
> > execution.
> > > > I do think the tests should be in the main repo (as opposed to a
> > > > secondary).
> > > >
> > > > We should strive to ultimately make running these integration tests a
> > > > requirement for merging.  We need to complete all the steps before we
> > can
> > > > impose that.  I should be able to help on the global run component
> and
> > > > supporting infrastructure.
> > > >
> > > > J
> > > >
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > challapallirahul@gmail.com> wrote:
> > > >
> > > > > Ramana,
> > > > >
> > > > > You are right. We are trying to address multiple issues here, but
> not
> > > > with
> > > > > a single solution. I am summarizing them
> > > > >
> > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > 2. Before applying a patch we should run tests in a clustered
> > > > environment.
> > > > > Parth had a suggestion(#4) in his original email.
> > > > > 3. Developers should be able to debug majority of the tests on
> their
> > > > local
> > > > > environment. I made a few suggestions above to this regard
> > > > >
> > > > > - Rahul
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com>
> > > wrote:
> > > > >
> > > > > > One important thing which we need to be clear on here is what are
> > we
> > > > > trying
> > > > > > to address?
> > > > > >
> > > > > > I feel there are two separate issues here and I do not think one
> > > > solution
> > > > > > will fit both the issues.
> > > > > >
> > > > > >    1. Allowing developers to run tests on their local box so they
> > > know
> > > > > the
> > > > > >    changes they have are not completely wrong.
> > > > > >    2. Allowing transparency in the integration tests process
> which
> > is
> > > > > >    currently a black box.
> > > > > >
> > > > > > 1 is needed for developers to make changes and have an idea that
> > > their
> > > > > > changes are not going to fail tests en masse in the integration
> > > suite.
> > > > 2
> > > > > is
> > > > > > needed because its a prerequisite for changes to be committed.
> > > > > >
> > > > > >
> > > > > > Regards
> > > > > > Ramana
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > challapallirahul@gmail.com> wrote:
> > > > > >
> > > > > > > Ramana,
> > > > > > >
> > > > > > > Let me fill in more details.
> > > > > > >
> > > > > > > 1. Before we accept a patch we want to make sure the tests run
> > in a
> > > > > > cluster
> > > > > > > environment. No exceptions here.
> > > > > > > 2. We want  the contributors to be able to debug the failing
> > tests
> > > on
> > > > > > their
> > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > >         1. Tests should run on top of a local file system.
> (Tests
> > > can
> > > > > > > launch an embedded drillbit or they can connect to a running
> > > drillbit
> > > > > > > through zookeeper)
> > > > > > >         2. Running suites which require additional setup (hive,
> > > hbase
> > > > > > etc)
> > > > > > > should be made optional and sufficient documentation should be
> > > > provided
> > > > > > for
> > > > > > > enabling and disabling these tests.
> > > > > > > 3. In my opinion making these new tests part of drill would
> make
> > it
> > > > > > easier
> > > > > > > for the developers to debug and run tests instead of having a
> > > > different
> > > > > > > repository. But as you said it might bloat the drill project
> > > > > > >
> > > > > > > - Rahul
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > ted.dunning@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The Hadoop family of projects has some software that
> > integrates a
> > > > > > > > continuous integration system so that every time a JIRA is
> > marked
> > > > as
> > > > > > > > patch-available, the associated patch attached to the bug
> will
> > > have
> > > > > > > > integration tests run against it.  I believe that there has
> > been
> > > > some
> > > > > > > > process to use git hashes instead of patches.  The CI results
> > are
> > > > put
> > > > > > > back
> > > > > > > > on the JIRA.
> > > > > > > >
> > > > > > > > This is done using a fairly simple set of scripts.  Apache
> > Yetus
> > > is
> > > > > > just
> > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > >
> > > > > > > > Proposal is here (don't be fooled by the fact that it looks
> > like
> > > an
> > > > > > > > incubation proposal):
> > > > > > > >
> > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > >
> > > > > > > > Early code can be found here (don't guess that this is very
> > real
> > > > > yet).
> > > > > > > > More links can be found in the proposal.
> > > > > > > >
> > > > > > > >
> https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > >
> > > > > > > > The project has not yet been formed and there are no mailing
> > > lists
> > > > or
> > > > > > git
> > > > > > > > repo yet.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > inramana@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > As someone who worked on this for a while, including it as
> > part
> > > > of
> > > > > > > drill
> > > > > > > > > may bloat drill a bit too much. Also not a big fan of
> running
> > > > > against
> > > > > > > an
> > > > > > > > > embedded drillbit. Does not replicate an actual production
> > use
> > > > > case.
> > > > > > > > >
> > > > > > > > > Additionally, setting up hive hbase and other components
> > maybe
> > > > > > painful
> > > > > > > > and
> > > > > > > > > unnecessary for most ppl. It would deter people from ever
> > > > > > contributing
> > > > > > > to
> > > > > > > > > drill. We could spin up in memory hive and hbase but that's
> > > > similar
> > > > > > to
> > > > > > > an
> > > > > > > > > embedded drill bit. Does not replicate a production
> scenario.
> > > > > > > > >
> > > > > > > > > Would prefer the hive way with a central Jenkins server
> > hosted
> > > on
> > > > > aws
> > > > > > > and
> > > > > > > > > accessible to everyone.  Users should be able to submit a
> git
> > > url
> > > > > and
> > > > > > > > that
> > > > > > > > > should be able to deploy and fire off tests. Should then
> > have a
> > > > way
> > > > > > to
> > > > > > > > > easily communicate failures to contributors and if success
> > > notify
> > > > > the
> > > > > > > > > commiters to commit the change.
> > > > > > > > >
> > > > > > > > > Ps: if hive's way is open source maybe we can look into
> reuse
> > > > > rather
> > > > > > > than
> > > > > > > > > doing it from scratch. Esp the Jenkins and configuration
> > stuff.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Ramana
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> parthc@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Drill devs use a set of tests that are not available as
> > part
> > > of
> > > > > the
> > > > > > > > > Apache
> > > > > > > > > > distribution. These tests are a pre-requisite for all
> > > commits,
> > > > > but
> > > > > > > are
> > > > > > > > > not
> > > > > > > > > > available to any contributors outside the current devs.
> > > > > > > > > >
> > > > > > > > > > This thread is to discuss various options to make these
> > tests
> > > > > > > > available.
> > > > > > > > > >
> > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > 1) A functional test (as opposed to a unit test) needs to
> > be
> > > > > closer
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > end user environment than a development environment. As
> > such,
> > > > we
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > running functional tests in a cluster environment,
> connect
> > > > using
> > > > > > > > > zookeeper
> > > > > > > > > > etc.
> > > > > > > > > > 2) Functional test will keep increasing in number, get
> more
> > > > > complex
> > > > > > > and
> > > > > > > > > > take a longer and longer time to execute as we go along.
> > > > > > > > > > 3) Some requirements are:
> > > > > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > > > > requirements,
> > > > > > > > but
> > > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > > >     b) All parts of the product (especially various
> > > 'certified'
> > > > > > > storage
> > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > >     c) It should be easy to debug issues when a test
> fails.
> > > > Tests
> > > > > > > > should
> > > > > > > > > > fail deterministically. If a test fails, it should always
> > > fail
> > > > > and
> > > > > > > > always
> > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > >
> > > > > > > > > > Some suggestions -
> > > > > > > > > > 1) Tests should be a top-level maven module within the
> > drill
> > > > > > project
> > > > > > > > > >         a) We want  the integration tests to run as part
> of
> > > the
> > > > > > > drill's
> > > > > > > > > > maven build process
> > > > > > > > > >         b) The build step for the integration-tests
> module
> > > > would
> > > > > > > launch
> > > > > > > > > an
> > > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > > >         c) The tests will be a separate target so they
> need
> > > not
> > > > > be
> > > > > > > run
> > > > > > > > > all
> > > > > > > > > > the time
> > > > > > > > > >  2) Tests should be divided into multiple suites that are
> > > based
> > > > > on
> > > > > > > > > > components. For example a test suite for testing
> datatypes
> > > will
> > > > > > > contain
> > > > > > > > > the
> > > > > > > > > > tests for various datatypes including complex types. A
> > > > > contributor
> > > > > > or
> > > > > > > > > > developer can then run these tests more frequently as an
> > > issue
> > > > is
> > > > > > > being
> > > > > > > > > > addressed and run the entire suite only once before
> commit.
> > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and
> post
> > > the
> > > > > > > results
> > > > > > > > to
> > > > > > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Some questions -
> > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > 2) Are there any technologies we can leverage that will
> > make
> > > > this
> > > > > > > > easier?
> > > > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please feel free to question the assumptions and
> > > requirements.
> > > > Be
> > > > > > > > > creative
> > > > > > > > > > with your suggestions.
> > > > > > > > > >
> > > > > > > > > > Parth
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ted Dunning <te...@gmail.com>.
+1.  Get it out there.



On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Hey Rahul,
>
> My suggestion would be to the lower bar--do the absolute bare minimum to
> get the tests out there.  For example, simply remove proprietary
> information and then get it on a public github (whether your personal
> github or a corporate one).  From there, people can help by submitting pull
> requests to improve the infrastructure and harness.  Making things easier
> is something that can be done over time.  For example, we've had offers
> from a couple different Linux Admins to help on something.  I'm sure that
> they could help with a number of the items you've identified.  In the mean
> time, we risk patches being merged that have less than complete testing.
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Jacques,
> >
> > I am breaking down steps 1,2 & 3 into sub-tasks so we can add/prioritize
> > these tasks
> >
> > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> >
> >
> >
> >
> > Remove Proprietary Data & Queries
> > 0
> >
> > Redact Propriety Data/Queries
> >
> >
> >
> > Move tests into drill repo
> > This requires some refactoring to the framework code since the test
> > framework uses a 2-level directory structure
> >
> >
> >
> > Organize the tests using a label based approach
> > This involves code changes and moving a lot of files. When doing a one
> time
> > push it might be better to do this before publishing the tests?
> >
> >
> > Each suite should be independentSome suites wrongly assume that the data
> is
> > present. They should be identified and fixed
> >
> >
> > Cleanup hardcoded dependencies during data generationSome data-gen
> scripts
> > have hard-coded references
> >
> >
> > Cleanup downloadsThe same dataset is being downloaded multiple times by
> > different suites
> >
> >
> > Licenses for downloadsThe framework downloads some files automatically.
> > These files are publicly available.
> > However before downloading them users need to agree to certain terms. By
> > using the framework users might be skipping this step. We should look
> into
> > this
> > 2*Setup a cluster infrastructure to run the pre-commit tests*
> >
> >
> > 3*Local debugging of tests*
> >
> >
> >
> >
> > Add an optional maven target for running tests on a local machine
> > Tests can launch an embedded drillbit or they can connect to a running
> > drillbit through zookeeper
> >
> >
> > Running suites which require additional setup (hive, hbase etc) should be
> > made optional
> >
> > 4*Documentation*
> >
> >
> >
> >
> > Running Tests (options available and also listing the asumed defaults)
> >
> >
> >
> > Explaining how tests are organized
> >
> >
> >
> > Process for adding a new suite
> >
> >
> >
> > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > Let's get number one done (tests out there so all community members can
> > run
> > > them).  Then the whole community can work together to solve the rest.
> > >
> > > I don't think the base install should include integration test
> execution.
> > > I do think the tests should be in the main repo (as opposed to a
> > > secondary).
> > >
> > > We should strive to ultimately make running these integration tests a
> > > requirement for merging.  We need to complete all the steps before we
> can
> > > impose that.  I should be able to help on the global run component and
> > > supporting infrastructure.
> > >
> > > J
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > >
> > > > Ramana,
> > > >
> > > > You are right. We are trying to address multiple issues here, but not
> > > with
> > > > a single solution. I am summarizing them
> > > >
> > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > 2. Before applying a patch we should run tests in a clustered
> > > environment.
> > > > Parth had a suggestion(#4) in his original email.
> > > > 3. Developers should be able to debug majority of the tests on their
> > > local
> > > > environment. I made a few suggestions above to this regard
> > > >
> > > > - Rahul
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com>
> > wrote:
> > > >
> > > > > One important thing which we need to be clear on here is what are
> we
> > > > trying
> > > > > to address?
> > > > >
> > > > > I feel there are two separate issues here and I do not think one
> > > solution
> > > > > will fit both the issues.
> > > > >
> > > > >    1. Allowing developers to run tests on their local box so they
> > know
> > > > the
> > > > >    changes they have are not completely wrong.
> > > > >    2. Allowing transparency in the integration tests process which
> is
> > > > >    currently a black box.
> > > > >
> > > > > 1 is needed for developers to make changes and have an idea that
> > their
> > > > > changes are not going to fail tests en masse in the integration
> > suite.
> > > 2
> > > > is
> > > > > needed because its a prerequisite for changes to be committed.
> > > > >
> > > > >
> > > > > Regards
> > > > > Ramana
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > challapallirahul@gmail.com> wrote:
> > > > >
> > > > > > Ramana,
> > > > > >
> > > > > > Let me fill in more details.
> > > > > >
> > > > > > 1. Before we accept a patch we want to make sure the tests run
> in a
> > > > > cluster
> > > > > > environment. No exceptions here.
> > > > > > 2. We want  the contributors to be able to debug the failing
> tests
> > on
> > > > > their
> > > > > > laptops in as many cases as possbile. This requires :
> > > > > >         1. Tests should run on top of a local file system. (Tests
> > can
> > > > > > launch an embedded drillbit or they can connect to a running
> > drillbit
> > > > > > through zookeeper)
> > > > > >         2. Running suites which require additional setup (hive,
> > hbase
> > > > > etc)
> > > > > > should be made optional and sufficient documentation should be
> > > provided
> > > > > for
> > > > > > enabling and disabling these tests.
> > > > > > 3. In my opinion making these new tests part of drill would make
> it
> > > > > easier
> > > > > > for the developers to debug and run tests instead of having a
> > > different
> > > > > > repository. But as you said it might bloat the drill project
> > > > > >
> > > > > > - Rahul
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > The Hadoop family of projects has some software that
> integrates a
> > > > > > > continuous integration system so that every time a JIRA is
> marked
> > > as
> > > > > > > patch-available, the associated patch attached to the bug will
> > have
> > > > > > > integration tests run against it.  I believe that there has
> been
> > > some
> > > > > > > process to use git hashes instead of patches.  The CI results
> are
> > > put
> > > > > > back
> > > > > > > on the JIRA.
> > > > > > >
> > > > > > > This is done using a fairly simple set of scripts.  Apache
> Yetus
> > is
> > > > > just
> > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > >
> > > > > > > Proposal is here (don't be fooled by the fact that it looks
> like
> > an
> > > > > > > incubation proposal):
> > > > > > >
> > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > >
> > > > > > > Early code can be found here (don't guess that this is very
> real
> > > > yet).
> > > > > > > More links can be found in the proposal.
> > > > > > >
> > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > >
> > > > > > > The project has not yet been formed and there are no mailing
> > lists
> > > or
> > > > > git
> > > > > > > repo yet.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> inramana@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > As someone who worked on this for a while, including it as
> part
> > > of
> > > > > > drill
> > > > > > > > may bloat drill a bit too much. Also not a big fan of running
> > > > against
> > > > > > an
> > > > > > > > embedded drillbit. Does not replicate an actual production
> use
> > > > case.
> > > > > > > >
> > > > > > > > Additionally, setting up hive hbase and other components
> maybe
> > > > > painful
> > > > > > > and
> > > > > > > > unnecessary for most ppl. It would deter people from ever
> > > > > contributing
> > > > > > to
> > > > > > > > drill. We could spin up in memory hive and hbase but that's
> > > similar
> > > > > to
> > > > > > an
> > > > > > > > embedded drill bit. Does not replicate a production scenario.
> > > > > > > >
> > > > > > > > Would prefer the hive way with a central Jenkins server
> hosted
> > on
> > > > aws
> > > > > > and
> > > > > > > > accessible to everyone.  Users should be able to submit a git
> > url
> > > > and
> > > > > > > that
> > > > > > > > should be able to deploy and fire off tests. Should then
> have a
> > > way
> > > > > to
> > > > > > > > easily communicate failures to contributors and if success
> > notify
> > > > the
> > > > > > > > commiters to commit the change.
> > > > > > > >
> > > > > > > > Ps: if hive's way is open source maybe we can look into reuse
> > > > rather
> > > > > > than
> > > > > > > > doing it from scratch. Esp the Jenkins and configuration
> stuff.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Ramana
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thursday, July 23, 2015, Parth Chandra <parthc@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Drill devs use a set of tests that are not available as
> part
> > of
> > > > the
> > > > > > > > Apache
> > > > > > > > > distribution. These tests are a pre-requisite for all
> > commits,
> > > > but
> > > > > > are
> > > > > > > > not
> > > > > > > > > available to any contributors outside the current devs.
> > > > > > > > >
> > > > > > > > > This thread is to discuss various options to make these
> tests
> > > > > > > available.
> > > > > > > > >
> > > > > > > > > Assumptions and requirements  -
> > > > > > > > > 1) A functional test (as opposed to a unit test) needs to
> be
> > > > closer
> > > > > > to
> > > > > > > > the
> > > > > > > > > end user environment than a development environment. As
> such,
> > > we
> > > > > > should
> > > > > > > > be
> > > > > > > > > running functional tests in a cluster environment, connect
> > > using
> > > > > > > > zookeeper
> > > > > > > > > etc.
> > > > > > > > > 2) Functional test will keep increasing in number, get more
> > > > complex
> > > > > > and
> > > > > > > > > take a longer and longer time to execute as we go along.
> > > > > > > > > 3) Some requirements are:
> > > > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > > > requirements,
> > > > > > > but
> > > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > > >     b) All parts of the product (especially various
> > 'certified'
> > > > > > storage
> > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > >     c) It should be easy to debug issues when a test fails.
> > > Tests
> > > > > > > should
> > > > > > > > > fail deterministically. If a test fails, it should always
> > fail
> > > > and
> > > > > > > always
> > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > >
> > > > > > > > > Some suggestions -
> > > > > > > > > 1) Tests should be a top-level maven module within the
> drill
> > > > > project
> > > > > > > > >         a) We want  the integration tests to run as part of
> > the
> > > > > > drill's
> > > > > > > > > maven build process
> > > > > > > > >         b) The build step for the integration-tests module
> > > would
> > > > > > launch
> > > > > > > > an
> > > > > > > > > embedded drillbit and runs tests against it
> > > > > > > > >         c) The tests will be a separate target so they need
> > not
> > > > be
> > > > > > run
> > > > > > > > all
> > > > > > > > > the time
> > > > > > > > >  2) Tests should be divided into multiple suites that are
> > based
> > > > on
> > > > > > > > > components. For example a test suite for testing datatypes
> > will
> > > > > > contain
> > > > > > > > the
> > > > > > > > > tests for various datatypes including complex types. A
> > > > contributor
> > > > > or
> > > > > > > > > developer can then run these tests more frequently as an
> > issue
> > > is
> > > > > > being
> > > > > > > > > addressed and run the entire suite only once before commit.
> > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post
> > the
> > > > > > results
> > > > > > > to
> > > > > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Some questions -
> > > > > > > > > 1) What do some other projects do?
> > > > > > > > > 2) Are there any technologies we can leverage that will
> make
> > > this
> > > > > > > easier?
> > > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please feel free to question the assumptions and
> > requirements.
> > > Be
> > > > > > > > creative
> > > > > > > > > with your suggestions.
> > > > > > > > >
> > > > > > > > > Parth
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Jacques Nadeau <ja...@dremio.com>.
Hey Rahul,

My suggestion would be to the lower bar--do the absolute bare minimum to
get the tests out there.  For example, simply remove proprietary
information and then get it on a public github (whether your personal
github or a corporate one).  From there, people can help by submitting pull
requests to improve the infrastructure and harness.  Making things easier
is something that can be done over time.  For example, we've had offers
from a couple different Linux Admins to help on something.  I'm sure that
they could help with a number of the items you've identified.  In the mean
time, we risk patches being merged that have less than complete testing.


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Jacques,
>
> I am breaking down steps 1,2 & 3 into sub-tasks so we can add/prioritize
> these tasks
>
> Item #TaskSub-TaskCommentsPriority1*Publish the tests*
>
>
>
>
> Remove Proprietary Data & Queries
> 0
>
> Redact Propriety Data/Queries
>
>
>
> Move tests into drill repo
> This requires some refactoring to the framework code since the test
> framework uses a 2-level directory structure
>
>
>
> Organize the tests using a label based approach
> This involves code changes and moving a lot of files. When doing a one time
> push it might be better to do this before publishing the tests?
>
>
> Each suite should be independentSome suites wrongly assume that the data is
> present. They should be identified and fixed
>
>
> Cleanup hardcoded dependencies during data generationSome data-gen scripts
> have hard-coded references
>
>
> Cleanup downloadsThe same dataset is being downloaded multiple times by
> different suites
>
>
> Licenses for downloadsThe framework downloads some files automatically.
> These files are publicly available.
> However before downloading them users need to agree to certain terms. By
> using the framework users might be skipping this step. We should look into
> this
> 2*Setup a cluster infrastructure to run the pre-commit tests*
>
>
> 3*Local debugging of tests*
>
>
>
>
> Add an optional maven target for running tests on a local machine
> Tests can launch an embedded drillbit or they can connect to a running
> drillbit through zookeeper
>
>
> Running suites which require additional setup (hive, hbase etc) should be
> made optional
>
> 4*Documentation*
>
>
>
>
> Running Tests (options available and also listing the asumed defaults)
>
>
>
> Explaining how tests are organized
>
>
>
> Process for adding a new suite
>
>
>
> On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Let's get number one done (tests out there so all community members can
> run
> > them).  Then the whole community can work together to solve the rest.
> >
> > I don't think the base install should include integration test execution.
> > I do think the tests should be in the main repo (as opposed to a
> > secondary).
> >
> > We should strive to ultimately make running these integration tests a
> > requirement for merging.  We need to complete all the steps before we can
> > impose that.  I should be able to help on the global run component and
> > supporting infrastructure.
> >
> > J
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Ramana,
> > >
> > > You are right. We are trying to address multiple issues here, but not
> > with
> > > a single solution. I am summarizing them
> > >
> > > 1. Tests should be visible to everyone (Implicit goal)
> > > 2. Before applying a patch we should run tests in a clustered
> > environment.
> > > Parth had a suggestion(#4) in his original email.
> > > 3. Developers should be able to debug majority of the tests on their
> > local
> > > environment. I made a few suggestions above to this regard
> > >
> > > - Rahul
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com>
> wrote:
> > >
> > > > One important thing which we need to be clear on here is what are we
> > > trying
> > > > to address?
> > > >
> > > > I feel there are two separate issues here and I do not think one
> > solution
> > > > will fit both the issues.
> > > >
> > > >    1. Allowing developers to run tests on their local box so they
> know
> > > the
> > > >    changes they have are not completely wrong.
> > > >    2. Allowing transparency in the integration tests process which is
> > > >    currently a black box.
> > > >
> > > > 1 is needed for developers to make changes and have an idea that
> their
> > > > changes are not going to fail tests en masse in the integration
> suite.
> > 2
> > > is
> > > > needed because its a prerequisite for changes to be committed.
> > > >
> > > >
> > > > Regards
> > > > Ramana
> > > >
> > > >
> > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > challapallirahul@gmail.com> wrote:
> > > >
> > > > > Ramana,
> > > > >
> > > > > Let me fill in more details.
> > > > >
> > > > > 1. Before we accept a patch we want to make sure the tests run in a
> > > > cluster
> > > > > environment. No exceptions here.
> > > > > 2. We want  the contributors to be able to debug the failing tests
> on
> > > > their
> > > > > laptops in as many cases as possbile. This requires :
> > > > >         1. Tests should run on top of a local file system. (Tests
> can
> > > > > launch an embedded drillbit or they can connect to a running
> drillbit
> > > > > through zookeeper)
> > > > >         2. Running suites which require additional setup (hive,
> hbase
> > > > etc)
> > > > > should be made optional and sufficient documentation should be
> > provided
> > > > for
> > > > > enabling and disabling these tests.
> > > > > 3. In my opinion making these new tests part of drill would make it
> > > > easier
> > > > > for the developers to debug and run tests instead of having a
> > different
> > > > > repository. But as you said it might bloat the drill project
> > > > >
> > > > > - Rahul
> > > > >
> > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The Hadoop family of projects has some software that integrates a
> > > > > > continuous integration system so that every time a JIRA is marked
> > as
> > > > > > patch-available, the associated patch attached to the bug will
> have
> > > > > > integration tests run against it.  I believe that there has been
> > some
> > > > > > process to use git hashes instead of patches.  The CI results are
> > put
> > > > > back
> > > > > > on the JIRA.
> > > > > >
> > > > > > This is done using a fairly simple set of scripts.  Apache Yetus
> is
> > > > just
> > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > >
> > > > > > Proposal is here (don't be fooled by the fact that it looks like
> an
> > > > > > incubation proposal):
> > > > > >
> > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > >
> > > > > > Early code can be found here (don't guess that this is very real
> > > yet).
> > > > > > More links can be found in the proposal.
> > > > > >
> > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > >
> > > > > > The project has not yet been formed and there are no mailing
> lists
> > or
> > > > git
> > > > > > repo yet.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > As someone who worked on this for a while, including it as part
> > of
> > > > > drill
> > > > > > > may bloat drill a bit too much. Also not a big fan of running
> > > against
> > > > > an
> > > > > > > embedded drillbit. Does not replicate an actual production use
> > > case.
> > > > > > >
> > > > > > > Additionally, setting up hive hbase and other components maybe
> > > > painful
> > > > > > and
> > > > > > > unnecessary for most ppl. It would deter people from ever
> > > > contributing
> > > > > to
> > > > > > > drill. We could spin up in memory hive and hbase but that's
> > similar
> > > > to
> > > > > an
> > > > > > > embedded drill bit. Does not replicate a production scenario.
> > > > > > >
> > > > > > > Would prefer the hive way with a central Jenkins server hosted
> on
> > > aws
> > > > > and
> > > > > > > accessible to everyone.  Users should be able to submit a git
> url
> > > and
> > > > > > that
> > > > > > > should be able to deploy and fire off tests. Should then have a
> > way
> > > > to
> > > > > > > easily communicate failures to contributors and if success
> notify
> > > the
> > > > > > > commiters to commit the change.
> > > > > > >
> > > > > > > Ps: if hive's way is open source maybe we can look into reuse
> > > rather
> > > > > than
> > > > > > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > > > > > >
> > > > > > > Regards
> > > > > > > Ramana
> > > > > > >
> > > > > > >
> > > > > > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Drill devs use a set of tests that are not available as part
> of
> > > the
> > > > > > > Apache
> > > > > > > > distribution. These tests are a pre-requisite for all
> commits,
> > > but
> > > > > are
> > > > > > > not
> > > > > > > > available to any contributors outside the current devs.
> > > > > > > >
> > > > > > > > This thread is to discuss various options to make these tests
> > > > > > available.
> > > > > > > >
> > > > > > > > Assumptions and requirements  -
> > > > > > > > 1) A functional test (as opposed to a unit test) needs to be
> > > closer
> > > > > to
> > > > > > > the
> > > > > > > > end user environment than a development environment. As such,
> > we
> > > > > should
> > > > > > > be
> > > > > > > > running functional tests in a cluster environment, connect
> > using
> > > > > > > zookeeper
> > > > > > > > etc.
> > > > > > > > 2) Functional test will keep increasing in number, get more
> > > complex
> > > > > and
> > > > > > > > take a longer and longer time to execute as we go along.
> > > > > > > > 3) Some requirements are:
> > > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > > requirements,
> > > > > > but
> > > > > > > > not penalize the contributor who has a minor fix.
> > > > > > > >     b) All parts of the product (especially various
> 'certified'
> > > > > storage
> > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > >     c) It should be easy to debug issues when a test fails.
> > Tests
> > > > > > should
> > > > > > > > fail deterministically. If a test fails, it should always
> fail
> > > and
> > > > > > always
> > > > > > > > fail in the same way (easier said than done).
> > > > > > > >
> > > > > > > > Some suggestions -
> > > > > > > > 1) Tests should be a top-level maven module within the drill
> > > > project
> > > > > > > >         a) We want  the integration tests to run as part of
> the
> > > > > drill's
> > > > > > > > maven build process
> > > > > > > >         b) The build step for the integration-tests module
> > would
> > > > > launch
> > > > > > > an
> > > > > > > > embedded drillbit and runs tests against it
> > > > > > > >         c) The tests will be a separate target so they need
> not
> > > be
> > > > > run
> > > > > > > all
> > > > > > > > the time
> > > > > > > >  2) Tests should be divided into multiple suites that are
> based
> > > on
> > > > > > > > components. For example a test suite for testing datatypes
> will
> > > > > contain
> > > > > > > the
> > > > > > > > tests for various datatypes including complex types. A
> > > contributor
> > > > or
> > > > > > > > developer can then run these tests more frequently as an
> issue
> > is
> > > > > being
> > > > > > > > addressed and run the entire suite only once before commit.
> > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post
> the
> > > > > results
> > > > > > to
> > > > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > > > >
> > > > > > > >
> > > > > > > > Some questions -
> > > > > > > > 1) What do some other projects do?
> > > > > > > > 2) Are there any technologies we can leverage that will make
> > this
> > > > > > easier?
> > > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > > >
> > > > > > > >
> > > > > > > > Please feel free to question the assumptions and
> requirements.
> > Be
> > > > > > > creative
> > > > > > > > with your suggestions.
> > > > > > > >
> > > > > > > > Parth
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by rahul challapalli <ch...@gmail.com>.
Jacques,

I am breaking down steps 1,2 & 3 into sub-tasks so we can add/prioritize
these tasks

Item #TaskSub-TaskCommentsPriority1*Publish the tests*




Remove Proprietary Data & Queries
0

Redact Propriety Data/Queries



Move tests into drill repo
This requires some refactoring to the framework code since the test
framework uses a 2-level directory structure



Organize the tests using a label based approach
This involves code changes and moving a lot of files. When doing a one time
push it might be better to do this before publishing the tests?


Each suite should be independentSome suites wrongly assume that the data is
present. They should be identified and fixed


Cleanup hardcoded dependencies during data generationSome data-gen scripts
have hard-coded references


Cleanup downloadsThe same dataset is being downloaded multiple times by
different suites


Licenses for downloadsThe framework downloads some files automatically.
These files are publicly available.
However before downloading them users need to agree to certain terms. By
using the framework users might be skipping this step. We should look into
this
2*Setup a cluster infrastructure to run the pre-commit tests*


3*Local debugging of tests*




Add an optional maven target for running tests on a local machine
Tests can launch an embedded drillbit or they can connect to a running
drillbit through zookeeper


Running suites which require additional setup (hive, hbase etc) should be
made optional

4*Documentation*




Running Tests (options available and also listing the asumed defaults)



Explaining how tests are organized



Process for adding a new suite



On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Let's get number one done (tests out there so all community members can run
> them).  Then the whole community can work together to solve the rest.
>
> I don't think the base install should include integration test execution.
> I do think the tests should be in the main repo (as opposed to a
> secondary).
>
> We should strive to ultimately make running these integration tests a
> requirement for merging.  We need to complete all the steps before we can
> impose that.  I should be able to help on the global run component and
> supporting infrastructure.
>
> J
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Ramana,
> >
> > You are right. We are trying to address multiple issues here, but not
> with
> > a single solution. I am summarizing them
> >
> > 1. Tests should be visible to everyone (Implicit goal)
> > 2. Before applying a patch we should run tests in a clustered
> environment.
> > Parth had a suggestion(#4) in his original email.
> > 3. Developers should be able to debug majority of the tests on their
> local
> > environment. I made a few suggestions above to this regard
> >
> > - Rahul
> >
> >
> >
> >
> >
> > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com> wrote:
> >
> > > One important thing which we need to be clear on here is what are we
> > trying
> > > to address?
> > >
> > > I feel there are two separate issues here and I do not think one
> solution
> > > will fit both the issues.
> > >
> > >    1. Allowing developers to run tests on their local box so they know
> > the
> > >    changes they have are not completely wrong.
> > >    2. Allowing transparency in the integration tests process which is
> > >    currently a black box.
> > >
> > > 1 is needed for developers to make changes and have an idea that their
> > > changes are not going to fail tests en masse in the integration suite.
> 2
> > is
> > > needed because its a prerequisite for changes to be committed.
> > >
> > >
> > > Regards
> > > Ramana
> > >
> > >
> > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > >
> > > > Ramana,
> > > >
> > > > Let me fill in more details.
> > > >
> > > > 1. Before we accept a patch we want to make sure the tests run in a
> > > cluster
> > > > environment. No exceptions here.
> > > > 2. We want  the contributors to be able to debug the failing tests on
> > > their
> > > > laptops in as many cases as possbile. This requires :
> > > >         1. Tests should run on top of a local file system. (Tests can
> > > > launch an embedded drillbit or they can connect to a running drillbit
> > > > through zookeeper)
> > > >         2. Running suites which require additional setup (hive, hbase
> > > etc)
> > > > should be made optional and sufficient documentation should be
> provided
> > > for
> > > > enabling and disabling these tests.
> > > > 3. In my opinion making these new tests part of drill would make it
> > > easier
> > > > for the developers to debug and run tests instead of having a
> different
> > > > repository. But as you said it might bloat the drill project
> > > >
> > > > - Rahul
> > > >
> > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > The Hadoop family of projects has some software that integrates a
> > > > > continuous integration system so that every time a JIRA is marked
> as
> > > > > patch-available, the associated patch attached to the bug will have
> > > > > integration tests run against it.  I believe that there has been
> some
> > > > > process to use git hashes instead of patches.  The CI results are
> put
> > > > back
> > > > > on the JIRA.
> > > > >
> > > > > This is done using a fairly simple set of scripts.  Apache Yetus is
> > > just
> > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > >
> > > > > Proposal is here (don't be fooled by the fact that it looks like an
> > > > > incubation proposal):
> > > > >
> > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > >
> > > > > Early code can be found here (don't guess that this is very real
> > yet).
> > > > > More links can be found in the proposal.
> > > > >
> > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > >
> > > > > The project has not yet been formed and there are no mailing lists
> or
> > > git
> > > > > repo yet.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com>
> > > wrote:
> > > > >
> > > > > > As someone who worked on this for a while, including it as part
> of
> > > > drill
> > > > > > may bloat drill a bit too much. Also not a big fan of running
> > against
> > > > an
> > > > > > embedded drillbit. Does not replicate an actual production use
> > case.
> > > > > >
> > > > > > Additionally, setting up hive hbase and other components maybe
> > > painful
> > > > > and
> > > > > > unnecessary for most ppl. It would deter people from ever
> > > contributing
> > > > to
> > > > > > drill. We could spin up in memory hive and hbase but that's
> similar
> > > to
> > > > an
> > > > > > embedded drill bit. Does not replicate a production scenario.
> > > > > >
> > > > > > Would prefer the hive way with a central Jenkins server hosted on
> > aws
> > > > and
> > > > > > accessible to everyone.  Users should be able to submit a git url
> > and
> > > > > that
> > > > > > should be able to deploy and fire off tests. Should then have a
> way
> > > to
> > > > > > easily communicate failures to contributors and if success notify
> > the
> > > > > > commiters to commit the change.
> > > > > >
> > > > > > Ps: if hive's way is open source maybe we can look into reuse
> > rather
> > > > than
> > > > > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > > > > >
> > > > > > Regards
> > > > > > Ramana
> > > > > >
> > > > > >
> > > > > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Drill devs use a set of tests that are not available as part of
> > the
> > > > > > Apache
> > > > > > > distribution. These tests are a pre-requisite for all commits,
> > but
> > > > are
> > > > > > not
> > > > > > > available to any contributors outside the current devs.
> > > > > > >
> > > > > > > This thread is to discuss various options to make these tests
> > > > > available.
> > > > > > >
> > > > > > > Assumptions and requirements  -
> > > > > > > 1) A functional test (as opposed to a unit test) needs to be
> > closer
> > > > to
> > > > > > the
> > > > > > > end user environment than a development environment. As such,
> we
> > > > should
> > > > > > be
> > > > > > > running functional tests in a cluster environment, connect
> using
> > > > > > zookeeper
> > > > > > > etc.
> > > > > > > 2) Functional test will keep increasing in number, get more
> > complex
> > > > and
> > > > > > > take a longer and longer time to execute as we go along.
> > > > > > > 3) Some requirements are:
> > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > requirements,
> > > > > but
> > > > > > > not penalize the contributor who has a minor fix.
> > > > > > >     b) All parts of the product (especially various 'certified'
> > > > storage
> > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > >     c) It should be easy to debug issues when a test fails.
> Tests
> > > > > should
> > > > > > > fail deterministically. If a test fails, it should always fail
> > and
> > > > > always
> > > > > > > fail in the same way (easier said than done).
> > > > > > >
> > > > > > > Some suggestions -
> > > > > > > 1) Tests should be a top-level maven module within the drill
> > > project
> > > > > > >         a) We want  the integration tests to run as part of the
> > > > drill's
> > > > > > > maven build process
> > > > > > >         b) The build step for the integration-tests module
> would
> > > > launch
> > > > > > an
> > > > > > > embedded drillbit and runs tests against it
> > > > > > >         c) The tests will be a separate target so they need not
> > be
> > > > run
> > > > > > all
> > > > > > > the time
> > > > > > >  2) Tests should be divided into multiple suites that are based
> > on
> > > > > > > components. For example a test suite for testing datatypes will
> > > > contain
> > > > > > the
> > > > > > > tests for various datatypes including complex types. A
> > contributor
> > > or
> > > > > > > developer can then run these tests more frequently as an issue
> is
> > > > being
> > > > > > > addressed and run the entire suite only once before commit.
> > > > > > > 3) Provide the tests as a hosted service
> > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post the
> > > > results
> > > > > to
> > > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > > >
> > > > > > >
> > > > > > > Some questions -
> > > > > > > 1) What do some other projects do?
> > > > > > > 2) Are there any technologies we can leverage that will make
> this
> > > > > easier?
> > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > >
> > > > > > >
> > > > > > > Please feel free to question the assumptions and requirements.
> Be
> > > > > > creative
> > > > > > > with your suggestions.
> > > > > > >
> > > > > > > Parth
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ramana I N <in...@gmail.com>.
>
>  I should be able to help on the global run component and
> supporting infrastructure.


I can pitch in on that as well, let me know what help you need.

Regards
Ramana


On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Let's get number one done (tests out there so all community members can run
> them).  Then the whole community can work together to solve the rest.
>
> I don't think the base install should include integration test execution.
> I do think the tests should be in the main repo (as opposed to a
> secondary).
>
> We should strive to ultimately make running these integration tests a
> requirement for merging.  We need to complete all the steps before we can
> impose that.  I should be able to help on the global run component and
> supporting infrastructure.
>
> J
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Ramana,
> >
> > You are right. We are trying to address multiple issues here, but not
> with
> > a single solution. I am summarizing them
> >
> > 1. Tests should be visible to everyone (Implicit goal)
> > 2. Before applying a patch we should run tests in a clustered
> environment.
> > Parth had a suggestion(#4) in his original email.
> > 3. Developers should be able to debug majority of the tests on their
> local
> > environment. I made a few suggestions above to this regard
> >
> > - Rahul
> >
> >
> >
> >
> >
> > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com> wrote:
> >
> > > One important thing which we need to be clear on here is what are we
> > trying
> > > to address?
> > >
> > > I feel there are two separate issues here and I do not think one
> solution
> > > will fit both the issues.
> > >
> > >    1. Allowing developers to run tests on their local box so they know
> > the
> > >    changes they have are not completely wrong.
> > >    2. Allowing transparency in the integration tests process which is
> > >    currently a black box.
> > >
> > > 1 is needed for developers to make changes and have an idea that their
> > > changes are not going to fail tests en masse in the integration suite.
> 2
> > is
> > > needed because its a prerequisite for changes to be committed.
> > >
> > >
> > > Regards
> > > Ramana
> > >
> > >
> > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > >
> > > > Ramana,
> > > >
> > > > Let me fill in more details.
> > > >
> > > > 1. Before we accept a patch we want to make sure the tests run in a
> > > cluster
> > > > environment. No exceptions here.
> > > > 2. We want  the contributors to be able to debug the failing tests on
> > > their
> > > > laptops in as many cases as possbile. This requires :
> > > >         1. Tests should run on top of a local file system. (Tests can
> > > > launch an embedded drillbit or they can connect to a running drillbit
> > > > through zookeeper)
> > > >         2. Running suites which require additional setup (hive, hbase
> > > etc)
> > > > should be made optional and sufficient documentation should be
> provided
> > > for
> > > > enabling and disabling these tests.
> > > > 3. In my opinion making these new tests part of drill would make it
> > > easier
> > > > for the developers to debug and run tests instead of having a
> different
> > > > repository. But as you said it might bloat the drill project
> > > >
> > > > - Rahul
> > > >
> > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > The Hadoop family of projects has some software that integrates a
> > > > > continuous integration system so that every time a JIRA is marked
> as
> > > > > patch-available, the associated patch attached to the bug will have
> > > > > integration tests run against it.  I believe that there has been
> some
> > > > > process to use git hashes instead of patches.  The CI results are
> put
> > > > back
> > > > > on the JIRA.
> > > > >
> > > > > This is done using a fairly simple set of scripts.  Apache Yetus is
> > > just
> > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > >
> > > > > Proposal is here (don't be fooled by the fact that it looks like an
> > > > > incubation proposal):
> > > > >
> > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > >
> > > > > Early code can be found here (don't guess that this is very real
> > yet).
> > > > > More links can be found in the proposal.
> > > > >
> > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > >
> > > > > The project has not yet been formed and there are no mailing lists
> or
> > > git
> > > > > repo yet.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com>
> > > wrote:
> > > > >
> > > > > > As someone who worked on this for a while, including it as part
> of
> > > > drill
> > > > > > may bloat drill a bit too much. Also not a big fan of running
> > against
> > > > an
> > > > > > embedded drillbit. Does not replicate an actual production use
> > case.
> > > > > >
> > > > > > Additionally, setting up hive hbase and other components maybe
> > > painful
> > > > > and
> > > > > > unnecessary for most ppl. It would deter people from ever
> > > contributing
> > > > to
> > > > > > drill. We could spin up in memory hive and hbase but that's
> similar
> > > to
> > > > an
> > > > > > embedded drill bit. Does not replicate a production scenario.
> > > > > >
> > > > > > Would prefer the hive way with a central Jenkins server hosted on
> > aws
> > > > and
> > > > > > accessible to everyone.  Users should be able to submit a git url
> > and
> > > > > that
> > > > > > should be able to deploy and fire off tests. Should then have a
> way
> > > to
> > > > > > easily communicate failures to contributors and if success notify
> > the
> > > > > > commiters to commit the change.
> > > > > >
> > > > > > Ps: if hive's way is open source maybe we can look into reuse
> > rather
> > > > than
> > > > > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > > > > >
> > > > > > Regards
> > > > > > Ramana
> > > > > >
> > > > > >
> > > > > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Drill devs use a set of tests that are not available as part of
> > the
> > > > > > Apache
> > > > > > > distribution. These tests are a pre-requisite for all commits,
> > but
> > > > are
> > > > > > not
> > > > > > > available to any contributors outside the current devs.
> > > > > > >
> > > > > > > This thread is to discuss various options to make these tests
> > > > > available.
> > > > > > >
> > > > > > > Assumptions and requirements  -
> > > > > > > 1) A functional test (as opposed to a unit test) needs to be
> > closer
> > > > to
> > > > > > the
> > > > > > > end user environment than a development environment. As such,
> we
> > > > should
> > > > > > be
> > > > > > > running functional tests in a cluster environment, connect
> using
> > > > > > zookeeper
> > > > > > > etc.
> > > > > > > 2) Functional test will keep increasing in number, get more
> > complex
> > > > and
> > > > > > > take a longer and longer time to execute as we go along.
> > > > > > > 3) Some requirements are:
> > > > > > >     a) We want to be strict in enforcing the pre-commit
> > > requirements,
> > > > > but
> > > > > > > not penalize the contributor who has a minor fix.
> > > > > > >     b) All parts of the product (especially various 'certified'
> > > > storage
> > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > >     c) It should be easy to debug issues when a test fails.
> Tests
> > > > > should
> > > > > > > fail deterministically. If a test fails, it should always fail
> > and
> > > > > always
> > > > > > > fail in the same way (easier said than done).
> > > > > > >
> > > > > > > Some suggestions -
> > > > > > > 1) Tests should be a top-level maven module within the drill
> > > project
> > > > > > >         a) We want  the integration tests to run as part of the
> > > > drill's
> > > > > > > maven build process
> > > > > > >         b) The build step for the integration-tests module
> would
> > > > launch
> > > > > > an
> > > > > > > embedded drillbit and runs tests against it
> > > > > > >         c) The tests will be a separate target so they need not
> > be
> > > > run
> > > > > > all
> > > > > > > the time
> > > > > > >  2) Tests should be divided into multiple suites that are based
> > on
> > > > > > > components. For example a test suite for testing datatypes will
> > > > contain
> > > > > > the
> > > > > > > tests for various datatypes including complex types. A
> > contributor
> > > or
> > > > > > > developer can then run these tests more frequently as an issue
> is
> > > > being
> > > > > > > addressed and run the entire suite only once before commit.
> > > > > > > 3) Provide the tests as a hosted service
> > > > > > > 4) Setup a bot to fire the test on an AWS cluster and post the
> > > > results
> > > > > to
> > > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > > >
> > > > > > >
> > > > > > > Some questions -
> > > > > > > 1) What do some other projects do?
> > > > > > > 2) Are there any technologies we can leverage that will make
> this
> > > > > easier?
> > > > > > > 3) How do we make it easier to debug failing tests.
> > > > > > >
> > > > > > >
> > > > > > > Please feel free to question the assumptions and requirements.
> Be
> > > > > > creative
> > > > > > > with your suggestions.
> > > > > > >
> > > > > > > Parth
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Jacques Nadeau <ja...@dremio.com>.
Let's get number one done (tests out there so all community members can run
them).  Then the whole community can work together to solve the rest.

I don't think the base install should include integration test execution.
I do think the tests should be in the main repo (as opposed to a secondary).

We should strive to ultimately make running these integration tests a
requirement for merging.  We need to complete all the steps before we can
impose that.  I should be able to help on the global run component and
supporting infrastructure.

J



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Ramana,
>
> You are right. We are trying to address multiple issues here, but not with
> a single solution. I am summarizing them
>
> 1. Tests should be visible to everyone (Implicit goal)
> 2. Before applying a patch we should run tests in a clustered environment.
> Parth had a suggestion(#4) in his original email.
> 3. Developers should be able to debug majority of the tests on their local
> environment. I made a few suggestions above to this regard
>
> - Rahul
>
>
>
>
>
> On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com> wrote:
>
> > One important thing which we need to be clear on here is what are we
> trying
> > to address?
> >
> > I feel there are two separate issues here and I do not think one solution
> > will fit both the issues.
> >
> >    1. Allowing developers to run tests on their local box so they know
> the
> >    changes they have are not completely wrong.
> >    2. Allowing transparency in the integration tests process which is
> >    currently a black box.
> >
> > 1 is needed for developers to make changes and have an idea that their
> > changes are not going to fail tests en masse in the integration suite. 2
> is
> > needed because its a prerequisite for changes to be committed.
> >
> >
> > Regards
> > Ramana
> >
> >
> > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Ramana,
> > >
> > > Let me fill in more details.
> > >
> > > 1. Before we accept a patch we want to make sure the tests run in a
> > cluster
> > > environment. No exceptions here.
> > > 2. We want  the contributors to be able to debug the failing tests on
> > their
> > > laptops in as many cases as possbile. This requires :
> > >         1. Tests should run on top of a local file system. (Tests can
> > > launch an embedded drillbit or they can connect to a running drillbit
> > > through zookeeper)
> > >         2. Running suites which require additional setup (hive, hbase
> > etc)
> > > should be made optional and sufficient documentation should be provided
> > for
> > > enabling and disabling these tests.
> > > 3. In my opinion making these new tests part of drill would make it
> > easier
> > > for the developers to debug and run tests instead of having a different
> > > repository. But as you said it might bloat the drill project
> > >
> > > - Rahul
> > >
> > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > The Hadoop family of projects has some software that integrates a
> > > > continuous integration system so that every time a JIRA is marked as
> > > > patch-available, the associated patch attached to the bug will have
> > > > integration tests run against it.  I believe that there has been some
> > > > process to use git hashes instead of patches.  The CI results are put
> > > back
> > > > on the JIRA.
> > > >
> > > > This is done using a fairly simple set of scripts.  Apache Yetus is
> > just
> > > > forming as a direct-to-top-level spinoff from Hadoop
> > > >
> > > > Proposal is here (don't be fooled by the fact that it looks like an
> > > > incubation proposal):
> > > >
> > > > http://wiki.apache.org/incubator/YetusProposal
> > > >
> > > > Early code can be found here (don't guess that this is very real
> yet).
> > > > More links can be found in the proposal.
> > > >
> > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > >
> > > > The project has not yet been formed and there are no mailing lists or
> > git
> > > > repo yet.
> > > >
> > > >
> > > >
> > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com>
> > wrote:
> > > >
> > > > > As someone who worked on this for a while, including it as part of
> > > drill
> > > > > may bloat drill a bit too much. Also not a big fan of running
> against
> > > an
> > > > > embedded drillbit. Does not replicate an actual production use
> case.
> > > > >
> > > > > Additionally, setting up hive hbase and other components maybe
> > painful
> > > > and
> > > > > unnecessary for most ppl. It would deter people from ever
> > contributing
> > > to
> > > > > drill. We could spin up in memory hive and hbase but that's similar
> > to
> > > an
> > > > > embedded drill bit. Does not replicate a production scenario.
> > > > >
> > > > > Would prefer the hive way with a central Jenkins server hosted on
> aws
> > > and
> > > > > accessible to everyone.  Users should be able to submit a git url
> and
> > > > that
> > > > > should be able to deploy and fire off tests. Should then have a way
> > to
> > > > > easily communicate failures to contributors and if success notify
> the
> > > > > commiters to commit the change.
> > > > >
> > > > > Ps: if hive's way is open source maybe we can look into reuse
> rather
> > > than
> > > > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > > > >
> > > > > Regards
> > > > > Ramana
> > > > >
> > > > >
> > > > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org>
> wrote:
> > > > >
> > > > > > Drill devs use a set of tests that are not available as part of
> the
> > > > > Apache
> > > > > > distribution. These tests are a pre-requisite for all commits,
> but
> > > are
> > > > > not
> > > > > > available to any contributors outside the current devs.
> > > > > >
> > > > > > This thread is to discuss various options to make these tests
> > > > available.
> > > > > >
> > > > > > Assumptions and requirements  -
> > > > > > 1) A functional test (as opposed to a unit test) needs to be
> closer
> > > to
> > > > > the
> > > > > > end user environment than a development environment. As such, we
> > > should
> > > > > be
> > > > > > running functional tests in a cluster environment, connect using
> > > > > zookeeper
> > > > > > etc.
> > > > > > 2) Functional test will keep increasing in number, get more
> complex
> > > and
> > > > > > take a longer and longer time to execute as we go along.
> > > > > > 3) Some requirements are:
> > > > > >     a) We want to be strict in enforcing the pre-commit
> > requirements,
> > > > but
> > > > > > not penalize the contributor who has a minor fix.
> > > > > >     b) All parts of the product (especially various 'certified'
> > > storage
> > > > > > plugins like Hive and Hbase should get tested)
> > > > > >     c) It should be easy to debug issues when a test fails. Tests
> > > > should
> > > > > > fail deterministically. If a test fails, it should always fail
> and
> > > > always
> > > > > > fail in the same way (easier said than done).
> > > > > >
> > > > > > Some suggestions -
> > > > > > 1) Tests should be a top-level maven module within the drill
> > project
> > > > > >         a) We want  the integration tests to run as part of the
> > > drill's
> > > > > > maven build process
> > > > > >         b) The build step for the integration-tests module would
> > > launch
> > > > > an
> > > > > > embedded drillbit and runs tests against it
> > > > > >         c) The tests will be a separate target so they need not
> be
> > > run
> > > > > all
> > > > > > the time
> > > > > >  2) Tests should be divided into multiple suites that are based
> on
> > > > > > components. For example a test suite for testing datatypes will
> > > contain
> > > > > the
> > > > > > tests for various datatypes including complex types. A
> contributor
> > or
> > > > > > developer can then run these tests more frequently as an issue is
> > > being
> > > > > > addressed and run the entire suite only once before commit.
> > > > > > 3) Provide the tests as a hosted service
> > > > > > 4) Setup a bot to fire the test on an AWS cluster and post the
> > > results
> > > > to
> > > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > > >
> > > > > >
> > > > > > Some questions -
> > > > > > 1) What do some other projects do?
> > > > > > 2) Are there any technologies we can leverage that will make this
> > > > easier?
> > > > > > 3) How do we make it easier to debug failing tests.
> > > > > >
> > > > > >
> > > > > > Please feel free to question the assumptions and requirements. Be
> > > > > creative
> > > > > > with your suggestions.
> > > > > >
> > > > > > Parth
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by rahul challapalli <ch...@gmail.com>.
Ramana,

You are right. We are trying to address multiple issues here, but not with
a single solution. I am summarizing them

1. Tests should be visible to everyone (Implicit goal)
2. Before applying a patch we should run tests in a clustered environment.
Parth had a suggestion(#4) in his original email.
3. Developers should be able to debug majority of the tests on their local
environment. I made a few suggestions above to this regard

- Rahul





On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <in...@gmail.com> wrote:

> One important thing which we need to be clear on here is what are we trying
> to address?
>
> I feel there are two separate issues here and I do not think one solution
> will fit both the issues.
>
>    1. Allowing developers to run tests on their local box so they know the
>    changes they have are not completely wrong.
>    2. Allowing transparency in the integration tests process which is
>    currently a black box.
>
> 1 is needed for developers to make changes and have an idea that their
> changes are not going to fail tests en masse in the integration suite. 2 is
> needed because its a prerequisite for changes to be committed.
>
>
> Regards
> Ramana
>
>
> On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Ramana,
> >
> > Let me fill in more details.
> >
> > 1. Before we accept a patch we want to make sure the tests run in a
> cluster
> > environment. No exceptions here.
> > 2. We want  the contributors to be able to debug the failing tests on
> their
> > laptops in as many cases as possbile. This requires :
> >         1. Tests should run on top of a local file system. (Tests can
> > launch an embedded drillbit or they can connect to a running drillbit
> > through zookeeper)
> >         2. Running suites which require additional setup (hive, hbase
> etc)
> > should be made optional and sufficient documentation should be provided
> for
> > enabling and disabling these tests.
> > 3. In my opinion making these new tests part of drill would make it
> easier
> > for the developers to debug and run tests instead of having a different
> > repository. But as you said it might bloat the drill project
> >
> > - Rahul
> >
> > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > The Hadoop family of projects has some software that integrates a
> > > continuous integration system so that every time a JIRA is marked as
> > > patch-available, the associated patch attached to the bug will have
> > > integration tests run against it.  I believe that there has been some
> > > process to use git hashes instead of patches.  The CI results are put
> > back
> > > on the JIRA.
> > >
> > > This is done using a fairly simple set of scripts.  Apache Yetus is
> just
> > > forming as a direct-to-top-level spinoff from Hadoop
> > >
> > > Proposal is here (don't be fooled by the fact that it looks like an
> > > incubation proposal):
> > >
> > > http://wiki.apache.org/incubator/YetusProposal
> > >
> > > Early code can be found here (don't guess that this is very real yet).
> > > More links can be found in the proposal.
> > >
> > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > >
> > > The project has not yet been formed and there are no mailing lists or
> git
> > > repo yet.
> > >
> > >
> > >
> > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com>
> wrote:
> > >
> > > > As someone who worked on this for a while, including it as part of
> > drill
> > > > may bloat drill a bit too much. Also not a big fan of running against
> > an
> > > > embedded drillbit. Does not replicate an actual production use case.
> > > >
> > > > Additionally, setting up hive hbase and other components maybe
> painful
> > > and
> > > > unnecessary for most ppl. It would deter people from ever
> contributing
> > to
> > > > drill. We could spin up in memory hive and hbase but that's similar
> to
> > an
> > > > embedded drill bit. Does not replicate a production scenario.
> > > >
> > > > Would prefer the hive way with a central Jenkins server hosted on aws
> > and
> > > > accessible to everyone.  Users should be able to submit a git url and
> > > that
> > > > should be able to deploy and fire off tests. Should then have a way
> to
> > > > easily communicate failures to contributors and if success notify the
> > > > commiters to commit the change.
> > > >
> > > > Ps: if hive's way is open source maybe we can look into reuse rather
> > than
> > > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > > >
> > > > Regards
> > > > Ramana
> > > >
> > > >
> > > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org> wrote:
> > > >
> > > > > Drill devs use a set of tests that are not available as part of the
> > > > Apache
> > > > > distribution. These tests are a pre-requisite for all commits, but
> > are
> > > > not
> > > > > available to any contributors outside the current devs.
> > > > >
> > > > > This thread is to discuss various options to make these tests
> > > available.
> > > > >
> > > > > Assumptions and requirements  -
> > > > > 1) A functional test (as opposed to a unit test) needs to be closer
> > to
> > > > the
> > > > > end user environment than a development environment. As such, we
> > should
> > > > be
> > > > > running functional tests in a cluster environment, connect using
> > > > zookeeper
> > > > > etc.
> > > > > 2) Functional test will keep increasing in number, get more complex
> > and
> > > > > take a longer and longer time to execute as we go along.
> > > > > 3) Some requirements are:
> > > > >     a) We want to be strict in enforcing the pre-commit
> requirements,
> > > but
> > > > > not penalize the contributor who has a minor fix.
> > > > >     b) All parts of the product (especially various 'certified'
> > storage
> > > > > plugins like Hive and Hbase should get tested)
> > > > >     c) It should be easy to debug issues when a test fails. Tests
> > > should
> > > > > fail deterministically. If a test fails, it should always fail and
> > > always
> > > > > fail in the same way (easier said than done).
> > > > >
> > > > > Some suggestions -
> > > > > 1) Tests should be a top-level maven module within the drill
> project
> > > > >         a) We want  the integration tests to run as part of the
> > drill's
> > > > > maven build process
> > > > >         b) The build step for the integration-tests module would
> > launch
> > > > an
> > > > > embedded drillbit and runs tests against it
> > > > >         c) The tests will be a separate target so they need not be
> > run
> > > > all
> > > > > the time
> > > > >  2) Tests should be divided into multiple suites that are based on
> > > > > components. For example a test suite for testing datatypes will
> > contain
> > > > the
> > > > > tests for various datatypes including complex types. A contributor
> or
> > > > > developer can then run these tests more frequently as an issue is
> > being
> > > > > addressed and run the entire suite only once before commit.
> > > > > 3) Provide the tests as a hosted service
> > > > > 4) Setup a bot to fire the test on an AWS cluster and post the
> > results
> > > to
> > > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > > >
> > > > >
> > > > > Some questions -
> > > > > 1) What do some other projects do?
> > > > > 2) Are there any technologies we can leverage that will make this
> > > easier?
> > > > > 3) How do we make it easier to debug failing tests.
> > > > >
> > > > >
> > > > > Please feel free to question the assumptions and requirements. Be
> > > > creative
> > > > > with your suggestions.
> > > > >
> > > > > Parth
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ramana I N <in...@gmail.com>.
One important thing which we need to be clear on here is what are we trying
to address?

I feel there are two separate issues here and I do not think one solution
will fit both the issues.

   1. Allowing developers to run tests on their local box so they know the
   changes they have are not completely wrong.
   2. Allowing transparency in the integration tests process which is
   currently a black box.

1 is needed for developers to make changes and have an idea that their
changes are not going to fail tests en masse in the integration suite. 2 is
needed because its a prerequisite for changes to be committed.


Regards
Ramana


On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Ramana,
>
> Let me fill in more details.
>
> 1. Before we accept a patch we want to make sure the tests run in a cluster
> environment. No exceptions here.
> 2. We want  the contributors to be able to debug the failing tests on their
> laptops in as many cases as possbile. This requires :
>         1. Tests should run on top of a local file system. (Tests can
> launch an embedded drillbit or they can connect to a running drillbit
> through zookeeper)
>         2. Running suites which require additional setup (hive, hbase etc)
> should be made optional and sufficient documentation should be provided for
> enabling and disabling these tests.
> 3. In my opinion making these new tests part of drill would make it easier
> for the developers to debug and run tests instead of having a different
> repository. But as you said it might bloat the drill project
>
> - Rahul
>
> On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > The Hadoop family of projects has some software that integrates a
> > continuous integration system so that every time a JIRA is marked as
> > patch-available, the associated patch attached to the bug will have
> > integration tests run against it.  I believe that there has been some
> > process to use git hashes instead of patches.  The CI results are put
> back
> > on the JIRA.
> >
> > This is done using a fairly simple set of scripts.  Apache Yetus is just
> > forming as a direct-to-top-level spinoff from Hadoop
> >
> > Proposal is here (don't be fooled by the fact that it looks like an
> > incubation proposal):
> >
> > http://wiki.apache.org/incubator/YetusProposal
> >
> > Early code can be found here (don't guess that this is very real yet).
> > More links can be found in the proposal.
> >
> > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> >
> > The project has not yet been formed and there are no mailing lists or git
> > repo yet.
> >
> >
> >
> > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com> wrote:
> >
> > > As someone who worked on this for a while, including it as part of
> drill
> > > may bloat drill a bit too much. Also not a big fan of running against
> an
> > > embedded drillbit. Does not replicate an actual production use case.
> > >
> > > Additionally, setting up hive hbase and other components maybe painful
> > and
> > > unnecessary for most ppl. It would deter people from ever contributing
> to
> > > drill. We could spin up in memory hive and hbase but that's similar to
> an
> > > embedded drill bit. Does not replicate a production scenario.
> > >
> > > Would prefer the hive way with a central Jenkins server hosted on aws
> and
> > > accessible to everyone.  Users should be able to submit a git url and
> > that
> > > should be able to deploy and fire off tests. Should then have a way to
> > > easily communicate failures to contributors and if success notify the
> > > commiters to commit the change.
> > >
> > > Ps: if hive's way is open source maybe we can look into reuse rather
> than
> > > doing it from scratch. Esp the Jenkins and configuration stuff.
> > >
> > > Regards
> > > Ramana
> > >
> > >
> > > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org> wrote:
> > >
> > > > Drill devs use a set of tests that are not available as part of the
> > > Apache
> > > > distribution. These tests are a pre-requisite for all commits, but
> are
> > > not
> > > > available to any contributors outside the current devs.
> > > >
> > > > This thread is to discuss various options to make these tests
> > available.
> > > >
> > > > Assumptions and requirements  -
> > > > 1) A functional test (as opposed to a unit test) needs to be closer
> to
> > > the
> > > > end user environment than a development environment. As such, we
> should
> > > be
> > > > running functional tests in a cluster environment, connect using
> > > zookeeper
> > > > etc.
> > > > 2) Functional test will keep increasing in number, get more complex
> and
> > > > take a longer and longer time to execute as we go along.
> > > > 3) Some requirements are:
> > > >     a) We want to be strict in enforcing the pre-commit requirements,
> > but
> > > > not penalize the contributor who has a minor fix.
> > > >     b) All parts of the product (especially various 'certified'
> storage
> > > > plugins like Hive and Hbase should get tested)
> > > >     c) It should be easy to debug issues when a test fails. Tests
> > should
> > > > fail deterministically. If a test fails, it should always fail and
> > always
> > > > fail in the same way (easier said than done).
> > > >
> > > > Some suggestions -
> > > > 1) Tests should be a top-level maven module within the drill project
> > > >         a) We want  the integration tests to run as part of the
> drill's
> > > > maven build process
> > > >         b) The build step for the integration-tests module would
> launch
> > > an
> > > > embedded drillbit and runs tests against it
> > > >         c) The tests will be a separate target so they need not be
> run
> > > all
> > > > the time
> > > >  2) Tests should be divided into multiple suites that are based on
> > > > components. For example a test suite for testing datatypes will
> contain
> > > the
> > > > tests for various datatypes including complex types. A contributor or
> > > > developer can then run these tests more frequently as an issue is
> being
> > > > addressed and run the entire suite only once before commit.
> > > > 3) Provide the tests as a hosted service
> > > > 4) Setup a bot to fire the test on an AWS cluster and post the
> results
> > to
> > > > the JIRA  (Hive does this). Or some variant of this idea.
> > > >
> > > >
> > > > Some questions -
> > > > 1) What do some other projects do?
> > > > 2) Are there any technologies we can leverage that will make this
> > easier?
> > > > 3) How do we make it easier to debug failing tests.
> > > >
> > > >
> > > > Please feel free to question the assumptions and requirements. Be
> > > creative
> > > > with your suggestions.
> > > >
> > > > Parth
> > > >
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by rahul challapalli <ch...@gmail.com>.
Ramana,

Let me fill in more details.

1. Before we accept a patch we want to make sure the tests run in a cluster
environment. No exceptions here.
2. We want  the contributors to be able to debug the failing tests on their
laptops in as many cases as possbile. This requires :
        1. Tests should run on top of a local file system. (Tests can
launch an embedded drillbit or they can connect to a running drillbit
through zookeeper)
        2. Running suites which require additional setup (hive, hbase etc)
should be made optional and sufficient documentation should be provided for
enabling and disabling these tests.
3. In my opinion making these new tests part of drill would make it easier
for the developers to debug and run tests instead of having a different
repository. But as you said it might bloat the drill project

- Rahul

On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <te...@gmail.com> wrote:

> The Hadoop family of projects has some software that integrates a
> continuous integration system so that every time a JIRA is marked as
> patch-available, the associated patch attached to the bug will have
> integration tests run against it.  I believe that there has been some
> process to use git hashes instead of patches.  The CI results are put back
> on the JIRA.
>
> This is done using a fairly simple set of scripts.  Apache Yetus is just
> forming as a direct-to-top-level spinoff from Hadoop
>
> Proposal is here (don't be fooled by the fact that it looks like an
> incubation proposal):
>
> http://wiki.apache.org/incubator/YetusProposal
>
> Early code can be found here (don't guess that this is very real yet).
> More links can be found in the proposal.
>
> https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
>
> The project has not yet been formed and there are no mailing lists or git
> repo yet.
>
>
>
> On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com> wrote:
>
> > As someone who worked on this for a while, including it as part of drill
> > may bloat drill a bit too much. Also not a big fan of running against an
> > embedded drillbit. Does not replicate an actual production use case.
> >
> > Additionally, setting up hive hbase and other components maybe painful
> and
> > unnecessary for most ppl. It would deter people from ever contributing to
> > drill. We could spin up in memory hive and hbase but that's similar to an
> > embedded drill bit. Does not replicate a production scenario.
> >
> > Would prefer the hive way with a central Jenkins server hosted on aws and
> > accessible to everyone.  Users should be able to submit a git url and
> that
> > should be able to deploy and fire off tests. Should then have a way to
> > easily communicate failures to contributors and if success notify the
> > commiters to commit the change.
> >
> > Ps: if hive's way is open source maybe we can look into reuse rather than
> > doing it from scratch. Esp the Jenkins and configuration stuff.
> >
> > Regards
> > Ramana
> >
> >
> > On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org> wrote:
> >
> > > Drill devs use a set of tests that are not available as part of the
> > Apache
> > > distribution. These tests are a pre-requisite for all commits, but are
> > not
> > > available to any contributors outside the current devs.
> > >
> > > This thread is to discuss various options to make these tests
> available.
> > >
> > > Assumptions and requirements  -
> > > 1) A functional test (as opposed to a unit test) needs to be closer to
> > the
> > > end user environment than a development environment. As such, we should
> > be
> > > running functional tests in a cluster environment, connect using
> > zookeeper
> > > etc.
> > > 2) Functional test will keep increasing in number, get more complex and
> > > take a longer and longer time to execute as we go along.
> > > 3) Some requirements are:
> > >     a) We want to be strict in enforcing the pre-commit requirements,
> but
> > > not penalize the contributor who has a minor fix.
> > >     b) All parts of the product (especially various 'certified' storage
> > > plugins like Hive and Hbase should get tested)
> > >     c) It should be easy to debug issues when a test fails. Tests
> should
> > > fail deterministically. If a test fails, it should always fail and
> always
> > > fail in the same way (easier said than done).
> > >
> > > Some suggestions -
> > > 1) Tests should be a top-level maven module within the drill project
> > >         a) We want  the integration tests to run as part of the drill's
> > > maven build process
> > >         b) The build step for the integration-tests module would launch
> > an
> > > embedded drillbit and runs tests against it
> > >         c) The tests will be a separate target so they need not be run
> > all
> > > the time
> > >  2) Tests should be divided into multiple suites that are based on
> > > components. For example a test suite for testing datatypes will contain
> > the
> > > tests for various datatypes including complex types. A contributor or
> > > developer can then run these tests more frequently as an issue is being
> > > addressed and run the entire suite only once before commit.
> > > 3) Provide the tests as a hosted service
> > > 4) Setup a bot to fire the test on an AWS cluster and post the results
> to
> > > the JIRA  (Hive does this). Or some variant of this idea.
> > >
> > >
> > > Some questions -
> > > 1) What do some other projects do?
> > > 2) Are there any technologies we can leverage that will make this
> easier?
> > > 3) How do we make it easier to debug failing tests.
> > >
> > >
> > > Please feel free to question the assumptions and requirements. Be
> > creative
> > > with your suggestions.
> > >
> > > Parth
> > >
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ted Dunning <te...@gmail.com>.
The Hadoop family of projects has some software that integrates a
continuous integration system so that every time a JIRA is marked as
patch-available, the associated patch attached to the bug will have
integration tests run against it.  I believe that there has been some
process to use git hashes instead of patches.  The CI results are put back
on the JIRA.

This is done using a fairly simple set of scripts.  Apache Yetus is just
forming as a direct-to-top-level spinoff from Hadoop

Proposal is here (don't be fooled by the fact that it looks like an
incubation proposal):

http://wiki.apache.org/incubator/YetusProposal

Early code can be found here (don't guess that this is very real yet).
More links can be found in the proposal.

https://github.com/sekikn/pre-yetus/tree/master/precommit/docs

The project has not yet been formed and there are no mailing lists or git
repo yet.



On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <in...@gmail.com> wrote:

> As someone who worked on this for a while, including it as part of drill
> may bloat drill a bit too much. Also not a big fan of running against an
> embedded drillbit. Does not replicate an actual production use case.
>
> Additionally, setting up hive hbase and other components maybe painful and
> unnecessary for most ppl. It would deter people from ever contributing to
> drill. We could spin up in memory hive and hbase but that's similar to an
> embedded drill bit. Does not replicate a production scenario.
>
> Would prefer the hive way with a central Jenkins server hosted on aws and
> accessible to everyone.  Users should be able to submit a git url and that
> should be able to deploy and fire off tests. Should then have a way to
> easily communicate failures to contributors and if success notify the
> commiters to commit the change.
>
> Ps: if hive's way is open source maybe we can look into reuse rather than
> doing it from scratch. Esp the Jenkins and configuration stuff.
>
> Regards
> Ramana
>
>
> On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org> wrote:
>
> > Drill devs use a set of tests that are not available as part of the
> Apache
> > distribution. These tests are a pre-requisite for all commits, but are
> not
> > available to any contributors outside the current devs.
> >
> > This thread is to discuss various options to make these tests available.
> >
> > Assumptions and requirements  -
> > 1) A functional test (as opposed to a unit test) needs to be closer to
> the
> > end user environment than a development environment. As such, we should
> be
> > running functional tests in a cluster environment, connect using
> zookeeper
> > etc.
> > 2) Functional test will keep increasing in number, get more complex and
> > take a longer and longer time to execute as we go along.
> > 3) Some requirements are:
> >     a) We want to be strict in enforcing the pre-commit requirements, but
> > not penalize the contributor who has a minor fix.
> >     b) All parts of the product (especially various 'certified' storage
> > plugins like Hive and Hbase should get tested)
> >     c) It should be easy to debug issues when a test fails. Tests should
> > fail deterministically. If a test fails, it should always fail and always
> > fail in the same way (easier said than done).
> >
> > Some suggestions -
> > 1) Tests should be a top-level maven module within the drill project
> >         a) We want  the integration tests to run as part of the drill's
> > maven build process
> >         b) The build step for the integration-tests module would launch
> an
> > embedded drillbit and runs tests against it
> >         c) The tests will be a separate target so they need not be run
> all
> > the time
> >  2) Tests should be divided into multiple suites that are based on
> > components. For example a test suite for testing datatypes will contain
> the
> > tests for various datatypes including complex types. A contributor or
> > developer can then run these tests more frequently as an issue is being
> > addressed and run the entire suite only once before commit.
> > 3) Provide the tests as a hosted service
> > 4) Setup a bot to fire the test on an AWS cluster and post the results to
> > the JIRA  (Hive does this). Or some variant of this idea.
> >
> >
> > Some questions -
> > 1) What do some other projects do?
> > 2) Are there any technologies we can leverage that will make this easier?
> > 3) How do we make it easier to debug failing tests.
> >
> >
> > Please feel free to question the assumptions and requirements. Be
> creative
> > with your suggestions.
> >
> > Parth
> >
>

Re: [DISCUSS] Publishing advanced/functional tests

Posted by Ramana I N <in...@gmail.com>.
As someone who worked on this for a while, including it as part of drill
may bloat drill a bit too much. Also not a big fan of running against an
embedded drillbit. Does not replicate an actual production use case.

Additionally, setting up hive hbase and other components maybe painful and
unnecessary for most ppl. It would deter people from ever contributing to
drill. We could spin up in memory hive and hbase but that's similar to an
embedded drill bit. Does not replicate a production scenario.

Would prefer the hive way with a central Jenkins server hosted on aws and
accessible to everyone.  Users should be able to submit a git url and that
should be able to deploy and fire off tests. Should then have a way to
easily communicate failures to contributors and if success notify the
commiters to commit the change.

Ps: if hive's way is open source maybe we can look into reuse rather than
doing it from scratch. Esp the Jenkins and configuration stuff.

Regards
Ramana


On Thursday, July 23, 2015, Parth Chandra <pa...@apache.org> wrote:

> Drill devs use a set of tests that are not available as part of the Apache
> distribution. These tests are a pre-requisite for all commits, but are not
> available to any contributors outside the current devs.
>
> This thread is to discuss various options to make these tests available.
>
> Assumptions and requirements  -
> 1) A functional test (as opposed to a unit test) needs to be closer to the
> end user environment than a development environment. As such, we should be
> running functional tests in a cluster environment, connect using  zookeeper
> etc.
> 2) Functional test will keep increasing in number, get more complex and
> take a longer and longer time to execute as we go along.
> 3) Some requirements are:
>     a) We want to be strict in enforcing the pre-commit requirements, but
> not penalize the contributor who has a minor fix.
>     b) All parts of the product (especially various 'certified' storage
> plugins like Hive and Hbase should get tested)
>     c) It should be easy to debug issues when a test fails. Tests should
> fail deterministically. If a test fails, it should always fail and always
> fail in the same way (easier said than done).
>
> Some suggestions -
> 1) Tests should be a top-level maven module within the drill project
>         a) We want  the integration tests to run as part of the drill's
> maven build process
>         b) The build step for the integration-tests module would launch an
> embedded drillbit and runs tests against it
>         c) The tests will be a separate target so they need not be run all
> the time
>  2) Tests should be divided into multiple suites that are based on
> components. For example a test suite for testing datatypes will contain the
> tests for various datatypes including complex types. A contributor or
> developer can then run these tests more frequently as an issue is being
> addressed and run the entire suite only once before commit.
> 3) Provide the tests as a hosted service
> 4) Setup a bot to fire the test on an AWS cluster and post the results to
> the JIRA  (Hive does this). Or some variant of this idea.
>
>
> Some questions -
> 1) What do some other projects do?
> 2) Are there any technologies we can leverage that will make this easier?
> 3) How do we make it easier to debug failing tests.
>
>
> Please feel free to question the assumptions and requirements. Be creative
> with your suggestions.
>
> Parth
>