You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@impala.apache.org by Jim Apple <jb...@cloudera.com> on 2017/09/06 17:05:16 UTC

New Impala contributors: IMPALA-5754

If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5754,
"rand() algorithm is very non-random"? This is a partial walk-through
of how to get started.

Set up your development environment. Then, look for where we might
first write a failing test. The test case given in the ticket is
"select count(distinct(rand(867-5309))), count(*) from alltypes a,
alltypes b;". Tests that run a full query are considered "end-to-end
tests".

End-to-end tests are described in two ways: .test files and .py files.

.test files contain queries and their expected results. For example:

====
---- QUERY
# Regression test for IMPALA-938
select smallint_col, int_col, (cast("1970-01-01" as timestamp) +
interval smallint_col days)
from functional.alltypes where smallint_col = 1 limit 1
---- RESULTS
1,1,1970-01-02 00:00:00
---- TYPES
smallint, int, timestamp
====

That is taken from
testdata/workloads/functional-query/queries/QueryTest/exprs.test.
That's a good test file to add a test case to, since it is testing
"exprs", and the bug is in  MathFunctions::Rand, which is defined in
be/src/exprs.

First, let's run all of the exprs tests to see that they pass. You can
see them called in tests/query_test/test_exprs.py. The Python scrips
in tests/ can run these .test files by calling ImpalaTestSuite's
run_test_case() method with an abbreviated name of the .test file. In
test_exprs.py, this looks like

self.run_test_case('QueryTest/exprs', vector)

That call is in the method TestExprs.test_exprs(); you can invoke it with:

./bin/impala-py.test
tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity

This should take about 40 seconds and should pass, indicated by a
return value of 0 and a green line printed to the terminal reading:

...====== 1 passed in 39.85 seconds ======...

Now add a test case, following the example from the ticket and the
format in exprs.test. Run the test again; it should fail.

Fix the bug and run the test again. Once the test is passing, follow
the instructions on the wiki to send your patch for code review:
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

Re: New Impala contributors: IMPALA-5754

Posted by Jim Apple <jb...@cloudera.com>.

I have posted a link on the ticket to
https://lists.apache.org/thread.html/6fbcfa650cbb920e2b517ae643bcd0859f1ba0368451d2949eda274d@%3Cdev.impala.apache.org%3E.
I hope to write some more of these, after which perhaps I should make
a space on the wiki to hold them all.

On Wed, Sep 6, 2017 at 10:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
> Hey JIm,
>
> This is a great tutorial, thanks for posting it. One thought: would be
> great to put this somewhere on the web -- either as a blog post or wiki
> entry, so if someone googles they are more likely to find it. (sometimes
> mailing list archives are harder to bring up in google results)
>
> On Wed, Sep 6, 2017 at 10:05 AM, Jim Apple <jb...@cloudera.com> wrote:
>
>> If you'd like to contribute a patch to Impala, but aren't sure what
>> you want to work on, you can look at Impala's newbie issues:
>> https://issues.apache.org/jira/issues/?filter=12341668. You can find
>> detailed instructions on submitting patches at
>> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
>> This is a walkthrough of a ticket a new contributor could take on,
>> with hopefully enough detail to get you going but not so much to take
>> away the fun.
>>
>> How can we fix https://issues.apache.org/jira/browse/IMPALA-5754,
>> "rand() algorithm is very non-random"? This is a partial walk-through
>> of how to get started.
>>
>> Set up your development environment. Then, look for where we might
>> first write a failing test. The test case given in the ticket is
>> "select count(distinct(rand(867-5309))), count(*) from alltypes a,
>> alltypes b;". Tests that run a full query are considered "end-to-end
>> tests".
>>
>> End-to-end tests are described in two ways: .test files and .py files.
>>
>> .test files contain queries and their expected results. For example:
>>
>> ====
>> ---- QUERY
>> # Regression test for IMPALA-938
>> select smallint_col, int_col, (cast("1970-01-01" as timestamp) +
>> interval smallint_col days)
>> from functional.alltypes where smallint_col = 1 limit 1
>> ---- RESULTS
>> 1,1,1970-01-02 00:00:00
>> ---- TYPES
>> smallint, int, timestamp
>> ====
>>
>> That is taken from
>> testdata/workloads/functional-query/queries/QueryTest/exprs.test.
>> That's a good test file to add a test case to, since it is testing
>> "exprs", and the bug is in  MathFunctions::Rand, which is defined in
>> be/src/exprs.
>>
>> First, let's run all of the exprs tests to see that they pass. You can
>> see them called in tests/query_test/test_exprs.py. The Python scrips
>> in tests/ can run these .test files by calling ImpalaTestSuite's
>> run_test_case() method with an abbreviated name of the .test file. In
>> test_exprs.py, this looks like
>>
>> self.run_test_case('QueryTest/exprs', vector)
>>
>> That call is in the method TestExprs.test_exprs(); you can invoke it with:
>>
>> ./bin/impala-py.test
>> tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity
>>
>> This should take about 40 seconds and should pass, indicated by a
>> return value of 0 and a green line printed to the terminal reading:
>>
>> ...====== 1 passed in 39.85 seconds ======...
>>
>> Now add a test case, following the example from the ticket and the
>> format in exprs.test. Run the test again; it should fail.
>>
>> Fix the bug and run the test again. Once the test is passing, follow
>> the instructions on the wiki to send your patch for code review:
>> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera

Re: New Impala contributors: IMPALA-5754

Posted by Todd Lipcon <to...@cloudera.com>.

Hey JIm,

This is a great tutorial, thanks for posting it. One thought: would be
great to put this somewhere on the web -- either as a blog post or wiki
entry, so if someone googles they are more likely to find it. (sometimes
mailing list archives are harder to bring up in google results)

On Wed, Sep 6, 2017 at 10:05 AM, Jim Apple <jb...@cloudera.com> wrote:

> If you'd like to contribute a patch to Impala, but aren't sure what
> you want to work on, you can look at Impala's newbie issues:
> https://issues.apache.org/jira/issues/?filter=12341668. You can find
> detailed instructions on submitting patches at
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
> This is a walkthrough of a ticket a new contributor could take on,
> with hopefully enough detail to get you going but not so much to take
> away the fun.
>
> How can we fix https://issues.apache.org/jira/browse/IMPALA-5754,
> "rand() algorithm is very non-random"? This is a partial walk-through
> of how to get started.
>
> Set up your development environment. Then, look for where we might
> first write a failing test. The test case given in the ticket is
> "select count(distinct(rand(867-5309))), count(*) from alltypes a,
> alltypes b;". Tests that run a full query are considered "end-to-end
> tests".
>
> End-to-end tests are described in two ways: .test files and .py files.
>
> .test files contain queries and their expected results. For example:
>
> ====
> ---- QUERY
> # Regression test for IMPALA-938
> select smallint_col, int_col, (cast("1970-01-01" as timestamp) +
> interval smallint_col days)
> from functional.alltypes where smallint_col = 1 limit 1
> ---- RESULTS
> 1,1,1970-01-02 00:00:00
> ---- TYPES
> smallint, int, timestamp
> ====
>
> That is taken from
> testdata/workloads/functional-query/queries/QueryTest/exprs.test.
> That's a good test file to add a test case to, since it is testing
> "exprs", and the bug is in  MathFunctions::Rand, which is defined in
> be/src/exprs.
>
> First, let's run all of the exprs tests to see that they pass. You can
> see them called in tests/query_test/test_exprs.py. The Python scrips
> in tests/ can run these .test files by calling ImpalaTestSuite's
> run_test_case() method with an abbreviated name of the .test file. In
> test_exprs.py, this looks like
>
> self.run_test_case('QueryTest/exprs', vector)
>
> That call is in the method TestExprs.test_exprs(); you can invoke it with:
>
> ./bin/impala-py.test
> tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity
>
> This should take about 40 seconds and should pass, indicated by a
> return value of 0 and a green line printed to the terminal reading:
>
> ...====== 1 passed in 39.85 seconds ======...
>
> Now add a test case, following the example from the ticket and the
> format in exprs.test. Run the test again; it should fail.
>
> Fix the bug and run the test again. Once the test is passing, follow
> the instructions on the wiki to send your patch for code review:
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
>



-- 
Todd Lipcon
Software Engineer, Cloudera