You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Imran Rashid <ir...@cloudera.com.INVALID> on 2018/08/24 16:53:26 UTC

python tests: any reason for a huge tests.py?

Hi,

another question from looking more at python recently.  Is there any reason
we've got a ton of tests in one humongous tests.py file, rather than
breaking it out into smaller files?

Having one huge file doesn't seem great for code organization, and it also
makes the test parallelization in run-tests.py not work as well.  On my
laptop, tests.py takes 150s, and the next longest test file takes only 20s.

can we at least try to put new tests into smaller files?

thanks,
Imran

Re: python tests: any reason for a huge tests.py?

Posted by Bryan Cutler <cu...@gmail.com>.
Hi Imran,

I agree it would be good to split up the tests, but there might be a couple
things to discuss first. Right now we have a single "test.py" for each
subpackage. I think it makes sense to roughly have a test file for most
modules, e.g. "test_rdd.py", but it might not always be clear cut and there
could be other ways to split them up.  Also, should we put the test files
in the same directory as source or a subdirectory named "tests." My
preference is for a subdirectory.  As for putting new tests into their own
files right away, it seems better to me to keep them with related tests for
now and separate as it's own task to avoid fragmenting the test suites. If
it's done incrementally, I don't think merge conflicts will cause a
problem. Let be summarize this in SPARK-25344.

Thanks,
Bryan

On Wed, Sep 12, 2018 at 10:48 AM Imran Rashid <ir...@cloudera.com.invalid>
wrote:

> So I've had some offline discussion around this, so I'd like to clarify.
> SPARK-25344 maybe some non-trivial work to do, as its significant
> refactoring.
>
> But can we agree on an *immediate* first step: all new python tests should
> go into their own files?  is there some reason to not do that right away?
>
> I understand that in some case, you'll want to add a test case that really
> is related to an existing test already in those giant files, and it makes
> sense for you to keep them close.   Its fine to decide on a case-by-case
> basis whether we should do the relevant refactoring for that relevant bit
> at the same or just put it in the same file.  But we should still have this
> *goal* in mind, so you should do it in the cases where its really
> independent cases.
>
> That avoid us making the problem worse till we get to SPARK-25344, and
> furthermore it will allow work on SPARK-25344 to eventually proceed without
> never ending merge conflicts with other changes that are also adding new
> tests.
>
> On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <ir...@cloudera.com> wrote:
>
>> I filed https://issues.apache.org/jira/browse/SPARK-25344
>>
>> On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> We should break it.
>>>
>>> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid
>>> <ir...@cloudera.com.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> another question from looking more at python recently.  Is there any
>>>> reason we've got a ton of tests in one humongous tests.py file, rather than
>>>> breaking it out into smaller files?
>>>>
>>>> Having one huge file doesn't seem great for code organization, and it
>>>> also makes the test parallelization in run-tests.py not work as well.  On
>>>> my laptop, tests.py takes 150s, and the next longest test file takes only
>>>> 20s.
>>>>
>>>> can we at least try to put new tests into smaller files?
>>>>
>>>> thanks,
>>>> Imran
>>>>
>>>

Re: python tests: any reason for a huge tests.py?

Posted by Imran Rashid <ir...@cloudera.com.INVALID>.
So I've had some offline discussion around this, so I'd like to clarify.
SPARK-25344 maybe some non-trivial work to do, as its significant
refactoring.

But can we agree on an *immediate* first step: all new python tests should
go into their own files?  is there some reason to not do that right away?

I understand that in some case, you'll want to add a test case that really
is related to an existing test already in those giant files, and it makes
sense for you to keep them close.   Its fine to decide on a case-by-case
basis whether we should do the relevant refactoring for that relevant bit
at the same or just put it in the same file.  But we should still have this
*goal* in mind, so you should do it in the cases where its really
independent cases.

That avoid us making the problem worse till we get to SPARK-25344, and
furthermore it will allow work on SPARK-25344 to eventually proceed without
never ending merge conflicts with other changes that are also adding new
tests.

On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <ir...@cloudera.com> wrote:

> I filed https://issues.apache.org/jira/browse/SPARK-25344
>
> On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <rx...@databricks.com> wrote:
>
>> We should break it.
>>
>> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <ir...@cloudera.com.invalid>
>> wrote:
>>
>>> Hi,
>>>
>>> another question from looking more at python recently.  Is there any
>>> reason we've got a ton of tests in one humongous tests.py file, rather than
>>> breaking it out into smaller files?
>>>
>>> Having one huge file doesn't seem great for code organization, and it
>>> also makes the test parallelization in run-tests.py not work as well.  On
>>> my laptop, tests.py takes 150s, and the next longest test file takes only
>>> 20s.
>>>
>>> can we at least try to put new tests into smaller files?
>>>
>>> thanks,
>>> Imran
>>>
>>

Re: python tests: any reason for a huge tests.py?

Posted by Imran Rashid <ir...@cloudera.com.INVALID>.
I filed https://issues.apache.org/jira/browse/SPARK-25344

On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <rx...@databricks.com> wrote:

> We should break it.
>
> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <ir...@cloudera.com.invalid>
> wrote:
>
>> Hi,
>>
>> another question from looking more at python recently.  Is there any
>> reason we've got a ton of tests in one humongous tests.py file, rather than
>> breaking it out into smaller files?
>>
>> Having one huge file doesn't seem great for code organization, and it
>> also makes the test parallelization in run-tests.py not work as well.  On
>> my laptop, tests.py takes 150s, and the next longest test file takes only
>> 20s.
>>
>> can we at least try to put new tests into smaller files?
>>
>> thanks,
>> Imran
>>
>

Re: python tests: any reason for a huge tests.py?

Posted by Reynold Xin <rx...@databricks.com>.
We should break it.

On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <ir...@cloudera.com.invalid>
wrote:

> Hi,
>
> another question from looking more at python recently.  Is there any
> reason we've got a ton of tests in one humongous tests.py file, rather than
> breaking it out into smaller files?
>
> Having one huge file doesn't seem great for code organization, and it also
> makes the test parallelization in run-tests.py not work as well.  On my
> laptop, tests.py takes 150s, and the next longest test file takes only 20s.
>
> can we at least try to put new tests into smaller files?
>
> thanks,
> Imran
>