You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Eduard Shangareev <ed...@gmail.com> on 2018/07/27 12:26:47 UTC

Splitting cache and basic TC configuration on pure in-memory and persistent one

Igniters,

I am working on the stability of our TC test runs.

Some of our execution timeouts (hangings, unexpected stops) happen because
of issues in source code: test itself, test runners, configurations, bug,
Linux OOM killer and so on.

We could fix them by changing code.

But almost all of the last issues with timeouts have happened because many
tests ran disk-intensive operations on one machine.

Examples:

https://ci.ignite.apache.org/viewLog.html?buildId=1543562&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery2
https://ci.ignite.apache.org/viewLog.html?buildId=1543518&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1

and so on.

To fix this problem I propose to extract from "Run Basic" and "Run Cache"
new
dedicated ones for persistent tests TC configurations.

Also, I would add some checking to not allow add new tests with persistent
to other TC configurations in future.

It would allow us to run almost all TC configuration on any agent while
configurations with persistent would have agent rules to not get a timeout.

Thoughts?

Re: Splitting cache and basic TC configuration on pure in-memory and persistent one

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi all,

I'm reviving this thread because it seems to me that it might be better to
go back to the combined in-memory and PDS suites now.
As usual, it is a long one, so feel free to skip to the TLDR.

The decision for the split was mostly driven by the desire to have shorter
time of suite runs.
It seems that the decision (and related change
https://issues.apache.org/jira/browse/IGNITE-9100) was quite reasonable at
the time.
But as Vladimir said, the time required by the tests with persistence was
impacted by HDD, and we do have SSDs now.

Having to split the tests is a burden on all developers. Persistence is the
core functionality, and core tests would mostly want to check it.
Forcing to have a clone of a test just to manage the execution time of a
suite seems to be an overkill.

I see a potential benefit in having a test suite that doesn't require a
fast disk, but for now it is just that - potential, while the additional
work
to split the tests is quite real.

Finally, it is hard to make sense from how exactly to split the tests,
which suites to them in, etc.
I believe it makes more difficult for new community members to join - which
is the last thing we want.

TLDR
I propose to allow adding persistence tests to any suite, remove
the PERSISTENCE_IN_TESTS_IS_ALLOWED_PROPERTY property
and related functionality added by
https://issues.apache.org/jira/browse/IGNITE-9100.
Instead of that, to manage the execution time of the suites we can take a
habit of splitting the suites that take more than an hour
as it is being done now in https://issues.apache.org/jira/browse/IGNITE-8849
.

WDYT?

Thanks,
Stan

On Fri, Jul 27, 2018 at 3:46 PM Eduard Shangareev <es...@gridgain.com>
wrote:

> Vladimir,
>
> I am talking only about Run Cache and Basic.
> I don't see any objection why we couldn't do so.
> Even with extra SSD, it could be worth to split configuration to have
> control over their impact on the disk system.
>
>
> On Fri, Jul 27, 2018 at 3:28 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Ed,
> >
> > We already discussed this some time ago. AFAIK SSD disks do not have this
> > problem, so all we need is to replace HDD with SSD.
> >
> > On Fri, Jul 27, 2018 at 3:26 PM Eduard Shangareev <
> > eduard.shangareev@gmail.com> wrote:
> >
> > > Igniters,
> > >
> > > I am working on the stability of our TC test runs.
> > >
> > > Some of our execution timeouts (hangings, unexpected stops) happen
> > because
> > > of issues in source code: test itself, test runners, configurations,
> bug,
> > > Linux OOM killer and so on.
> > >
> > > We could fix them by changing code.
> > >
> > > But almost all of the last issues with timeouts have happened because
> > many
> > > tests ran disk-intensive operations on one machine.
> > >
> > > Examples:
> > >
> > >
> > > https://ci.ignite.apache.org/viewLog.html?buildId=1543562&
> > tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery2
> > >
> > > https://ci.ignite.apache.org/viewLog.html?buildId=1543518&
> > tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1
> > >
> > > and so on.
> > >
> > > To fix this problem I propose to extract from "Run Basic" and "Run
> Cache"
> > > new
> > > dedicated ones for persistent tests TC configurations.
> > >
> > > Also, I would add some checking to not allow add new tests with
> > persistent
> > > to other TC configurations in future.
> > >
> > > It would allow us to run almost all TC configuration on any agent while
> > > configurations with persistent would have agent rules to not get a
> > timeout.
> > >
> > > Thoughts?
> > >
> >
>
>
>
> --
> Best regards,
> Eduard.
>

Re: Splitting cache and basic TC configuration on pure in-memory and persistent one

Posted by Eduard Shangareev <es...@gridgain.com>.

Vladimir,

I am talking only about Run Cache and Basic.
I don't see any objection why we couldn't do so.
Even with extra SSD, it could be worth to split configuration to have
control over their impact on the disk system.


On Fri, Jul 27, 2018 at 3:28 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Ed,
>
> We already discussed this some time ago. AFAIK SSD disks do not have this
> problem, so all we need is to replace HDD with SSD.
>
> On Fri, Jul 27, 2018 at 3:26 PM Eduard Shangareev <
> eduard.shangareev@gmail.com> wrote:
>
> > Igniters,
> >
> > I am working on the stability of our TC test runs.
> >
> > Some of our execution timeouts (hangings, unexpected stops) happen
> because
> > of issues in source code: test itself, test runners, configurations, bug,
> > Linux OOM killer and so on.
> >
> > We could fix them by changing code.
> >
> > But almost all of the last issues with timeouts have happened because
> many
> > tests ran disk-intensive operations on one machine.
> >
> > Examples:
> >
> >
> > https://ci.ignite.apache.org/viewLog.html?buildId=1543562&
> tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery2
> >
> > https://ci.ignite.apache.org/viewLog.html?buildId=1543518&
> tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1
> >
> > and so on.
> >
> > To fix this problem I propose to extract from "Run Basic" and "Run Cache"
> > new
> > dedicated ones for persistent tests TC configurations.
> >
> > Also, I would add some checking to not allow add new tests with
> persistent
> > to other TC configurations in future.
> >
> > It would allow us to run almost all TC configuration on any agent while
> > configurations with persistent would have agent rules to not get a
> timeout.
> >
> > Thoughts?
> >
>



-- 
Best regards,
Eduard.

Re: Splitting cache and basic TC configuration on pure in-memory and persistent one

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Ed,

We already discussed this some time ago. AFAIK SSD disks do not have this
problem, so all we need is to replace HDD with SSD.

On Fri, Jul 27, 2018 at 3:26 PM Eduard Shangareev <
eduard.shangareev@gmail.com> wrote:

> Igniters,
>
> I am working on the stability of our TC test runs.
>
> Some of our execution timeouts (hangings, unexpected stops) happen because
> of issues in source code: test itself, test runners, configurations, bug,
> Linux OOM killer and so on.
>
> We could fix them by changing code.
>
> But almost all of the last issues with timeouts have happened because many
> tests ran disk-intensive operations on one machine.
>
> Examples:
>
>
> https://ci.ignite.apache.org/viewLog.html?buildId=1543562&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery2
>
> https://ci.ignite.apache.org/viewLog.html?buildId=1543518&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic1
>
> and so on.
>
> To fix this problem I propose to extract from "Run Basic" and "Run Cache"
> new
> dedicated ones for persistent tests TC configurations.
>
> Also, I would add some checking to not allow add new tests with persistent
> to other TC configurations in future.
>
> It would allow us to run almost all TC configuration on any agent while
> configurations with persistent would have agent rules to not get a timeout.
>
> Thoughts?
>