You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Nikolay Izhikov <ni...@apache.org> on 2020/05/21 07:27:00 UTC

[DISCUSSION] Ignite integration testing framework.

Hello, Igniters.

I created a PoC [1] for the integration tests of Ignite.

Let me briefly explain the gap I want to cover:

1. For now, we don’t have a solution for automated testing of Ignite on «real cluster».
By «real cluster» I mean cluster «like a production»:
* client and server nodes deployed on different hosts.
* thin clients perform queries from some other hosts
* etc.

2. We don’t have a solution for automated benchmarks of some internal Ignite process
* PME
* rebalance.
This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster or slower than in 2.8.0 for the same cluster?

3. We don’t have a solution for automated testing of Ignite integration in a real-world environment:
Ignite-Spark integration can be taken as an example.
I think some ML solutions also should be tested in real-world deployments.

Solution:

I propose to use duck tape library from confluent (apache 2.0 license)
I tested it both on the real cluster(Yandex Cloud) and on the local environment(docker) and it works just fine.

PoC contains following services:

* Simple rebalance test:
Start 2 server nodes,
Create some data with Ignite client,
Start one more server node,
Wait for rebalance finish
* Simple Ignite-Spark integration test:
Start 1 Spark master, start 1 Spark worker,
Start 1 Ignite server node
Create some data with Ignite client,
Check data in application that queries it from Spark.

All tests are fully automated.
Logs collection works just fine.
You can see an example of the tests report - [4].

Pros:

* Ability to test local changes(no need to public changes to some remote repository or similar).
* Ability to parametrize test environment(run the same tests on different JDK, JVM params, config, etc.)
* Isolation by default so system tests are as reliable as possible.
* Utilities for pulling up and tearing down services easily in clusters in different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos, Docker, cloud providers, etc.)
* Easy to write unit tests for distributed systems
* Adopted and successfully used by other distributed open source project - Apache Kafka.
* Collect results (e.g. logs, console output)
* Report results (e.g. expected conditions met, performance results, etc.)

WDYT?

[1] https://github.com/nizhikov/ignite/pull/15
[2] https://github.com/confluentinc/ducktape
[3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
[4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

Discussed privately with Max.
Discussion results provided at the slack channel [1].

[1] https://the-asf.slack.com/archives/C016F4PS8KV/p1595336751234500

On Wed, Jul 15, 2020 at 3:59 PM Max Shonichev <ms...@yandex.ru> wrote:

> Anton, Nikolay,
>
> I want to share some more findings about ducktests I've stubmled upon
> during porting them to Tiden.
>
>
> First problem was that GridGain Tiden-based tests by default use real
> production-like configuration for Ignite nodes, notably:
>
>   - persitence enabled
>   - ~120 caches in ~40 groups
>   - data set size around 1M keys per each cache
>   - primitive and PoJo cache values
>   - extensive use of query entities (indices)
>
> When I've tried to run 4 nodes with such configuration in docker, my
> notebook nearly burns. Nevertheless, grid was starting and working OK,
> but for one little 'but': each successive version under test was
> starting slower and slower.
>
> The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and your
> fork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes within
> default 120 seconds timeout. In order to mimick behavior of your tests I
> had to turn off persistence and use only 1 cache too.
>
> It's a pity that you completely ignore persistence and indices in your
> ducktests, otherwise you would quickly stuck into same limitation.
>
> I hope in the nearest time I would adopt Tiden docker PoC to our
> TeamCity and we'll try to git-bisect in order to find where this
> slowdown comes from. After that I'll file a bug to IGNITE Jira.
>
>
>
> Another problem with your rebalance benchmark is it's low accuracy due
> to granularity of measurements.
>
> You don't actually measure rebalance time, you measure time that takes
> you to find a specific string in logs, that's confusing.
>
> The scenario of your test is as follows:
>
> 1. start 3 server nodes
> 2. start 1 data loading client, preload a data, stop client
> 3. start 1 more server node
> 4. wait till server joins topology
> 5. wait till this server node completes exchange and write
> 'rebalanced=true, wasRebalanced=false' message to log
> 6. report time was taken by step 5 as 'Rebalance time'
>
> Confusing thing here is that 'wait till' implementation - you actually
> continuously re-scan logs sleeping each second and wait till message
> appear. So, that means that rebalance time is at least of second
> granularity or even higher, though it is reported with nanosecond
> precision.
>
> But for such lightweight configuration (single in-memory cache) and such
> small set of data (1M keys only), rebalancing is very fast, and usually
> performs under 1 second or just slightly slower.
>
> Before waiting for rebalance message you first wait for topology message
> and that wait also takes time to execute.
>
> So, at the time Python part of the test performs first scan of the logs,
> rebalancing is in most cases already done and time you report as
> '0.0760810375213623' is actually the time to execute logs scanning code.
>
> However, if rebalancing perform just a little bit slower after topology
> update, then first scan of logs is failed, you sleep for whole one
> second and rescan logs and there you got your message and report it as
> '1.02205491065979'.
>
> Under different conditions, dockerized application may run a little
> slower or a little faster, that depends on overall system load, free
> memory, etc. I've tried to increase load on my laptop by running browser
> or maven build, and time to scan logs may fluctuate from 0.02 to 0.09 or
> even 1.02 seconds. Note, that in CI environment, high system load from
> tenants is a quite ordinary situation.
>
> Suppose we adopted rebalance improvements and all versions after 2.9.0
> would perform within 1 second just as 2.9.0 itself. Then your benchmark
> would either report false negative (e.g. 0.02 for master and 0.03 for
> PR), while actually on next re-run it would pass (e.g. 0.07 for master
> and 0.03 for PR). That's not quite the 'stable and non-flacky' test
> Ignite community wants.
>
> What suggestions do you have to improve benchmark measurement accuracy?
>
>
> A third question is about PME free switch benchmark. Under some
> conditions, LongTxStreamerApplication actually hangs up PME. It need to
> be investigated further, but either this was due to persistence enabled
> or due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false
>
> Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option?
> Also, have you had performed a test of PME free switch with
> persistence-enabled caches?
>
>
> On 09.07.2020 10:11, Max Shonichev wrote:
> > Anton,
> >
> > well, strange thing, but clean up and rerun helped.
> >
> >
> > Ubuntu 18.04
> >
> >
> ====================================================================================================
>
> >
> > SESSION REPORT (ALL TESTS)
> > ducktape version: 0.7.7
> > session_id:       2020-07-06--003
> > run time:         4 minutes 44.835 seconds
> > tests run:        5
> > passed:           5
> > failed:           0
> > ignored:          0
> >
> ====================================================================================================
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >
> > status:     PASS
> > run time:   41.927 seconds
> > {"Rebalanced in (sec)": 1.02205491065979}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
>
> >
> > status:     PASS
> > run time:   51.985 seconds
> > {"Rebalanced in (sec)": 0.0760810375213623}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >
> > status:     PASS
> > run time:   1 minute 4.283 seconds
> > {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst
> > latency (ms)": "31035"}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 13.089 seconds
> > {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst
> > latency (ms)": "139"}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
>
> >
> > status:     PASS
> > run time:   53.332 seconds
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> >
> >
> > MacBook
> >
> ================================================================================
>
> >
> > SESSION REPORT (ALL TESTS)
> > ducktape version: 0.7.7
> > session_id:       2020-07-06--001
> > run time:         6 minutes 58.612 seconds
> > tests run:        5
> > passed:           5
> > failed:           0
> > ignored:          0
> >
> ================================================================================
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >
> > status:     PASS
> > run time:   48.724 seconds
> > {"Rebalanced in (sec)": 3.2574470043182373}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 23.210 seconds
> > {"Rebalanced in (sec)": 2.165921211242676}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >
> > status:     PASS
> > run time:   1 minute 12.659 seconds
> > {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency
> > (ms)": "31063"}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 57.257 seconds
> > {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst
> > latency (ms)": "1010"}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
>
> >
> > status:     PASS
> > run time:   1 minute 36.317 seconds
> >
> > =============
> >
> > while relative numbers proportion remains the same for different Ignite
> > versions, absolute number for mac/linux differ more then twice.
> >
> > I'm finalizing code with 'local Tiden' appliance for your tests.  PR
> > would be ready soon.
> >
> > Have you had a chance to deploy ducktests in bare metal?
> >
> >
> >
> > On 06.07.2020 14:27, Anton Vinogradov wrote:
> >> Max,
> >>
> >> Thanks for the check!
> >>
> >>> Is it OK for those tests to fail?
> >> No.
> >> I see really strange things at logs.
> >> Looks like you have concurrent ducktests run started not expected
> >> services,
> >> and this broke the tests.
> >> Could you please clean up the docker (use clean-up script [1]).
> >> Compile sources (use script [2]) and rerun the tests.
> >>
> >> [1]
> >>
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> >>
> >> [2]
> >>
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
> >>
> >>
> >> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org>
> >> wrote:
> >>
> >>> Hello, Maxim.
> >>>
> >>> Thanks for writing down the minutes.
> >>>
> >>> There is no such thing as «Nikolay team» on the dev-list.
> >>> I propose to focus on product requirements and what we want to gain
> from
> >>> the framework instead of taking into account the needs of some team.
> >>>
> >>> Can you, please, write down your version of requirements so we can
> >>> reach a
> >>> consensus on that and therefore move to the discussion of the
> >>> implementation?
> >>>
> >>>> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru>
> написал(а):
> >>>>
> >>>> Yes, Denis,
> >>>>
> >>>> common ground seems to be as follows:
> >>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> >>> over physical hosts and share benchmark results. In the meantime,
> >>> while I
> >>> strongly believe that dockerized approach to benchmarking is a road to
> >>> misleading and false positives, I'll prepare a PoC of Tiden in
> >>> dockerized
> >>> environment to support 'fast development prototyping' usecase Nikolay
> >>> team
> >>> insist on. It should be a matter of few days.
> >>>>
> >>>> As a side note, I've run Anton PoC locally and would like to have some
> >>> comments about results:
> >>>>
> >>>> Test system: Ubuntu 18.04, docker 19.03.6
> >>>> Test commands:
> >>>>
> >>>>
> >>>> git clone -b ignite-ducktape git@github.com:
> anton-vinogradov/ignite.git
> >>>> cd ignite
> >>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> >>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> >>>> cd modules/ducktests/tests/docker
> >>>> ./run_tests.sh
> >>>>
> >>>> Test results:
> >>>>
> >>>
> ====================================================================================================
>
> >>>
> >>>> SESSION REPORT (ALL TESTS)
> >>>> ducktape version: 0.7.7
> >>>> session_id:       2020-07-05--004
> >>>> run time:         7 minutes 36.360 seconds
> >>>> tests run:        5
> >>>> passed:           3
> >>>> failed:           2
> >>>> ignored:          0
> >>>>
> >>>
> ====================================================================================================
>
> >>>
> >>>> test_id:
> >>>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >>>
> >>>> status:     FAIL
> >>>> run time:   3 minutes 12.232 seconds
> >>>>
> >>>
> ----------------------------------------------------------------------------------------------------
>
> >>>
> >>>> test_id:
> >>>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >>>
> >>>> status:     FAIL
> >>>> run time:   1 minute 33.076 seconds
> >>>>
> >>>>
> >>>> Is it OK for those tests to fail? Attached is full test report
> >>>>
> >>>>
> >>>> On 02.07.2020 17:46, Denis Magda wrote:
> >>>>> Folks,
> >>>>> Please share the summary of that Slack conversation here for records
> >>> once
> >>>>> you find common ground.
> >>>>> -
> >>>>> Denis
> >>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
> >>> wrote:
> >>>>>> Igniters.
> >>>>>>
> >>>>>> All who are interested in integration testing framework discussion
> >>>>>> are
> >>>>>> welcome into slack channel -
> >>>>>>
> >>>
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> >>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org>
> >>>>>>> написал(а):
> >>>>>>>
> >>>>>>> Max,
> >>>>>>> Thanks for joining us.
> >>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>>>> dependencies being deployed by external scripts.
> >>>>>>> No. It is important to distinguish development, deploy, and
> >>>>>> orchestration.
> >>>>>>> All-in-one solutions have extremely limited usability.
> >>>>>>> As to Ducktests:
> >>>>>>> Docker is responsible for deployments during development.
> >>>>>>> CI/CD is responsible for deployments during release and nightly
> >>> checks.
> >>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> >>>>>>> Ducktape is responsible for orchestration.
> >>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion,
> >>>>>>>> while ducktape internally does all actions sequentially.
> >>>>>>> No. Ducktape may start any service in parallel. See Pme-free
> >>>>>>> benchmark
> >>>>>> [1] for details.
> >>>>>>>
> >>>>>>>> if we used ducktape solution we would have to instead prepare some
> >>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>> with
> >>>>>>>> Ansible or Chef.
> >>>>>>> Sure, because a way of deploy depends on infrastructure.
> >>>>>>> How can we be sure that OS we use and the restrictions we have
> >>>>>>> will be
> >>>>>> compatible with Tiden?
> >>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>> dependencies
> >>>>>>>> into one uber-image ...
> >>>>>>> and
> >>>>>>>> I guess we all know about docker hyped ability to run over
> >>> distributed
> >>>>>>>> virtual networks.
> >>>>>>> It is very important not to confuse the test's development (docker
> >>> image
> >>>>>> you're talking about) and real deployment.
> >>>>>>>
> >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> >>>>>>> All actions can be performed in parallel.
> >>>>>>> See how Ducktests [2] starts cluster in parallel for example.
> >>>>>>>
> >>>>>>> [1]
> >>>>>>
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> >>>
> >>>>>>> [2]
> >>>>>>
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >>>
> >>>>>>>
> >>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <
> nizhikov@apache.org>
> >>>>>> wrote:
> >>>>>>> Hello, Maxim.
> >>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>> dependencies being deployed by external scripts
> >>>>>>>
> >>>>>>> Why do you think that maintaining deploy scripts coupled with the
> >>>>>> testing framework is an advantage?
> >>>>>>> I thought we want to see and maintain deployment scripts separate
> >>>>>>> from
> >>>>>> the testing framework.
> >>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>
> >>>>>>> Can you, please, clarify, what actions do you have in mind?
> >>>>>>> And why we want to execute them concurrently?
> >>>>>>> Ignite node start, Client application execution can be done
> >>> concurrently
> >>>>>> with the ducktape approach.
> >>>>>>>
> >>>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>>>>> with
> >>>>>> Ansible or Chef
> >>>>>>>
> >>>>>>> We shouldn’t take some user approach as an argument in this
> >>> discussion.
> >>>>>> Let’s discuss a general approach for all users of the Ignite.
> Anyway,
> >>> what
> >>>>>> is wrong with the external deployment script approach?
> >>>>>>>
> >>>>>>> We, as a community, should provide several ways to run integration
> >>> tests
> >>>>>> out-of-the-box AND the ability to customize deployment regarding the
> >>> user
> >>>>>> landscape.
> >>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>>>>> dependencies into one uber-image and that looks like simple and
> >>>>>> elegant
> >>>>>> solution however, that effectively limits you to single-host
> testing.
> >>>>>>>
> >>>>>>> Docker image should be used only by the Ignite developers to test
> >>>>>> something locally.
> >>>>>>> It’s not intended for some real-world testing.
> >>>>>>>
> >>>>>>> The main issue with the Tiden that I see, it tested and
> >>>>>>> maintained as
> >>> a
> >>>>>> closed source solution.
> >>>>>>> This can lead to the hard to solve problems when we start using and
> >>>>>> maintaining it as an open-source solution.
> >>>>>>> Like, how many developers used Tiden? And how many of developers
> >>>>>>> were
> >>>>>> not authors of the Tiden itself?
> >>>>>>>
> >>>>>>>
> >>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
> >>>>>> написал(а):
> >>>>>>>>
> >>>>>>>> Anton, Nikolay,
> >>>>>>>>
> >>>>>>>> Let's agree on what we are arguing about: whether it is about
> "like
> >>> or
> >>>>>> don't like" or about technical properties of suggested solutions.
> >>>>>>>>
> >>>>>>>> If it is about likes and dislikes, then the whole discussion is
> >>>>>> meaningless. However, I hope together we can analyse pros and cons
> >>>>>> carefully.
> >>>>>>>>
> >>>>>>>> As far as I can understand now, two main differences between
> >>>>>>>> ducktape
> >>>>>> and tiden is that:
> >>>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>> dependencies being deployed by external scripts.
> >>>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>>
> >>>>>>>> As for me, these are very important properties for distributed
> >>> testing
> >>>>>> framework.
> >>>>>>>>
> >>>>>>>> First property let us easily reuse tiden in existing
> >>>>>>>> infrastructures,
> >>>>>> for example, during Zookeeper IEP testing at Sberbank site we used
> >>>>>> the
> >>> same
> >>>>>> tiden scripts that we use in our lab, the only change was putting a
> >>> list of
> >>>>>> hosts into config.
> >>>>>>>>
> >>>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>>>>> with
> >>>>>> Ansible or Chef.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>>>>> dependencies into one uber-image and that looks like simple and
> >>>>>> elegant
> >>>>>> solution,
> >>>>>>>> however, that effectively limits you to single-host testing.
> >>>>>>>>
> >>>>>>>> I guess we all know about docker hyped ability to run over
> >>> distributed
> >>>>>> virtual networks. We used to go that way, but quickly found that
> >>>>>> it is
> >>> more
> >>>>>> of the hype than real work. In real environments, there are problems
> >>> with
> >>>>>> routing, DNS, multicast and broadcast traffic, and many others, that
> >>> turn
> >>>>>> docker-based distributed solution into a fragile hard-to-maintain
> >>> monster.
> >>>>>>>>
> >>>>>>>> Please, if you believe otherwise, perform a run of your PoC over
> at
> >>>>>> least two physical hosts and share results with us.
> >>>>>>>>
> >>>>>>>> If you consider that one physical docker host is enough, please,
> >>> don't
> >>>>>> overlook that we want to run real scale scenarios, with 50-100 cache
> >>>>>> groups, persistence enabled and a millions of keys loaded.
> >>>>>>>>
> >>>>>>>> Practical limit for such configurations is 4-6 nodes per single
> >>>>>> physical host. Otherwise, tests become flaky due to resource
> >>> starvation.
> >>>>>>>>
> >>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> >>>>>> your PoC with other tests running at TC (we're targeting TeamCity,
> >>> right?)
> >>>>>> and share results so we could check if the numbers are reproducible.
> >>>>>>>>
> >>>>>>>> I stress this once more: functional integration tests are OK to
> run
> >>> in
> >>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Second property let us write tests that require real-parallel
> >>>>>>>> actions
> >>>>>> over hosts.
> >>>>>>>>
> >>>>>>>> For example, agreed scenario for PME benchmarkduring "PME
> >>> optimization
> >>>>>> stream" was as follows:
> >>>>>>>>
> >>>>>>>>   - 10 server nodes, preloaded with 1M of keys
> >>>>>>>>   - 4 client nodes perform transactional load  (client nodes
> >>> physically
> >>>>>> separated from server nodes)
> >>>>>>>>   - during load:
> >>>>>>>>   -- 5 server nodes stopped in parallel
> >>>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
> >>>>>>>>   - load stopped, logs are analysed for exchange times.
> >>>>>>>>
> >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape
> does,
> >>>>>> then partition map exchange merge would not happen and we could not
> >>> have
> >>>>>> measured PME optimizations for that case.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> These are limitations of ducktape that we believe as a more
> >>>>>>>> important
> >>>>>>>> argument "against" than you provide "for".
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> >>>>>>>>> Folks,
> >>>>>>>>> First, I've created PR [1] with ducktests improvements
> >>>>>>>>> PR contains the following changes
> >>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> >>> 2.8)
> >>>>>>>>> - Global refactoring
> >>>>>>>>> -- benchmarks javacode simplification
> >>>>>>>>> -- services python and java classes code deduplication
> >>>>>>>>> -- fail-fast checks for java and python (eg. application should
> >>>>>> explicitly write it finished with success)
> >>>>>>>>> -- simple results extraction from tests and benchmarks
> >>>>>>>>> -- javacode now configurable from tests/benchmarks
> >>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> >>>>>> operation and log results)
> >>>>>>>>> -- docker volume now marked as delegated to increase execution
> >>>>>>>>> speed
> >>>>>> for mac & win users
> >>>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
> >>>>>>>>> -- Ignite can be configured at test/benchmark
> >>>>>>>>> - full and module assembly scripts added
> >>>>>>>> Great job done! But let me remind one of Apache Ignite principles:
> >>>>>>>> week of thinking save months of development.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> >>>>>> integration) as a target "PoC check & real topology benchmarking
> >>>>>> tool".
> >>>>>>>>> Ducktape pros
> >>>>>>>>> - Developed for distributed system by distributed system
> >>>>>>>>> developers.
> >>>>>>>> So does Tiden
> >>>>>>>>
> >>>>>>>>> - Developed since 2014, stable.
> >>>>>>>> Tiden is also pretty stable, and development start date is not a
> >>>>>>>> good
> >>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> >>>>>> distributed testing) is since 2010, but we don't see it as a
> >>> alternative at
> >>>>>> all.
> >>>>>>>>
> >>>>>>>>> - Proven usability by usage at Kafka.
> >>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank
> >>>>>>>> deployments.
> >>>>>>>> Core, storage, sql and tx teams use benchmark results provided by
> >>>>>> Tiden on a daily basis.
> >>>>>>>>
> >>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great
> >>>>>>>>> example
> >>>>>> pack.
> >>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
> >>>>>> previous letter.
> >>>>>>>>
> >>>>>>>>> - Built-in Docker support for rapid development and checks.
> >>>>>>>> False, there's no specific 'docker support' in ducktape itself,
> you
> >>>>>> just wrap it in docker by yourself, because ducktape is lacking
> >>> deployment
> >>>>>> abilities.
> >>>>>>>>
> >>>>>>>>> - Great for CI automation.
> >>>>>>>> False, there's no specific CI-enabled features in ducktape.
> >>>>>>>> Tiden, on
> >>>>>> the other hand, provide generic xUnit reporting format, which is
> >>> supported
> >>>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
> >>> Tiden
> >>>>>> can use SSH agent, which is also great for CI, because both
> >>>>>>>> TeamCity and Jenkins store keys in secret storage available only
> >>>>>>>> for
> >>>>>> ssh-agent and only for the time of the test.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> As an additional motivation, at least 3 teams
> >>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> >>> Zabbix
> >>>>>> speed-up))
> >>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
> >>>>>> slow-down or broke AI features)
> >>>>>>>>> - Ignite SE QA team (to append already developed
> >>>>>>>>> smoke/load/failover
> >>>>>> tests to AI codebase)
> >>>>>>>>
> >>>>>>>> Please, before recommending your tests to other teams, provide
> >>>>>>>> proofs
> >>>>>>>> that your tests are reproducible in real environment.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> now, wait for ducktest merge to start checking cases they
> >>>>>>>>> working on
> >>>>>> in AI way.
> >>>>>>>>> Thoughts?
> >>>>>>>> Let us together review both solutions, we'll try to run your
> >>>>>>>> tests in
> >>>>>> our lab, and you'll try to at least checkout tiden and see if same
> >>> tests
> >>>>>> can be implemented with it?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> [1] https://github.com/apache/ignite/pull/7967
> >>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> >>> nizhikov@apache.org
> >>>>>> <ma...@apache.org>> wrote:
> >>>>>>>>>     Hello, Maxim.
> >>>>>>>>>     Thank you for so detailed explanation.
> >>>>>>>>>     Can we put the content of this discussion somewhere on the
> >>>>>>>>> wiki?
> >>>>>>>>>     So It doesn’t get lost.
> >>>>>>>>>     I divide the answer in several parts. From the requirements
> to
> >>> the
> >>>>>>>>>     implementation.
> >>>>>>>>>     So, if we agreed on the requirements we can proceed with the
> >>>>>>>>>     discussion of the implementation.
> >>>>>>>>>     1. Requirements:
> >>>>>>>>>     The main goal I want to achieve is *reproducibility* of the
> >>> tests.
> >>>>>>>>>     I’m sick and tired with the zillions of flaky, rarely
> >>>>>>>>> failed, and
> >>>>>>>>>     almost never failed tests in Ignite codebase.
> >>>>>>>>>     We should start with the simplest scenarios that will be as
> >>>>>> reliable
> >>>>>>>>>     as steel :)
> >>>>>>>>>     I want to know for sure:
> >>>>>>>>>        - Is this PR makes rebalance quicker or not?
> >>>>>>>>>        - Is this PR makes PME quicker or not?
> >>>>>>>>>     So, your description of the complex test scenario looks as
> >>>>>>>>> a next
> >>>>>>>>>     step to me.
> >>>>>>>>>     Anyway, It’s cool we already have one.
> >>>>>>>>>     The second goal is to have a strict test lifecycle as we
> >>>>>>>>> have in
> >>>>>>>>>     JUnit and similar frameworks.
> >>>>>>>>>      > It covers production-like deployment and running a
> >>>>>>>>> scenarios
> >>>>>> over
> >>>>>>>>>     a single database instance.
> >>>>>>>>>     Do you mean «single cluster» or «single host»?
> >>>>>>>>>     2. Existing tests:
> >>>>>>>>>      > A Combinator suite allows to run set of operations
> >>> concurrently
> >>>>>>>>>     over given database instance.
> >>>>>>>>>      > A Consumption suite allows to run a set production-like
> >>> actions
> >>>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >>> metrics
> >>>>>>>>>     across versions
> >>>>>>>>>      > A Yardstick suite
> >>>>>>>>>      > A Stress suite that simulates hardware environment
> >>>>>>>>> degradation
> >>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>>> functional
> >>>>>>>>>     regression testing
> >>>>>>>>>      > Regression
> >>>>>>>>>     Great news that we already have so many choices for testing!
> >>>>>>>>>     Mature test base is a big +1 for Tiden.
> >>>>>>>>>     3. Comparison:
> >>>>>>>>>      > Criteria: Test configuration
> >>>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>>      > Tiden: any number of YaML config files, command line
> option
> >>> for
> >>>>>>>>>     fine-grained test configuration, ability to select/modify
> >>>>>>>>> tests
> >>>>>>>>>     behavior based on Ignite version.
> >>>>>>>>>     1. Many YAML files can be hard to maintain.
> >>>>>>>>>     2. In ducktape, you can set parameters via «—parameters»
> >>>>>>>>> option.
> >>>>>>>>>     Please, take a look at the doc [1]
> >>>>>>>>>      > Criteria: Cluster control
> >>>>>>>>>      > Tiden: additionally can address cluster as a whole and
> >>>>>>>>> execute
> >>>>>>>>>     remote commands in parallel.
> >>>>>>>>>     It seems we implement this ability in the PoC, already.
> >>>>>>>>>      > Criteria: Test assertions
> >>>>>>>>>      > Tiden: simple asserts, also few customized assertion
> >>>>>>>>> helpers.
> >>>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>>     Can you, please, be more specific.
> >>>>>>>>>     What helpers do you have in mind?
> >>>>>>>>>     Ducktape has an asserts that waits for logfile messages or
> >>>>>>>>> some
> >>>>>>>>>     process finish.
> >>>>>>>>>      > Criteria: Test reporting
> >>>>>>>>>      > Ducktape: limited to its own text/HTML format
> >>>>>>>>>     Ducktape have
> >>>>>>>>>     1. Text reporter
> >>>>>>>>>     2. Customizable HTML reporter
> >>>>>>>>>     3. JSON reporter.
> >>>>>>>>>     We can show JSON with the any template or tool.
> >>>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> >>>>>>>>> test
> >>>>>>>>>     needs. However, that means, that test can’t be scaled without
> >>> test
> >>>>>>>>>     code changes. Does not do any deploy, relies on external
> >>>>>>>>> means,
> >>>>>> e.g.
> >>>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>>     This is not true.
> >>>>>>>>>     1. We can set explicit test parameters(node number) via
> >>> parameters.
> >>>>>>>>>     We can increase client count of cluster size without test
> code
> >>>>>> changes.
> >>>>>>>>>     2. We have many choices for the test environment. These
> >>>>>>>>> choices
> >>> are
> >>>>>>>>>     tested and used in other projects:
> >>>>>>>>>              * docker
> >>>>>>>>>              * vagrant
> >>>>>>>>>              * private cloud(ssh access)
> >>>>>>>>>              * ec2
> >>>>>>>>>     Please, take a look at Kafka documentation [2]
> >>>>>>>>>      > I can continue more on this, but it should be enough for
> >>>>>>>>> now:
> >>>>>>>>>     We need to go deeper! :)
> >>>>>>>>>     [1]
> >>>>>>>>>
> >>>>>>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>>>>>>>>     [2]
> >>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev
> >>>>>>>>> <mshonich@yandex.ru
> >>>>>>>>>     <ma...@yandex.ru>> написал(а):
> >>>>>>>>>      >
> >>>>>>>>>      > Greetings, Nikolay,
> >>>>>>>>>      >
> >>>>>>>>>      > First of all, thank you for you great effort preparing
> >>>>>>>>> PoC of
> >>>>>>>>>     integration testing to Ignite community.
> >>>>>>>>>      >
> >>>>>>>>>      > It’s a shame Ignite did not have at least some such
> >>>>>>>>> tests yet,
> >>>>>>>>>     however, GridGain, as a major contributor to Apache Ignite
> >>>>>>>>> had a
> >>>>>>>>>     profound collection of in-house tools to perform
> >>>>>>>>> integration and
> >>>>>>>>>     performance testing for years already and while we slowly
> >>> consider
> >>>>>>>>>     sharing our expertise with the community, your initiative
> >>>>>>>>> makes
> >>> us
> >>>>>>>>>     drive that process a bit faster, thanks a lot!
> >>>>>>>>>      >
> >>>>>>>>>      > I reviewed your PoC and want to share a little about
> >>>>>>>>> what we
> >>> do
> >>>>>>>>>     on our part, why and how, hope it would help community take
> >>> proper
> >>>>>>>>>     course.
> >>>>>>>>>      >
> >>>>>>>>>      > First I’ll do a brief overview of what decisions we made
> >>>>>>>>> and
> >>>>>> what
> >>>>>>>>>     we do have in our private code base, next I’ll describe
> >>>>>>>>> what we
> >>>>>> have
> >>>>>>>>>     already donated to the public and what we plan public next,
> >>>>>>>>> then
> >>>>>>>>>     I’ll compare both approaches highlighting deficiencies in
> >>>>>>>>> order
> >>> to
> >>>>>>>>>     spur public discussion on the matter.
> >>>>>>>>>      >
> >>>>>>>>>      > It might seem strange to use Python to run Bash to run
> Java
> >>>>>>>>>     applications because that introduces IT industry best of
> >>>>>>>>> breed’ –
> >>>>>>>>>     the Python dependency hell – to the Java application code
> >>>>>>>>> base.
> >>> The
> >>>>>>>>>     only strangest decision one can made is to use Maven to run
> >>> Docker
> >>>>>>>>>     to run Bash to run Python to run Bash to run Java, but
> >>>>>>>>> desperate
> >>>>>>>>>     times call for desperate measures I guess.
> >>>>>>>>>      >
> >>>>>>>>>      > There are Java-based solutions for integration testing
> >>>>>>>>> exists,
> >>>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they
> >>>>>>>>> might go
> >>>>>> well
> >>>>>>>>>     for Ignite community CI pipelines by them selves. But we also
> >>>>>> wanted
> >>>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
> >>>>>>>>>     benchmark, and this is solved by totally different set of
> >>>>>>>>> tools
> >>> in
> >>>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>>>>>>>>      >
> >>>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
> >>>>>> community
> >>>>>>>>>     already has Yardstick [6], and there’s nothing wrong with
> >>>>>>>>> writing
> >>>>>>>>>     PME benchmark using Yardstick, but we also wanted to be
> >>>>>>>>> able to
> >>> run
> >>>>>>>>>     scenarios like this:
> >>>>>>>>>      > - put an X load to a Ignite database;
> >>>>>>>>>      > - perform an Y set of operations to check how Ignite copes
> >>> with
> >>>>>>>>>     operations under load.
> >>>>>>>>>      >
> >>>>>>>>>      > And yes, we also wanted applications under test be
> deployed
> >>>>>> ‘like
> >>>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
> >>> arises
> >>>>>>>>>     questions about provisioning and nodes affinity which I’ll
> >>>>>>>>> cover
> >>> in
> >>>>>>>>>     detail later.
> >>>>>>>>>      >
> >>>>>>>>>      > So we decided to put a little effort to build a simple
> >>>>>>>>> tool to
> >>>>>>>>>     cover different integration and performance scenarios, and
> >>>>>>>>> our QA
> >>>>>>>>>     lab first attempt was PoC-Tester [7], currently open source
> >>>>>>>>> for
> >>> all
> >>>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
> >>> Java-based
> >>>>>>>>>     tool targeted to be run on a pre-release QA stage.
> >>>>>>>>>      >
> >>>>>>>>>      > It covers production-like deployment and running a
> >>>>>>>>> scenarios
> >>>>>> over
> >>>>>>>>>     a single database instance. PoC-Tester scenarios consists of
> a
> >>>>>>>>>     sequence of tasks running sequentially or in parallel.
> >>>>>>>>> After all
> >>>>>>>>>     tasks complete, or at any time during test, user can run logs
> >>>>>>>>>     collection task, logs are checked against exceptions and a
> >>> summary
> >>>>>>>>>     of found issues and task ops/latency statistics is
> >>>>>>>>> generated at
> >>> the
> >>>>>>>>>     end of scenario. One of the main PoC-Tester features is its
> >>>>>>>>>     fire-and-forget approach to task managing. That is, you can
> >>> deploy
> >>>>>> a
> >>>>>>>>>     grid and left it running for weeks, periodically firing some
> >>> tasks
> >>>>>>>>>     onto it.
> >>>>>>>>>      >
> >>>>>>>>>      > During earliest stages of PoC-Tester development it
> becomes
> >>>>>> quite
> >>>>>>>>>     clear that Java application development is a tedious
> >>>>>>>>> process and
> >>>>>>>>>     architecture decisions you take during development are slow
> >>>>>>>>> and
> >>>>>> hard
> >>>>>>>>>     to change.
> >>>>>>>>>      > For example, scenarios like this
> >>>>>>>>>      > - deploy two instances of GridGain with master-slave data
> >>>>>>>>>     replication configured;
> >>>>>>>>>      > - put a load on master;
> >>>>>>>>>      > - perform checks on slave,
> >>>>>>>>>      > or like this:
> >>>>>>>>>      > - preload a 1Tb of data by using your favorite tool of
> >>>>>>>>> choice
> >>> to
> >>>>>>>>>     an Apache Ignite of version X;
> >>>>>>>>>      > - run a set of functional tests running Apache Ignite
> >>>>>>>>> version
> >>> Y
> >>>>>>>>>     over preloaded data,
> >>>>>>>>>      > do not fit well in the PoC-Tester workflow.
> >>>>>>>>>      >
> >>>>>>>>>      > So, this is why we decided to use Python as a generic
> >>> scripting
> >>>>>>>>>     language of choice.
> >>>>>>>>>      >
> >>>>>>>>>      > Pros:
> >>>>>>>>>      > - quicker prototyping and development cycles
> >>>>>>>>>      > - easier to find DevOps/QA engineer with Python skills
> than
> >>> one
> >>>>>>>>>     with Java skills
> >>>>>>>>>      > - used extensively all over the world for DevOps/CI
> >>>>>>>>> pipelines
> >>>>>> and
> >>>>>>>>>     thus has rich set of libraries for all possible integration
> >>>>>>>>> uses
> >>>>>> cases.
> >>>>>>>>>      >
> >>>>>>>>>      > Cons:
> >>>>>>>>>      > - Nightmare with dependencies. Better stick to specific
> >>>>>>>>>     language/libraries version.
> >>>>>>>>>      >
> >>>>>>>>>      > Comparing alternatives for Python-based testing
> >>>>>>>>> framework we
> >>>>>> have
> >>>>>>>>>     considered following requirements, somewhat similar to what
> >>> you’ve
> >>>>>>>>>     mentioned for Confluent [8] previously:
> >>>>>>>>>      > - should be able run locally or distributed (bare metal
> >>>>>>>>> or in
> >>>>>> the
> >>>>>>>>>     cloud)
> >>>>>>>>>      > - should have built-in deployment facilities for
> >>>>>>>>> applications
> >>>>>>>>>     under test
> >>>>>>>>>      > - should separate test configuration and test code
> >>>>>>>>>      > -- be able to easily reconfigure tests by simple
> >>>>>>>>> configuration
> >>>>>>>>>     changes
> >>>>>>>>>      > -- be able to easily scale test environment by simple
> >>>>>>>>>     configuration changes
> >>>>>>>>>      > -- be able to perform regression testing by simple
> >>>>>>>>> switching
> >>>>>>>>>     artifacts under test via configuration
> >>>>>>>>>      > -- be able to run tests with different JDK version by
> >>>>>>>>> simple
> >>>>>>>>>     configuration changes
> >>>>>>>>>      > - should have human readable reports and/or reporting
> tools
> >>>>>>>>>     integration
> >>>>>>>>>      > - should allow simple test progress monitoring, one does
> >>>>>>>>> not
> >>>>>> want
> >>>>>>>>>     to run 6-hours test to find out that application actually
> >>>>>>>>> crashed
> >>>>>>>>>     during first hour.
> >>>>>>>>>      > - should allow parallel execution of test actions
> >>>>>>>>>      > - should have clean API for test writers
> >>>>>>>>>      > -- clean API for distributed remote commands execution
> >>>>>>>>>      > -- clean API for deployed applications start / stop and
> >>>>>>>>> other
> >>>>>>>>>     operations
> >>>>>>>>>      > -- clean API for performing check on results
> >>>>>>>>>      > - should be open source or at least source code should
> >>>>>>>>> allow
> >>>>>> ease
> >>>>>>>>>     change or extension
> >>>>>>>>>      >
> >>>>>>>>>      > Back at that time we found no better alternative than to
> >>>>>>>>> write
> >>>>>>>>>     our own framework, and here goes Tiden [9] as GridGain
> >>>>>>>>> framework
> >>> of
> >>>>>>>>>     choice for functional integration and performance testing.
> >>>>>>>>>      >
> >>>>>>>>>      > Pros:
> >>>>>>>>>      > - solves all the requirements above
> >>>>>>>>>      > Cons (for Ignite):
> >>>>>>>>>      > - (currently) closed GridGain source
> >>>>>>>>>      >
> >>>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
> >>> which
> >>>>>>>>>     you might have heard already.
> >>>>>>>>>      >
> >>>>>>>>>      > A Combinator suite allows to run set of operations
> >>> concurrently
> >>>>>>>>>     over given database instance. Proven to find at least 30+
> race
> >>>>>>>>>     conditions and NPE issues.
> >>>>>>>>>      >
> >>>>>>>>>      > A Consumption suite allows to run a set production-like
> >>> actions
> >>>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >>> metrics
> >>>>>>>>>     across versions, like heap/disk/CPU consumption, time to
> >>>>>>>>> perform
> >>>>>>>>>     actions, like client PME, server PME, rebalancing time, data
> >>>>>>>>>     replication time, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to
> >>>>>>>>> run
> >>>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself
> >>>>>>>>> has a
> >>>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
> >>>>>>>>>      >
> >>>>>>>>>      > A Stress suite that simulates hardware environment
> >>>>>>>>> degradation
> >>>>>>>>>     during testing.
> >>>>>>>>>      >
> >>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>>> functional
> >>>>>>>>>     regression testing of GridGain Ultimate Edition features like
> >>>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
> >>> IEP-15,
> >>>>>>>>>     etc, etc, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > Most of the suites above use another in-house developed
> >>>>>>>>> Java
> >>>>>> tool
> >>>>>>>>>     – PiClient – to perform actual loading and miscellaneous
> >>> operations
> >>>>>>>>>     with Ignite under test. We use py4j Python-Java gateway
> >>>>>>>>> library
> >>> to
> >>>>>>>>>     control PiClient instances from the tests.
> >>>>>>>>>      >
> >>>>>>>>>      > When we considered CI, we put TeamCity out of scope,
> >>>>>>>>> because
> >>>>>>>>>     distributed integration and performance tests tend to run for
> >>> hours
> >>>>>>>>>     and TeamCity agents are scarce and costly resource. So,
> >>>>>>>>> bundled
> >>>>>> with
> >>>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines
> and
> >>>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward
> >>>>>>>>> aggregates
> >>>>>> test
> >>>>>>>>>     run reports across versions and has built in visualization
> >>> support
> >>>>>>>>>     for Combinator suite.
> >>>>>>>>>      >
> >>>>>>>>>      > All of the above is currently closed source, but we plan
> to
> >>> make
> >>>>>>>>>     it public for community, and publishing Tiden core [9] is the
> >>> first
> >>>>>>>>>     step on that way. You can review some examples of using
> >>>>>>>>> Tiden for
> >>>>>>>>>     tests at my repository [11], for start.
> >>>>>>>>>      >
> >>>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Language
> >>>>>>>>>      > Tiden: Python, 3.7
> >>>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>>>>>>>>     compatible, but actually can’t work with Python 3.7 due to
> >>>>>>>>> broken
> >>>>>>>>>     Zmq dependency.
> >>>>>>>>>      > Comment: Python 3.7 has a much better support for
> >>>>>>>>> async-style
> >>>>>>>>>     code which might be crucial for distributed application
> >>>>>>>>> testing.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 0
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test writers API
> >>>>>>>>>      > Supported integration test framework concepts are
> basically
> >>> the
> >>>>>> same:
> >>>>>>>>>      > - a test controller (test runner)
> >>>>>>>>>      > - a cluster
> >>>>>>>>>      > - a node
> >>>>>>>>>      > - an application (a service in Ducktape terms)
> >>>>>>>>>      > - a test
> >>>>>>>>>      > Score: Tiden: 5, Ducktape: 5
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Tests selection and run
> >>>>>>>>>      > Ducktape: suite-package-class-method level selection,
> >>>>>>>>> internal
> >>>>>>>>>     scheduler allows to run tests in suite in parallel.
> >>>>>>>>>      > Tiden: also suite-package-class-method level selection,
> >>>>>>>>>     additionally allows selecting subset of tests by attribute,
> >>>>>> parallel
> >>>>>>>>>     runs not built in, but allows merging test reports after
> >>> different
> >>>>>> runs.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 2
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test configuration
> >>>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>>      > Tiden: any number of YaML config files, command line
> option
> >>> for
> >>>>>>>>>     fine-grained test configuration, ability to select/modify
> >>>>>>>>> tests
> >>>>>>>>>     behavior based on Ignite version.
> >>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Cluster control
> >>>>>>>>>      > Ducktape: allow execute remote commands by node
> granularity
> >>>>>>>>>      > Tiden: additionally can address cluster as a whole and
> >>>>>>>>> execute
> >>>>>>>>>     remote commands in parallel.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Logs control
> >>>>>>>>>      > Both frameworks have similar builtin support for remote
> >>>>>>>>> logs
> >>>>>>>>>     collection and grepping. Tiden has built-in plugin that can
> >>>>>>>>> zip,
> >>>>>>>>>     collect arbitrary log files from arbitrary locations at
> >>>>>>>>>     test/module/suite granularity and unzip if needed, also
> >>> application
> >>>>>>>>>     API to search / wait for messages in logs. Ducktape allows
> >>>>>>>>> each
> >>>>>>>>>     service declare its log files location (seemingly does not
> >>> support
> >>>>>>>>>     logs rollback), and a single entrypoint to collect service
> >>>>>>>>> logs.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test assertions
> >>>>>>>>>      > Tiden: simple asserts, also few customized assertion
> >>>>>>>>> helpers.
> >>>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test reporting
> >>>>>>>>>      > Ducktape: limited to its own text/html format
> >>>>>>>>>      > Tiden: provides text report, yaml report for reporting
> >>>>>>>>> tools
> >>>>>>>>>     integration, XML xUnit report for integration with
> >>>>>> Jenkins/TeamCity.
> >>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> >>>>>>>>> test
> >>>>>>>>>     needs. However, that means, that test can’t be scaled without
> >>> test
> >>>>>>>>>     code changes. Does not do any deploy, relies on external
> >>>>>>>>> means,
> >>>>>> e.g.
> >>>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for
> the
> >>>>>> test.
> >>>>>>>>>     Provisioning should be done by external means. However,
> >>>>>>>>> provides
> >>> a
> >>>>>>>>>     conventional automated deployment routines.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Documentation and Extensibility
> >>>>>>>>>      > Tiden: current API documentation is limited, should
> >>>>>>>>> change as
> >>> we
> >>>>>>>>>     go open source. Tiden is easily extensible via hooks and
> >>>>>>>>> plugins,
> >>>>>>>>>     see example Maven plugin and Gatling application at [11].
> >>>>>>>>>      > Ducktape: basic documentation at readthedocs.io
> >>>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core
> is
> >>>>>>>>>     tightly coupled and hard to change. The only possible
> >>>>>>>>> extension
> >>>>>>>>>     mechanism is fork-and-rewrite.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > I can continue more on this, but it should be enough for
> >>>>>>>>> now:
> >>>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
> >>>>>>>>>      >
> >>>>>>>>>      > Time for discussion!
> >>>>>>>>>      >
> >>>>>>>>>      > ---
> >>>>>>>>>      > [1] - https://www.testcontainers.org/
> >>>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
> >>>>>>>>>      > [3] - https://jmeter.apache.org/index.html
> >>>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>>>>      > [5] - https://gatling.io/docs/current/
> >>>>>>>>>      > [6] - https://github.com/gridgain/yardstick
> >>>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
> >>>>>>>>>      > [8] -
> >>>>>>>>>
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>>
> >>>>>>>>>      > [9] - https://github.com/gridgain/tiden
> >>>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
> >>>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
> >>>>>>>>>      >
> >>>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>>>>>>>>      >> Hello,
> >>>>>>>>>      >>
> >>>>>>>>>      >> Branch with duck tape created -
> >>>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>>      >>
> >>>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
> >>>>>>>>>      >>
> >>>>>>>>>      >>
> >>>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>>>>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
> >>>>>> написал(а):
> >>>>>>>>>      >>>
> >>>>>>>>>      >>> Hello, Denis.
> >>>>>>>>>      >>>
> >>>>>>>>>      >>> There is no rush with these improvements.
> >>>>>>>>>      >>> We can wait for Maxim proposal and compare two
> >>>>>>>>> solutions :)
> >>>>>>>>>      >>>
> >>>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <
> dmagda@apache.org
> >>>>>>>>>     <ma...@apache.org>> написал(а):
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> Hi Nikolay,
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> Thanks for kicking off this conversation and sharing
> >>>>>>>>> your
> >>>>>>>>>     findings with the
> >>>>>>>>>      >>>> results. That's the right initiative. I do agree that
> >>> Ignite
> >>>>>>>>>     needs to have
> >>>>>>>>>      >>>> an integration testing framework with capabilities
> >>>>>>>>> listed
> >>> by
> >>>>>> you.
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> As we discussed privately, I would only check if
> >>>>>>>>> instead of
> >>>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
> >>>>>>>>>     testing framework
> >>>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
> >>>>>> clusters.
> >>>>>>>>>     That
> >>>>>>>>>      >>>> framework has been battle-tested and might be more
> >>>>>> convenient for
> >>>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim
> >>>>>>>>> Shonichev
> >>>>>>>>>      >>>> <mshonichev@gridgain.com
> >>>>>>>>> <ma...@gridgain.com>>
> >>>>>> who
> >>>>>>>>>     promised to join this thread once he finishes
> >>>>>>>>>      >>>> preparing the usage examples of the framework. To my
> >>>>>>>>>     knowledge, Max has
> >>>>>>>>>      >>>> already been working on that for several days.
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> -
> >>>>>>>>>      >>>> Denis
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>>>>>>>>     <nizhikov@apache.org <ma...@apache.org>>
> >>>>>>>>>      >>>> wrote:
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>>> Hello, Igniters.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> I created a PoC [1] for the integration tests of
> >>>>>>>>> Ignite.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated
> >>>>>>>>> testing
> >>>>>> of
> >>>>>>>>>     Ignite on
> >>>>>>>>>      >>>>> «real cluster».
> >>>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>>>>>      >>>>>       * client and server nodes deployed on different
> >>> hosts.
> >>>>>>>>>      >>>>>       * thin clients perform queries from some other
> >>>>>>>>> hosts
> >>>>>>>>>      >>>>>       * etc.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks
> of
> >>> some
> >>>>>>>>>     internal
> >>>>>>>>>      >>>>> Ignite process
> >>>>>>>>>      >>>>>       * PME
> >>>>>>>>>      >>>>>       * rebalance.
> >>>>>>>>>      >>>>> This means we don’t know - Do we perform
> >>>>>>>>> rebalance(or PME)
> >>>>>> in
> >>>>>>>>>     2.7.0 faster
> >>>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
> >>> Ignite
> >>>>>>>>>     integration in
> >>>>>>>>>      >>>>> a real-world environment:
> >>>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
> >>>>>>>>>      >>>>> I think some ML solutions also should be tested in
> >>>>>> real-world
> >>>>>>>>>     deployments.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Solution:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> I propose to use duck tape library from confluent
> >>>>>>>>> (apache
> >>>>>> 2.0
> >>>>>>>>>     license)
> >>>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud)
> >>>>>>>>> and on
> >>>>>> the
> >>>>>>>>>     local
> >>>>>>>>>      >>>>> environment(docker) and it works just fine.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> PoC contains following services:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>>       * Simple rebalance test:
> >>>>>>>>>      >>>>>               Start 2 server nodes,
> >>>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>>      >>>>>               Start one more server node,
> >>>>>>>>>      >>>>>               Wait for rebalance finish
> >>>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
> >>>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark
> >>>>>>>>> worker,
> >>>>>>>>>      >>>>>               Start 1 Ignite server node
> >>>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>>      >>>>>               Check data in application that queries
> it
> >>> from
> >>>>>>>>>     Spark.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> All tests are fully automated.
> >>>>>>>>>      >>>>> Logs collection works just fine.
> >>>>>>>>>      >>>>> You can see an example of the tests report - [4].
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Pros:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> * Ability to test local changes(no need to public
> >>>>>>>>> changes
> >>> to
> >>>>>>>>>     some remote
> >>>>>>>>>      >>>>> repository or similar).
> >>>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
> >>> tests
> >>>>>>>>>     on different
> >>>>>>>>>      >>>>> JDK, JVM params, config, etc.)
> >>>>>>>>>      >>>>> * Isolation by default so system tests are as
> >>>>>>>>> reliable as
> >>>>>>>>>     possible.
> >>>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
> >>> easily
> >>>>>>>>>     in clusters in
> >>>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
> >>> Vagrant,
> >>>>>>>>>     K8s, Mesos,
> >>>>>>>>>      >>>>> Docker, cloud providers, etc.)
> >>>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
> >>>>>>>>>      >>>>> * Adopted and successfully used by other distributed
> >>>>>>>>> open
> >>>>>>>>>     source project -
> >>>>>>>>>      >>>>> Apache Kafka.
> >>>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
> >>>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
> >>> performance
> >>>>>>>>>     results, etc.)
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> WDYT?
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>>>>>>      >>>>> [3]
> >>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> <2020-07-05--004.tar.gz>
> >>>
> >>>
> >>>
> >>
> >
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Max Shonichev <ms...@yandex.ru>.

Anton, Nikolay,

I want to share some more findings about ducktests I've stubmled upon 
during porting them to Tiden.


First problem was that GridGain Tiden-based tests by default use real 
production-like configuration for Ignite nodes, notably:

  - persitence enabled
  - ~120 caches in ~40 groups
  - data set size around 1M keys per each cache
  - primitive and PoJo cache values
  - extensive use of query entities (indices)

When I've tried to run 4 nodes with such configuration in docker, my 
notebook nearly burns. Nevertheless, grid was starting and working OK, 
but for one little 'but': each successive version under test was 
starting slower and slower.

The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and your 
fork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes within 
default 120 seconds timeout. In order to mimick behavior of your tests I 
had to turn off persistence and use only 1 cache too.

It's a pity that you completely ignore persistence and indices in your 
ducktests, otherwise you would quickly stuck into same limitation.

I hope in the nearest time I would adopt Tiden docker PoC to our 
TeamCity and we'll try to git-bisect in order to find where this 
slowdown comes from. After that I'll file a bug to IGNITE Jira.



Another problem with your rebalance benchmark is it's low accuracy due 
to granularity of measurements.

You don't actually measure rebalance time, you measure time that takes 
you to find a specific string in logs, that's confusing.

The scenario of your test is as follows:

1. start 3 server nodes
2. start 1 data loading client, preload a data, stop client
3. start 1 more server node
4. wait till server joins topology
5. wait till this server node completes exchange and write 
'rebalanced=true, wasRebalanced=false' message to log
6. report time was taken by step 5 as 'Rebalance time'

Confusing thing here is that 'wait till' implementation - you actually 
continuously re-scan logs sleeping each second and wait till message 
appear. So, that means that rebalance time is at least of second 
granularity or even higher, though it is reported with nanosecond 
precision.

But for such lightweight configuration (single in-memory cache) and such 
small set of data (1M keys only), rebalancing is very fast, and usually 
performs under 1 second or just slightly slower.

Before waiting for rebalance message you first wait for topology message 
and that wait also takes time to execute.

So, at the time Python part of the test performs first scan of the logs, 
rebalancing is in most cases already done and time you report as 
'0.0760810375213623' is actually the time to execute logs scanning code.

However, if rebalancing perform just a little bit slower after topology 
update, then first scan of logs is failed, you sleep for whole one 
second and rescan logs and there you got your message and report it as 
'1.02205491065979'.

Under different conditions, dockerized application may run a little 
slower or a little faster, that depends on overall system load, free 
memory, etc. I've tried to increase load on my laptop by running browser 
or maven build, and time to scan logs may fluctuate from 0.02 to 0.09 or 
even 1.02 seconds. Note, that in CI environment, high system load from 
tenants is a quite ordinary situation.

Suppose we adopted rebalance improvements and all versions after 2.9.0 
would perform within 1 second just as 2.9.0 itself. Then your benchmark 
would either report false negative (e.g. 0.02 for master and 0.03 for 
PR), while actually on next re-run it would pass (e.g. 0.07 for master 
and 0.03 for PR). That's not quite the 'stable and non-flacky' test 
Ignite community wants.

What suggestions do you have to improve benchmark measurement accuracy?


A third question is about PME free switch benchmark. Under some 
conditions, LongTxStreamerApplication actually hangs up PME. It need to 
be investigated further, but either this was due to persistence enabled 
or due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false

Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option?
Also, have you had performed a test of PME free switch with 
persistence-enabled caches?


On 09.07.2020 10:11, Max Shonichev wrote:
> Anton,
> 
> well, strange thing, but clean up and rerun helped.
> 
> 
> Ubuntu 18.04
> 
> ==================================================================================================== 
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--003
> run time:         4 minutes 44.835 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
> ==================================================================================================== 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
> 
> status:     PASS
> run time:   41.927 seconds
> {"Rebalanced in (sec)": 1.02205491065979}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 
> 
> status:     PASS
> run time:   51.985 seconds
> {"Rebalanced in (sec)": 0.0760810375213623}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
> 
> status:     PASS
> run time:   1 minute 4.283 seconds
> {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst 
> latency (ms)": "31035"}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev 
> 
> status:     PASS
> run time:   1 minute 13.089 seconds
> {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst 
> latency (ms)": "139"}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client 
> 
> status:     PASS
> run time:   53.332 seconds
> ---------------------------------------------------------------------------------------------------- 
> 
> 
> 
> MacBook
> ================================================================================ 
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--001
> run time:         6 minutes 58.612 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
> ================================================================================ 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
> 
> status:     PASS
> run time:   48.724 seconds
> {"Rebalanced in (sec)": 3.2574470043182373}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 
> 
> status:     PASS
> run time:   1 minute 23.210 seconds
> {"Rebalanced in (sec)": 2.165921211242676}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
> 
> status:     PASS
> run time:   1 minute 12.659 seconds
> {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency 
> (ms)": "31063"}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev 
> 
> status:     PASS
> run time:   1 minute 57.257 seconds
> {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst 
> latency (ms)": "1010"}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client 
> 
> status:     PASS
> run time:   1 minute 36.317 seconds
> 
> =============
> 
> while relative numbers proportion remains the same for different Ignite 
> versions, absolute number for mac/linux differ more then twice.
> 
> I'm finalizing code with 'local Tiden' appliance for your tests.  PR 
> would be ready soon.
> 
> Have you had a chance to deploy ducktests in bare metal?
> 
> 
> 
> On 06.07.2020 14:27, Anton Vinogradov wrote:
>> Max,
>>
>> Thanks for the check!
>>
>>> Is it OK for those tests to fail?
>> No.
>> I see really strange things at logs.
>> Looks like you have concurrent ducktests run started not expected 
>> services,
>> and this broke the tests.
>> Could you please clean up the docker (use clean-up script [1]).
>> Compile sources (use script [2]) and rerun the tests.
>>
>> [1]
>> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh 
>>
>> [2]
>> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh 
>>
>>
>> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org> 
>> wrote:
>>
>>> Hello, Maxim.
>>>
>>> Thanks for writing down the minutes.
>>>
>>> There is no such thing as «Nikolay team» on the dev-list.
>>> I propose to focus on product requirements and what we want to gain from
>>> the framework instead of taking into account the needs of some team.
>>>
>>> Can you, please, write down your version of requirements so we can 
>>> reach a
>>> consensus on that and therefore move to the discussion of the
>>> implementation?
>>>
>>>> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru> написал(а):
>>>>
>>>> Yes, Denis,
>>>>
>>>> common ground seems to be as follows:
>>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
>>> over physical hosts and share benchmark results. In the meantime, 
>>> while I
>>> strongly believe that dockerized approach to benchmarking is a road to
>>> misleading and false positives, I'll prepare a PoC of Tiden in 
>>> dockerized
>>> environment to support 'fast development prototyping' usecase Nikolay 
>>> team
>>> insist on. It should be a matter of few days.
>>>>
>>>> As a side note, I've run Anton PoC locally and would like to have some
>>> comments about results:
>>>>
>>>> Test system: Ubuntu 18.04, docker 19.03.6
>>>> Test commands:
>>>>
>>>>
>>>> git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
>>>> cd ignite
>>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
>>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
>>>> cd modules/ducktests/tests/docker
>>>> ./run_tests.sh
>>>>
>>>> Test results:
>>>>
>>> ==================================================================================================== 
>>>
>>>> SESSION REPORT (ALL TESTS)
>>>> ducktape version: 0.7.7
>>>> session_id:       2020-07-05--004
>>>> run time:         7 minutes 36.360 seconds
>>>> tests run:        5
>>>> passed:           3
>>>> failed:           2
>>>> ignored:          0
>>>>
>>> ==================================================================================================== 
>>>
>>>> test_id:
>>> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
>>>
>>>> status:     FAIL
>>>> run time:   3 minutes 12.232 seconds
>>>>
>>> ---------------------------------------------------------------------------------------------------- 
>>>
>>>> test_id:
>>> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
>>>
>>>> status:     FAIL
>>>> run time:   1 minute 33.076 seconds
>>>>
>>>>
>>>> Is it OK for those tests to fail? Attached is full test report
>>>>
>>>>
>>>> On 02.07.2020 17:46, Denis Magda wrote:
>>>>> Folks,
>>>>> Please share the summary of that Slack conversation here for records
>>> once
>>>>> you find common ground.
>>>>> -
>>>>> Denis
>>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
>>> wrote:
>>>>>> Igniters.
>>>>>>
>>>>>> All who are interested in integration testing framework discussion 
>>>>>> are
>>>>>> welcome into slack channel -
>>>>>>
>>> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2 
>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> 
>>>>>>> написал(а):
>>>>>>>
>>>>>>> Max,
>>>>>>> Thanks for joining us.
>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>>>> dependencies being deployed by external scripts.
>>>>>>> No. It is important to distinguish development, deploy, and
>>>>>> orchestration.
>>>>>>> All-in-one solutions have extremely limited usability.
>>>>>>> As to Ducktests:
>>>>>>> Docker is responsible for deployments during development.
>>>>>>> CI/CD is responsible for deployments during release and nightly
>>> checks.
>>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
>>>>>>> Ducktape is responsible for orchestration.
>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion,
>>>>>>>> while ducktape internally does all actions sequentially.
>>>>>>> No. Ducktape may start any service in parallel. See Pme-free 
>>>>>>> benchmark
>>>>>> [1] for details.
>>>>>>>
>>>>>>>> if we used ducktape solution we would have to instead prepare some
>>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
>>> with
>>>>>>>> Ansible or Chef.
>>>>>>> Sure, because a way of deploy depends on infrastructure.
>>>>>>> How can we be sure that OS we use and the restrictions we have 
>>>>>>> will be
>>>>>> compatible with Tiden?
>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>> dependencies
>>>>>>>> into one uber-image ...
>>>>>>> and
>>>>>>>> I guess we all know about docker hyped ability to run over
>>> distributed
>>>>>>>> virtual networks.
>>>>>>> It is very important not to confuse the test's development (docker
>>> image
>>>>>> you're talking about) and real deployment.
>>>>>>>
>>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
>>>>>>> All actions can be performed in parallel.
>>>>>>> See how Ducktests [2] starts cluster in parallel for example.
>>>>>>>
>>>>>>> [1]
>>>>>>
>>> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84 
>>>
>>>>>>> [2]
>>>>>>
>>> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79 
>>>
>>>>>>>
>>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
>>>>>> wrote:
>>>>>>> Hello, Maxim.
>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>> dependencies being deployed by external scripts
>>>>>>>
>>>>>>> Why do you think that maintaining deploy scripts coupled with the
>>>>>> testing framework is an advantage?
>>>>>>> I thought we want to see and maintain deployment scripts separate 
>>>>>>> from
>>>>>> the testing framework.
>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>>
>>>>>>> Can you, please, clarify, what actions do you have in mind?
>>>>>>> And why we want to execute them concurrently?
>>>>>>> Ignite node start, Client application execution can be done
>>> concurrently
>>>>>> with the ducktape approach.
>>>>>>>
>>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, 
>>>>>> with
>>>>>> Ansible or Chef
>>>>>>>
>>>>>>> We shouldn’t take some user approach as an argument in this
>>> discussion.
>>>>>> Let’s discuss a general approach for all users of the Ignite. Anyway,
>>> what
>>>>>> is wrong with the external deployment script approach?
>>>>>>>
>>>>>>> We, as a community, should provide several ways to run integration
>>> tests
>>>>>> out-of-the-box AND the ability to customize deployment regarding the
>>> user
>>>>>> landscape.
>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>>>>> dependencies into one uber-image and that looks like simple and 
>>>>>> elegant
>>>>>> solution however, that effectively limits you to single-host testing.
>>>>>>>
>>>>>>> Docker image should be used only by the Ignite developers to test
>>>>>> something locally.
>>>>>>> It’s not intended for some real-world testing.
>>>>>>>
>>>>>>> The main issue with the Tiden that I see, it tested and 
>>>>>>> maintained as
>>> a
>>>>>> closed source solution.
>>>>>>> This can lead to the hard to solve problems when we start using and
>>>>>> maintaining it as an open-source solution.
>>>>>>> Like, how many developers used Tiden? And how many of developers 
>>>>>>> were
>>>>>> not authors of the Tiden itself?
>>>>>>>
>>>>>>>
>>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
>>>>>> написал(а):
>>>>>>>>
>>>>>>>> Anton, Nikolay,
>>>>>>>>
>>>>>>>> Let's agree on what we are arguing about: whether it is about "like
>>> or
>>>>>> don't like" or about technical properties of suggested solutions.
>>>>>>>>
>>>>>>>> If it is about likes and dislikes, then the whole discussion is
>>>>>> meaningless. However, I hope together we can analyse pros and cons
>>>>>> carefully.
>>>>>>>>
>>>>>>>> As far as I can understand now, two main differences between 
>>>>>>>> ducktape
>>>>>> and tiden is that:
>>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>> dependencies being deployed by external scripts.
>>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>>>
>>>>>>>> As for me, these are very important properties for distributed
>>> testing
>>>>>> framework.
>>>>>>>>
>>>>>>>> First property let us easily reuse tiden in existing 
>>>>>>>> infrastructures,
>>>>>> for example, during Zookeeper IEP testing at Sberbank site we used 
>>>>>> the
>>> same
>>>>>> tiden scripts that we use in our lab, the only change was putting a
>>> list of
>>>>>> hosts into config.
>>>>>>>>
>>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, 
>>>>>> with
>>>>>> Ansible or Chef.
>>>>>>>>
>>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>>>>> dependencies into one uber-image and that looks like simple and 
>>>>>> elegant
>>>>>> solution,
>>>>>>>> however, that effectively limits you to single-host testing.
>>>>>>>>
>>>>>>>> I guess we all know about docker hyped ability to run over
>>> distributed
>>>>>> virtual networks. We used to go that way, but quickly found that 
>>>>>> it is
>>> more
>>>>>> of the hype than real work. In real environments, there are problems
>>> with
>>>>>> routing, DNS, multicast and broadcast traffic, and many others, that
>>> turn
>>>>>> docker-based distributed solution into a fragile hard-to-maintain
>>> monster.
>>>>>>>>
>>>>>>>> Please, if you believe otherwise, perform a run of your PoC over at
>>>>>> least two physical hosts and share results with us.
>>>>>>>>
>>>>>>>> If you consider that one physical docker host is enough, please,
>>> don't
>>>>>> overlook that we want to run real scale scenarios, with 50-100 cache
>>>>>> groups, persistence enabled and a millions of keys loaded.
>>>>>>>>
>>>>>>>> Practical limit for such configurations is 4-6 nodes per single
>>>>>> physical host. Otherwise, tests become flaky due to resource
>>> starvation.
>>>>>>>>
>>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
>>>>>> your PoC with other tests running at TC (we're targeting TeamCity,
>>> right?)
>>>>>> and share results so we could check if the numbers are reproducible.
>>>>>>>>
>>>>>>>> I stress this once more: functional integration tests are OK to run
>>> in
>>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
>>>>>>>>
>>>>>>>>
>>>>>>>> Second property let us write tests that require real-parallel 
>>>>>>>> actions
>>>>>> over hosts.
>>>>>>>>
>>>>>>>> For example, agreed scenario for PME benchmarkduring "PME
>>> optimization
>>>>>> stream" was as follows:
>>>>>>>>
>>>>>>>>   - 10 server nodes, preloaded with 1M of keys
>>>>>>>>   - 4 client nodes perform transactional load  (client nodes
>>> physically
>>>>>> separated from server nodes)
>>>>>>>>   - during load:
>>>>>>>>   -- 5 server nodes stopped in parallel
>>>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
>>>>>>>>   - load stopped, logs are analysed for exchange times.
>>>>>>>>
>>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
>>>>>> then partition map exchange merge would not happen and we could not
>>> have
>>>>>> measured PME optimizations for that case.
>>>>>>>>
>>>>>>>>
>>>>>>>> These are limitations of ducktape that we believe as a more 
>>>>>>>> important
>>>>>>>> argument "against" than you provide "for".
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
>>>>>>>>> Folks,
>>>>>>>>> First, I've created PR [1] with ducktests improvements
>>>>>>>>> PR contains the following changes
>>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
>>> 2.8)
>>>>>>>>> - Global refactoring
>>>>>>>>> -- benchmarks javacode simplification
>>>>>>>>> -- services python and java classes code deduplication
>>>>>>>>> -- fail-fast checks for java and python (eg. application should
>>>>>> explicitly write it finished with success)
>>>>>>>>> -- simple results extraction from tests and benchmarks
>>>>>>>>> -- javacode now configurable from tests/benchmarks
>>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
>>>>>> operation and log results)
>>>>>>>>> -- docker volume now marked as delegated to increase execution 
>>>>>>>>> speed
>>>>>> for mac & win users
>>>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
>>>>>>>>> -- Ignite can be configured at test/benchmark
>>>>>>>>> - full and module assembly scripts added
>>>>>>>> Great job done! But let me remind one of Apache Ignite principles:
>>>>>>>> week of thinking save months of development.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
>>>>>> integration) as a target "PoC check & real topology benchmarking 
>>>>>> tool".
>>>>>>>>> Ducktape pros
>>>>>>>>> - Developed for distributed system by distributed system 
>>>>>>>>> developers.
>>>>>>>> So does Tiden
>>>>>>>>
>>>>>>>>> - Developed since 2014, stable.
>>>>>>>> Tiden is also pretty stable, and development start date is not a 
>>>>>>>> good
>>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
>>>>>> distributed testing) is since 2010, but we don't see it as a
>>> alternative at
>>>>>> all.
>>>>>>>>
>>>>>>>>> - Proven usability by usage at Kafka.
>>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank 
>>>>>>>> deployments.
>>>>>>>> Core, storage, sql and tx teams use benchmark results provided by
>>>>>> Tiden on a daily basis.
>>>>>>>>
>>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great 
>>>>>>>>> example
>>>>>> pack.
>>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
>>>>>> previous letter.
>>>>>>>>
>>>>>>>>> - Built-in Docker support for rapid development and checks.
>>>>>>>> False, there's no specific 'docker support' in ducktape itself, you
>>>>>> just wrap it in docker by yourself, because ducktape is lacking
>>> deployment
>>>>>> abilities.
>>>>>>>>
>>>>>>>>> - Great for CI automation.
>>>>>>>> False, there's no specific CI-enabled features in ducktape. 
>>>>>>>> Tiden, on
>>>>>> the other hand, provide generic xUnit reporting format, which is
>>> supported
>>>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
>>> Tiden
>>>>>> can use SSH agent, which is also great for CI, because both
>>>>>>>> TeamCity and Jenkins store keys in secret storage available only 
>>>>>>>> for
>>>>>> ssh-agent and only for the time of the test.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> As an additional motivation, at least 3 teams
>>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
>>> Zabbix
>>>>>> speed-up))
>>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
>>>>>> slow-down or broke AI features)
>>>>>>>>> - Ignite SE QA team (to append already developed 
>>>>>>>>> smoke/load/failover
>>>>>> tests to AI codebase)
>>>>>>>>
>>>>>>>> Please, before recommending your tests to other teams, provide 
>>>>>>>> proofs
>>>>>>>> that your tests are reproducible in real environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>> now, wait for ducktest merge to start checking cases they 
>>>>>>>>> working on
>>>>>> in AI way.
>>>>>>>>> Thoughts?
>>>>>>>> Let us together review both solutions, we'll try to run your 
>>>>>>>> tests in
>>>>>> our lab, and you'll try to at least checkout tiden and see if same
>>> tests
>>>>>> can be implemented with it?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> [1] https://github.com/apache/ignite/pull/7967
>>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
>>> nizhikov@apache.org
>>>>>> <ma...@apache.org>> wrote:
>>>>>>>>>     Hello, Maxim.
>>>>>>>>>     Thank you for so detailed explanation.
>>>>>>>>>     Can we put the content of this discussion somewhere on the 
>>>>>>>>> wiki?
>>>>>>>>>     So It doesn’t get lost.
>>>>>>>>>     I divide the answer in several parts. From the requirements to
>>> the
>>>>>>>>>     implementation.
>>>>>>>>>     So, if we agreed on the requirements we can proceed with the
>>>>>>>>>     discussion of the implementation.
>>>>>>>>>     1. Requirements:
>>>>>>>>>     The main goal I want to achieve is *reproducibility* of the
>>> tests.
>>>>>>>>>     I’m sick and tired with the zillions of flaky, rarely 
>>>>>>>>> failed, and
>>>>>>>>>     almost never failed tests in Ignite codebase.
>>>>>>>>>     We should start with the simplest scenarios that will be as
>>>>>> reliable
>>>>>>>>>     as steel :)
>>>>>>>>>     I want to know for sure:
>>>>>>>>>        - Is this PR makes rebalance quicker or not?
>>>>>>>>>        - Is this PR makes PME quicker or not?
>>>>>>>>>     So, your description of the complex test scenario looks as 
>>>>>>>>> a next
>>>>>>>>>     step to me.
>>>>>>>>>     Anyway, It’s cool we already have one.
>>>>>>>>>     The second goal is to have a strict test lifecycle as we 
>>>>>>>>> have in
>>>>>>>>>     JUnit and similar frameworks.
>>>>>>>>>      > It covers production-like deployment and running a 
>>>>>>>>> scenarios
>>>>>> over
>>>>>>>>>     a single database instance.
>>>>>>>>>     Do you mean «single cluster» or «single host»?
>>>>>>>>>     2. Existing tests:
>>>>>>>>>      > A Combinator suite allows to run set of operations
>>> concurrently
>>>>>>>>>     over given database instance.
>>>>>>>>>      > A Consumption suite allows to run a set production-like
>>> actions
>>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>>> metrics
>>>>>>>>>     across versions
>>>>>>>>>      > A Yardstick suite
>>>>>>>>>      > A Stress suite that simulates hardware environment 
>>>>>>>>> degradation
>>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>>> functional
>>>>>>>>>     regression testing
>>>>>>>>>      > Regression
>>>>>>>>>     Great news that we already have so many choices for testing!
>>>>>>>>>     Mature test base is a big +1 for Tiden.
>>>>>>>>>     3. Comparison:
>>>>>>>>>      > Criteria: Test configuration
>>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>>      > Tiden: any number of YaML config files, command line option
>>> for
>>>>>>>>>     fine-grained test configuration, ability to select/modify 
>>>>>>>>> tests
>>>>>>>>>     behavior based on Ignite version.
>>>>>>>>>     1. Many YAML files can be hard to maintain.
>>>>>>>>>     2. In ducktape, you can set parameters via «—parameters» 
>>>>>>>>> option.
>>>>>>>>>     Please, take a look at the doc [1]
>>>>>>>>>      > Criteria: Cluster control
>>>>>>>>>      > Tiden: additionally can address cluster as a whole and 
>>>>>>>>> execute
>>>>>>>>>     remote commands in parallel.
>>>>>>>>>     It seems we implement this ability in the PoC, already.
>>>>>>>>>      > Criteria: Test assertions
>>>>>>>>>      > Tiden: simple asserts, also few customized assertion 
>>>>>>>>> helpers.
>>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>>     Can you, please, be more specific.
>>>>>>>>>     What helpers do you have in mind?
>>>>>>>>>     Ducktape has an asserts that waits for logfile messages or 
>>>>>>>>> some
>>>>>>>>>     process finish.
>>>>>>>>>      > Criteria: Test reporting
>>>>>>>>>      > Ducktape: limited to its own text/HTML format
>>>>>>>>>     Ducktape have
>>>>>>>>>     1. Text reporter
>>>>>>>>>     2. Customizable HTML reporter
>>>>>>>>>     3. JSON reporter.
>>>>>>>>>     We can show JSON with the any template or tool.
>>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for 
>>>>>>>>> test
>>>>>>>>>     needs. However, that means, that test can’t be scaled without
>>> test
>>>>>>>>>     code changes. Does not do any deploy, relies on external 
>>>>>>>>> means,
>>>>>> e.g.
>>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>>     This is not true.
>>>>>>>>>     1. We can set explicit test parameters(node number) via
>>> parameters.
>>>>>>>>>     We can increase client count of cluster size without test code
>>>>>> changes.
>>>>>>>>>     2. We have many choices for the test environment. These 
>>>>>>>>> choices
>>> are
>>>>>>>>>     tested and used in other projects:
>>>>>>>>>              * docker
>>>>>>>>>              * vagrant
>>>>>>>>>              * private cloud(ssh access)
>>>>>>>>>              * ec2
>>>>>>>>>     Please, take a look at Kafka documentation [2]
>>>>>>>>>      > I can continue more on this, but it should be enough for 
>>>>>>>>> now:
>>>>>>>>>     We need to go deeper! :)
>>>>>>>>>     [1]
>>>>>>>>>
>>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>>>>>>>>     [2]
>>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev 
>>>>>>>>> <mshonich@yandex.ru
>>>>>>>>>     <ma...@yandex.ru>> написал(а):
>>>>>>>>>      >
>>>>>>>>>      > Greetings, Nikolay,
>>>>>>>>>      >
>>>>>>>>>      > First of all, thank you for you great effort preparing 
>>>>>>>>> PoC of
>>>>>>>>>     integration testing to Ignite community.
>>>>>>>>>      >
>>>>>>>>>      > It’s a shame Ignite did not have at least some such 
>>>>>>>>> tests yet,
>>>>>>>>>     however, GridGain, as a major contributor to Apache Ignite 
>>>>>>>>> had a
>>>>>>>>>     profound collection of in-house tools to perform 
>>>>>>>>> integration and
>>>>>>>>>     performance testing for years already and while we slowly
>>> consider
>>>>>>>>>     sharing our expertise with the community, your initiative 
>>>>>>>>> makes
>>> us
>>>>>>>>>     drive that process a bit faster, thanks a lot!
>>>>>>>>>      >
>>>>>>>>>      > I reviewed your PoC and want to share a little about 
>>>>>>>>> what we
>>> do
>>>>>>>>>     on our part, why and how, hope it would help community take
>>> proper
>>>>>>>>>     course.
>>>>>>>>>      >
>>>>>>>>>      > First I’ll do a brief overview of what decisions we made 
>>>>>>>>> and
>>>>>> what
>>>>>>>>>     we do have in our private code base, next I’ll describe 
>>>>>>>>> what we
>>>>>> have
>>>>>>>>>     already donated to the public and what we plan public next, 
>>>>>>>>> then
>>>>>>>>>     I’ll compare both approaches highlighting deficiencies in 
>>>>>>>>> order
>>> to
>>>>>>>>>     spur public discussion on the matter.
>>>>>>>>>      >
>>>>>>>>>      > It might seem strange to use Python to run Bash to run Java
>>>>>>>>>     applications because that introduces IT industry best of 
>>>>>>>>> breed’ –
>>>>>>>>>     the Python dependency hell – to the Java application code 
>>>>>>>>> base.
>>> The
>>>>>>>>>     only strangest decision one can made is to use Maven to run
>>> Docker
>>>>>>>>>     to run Bash to run Python to run Bash to run Java, but 
>>>>>>>>> desperate
>>>>>>>>>     times call for desperate measures I guess.
>>>>>>>>>      >
>>>>>>>>>      > There are Java-based solutions for integration testing 
>>>>>>>>> exists,
>>>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they 
>>>>>>>>> might go
>>>>>> well
>>>>>>>>>     for Ignite community CI pipelines by them selves. But we also
>>>>>> wanted
>>>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
>>>>>>>>>     benchmark, and this is solved by totally different set of 
>>>>>>>>> tools
>>> in
>>>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>>>>>>>>      >
>>>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
>>>>>> community
>>>>>>>>>     already has Yardstick [6], and there’s nothing wrong with 
>>>>>>>>> writing
>>>>>>>>>     PME benchmark using Yardstick, but we also wanted to be 
>>>>>>>>> able to
>>> run
>>>>>>>>>     scenarios like this:
>>>>>>>>>      > - put an X load to a Ignite database;
>>>>>>>>>      > - perform an Y set of operations to check how Ignite copes
>>> with
>>>>>>>>>     operations under load.
>>>>>>>>>      >
>>>>>>>>>      > And yes, we also wanted applications under test be deployed
>>>>>> ‘like
>>>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
>>> arises
>>>>>>>>>     questions about provisioning and nodes affinity which I’ll 
>>>>>>>>> cover
>>> in
>>>>>>>>>     detail later.
>>>>>>>>>      >
>>>>>>>>>      > So we decided to put a little effort to build a simple 
>>>>>>>>> tool to
>>>>>>>>>     cover different integration and performance scenarios, and 
>>>>>>>>> our QA
>>>>>>>>>     lab first attempt was PoC-Tester [7], currently open source 
>>>>>>>>> for
>>> all
>>>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
>>> Java-based
>>>>>>>>>     tool targeted to be run on a pre-release QA stage.
>>>>>>>>>      >
>>>>>>>>>      > It covers production-like deployment and running a 
>>>>>>>>> scenarios
>>>>>> over
>>>>>>>>>     a single database instance. PoC-Tester scenarios consists of a
>>>>>>>>>     sequence of tasks running sequentially or in parallel. 
>>>>>>>>> After all
>>>>>>>>>     tasks complete, or at any time during test, user can run logs
>>>>>>>>>     collection task, logs are checked against exceptions and a
>>> summary
>>>>>>>>>     of found issues and task ops/latency statistics is 
>>>>>>>>> generated at
>>> the
>>>>>>>>>     end of scenario. One of the main PoC-Tester features is its
>>>>>>>>>     fire-and-forget approach to task managing. That is, you can
>>> deploy
>>>>>> a
>>>>>>>>>     grid and left it running for weeks, periodically firing some
>>> tasks
>>>>>>>>>     onto it.
>>>>>>>>>      >
>>>>>>>>>      > During earliest stages of PoC-Tester development it becomes
>>>>>> quite
>>>>>>>>>     clear that Java application development is a tedious 
>>>>>>>>> process and
>>>>>>>>>     architecture decisions you take during development are slow 
>>>>>>>>> and
>>>>>> hard
>>>>>>>>>     to change.
>>>>>>>>>      > For example, scenarios like this
>>>>>>>>>      > - deploy two instances of GridGain with master-slave data
>>>>>>>>>     replication configured;
>>>>>>>>>      > - put a load on master;
>>>>>>>>>      > - perform checks on slave,
>>>>>>>>>      > or like this:
>>>>>>>>>      > - preload a 1Tb of data by using your favorite tool of 
>>>>>>>>> choice
>>> to
>>>>>>>>>     an Apache Ignite of version X;
>>>>>>>>>      > - run a set of functional tests running Apache Ignite 
>>>>>>>>> version
>>> Y
>>>>>>>>>     over preloaded data,
>>>>>>>>>      > do not fit well in the PoC-Tester workflow.
>>>>>>>>>      >
>>>>>>>>>      > So, this is why we decided to use Python as a generic
>>> scripting
>>>>>>>>>     language of choice.
>>>>>>>>>      >
>>>>>>>>>      > Pros:
>>>>>>>>>      > - quicker prototyping and development cycles
>>>>>>>>>      > - easier to find DevOps/QA engineer with Python skills than
>>> one
>>>>>>>>>     with Java skills
>>>>>>>>>      > - used extensively all over the world for DevOps/CI 
>>>>>>>>> pipelines
>>>>>> and
>>>>>>>>>     thus has rich set of libraries for all possible integration 
>>>>>>>>> uses
>>>>>> cases.
>>>>>>>>>      >
>>>>>>>>>      > Cons:
>>>>>>>>>      > - Nightmare with dependencies. Better stick to specific
>>>>>>>>>     language/libraries version.
>>>>>>>>>      >
>>>>>>>>>      > Comparing alternatives for Python-based testing 
>>>>>>>>> framework we
>>>>>> have
>>>>>>>>>     considered following requirements, somewhat similar to what
>>> you’ve
>>>>>>>>>     mentioned for Confluent [8] previously:
>>>>>>>>>      > - should be able run locally or distributed (bare metal 
>>>>>>>>> or in
>>>>>> the
>>>>>>>>>     cloud)
>>>>>>>>>      > - should have built-in deployment facilities for 
>>>>>>>>> applications
>>>>>>>>>     under test
>>>>>>>>>      > - should separate test configuration and test code
>>>>>>>>>      > -- be able to easily reconfigure tests by simple 
>>>>>>>>> configuration
>>>>>>>>>     changes
>>>>>>>>>      > -- be able to easily scale test environment by simple
>>>>>>>>>     configuration changes
>>>>>>>>>      > -- be able to perform regression testing by simple 
>>>>>>>>> switching
>>>>>>>>>     artifacts under test via configuration
>>>>>>>>>      > -- be able to run tests with different JDK version by 
>>>>>>>>> simple
>>>>>>>>>     configuration changes
>>>>>>>>>      > - should have human readable reports and/or reporting tools
>>>>>>>>>     integration
>>>>>>>>>      > - should allow simple test progress monitoring, one does 
>>>>>>>>> not
>>>>>> want
>>>>>>>>>     to run 6-hours test to find out that application actually 
>>>>>>>>> crashed
>>>>>>>>>     during first hour.
>>>>>>>>>      > - should allow parallel execution of test actions
>>>>>>>>>      > - should have clean API for test writers
>>>>>>>>>      > -- clean API for distributed remote commands execution
>>>>>>>>>      > -- clean API for deployed applications start / stop and 
>>>>>>>>> other
>>>>>>>>>     operations
>>>>>>>>>      > -- clean API for performing check on results
>>>>>>>>>      > - should be open source or at least source code should 
>>>>>>>>> allow
>>>>>> ease
>>>>>>>>>     change or extension
>>>>>>>>>      >
>>>>>>>>>      > Back at that time we found no better alternative than to 
>>>>>>>>> write
>>>>>>>>>     our own framework, and here goes Tiden [9] as GridGain 
>>>>>>>>> framework
>>> of
>>>>>>>>>     choice for functional integration and performance testing.
>>>>>>>>>      >
>>>>>>>>>      > Pros:
>>>>>>>>>      > - solves all the requirements above
>>>>>>>>>      > Cons (for Ignite):
>>>>>>>>>      > - (currently) closed GridGain source
>>>>>>>>>      >
>>>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
>>> which
>>>>>>>>>     you might have heard already.
>>>>>>>>>      >
>>>>>>>>>      > A Combinator suite allows to run set of operations
>>> concurrently
>>>>>>>>>     over given database instance. Proven to find at least 30+ race
>>>>>>>>>     conditions and NPE issues.
>>>>>>>>>      >
>>>>>>>>>      > A Consumption suite allows to run a set production-like
>>> actions
>>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>>> metrics
>>>>>>>>>     across versions, like heap/disk/CPU consumption, time to 
>>>>>>>>> perform
>>>>>>>>>     actions, like client PME, server PME, rebalancing time, data
>>>>>>>>>     replication time, etc.
>>>>>>>>>      >
>>>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to 
>>>>>>>>> run
>>>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself 
>>>>>>>>> has a
>>>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
>>>>>>>>>      >
>>>>>>>>>      > A Stress suite that simulates hardware environment 
>>>>>>>>> degradation
>>>>>>>>>     during testing.
>>>>>>>>>      >
>>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>>> functional
>>>>>>>>>     regression testing of GridGain Ultimate Edition features like
>>>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
>>>>>>>>>      >
>>>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
>>> IEP-15,
>>>>>>>>>     etc, etc, etc.
>>>>>>>>>      >
>>>>>>>>>      > Most of the suites above use another in-house developed 
>>>>>>>>> Java
>>>>>> tool
>>>>>>>>>     – PiClient – to perform actual loading and miscellaneous
>>> operations
>>>>>>>>>     with Ignite under test. We use py4j Python-Java gateway 
>>>>>>>>> library
>>> to
>>>>>>>>>     control PiClient instances from the tests.
>>>>>>>>>      >
>>>>>>>>>      > When we considered CI, we put TeamCity out of scope, 
>>>>>>>>> because
>>>>>>>>>     distributed integration and performance tests tend to run for
>>> hours
>>>>>>>>>     and TeamCity agents are scarce and costly resource. So, 
>>>>>>>>> bundled
>>>>>> with
>>>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
>>>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward 
>>>>>>>>> aggregates
>>>>>> test
>>>>>>>>>     run reports across versions and has built in visualization
>>> support
>>>>>>>>>     for Combinator suite.
>>>>>>>>>      >
>>>>>>>>>      > All of the above is currently closed source, but we plan to
>>> make
>>>>>>>>>     it public for community, and publishing Tiden core [9] is the
>>> first
>>>>>>>>>     step on that way. You can review some examples of using 
>>>>>>>>> Tiden for
>>>>>>>>>     tests at my repository [11], for start.
>>>>>>>>>      >
>>>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Language
>>>>>>>>>      > Tiden: Python, 3.7
>>>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>>>>>>>>     compatible, but actually can’t work with Python 3.7 due to 
>>>>>>>>> broken
>>>>>>>>>     Zmq dependency.
>>>>>>>>>      > Comment: Python 3.7 has a much better support for 
>>>>>>>>> async-style
>>>>>>>>>     code which might be crucial for distributed application 
>>>>>>>>> testing.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 0
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test writers API
>>>>>>>>>      > Supported integration test framework concepts are basically
>>> the
>>>>>> same:
>>>>>>>>>      > - a test controller (test runner)
>>>>>>>>>      > - a cluster
>>>>>>>>>      > - a node
>>>>>>>>>      > - an application (a service in Ducktape terms)
>>>>>>>>>      > - a test
>>>>>>>>>      > Score: Tiden: 5, Ducktape: 5
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Tests selection and run
>>>>>>>>>      > Ducktape: suite-package-class-method level selection, 
>>>>>>>>> internal
>>>>>>>>>     scheduler allows to run tests in suite in parallel.
>>>>>>>>>      > Tiden: also suite-package-class-method level selection,
>>>>>>>>>     additionally allows selecting subset of tests by attribute,
>>>>>> parallel
>>>>>>>>>     runs not built in, but allows merging test reports after
>>> different
>>>>>> runs.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 2
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test configuration
>>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>>      > Tiden: any number of YaML config files, command line option
>>> for
>>>>>>>>>     fine-grained test configuration, ability to select/modify 
>>>>>>>>> tests
>>>>>>>>>     behavior based on Ignite version.
>>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Cluster control
>>>>>>>>>      > Ducktape: allow execute remote commands by node granularity
>>>>>>>>>      > Tiden: additionally can address cluster as a whole and 
>>>>>>>>> execute
>>>>>>>>>     remote commands in parallel.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Logs control
>>>>>>>>>      > Both frameworks have similar builtin support for remote 
>>>>>>>>> logs
>>>>>>>>>     collection and grepping. Tiden has built-in plugin that can 
>>>>>>>>> zip,
>>>>>>>>>     collect arbitrary log files from arbitrary locations at
>>>>>>>>>     test/module/suite granularity and unzip if needed, also
>>> application
>>>>>>>>>     API to search / wait for messages in logs. Ducktape allows 
>>>>>>>>> each
>>>>>>>>>     service declare its log files location (seemingly does not
>>> support
>>>>>>>>>     logs rollback), and a single entrypoint to collect service 
>>>>>>>>> logs.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test assertions
>>>>>>>>>      > Tiden: simple asserts, also few customized assertion 
>>>>>>>>> helpers.
>>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test reporting
>>>>>>>>>      > Ducktape: limited to its own text/html format
>>>>>>>>>      > Tiden: provides text report, yaml report for reporting 
>>>>>>>>> tools
>>>>>>>>>     integration, XML xUnit report for integration with
>>>>>> Jenkins/TeamCity.
>>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for 
>>>>>>>>> test
>>>>>>>>>     needs. However, that means, that test can’t be scaled without
>>> test
>>>>>>>>>     code changes. Does not do any deploy, relies on external 
>>>>>>>>> means,
>>>>>> e.g.
>>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
>>>>>> test.
>>>>>>>>>     Provisioning should be done by external means. However, 
>>>>>>>>> provides
>>> a
>>>>>>>>>     conventional automated deployment routines.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Documentation and Extensibility
>>>>>>>>>      > Tiden: current API documentation is limited, should 
>>>>>>>>> change as
>>> we
>>>>>>>>>     go open source. Tiden is easily extensible via hooks and 
>>>>>>>>> plugins,
>>>>>>>>>     see example Maven plugin and Gatling application at [11].
>>>>>>>>>      > Ducktape: basic documentation at readthedocs.io
>>>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
>>>>>>>>>     tightly coupled and hard to change. The only possible 
>>>>>>>>> extension
>>>>>>>>>     mechanism is fork-and-rewrite.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > I can continue more on this, but it should be enough for 
>>>>>>>>> now:
>>>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
>>>>>>>>>      >
>>>>>>>>>      > Time for discussion!
>>>>>>>>>      >
>>>>>>>>>      > ---
>>>>>>>>>      > [1] - https://www.testcontainers.org/
>>>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
>>>>>>>>>      > [3] - https://jmeter.apache.org/index.html
>>>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>>>      > [5] - https://gatling.io/docs/current/
>>>>>>>>>      > [6] - https://github.com/gridgain/yardstick
>>>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
>>>>>>>>>      > [8] -
>>>>>>>>>
>>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements 
>>>
>>>>>>>>>      > [9] - https://github.com/gridgain/tiden
>>>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
>>>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
>>>>>>>>>      >
>>>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>>>>>>>>      >> Hello,
>>>>>>>>>      >>
>>>>>>>>>      >> Branch with duck tape created -
>>>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>>>      >>
>>>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
>>>>>>>>>      >>
>>>>>>>>>      >>
>>>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>>>>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
>>>>>> написал(а):
>>>>>>>>>      >>>
>>>>>>>>>      >>> Hello, Denis.
>>>>>>>>>      >>>
>>>>>>>>>      >>> There is no rush with these improvements.
>>>>>>>>>      >>> We can wait for Maxim proposal and compare two 
>>>>>>>>> solutions :)
>>>>>>>>>      >>>
>>>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>>>>>>>>     <ma...@apache.org>> написал(а):
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> Hi Nikolay,
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> Thanks for kicking off this conversation and sharing 
>>>>>>>>> your
>>>>>>>>>     findings with the
>>>>>>>>>      >>>> results. That's the right initiative. I do agree that
>>> Ignite
>>>>>>>>>     needs to have
>>>>>>>>>      >>>> an integration testing framework with capabilities 
>>>>>>>>> listed
>>> by
>>>>>> you.
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> As we discussed privately, I would only check if 
>>>>>>>>> instead of
>>>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
>>>>>>>>>     testing framework
>>>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
>>>>>> clusters.
>>>>>>>>>     That
>>>>>>>>>      >>>> framework has been battle-tested and might be more
>>>>>> convenient for
>>>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim 
>>>>>>>>> Shonichev
>>>>>>>>>      >>>> <mshonichev@gridgain.com 
>>>>>>>>> <ma...@gridgain.com>>
>>>>>> who
>>>>>>>>>     promised to join this thread once he finishes
>>>>>>>>>      >>>> preparing the usage examples of the framework. To my
>>>>>>>>>     knowledge, Max has
>>>>>>>>>      >>>> already been working on that for several days.
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> -
>>>>>>>>>      >>>> Denis
>>>>>>>>>      >>>>
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>>>>>>>>     <nizhikov@apache.org <ma...@apache.org>>
>>>>>>>>>      >>>> wrote:
>>>>>>>>>      >>>>
>>>>>>>>>      >>>>> Hello, Igniters.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> I created a PoC [1] for the integration tests of 
>>>>>>>>> Ignite.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated 
>>>>>>>>> testing
>>>>>> of
>>>>>>>>>     Ignite on
>>>>>>>>>      >>>>> «real cluster».
>>>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
>>>>>>>>>      >>>>>       * client and server nodes deployed on different
>>> hosts.
>>>>>>>>>      >>>>>       * thin clients perform queries from some other 
>>>>>>>>> hosts
>>>>>>>>>      >>>>>       * etc.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of
>>> some
>>>>>>>>>     internal
>>>>>>>>>      >>>>> Ignite process
>>>>>>>>>      >>>>>       * PME
>>>>>>>>>      >>>>>       * rebalance.
>>>>>>>>>      >>>>> This means we don’t know - Do we perform 
>>>>>>>>> rebalance(or PME)
>>>>>> in
>>>>>>>>>     2.7.0 faster
>>>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
>>> Ignite
>>>>>>>>>     integration in
>>>>>>>>>      >>>>> a real-world environment:
>>>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
>>>>>>>>>      >>>>> I think some ML solutions also should be tested in
>>>>>> real-world
>>>>>>>>>     deployments.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Solution:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> I propose to use duck tape library from confluent 
>>>>>>>>> (apache
>>>>>> 2.0
>>>>>>>>>     license)
>>>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) 
>>>>>>>>> and on
>>>>>> the
>>>>>>>>>     local
>>>>>>>>>      >>>>> environment(docker) and it works just fine.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> PoC contains following services:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>>       * Simple rebalance test:
>>>>>>>>>      >>>>>               Start 2 server nodes,
>>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>>      >>>>>               Start one more server node,
>>>>>>>>>      >>>>>               Wait for rebalance finish
>>>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
>>>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark 
>>>>>>>>> worker,
>>>>>>>>>      >>>>>               Start 1 Ignite server node
>>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>>      >>>>>               Check data in application that queries it
>>> from
>>>>>>>>>     Spark.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> All tests are fully automated.
>>>>>>>>>      >>>>> Logs collection works just fine.
>>>>>>>>>      >>>>> You can see an example of the tests report - [4].
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Pros:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> * Ability to test local changes(no need to public 
>>>>>>>>> changes
>>> to
>>>>>>>>>     some remote
>>>>>>>>>      >>>>> repository or similar).
>>>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
>>> tests
>>>>>>>>>     on different
>>>>>>>>>      >>>>> JDK, JVM params, config, etc.)
>>>>>>>>>      >>>>> * Isolation by default so system tests are as 
>>>>>>>>> reliable as
>>>>>>>>>     possible.
>>>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
>>> easily
>>>>>>>>>     in clusters in
>>>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
>>> Vagrant,
>>>>>>>>>     K8s, Mesos,
>>>>>>>>>      >>>>> Docker, cloud providers, etc.)
>>>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
>>>>>>>>>      >>>>> * Adopted and successfully used by other distributed 
>>>>>>>>> open
>>>>>>>>>     source project -
>>>>>>>>>      >>>>> Apache Kafka.
>>>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
>>>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
>>> performance
>>>>>>>>>     results, etc.)
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> WDYT?
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
>>>>>>>>>      >>>>> [3]
>>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> <2020-07-05--004.tar.gz>
>>>
>>>
>>>
>>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Max Shonichev <ms...@yandex.ru>.

Anton,

I've prepared a PoC of running Tiden in dockerized environment.

The code is in fork of your repo at 
https://github.com/mshonichev/ignite.git, branch 'ignite-ducktape', 
module 'integration-tests.


Steps to run PoC are as follows:
```
$ mkdir -p $HOME/tiden_poc
$ cd $HOME/tiden_poc
$ git clone -b ignite-ducktape https://github.com/mshonichev/ignite.git
$ cd ignite
$ scripts/build.sh
$ cd modules/integration-tests
$ mvn -DskipTests -Dmaven.javadoc.skip=true verify

```

Few changes in Tiden 0.6.5 include ability to set num_nodes upon 
application start and minor fixes in API.

Most of the work to run in docker done by external bash scripts, so no 
changes in tests themselves would be required to run them in bare-metal.

You can run them manually, or as a part of maven verify stage, they are 
hooked in pom.xml via maven-exec-plugin for that.

I've noted comments in your PoC and instead of single docker image
we prepare a bunch of images:
  - tiden-master
  - tiden-slave:${JDK_VERSION}
  - tiden-artifacts-ignite:${IGNITE_VERSION}
  - tiden-artifacts-...

During provisioning stage, all those images are hotlinked via separate 
volumes into /opt container dir. Tiden itself is installed as package 
either of specific version or in 'develop' mode.

Also, it turns out that using `docker run --user=...` work great only in 
MacOs or in Ubuntu under default user. Trying to run ducktests under 
user with UID != 1000 produces unaccessible files in `results` dir. So, 
in Tiden PoC I've fixed that too, you can review Dockerfile's in 
modules/integration-tests/tiden/docker/ dir.

Next, I've recreated all your benchmarks, their code is in
modules/integration-tests/tiden/suites/benchmarks

Your Java applications copy-pasted with no changes into
modules/integration-tests/src, a thin Python wrapper over it is in 
modules/integration-tests/apps/igniteaware.app

Unfortunately, no version-filtering decorators yet present, so instead 
of single run across all versions, run-tests.sh internally runs Tiden 
several times.


Please, checkout sources and share some thoughts.


On 09.07.2020 10:11, Max Shonichev wrote:
> Anton,
> 
> well, strange thing, but clean up and rerun helped.
> 
> 
> Ubuntu 18.04
> 
> ==================================================================================================== 
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--003
> run time:         4 minutes 44.835 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
> ==================================================================================================== 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
> 
> status:     PASS
> run time:   41.927 seconds
> {"Rebalanced in (sec)": 1.02205491065979}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 
> 
> status:     PASS
> run time:   51.985 seconds
> {"Rebalanced in (sec)": 0.0760810375213623}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
> 
> status:     PASS
> run time:   1 minute 4.283 seconds
> {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst 
> latency (ms)": "31035"}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev 
> 
> status:     PASS
> run time:   1 minute 13.089 seconds
> {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst 
> latency (ms)": "139"}
> ---------------------------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client 
> 
> status:     PASS
> run time:   53.332 seconds
> ---------------------------------------------------------------------------------------------------- 
> 
> 
> 
> MacBook
> ================================================================================ 
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--001
> run time:         6 minutes 58.612 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
> ================================================================================ 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
> 
> status:     PASS
> run time:   48.724 seconds
> {"Rebalanced in (sec)": 3.2574470043182373}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 
> 
> status:     PASS
> run time:   1 minute 23.210 seconds
> {"Rebalanced in (sec)": 2.165921211242676}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
> 
> status:     PASS
> run time:   1 minute 12.659 seconds
> {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency 
> (ms)": "31063"}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev 
> 
> status:     PASS
> run time:   1 minute 57.257 seconds
> {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst 
> latency (ms)": "1010"}
> -------------------------------------------------------------------------------- 
> 
> test_id: 
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client 
> 
> status:     PASS
> run time:   1 minute 36.317 seconds
> 
> =============
> 
> while relative numbers proportion remains the same for different Ignite 
> versions, absolute number for mac/linux differ more then twice.
> 
> I'm finalizing code with 'local Tiden' appliance for your tests.  PR 
> would be ready soon.
> 
> Have you had a chance to deploy ducktests in bare metal?
> 
> 
> 
> On 06.07.2020 14:27, Anton Vinogradov wrote:
>> Max,
>>
>> Thanks for the check!
>>
>>> Is it OK for those tests to fail?
>> No.
>> I see really strange things at logs.
>> Looks like you have concurrent ducktests run started not expected 
>> services,
>> and this broke the tests.
>> Could you please clean up the docker (use clean-up script [1]).
>> Compile sources (use script [2]) and rerun the tests.
>>
>> [1]
>> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh 
>>
>> [2]
>> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh 
>>
>>
>> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org> 
>> wrote:
>>
>>> Hello, Maxim.
>>>
>>> Thanks for writing down the minutes.
>>>
>>> There is no such thing as «Nikolay team» on the dev-list.
>>> I propose to focus on product requirements and what we want to gain from
>>> the framework instead of taking into account the needs of some team.
>>>
>>> Can you, please, write down your version of requirements so we can 
>>> reach a
>>> consensus on that and therefore move to the discussion of the
>>> implementation?
>>>
>>>> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru> написал(а):
>>>>
>>>> Yes, Denis,
>>>>
>>>> common ground seems to be as follows:
>>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
>>> over physical hosts and share benchmark results. In the meantime, 
>>> while I
>>> strongly believe that dockerized approach to benchmarking is a road to
>>> misleading and false positives, I'll prepare a PoC of Tiden in 
>>> dockerized
>>> environment to support 'fast development prototyping' usecase Nikolay 
>>> team
>>> insist on. It should be a matter of few days.
>>>>
>>>> As a side note, I've run Anton PoC locally and would like to have some
>>> comments about results:
>>>>
>>>> Test system: Ubuntu 18.04, docker 19.03.6
>>>> Test commands:
>>>>
>>>>
>>>> git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
>>>> cd ignite
>>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
>>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
>>>> cd modules/ducktests/tests/docker
>>>> ./run_tests.sh
>>>>
>>>> Test results:
>>>>
>>> ==================================================================================================== 
>>>
>>>> SESSION REPORT (ALL TESTS)
>>>> ducktape version: 0.7.7
>>>> session_id:       2020-07-05--004
>>>> run time:         7 minutes 36.360 seconds
>>>> tests run:        5
>>>> passed:           3
>>>> failed:           2
>>>> ignored:          0
>>>>
>>> ==================================================================================================== 
>>>
>>>> test_id:
>>> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 
>>>
>>>> status:     FAIL
>>>> run time:   3 minutes 12.232 seconds
>>>>
>>> ---------------------------------------------------------------------------------------------------- 
>>>
>>>> test_id:
>>> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 
>>>
>>>> status:     FAIL
>>>> run time:   1 minute 33.076 seconds
>>>>
>>>>
>>>> Is it OK for those tests to fail? Attached is full test report
>>>>
>>>>
>>>> On 02.07.2020 17:46, Denis Magda wrote:
>>>>> Folks,
>>>>> Please share the summary of that Slack conversation here for records
>>> once
>>>>> you find common ground.
>>>>> -
>>>>> Denis
>>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
>>> wrote:
>>>>>> Igniters.
>>>>>>
>>>>>> All who are interested in integration testing framework discussion 
>>>>>> are
>>>>>> welcome into slack channel -
>>>>>>
>>> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2 
>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> 
>>>>>>> написал(а):
>>>>>>>
>>>>>>> Max,
>>>>>>> Thanks for joining us.
>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>>>> dependencies being deployed by external scripts.
>>>>>>> No. It is important to distinguish development, deploy, and
>>>>>> orchestration.
>>>>>>> All-in-one solutions have extremely limited usability.
>>>>>>> As to Ducktests:
>>>>>>> Docker is responsible for deployments during development.
>>>>>>> CI/CD is responsible for deployments during release and nightly
>>> checks.
>>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
>>>>>>> Ducktape is responsible for orchestration.
>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion,
>>>>>>>> while ducktape internally does all actions sequentially.
>>>>>>> No. Ducktape may start any service in parallel. See Pme-free 
>>>>>>> benchmark
>>>>>> [1] for details.
>>>>>>>
>>>>>>>> if we used ducktape solution we would have to instead prepare some
>>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
>>> with
>>>>>>>> Ansible or Chef.
>>>>>>> Sure, because a way of deploy depends on infrastructure.
>>>>>>> How can we be sure that OS we use and the restrictions we have 
>>>>>>> will be
>>>>>> compatible with Tiden?
>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>> dependencies
>>>>>>>> into one uber-image ...
>>>>>>> and
>>>>>>>> I guess we all know about docker hyped ability to run over
>>> distributed
>>>>>>>> virtual networks.
>>>>>>> It is very important not to confuse the test's development (docker
>>> image
>>>>>> you're talking about) and real deployment.
>>>>>>>
>>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
>>>>>>> All actions can be performed in parallel.
>>>>>>> See how Ducktests [2] starts cluster in parallel for example.
>>>>>>>
>>>>>>> [1]
>>>>>>
>>> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84 
>>>
>>>>>>> [2]
>>>>>>
>>> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79 
>>>
>>>>>>>
>>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
>>>>>> wrote:
>>>>>>> Hello, Maxim.
>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>> dependencies being deployed by external scripts
>>>>>>>
>>>>>>> Why do you think that maintaining deploy scripts coupled with the
>>>>>> testing framework is an advantage?
>>>>>>> I thought we want to see and maintain deployment scripts separate 
>>>>>>> from
>>>>>> the testing framework.
>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>>
>>>>>>> Can you, please, clarify, what actions do you have in mind?
>>>>>>> And why we want to execute them concurrently?
>>>>>>> Ignite node start, Client application execution can be done
>>> concurrently
>>>>>> with the ducktape approach.
>>>>>>>
>>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, 
>>>>>> with
>>>>>> Ansible or Chef
>>>>>>>
>>>>>>> We shouldn’t take some user approach as an argument in this
>>> discussion.
>>>>>> Let’s discuss a general approach for all users of the Ignite. Anyway,
>>> what
>>>>>> is wrong with the external deployment script approach?
>>>>>>>
>>>>>>> We, as a community, should provide several ways to run integration
>>> tests
>>>>>> out-of-the-box AND the ability to customize deployment regarding the
>>> user
>>>>>> landscape.
>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>>>>> dependencies into one uber-image and that looks like simple and 
>>>>>> elegant
>>>>>> solution however, that effectively limits you to single-host testing.
>>>>>>>
>>>>>>> Docker image should be used only by the Ignite developers to test
>>>>>> something locally.
>>>>>>> It’s not intended for some real-world testing.
>>>>>>>
>>>>>>> The main issue with the Tiden that I see, it tested and 
>>>>>>> maintained as
>>> a
>>>>>> closed source solution.
>>>>>>> This can lead to the hard to solve problems when we start using and
>>>>>> maintaining it as an open-source solution.
>>>>>>> Like, how many developers used Tiden? And how many of developers 
>>>>>>> were
>>>>>> not authors of the Tiden itself?
>>>>>>>
>>>>>>>
>>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
>>>>>> написал(а):
>>>>>>>>
>>>>>>>> Anton, Nikolay,
>>>>>>>>
>>>>>>>> Let's agree on what we are arguing about: whether it is about "like
>>> or
>>>>>> don't like" or about technical properties of suggested solutions.
>>>>>>>>
>>>>>>>> If it is about likes and dislikes, then the whole discussion is
>>>>>> meaningless. However, I hope together we can analyse pros and cons
>>>>>> carefully.
>>>>>>>>
>>>>>>>> As far as I can understand now, two main differences between 
>>>>>>>> ducktape
>>>>>> and tiden is that:
>>>>>>>>
>>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>> dependencies being deployed by external scripts.
>>>>>>>>
>>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>>>
>>>>>>>> As for me, these are very important properties for distributed
>>> testing
>>>>>> framework.
>>>>>>>>
>>>>>>>> First property let us easily reuse tiden in existing 
>>>>>>>> infrastructures,
>>>>>> for example, during Zookeeper IEP testing at Sberbank site we used 
>>>>>> the
>>> same
>>>>>> tiden scripts that we use in our lab, the only change was putting a
>>> list of
>>>>>> hosts into config.
>>>>>>>>
>>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, 
>>>>>> with
>>>>>> Ansible or Chef.
>>>>>>>>
>>>>>>>>
>>>>>>>> You have solved this deficiency with docker by putting all
>>>>>> dependencies into one uber-image and that looks like simple and 
>>>>>> elegant
>>>>>> solution,
>>>>>>>> however, that effectively limits you to single-host testing.
>>>>>>>>
>>>>>>>> I guess we all know about docker hyped ability to run over
>>> distributed
>>>>>> virtual networks. We used to go that way, but quickly found that 
>>>>>> it is
>>> more
>>>>>> of the hype than real work. In real environments, there are problems
>>> with
>>>>>> routing, DNS, multicast and broadcast traffic, and many others, that
>>> turn
>>>>>> docker-based distributed solution into a fragile hard-to-maintain
>>> monster.
>>>>>>>>
>>>>>>>> Please, if you believe otherwise, perform a run of your PoC over at
>>>>>> least two physical hosts and share results with us.
>>>>>>>>
>>>>>>>> If you consider that one physical docker host is enough, please,
>>> don't
>>>>>> overlook that we want to run real scale scenarios, with 50-100 cache
>>>>>> groups, persistence enabled and a millions of keys loaded.
>>>>>>>>
>>>>>>>> Practical limit for such configurations is 4-6 nodes per single
>>>>>> physical host. Otherwise, tests become flaky due to resource
>>> starvation.
>>>>>>>>
>>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
>>>>>> your PoC with other tests running at TC (we're targeting TeamCity,
>>> right?)
>>>>>> and share results so we could check if the numbers are reproducible.
>>>>>>>>
>>>>>>>> I stress this once more: functional integration tests are OK to run
>>> in
>>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
>>>>>>>>
>>>>>>>>
>>>>>>>> Second property let us write tests that require real-parallel 
>>>>>>>> actions
>>>>>> over hosts.
>>>>>>>>
>>>>>>>> For example, agreed scenario for PME benchmarkduring "PME
>>> optimization
>>>>>> stream" was as follows:
>>>>>>>>
>>>>>>>>   - 10 server nodes, preloaded with 1M of keys
>>>>>>>>   - 4 client nodes perform transactional load  (client nodes
>>> physically
>>>>>> separated from server nodes)
>>>>>>>>   - during load:
>>>>>>>>   -- 5 server nodes stopped in parallel
>>>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
>>>>>>>>   - load stopped, logs are analysed for exchange times.
>>>>>>>>
>>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
>>>>>> then partition map exchange merge would not happen and we could not
>>> have
>>>>>> measured PME optimizations for that case.
>>>>>>>>
>>>>>>>>
>>>>>>>> These are limitations of ducktape that we believe as a more 
>>>>>>>> important
>>>>>>>> argument "against" than you provide "for".
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
>>>>>>>>> Folks,
>>>>>>>>> First, I've created PR [1] with ducktests improvements
>>>>>>>>> PR contains the following changes
>>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
>>> 2.8)
>>>>>>>>> - Global refactoring
>>>>>>>>> -- benchmarks javacode simplification
>>>>>>>>> -- services python and java classes code deduplication
>>>>>>>>> -- fail-fast checks for java and python (eg. application should
>>>>>> explicitly write it finished with success)
>>>>>>>>> -- simple results extraction from tests and benchmarks
>>>>>>>>> -- javacode now configurable from tests/benchmarks
>>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
>>>>>> operation and log results)
>>>>>>>>> -- docker volume now marked as delegated to increase execution 
>>>>>>>>> speed
>>>>>> for mac & win users
>>>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
>>>>>>>>> -- Ignite can be configured at test/benchmark
>>>>>>>>> - full and module assembly scripts added
>>>>>>>> Great job done! But let me remind one of Apache Ignite principles:
>>>>>>>> week of thinking save months of development.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
>>>>>> integration) as a target "PoC check & real topology benchmarking 
>>>>>> tool".
>>>>>>>>> Ducktape pros
>>>>>>>>> - Developed for distributed system by distributed system 
>>>>>>>>> developers.
>>>>>>>> So does Tiden
>>>>>>>>
>>>>>>>>> - Developed since 2014, stable.
>>>>>>>> Tiden is also pretty stable, and development start date is not a 
>>>>>>>> good
>>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
>>>>>> distributed testing) is since 2010, but we don't see it as a
>>> alternative at
>>>>>> all.
>>>>>>>>
>>>>>>>>> - Proven usability by usage at Kafka.
>>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank 
>>>>>>>> deployments.
>>>>>>>> Core, storage, sql and tx teams use benchmark results provided by
>>>>>> Tiden on a daily basis.
>>>>>>>>
>>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great 
>>>>>>>>> example
>>>>>> pack.
>>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
>>>>>> previous letter.
>>>>>>>>
>>>>>>>>> - Built-in Docker support for rapid development and checks.
>>>>>>>> False, there's no specific 'docker support' in ducktape itself, you
>>>>>> just wrap it in docker by yourself, because ducktape is lacking
>>> deployment
>>>>>> abilities.
>>>>>>>>
>>>>>>>>> - Great for CI automation.
>>>>>>>> False, there's no specific CI-enabled features in ducktape. 
>>>>>>>> Tiden, on
>>>>>> the other hand, provide generic xUnit reporting format, which is
>>> supported
>>>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
>>> Tiden
>>>>>> can use SSH agent, which is also great for CI, because both
>>>>>>>> TeamCity and Jenkins store keys in secret storage available only 
>>>>>>>> for
>>>>>> ssh-agent and only for the time of the test.
>>>>>>>>
>>>>>>>>
>>>>>>>>>> As an additional motivation, at least 3 teams
>>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
>>> Zabbix
>>>>>> speed-up))
>>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
>>>>>> slow-down or broke AI features)
>>>>>>>>> - Ignite SE QA team (to append already developed 
>>>>>>>>> smoke/load/failover
>>>>>> tests to AI codebase)
>>>>>>>>
>>>>>>>> Please, before recommending your tests to other teams, provide 
>>>>>>>> proofs
>>>>>>>> that your tests are reproducible in real environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>> now, wait for ducktest merge to start checking cases they 
>>>>>>>>> working on
>>>>>> in AI way.
>>>>>>>>> Thoughts?
>>>>>>>> Let us together review both solutions, we'll try to run your 
>>>>>>>> tests in
>>>>>> our lab, and you'll try to at least checkout tiden and see if same
>>> tests
>>>>>> can be implemented with it?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> [1] https://github.com/apache/ignite/pull/7967
>>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
>>> nizhikov@apache.org
>>>>>> <ma...@apache.org>> wrote:
>>>>>>>>>     Hello, Maxim.
>>>>>>>>>     Thank you for so detailed explanation.
>>>>>>>>>     Can we put the content of this discussion somewhere on the 
>>>>>>>>> wiki?
>>>>>>>>>     So It doesn’t get lost.
>>>>>>>>>     I divide the answer in several parts. From the requirements to
>>> the
>>>>>>>>>     implementation.
>>>>>>>>>     So, if we agreed on the requirements we can proceed with the
>>>>>>>>>     discussion of the implementation.
>>>>>>>>>     1. Requirements:
>>>>>>>>>     The main goal I want to achieve is *reproducibility* of the
>>> tests.
>>>>>>>>>     I’m sick and tired with the zillions of flaky, rarely 
>>>>>>>>> failed, and
>>>>>>>>>     almost never failed tests in Ignite codebase.
>>>>>>>>>     We should start with the simplest scenarios that will be as
>>>>>> reliable
>>>>>>>>>     as steel :)
>>>>>>>>>     I want to know for sure:
>>>>>>>>>        - Is this PR makes rebalance quicker or not?
>>>>>>>>>        - Is this PR makes PME quicker or not?
>>>>>>>>>     So, your description of the complex test scenario looks as 
>>>>>>>>> a next
>>>>>>>>>     step to me.
>>>>>>>>>     Anyway, It’s cool we already have one.
>>>>>>>>>     The second goal is to have a strict test lifecycle as we 
>>>>>>>>> have in
>>>>>>>>>     JUnit and similar frameworks.
>>>>>>>>>      > It covers production-like deployment and running a 
>>>>>>>>> scenarios
>>>>>> over
>>>>>>>>>     a single database instance.
>>>>>>>>>     Do you mean «single cluster» or «single host»?
>>>>>>>>>     2. Existing tests:
>>>>>>>>>      > A Combinator suite allows to run set of operations
>>> concurrently
>>>>>>>>>     over given database instance.
>>>>>>>>>      > A Consumption suite allows to run a set production-like
>>> actions
>>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>>> metrics
>>>>>>>>>     across versions
>>>>>>>>>      > A Yardstick suite
>>>>>>>>>      > A Stress suite that simulates hardware environment 
>>>>>>>>> degradation
>>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>>> functional
>>>>>>>>>     regression testing
>>>>>>>>>      > Regression
>>>>>>>>>     Great news that we already have so many choices for testing!
>>>>>>>>>     Mature test base is a big +1 for Tiden.
>>>>>>>>>     3. Comparison:
>>>>>>>>>      > Criteria: Test configuration
>>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>>      > Tiden: any number of YaML config files, command line option
>>> for
>>>>>>>>>     fine-grained test configuration, ability to select/modify 
>>>>>>>>> tests
>>>>>>>>>     behavior based on Ignite version.
>>>>>>>>>     1. Many YAML files can be hard to maintain.
>>>>>>>>>     2. In ducktape, you can set parameters via «—parameters» 
>>>>>>>>> option.
>>>>>>>>>     Please, take a look at the doc [1]
>>>>>>>>>      > Criteria: Cluster control
>>>>>>>>>      > Tiden: additionally can address cluster as a whole and 
>>>>>>>>> execute
>>>>>>>>>     remote commands in parallel.
>>>>>>>>>     It seems we implement this ability in the PoC, already.
>>>>>>>>>      > Criteria: Test assertions
>>>>>>>>>      > Tiden: simple asserts, also few customized assertion 
>>>>>>>>> helpers.
>>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>>     Can you, please, be more specific.
>>>>>>>>>     What helpers do you have in mind?
>>>>>>>>>     Ducktape has an asserts that waits for logfile messages or 
>>>>>>>>> some
>>>>>>>>>     process finish.
>>>>>>>>>      > Criteria: Test reporting
>>>>>>>>>      > Ducktape: limited to its own text/HTML format
>>>>>>>>>     Ducktape have
>>>>>>>>>     1. Text reporter
>>>>>>>>>     2. Customizable HTML reporter
>>>>>>>>>     3. JSON reporter.
>>>>>>>>>     We can show JSON with the any template or tool.
>>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for 
>>>>>>>>> test
>>>>>>>>>     needs. However, that means, that test can’t be scaled without
>>> test
>>>>>>>>>     code changes. Does not do any deploy, relies on external 
>>>>>>>>> means,
>>>>>> e.g.
>>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>>     This is not true.
>>>>>>>>>     1. We can set explicit test parameters(node number) via
>>> parameters.
>>>>>>>>>     We can increase client count of cluster size without test code
>>>>>> changes.
>>>>>>>>>     2. We have many choices for the test environment. These 
>>>>>>>>> choices
>>> are
>>>>>>>>>     tested and used in other projects:
>>>>>>>>>              * docker
>>>>>>>>>              * vagrant
>>>>>>>>>              * private cloud(ssh access)
>>>>>>>>>              * ec2
>>>>>>>>>     Please, take a look at Kafka documentation [2]
>>>>>>>>>      > I can continue more on this, but it should be enough for 
>>>>>>>>> now:
>>>>>>>>>     We need to go deeper! :)
>>>>>>>>>     [1]
>>>>>>>>>
>>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>>>>>>>>     [2]
>>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev 
>>>>>>>>> <mshonich@yandex.ru
>>>>>>>>>     <ma...@yandex.ru>> написал(а):
>>>>>>>>>      >
>>>>>>>>>      > Greetings, Nikolay,
>>>>>>>>>      >
>>>>>>>>>      > First of all, thank you for you great effort preparing 
>>>>>>>>> PoC of
>>>>>>>>>     integration testing to Ignite community.
>>>>>>>>>      >
>>>>>>>>>      > It’s a shame Ignite did not have at least some such 
>>>>>>>>> tests yet,
>>>>>>>>>     however, GridGain, as a major contributor to Apache Ignite 
>>>>>>>>> had a
>>>>>>>>>     profound collection of in-house tools to perform 
>>>>>>>>> integration and
>>>>>>>>>     performance testing for years already and while we slowly
>>> consider
>>>>>>>>>     sharing our expertise with the community, your initiative 
>>>>>>>>> makes
>>> us
>>>>>>>>>     drive that process a bit faster, thanks a lot!
>>>>>>>>>      >
>>>>>>>>>      > I reviewed your PoC and want to share a little about 
>>>>>>>>> what we
>>> do
>>>>>>>>>     on our part, why and how, hope it would help community take
>>> proper
>>>>>>>>>     course.
>>>>>>>>>      >
>>>>>>>>>      > First I’ll do a brief overview of what decisions we made 
>>>>>>>>> and
>>>>>> what
>>>>>>>>>     we do have in our private code base, next I’ll describe 
>>>>>>>>> what we
>>>>>> have
>>>>>>>>>     already donated to the public and what we plan public next, 
>>>>>>>>> then
>>>>>>>>>     I’ll compare both approaches highlighting deficiencies in 
>>>>>>>>> order
>>> to
>>>>>>>>>     spur public discussion on the matter.
>>>>>>>>>      >
>>>>>>>>>      > It might seem strange to use Python to run Bash to run Java
>>>>>>>>>     applications because that introduces IT industry best of 
>>>>>>>>> breed’ –
>>>>>>>>>     the Python dependency hell – to the Java application code 
>>>>>>>>> base.
>>> The
>>>>>>>>>     only strangest decision one can made is to use Maven to run
>>> Docker
>>>>>>>>>     to run Bash to run Python to run Bash to run Java, but 
>>>>>>>>> desperate
>>>>>>>>>     times call for desperate measures I guess.
>>>>>>>>>      >
>>>>>>>>>      > There are Java-based solutions for integration testing 
>>>>>>>>> exists,
>>>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they 
>>>>>>>>> might go
>>>>>> well
>>>>>>>>>     for Ignite community CI pipelines by them selves. But we also
>>>>>> wanted
>>>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
>>>>>>>>>     benchmark, and this is solved by totally different set of 
>>>>>>>>> tools
>>> in
>>>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>>>>>>>>      >
>>>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
>>>>>> community
>>>>>>>>>     already has Yardstick [6], and there’s nothing wrong with 
>>>>>>>>> writing
>>>>>>>>>     PME benchmark using Yardstick, but we also wanted to be 
>>>>>>>>> able to
>>> run
>>>>>>>>>     scenarios like this:
>>>>>>>>>      > - put an X load to a Ignite database;
>>>>>>>>>      > - perform an Y set of operations to check how Ignite copes
>>> with
>>>>>>>>>     operations under load.
>>>>>>>>>      >
>>>>>>>>>      > And yes, we also wanted applications under test be deployed
>>>>>> ‘like
>>>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
>>> arises
>>>>>>>>>     questions about provisioning and nodes affinity which I’ll 
>>>>>>>>> cover
>>> in
>>>>>>>>>     detail later.
>>>>>>>>>      >
>>>>>>>>>      > So we decided to put a little effort to build a simple 
>>>>>>>>> tool to
>>>>>>>>>     cover different integration and performance scenarios, and 
>>>>>>>>> our QA
>>>>>>>>>     lab first attempt was PoC-Tester [7], currently open source 
>>>>>>>>> for
>>> all
>>>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
>>> Java-based
>>>>>>>>>     tool targeted to be run on a pre-release QA stage.
>>>>>>>>>      >
>>>>>>>>>      > It covers production-like deployment and running a 
>>>>>>>>> scenarios
>>>>>> over
>>>>>>>>>     a single database instance. PoC-Tester scenarios consists of a
>>>>>>>>>     sequence of tasks running sequentially or in parallel. 
>>>>>>>>> After all
>>>>>>>>>     tasks complete, or at any time during test, user can run logs
>>>>>>>>>     collection task, logs are checked against exceptions and a
>>> summary
>>>>>>>>>     of found issues and task ops/latency statistics is 
>>>>>>>>> generated at
>>> the
>>>>>>>>>     end of scenario. One of the main PoC-Tester features is its
>>>>>>>>>     fire-and-forget approach to task managing. That is, you can
>>> deploy
>>>>>> a
>>>>>>>>>     grid and left it running for weeks, periodically firing some
>>> tasks
>>>>>>>>>     onto it.
>>>>>>>>>      >
>>>>>>>>>      > During earliest stages of PoC-Tester development it becomes
>>>>>> quite
>>>>>>>>>     clear that Java application development is a tedious 
>>>>>>>>> process and
>>>>>>>>>     architecture decisions you take during development are slow 
>>>>>>>>> and
>>>>>> hard
>>>>>>>>>     to change.
>>>>>>>>>      > For example, scenarios like this
>>>>>>>>>      > - deploy two instances of GridGain with master-slave data
>>>>>>>>>     replication configured;
>>>>>>>>>      > - put a load on master;
>>>>>>>>>      > - perform checks on slave,
>>>>>>>>>      > or like this:
>>>>>>>>>      > - preload a 1Tb of data by using your favorite tool of 
>>>>>>>>> choice
>>> to
>>>>>>>>>     an Apache Ignite of version X;
>>>>>>>>>      > - run a set of functional tests running Apache Ignite 
>>>>>>>>> version
>>> Y
>>>>>>>>>     over preloaded data,
>>>>>>>>>      > do not fit well in the PoC-Tester workflow.
>>>>>>>>>      >
>>>>>>>>>      > So, this is why we decided to use Python as a generic
>>> scripting
>>>>>>>>>     language of choice.
>>>>>>>>>      >
>>>>>>>>>      > Pros:
>>>>>>>>>      > - quicker prototyping and development cycles
>>>>>>>>>      > - easier to find DevOps/QA engineer with Python skills than
>>> one
>>>>>>>>>     with Java skills
>>>>>>>>>      > - used extensively all over the world for DevOps/CI 
>>>>>>>>> pipelines
>>>>>> and
>>>>>>>>>     thus has rich set of libraries for all possible integration 
>>>>>>>>> uses
>>>>>> cases.
>>>>>>>>>      >
>>>>>>>>>      > Cons:
>>>>>>>>>      > - Nightmare with dependencies. Better stick to specific
>>>>>>>>>     language/libraries version.
>>>>>>>>>      >
>>>>>>>>>      > Comparing alternatives for Python-based testing 
>>>>>>>>> framework we
>>>>>> have
>>>>>>>>>     considered following requirements, somewhat similar to what
>>> you’ve
>>>>>>>>>     mentioned for Confluent [8] previously:
>>>>>>>>>      > - should be able run locally or distributed (bare metal 
>>>>>>>>> or in
>>>>>> the
>>>>>>>>>     cloud)
>>>>>>>>>      > - should have built-in deployment facilities for 
>>>>>>>>> applications
>>>>>>>>>     under test
>>>>>>>>>      > - should separate test configuration and test code
>>>>>>>>>      > -- be able to easily reconfigure tests by simple 
>>>>>>>>> configuration
>>>>>>>>>     changes
>>>>>>>>>      > -- be able to easily scale test environment by simple
>>>>>>>>>     configuration changes
>>>>>>>>>      > -- be able to perform regression testing by simple 
>>>>>>>>> switching
>>>>>>>>>     artifacts under test via configuration
>>>>>>>>>      > -- be able to run tests with different JDK version by 
>>>>>>>>> simple
>>>>>>>>>     configuration changes
>>>>>>>>>      > - should have human readable reports and/or reporting tools
>>>>>>>>>     integration
>>>>>>>>>      > - should allow simple test progress monitoring, one does 
>>>>>>>>> not
>>>>>> want
>>>>>>>>>     to run 6-hours test to find out that application actually 
>>>>>>>>> crashed
>>>>>>>>>     during first hour.
>>>>>>>>>      > - should allow parallel execution of test actions
>>>>>>>>>      > - should have clean API for test writers
>>>>>>>>>      > -- clean API for distributed remote commands execution
>>>>>>>>>      > -- clean API for deployed applications start / stop and 
>>>>>>>>> other
>>>>>>>>>     operations
>>>>>>>>>      > -- clean API for performing check on results
>>>>>>>>>      > - should be open source or at least source code should 
>>>>>>>>> allow
>>>>>> ease
>>>>>>>>>     change or extension
>>>>>>>>>      >
>>>>>>>>>      > Back at that time we found no better alternative than to 
>>>>>>>>> write
>>>>>>>>>     our own framework, and here goes Tiden [9] as GridGain 
>>>>>>>>> framework
>>> of
>>>>>>>>>     choice for functional integration and performance testing.
>>>>>>>>>      >
>>>>>>>>>      > Pros:
>>>>>>>>>      > - solves all the requirements above
>>>>>>>>>      > Cons (for Ignite):
>>>>>>>>>      > - (currently) closed GridGain source
>>>>>>>>>      >
>>>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
>>> which
>>>>>>>>>     you might have heard already.
>>>>>>>>>      >
>>>>>>>>>      > A Combinator suite allows to run set of operations
>>> concurrently
>>>>>>>>>     over given database instance. Proven to find at least 30+ race
>>>>>>>>>     conditions and NPE issues.
>>>>>>>>>      >
>>>>>>>>>      > A Consumption suite allows to run a set production-like
>>> actions
>>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>>> metrics
>>>>>>>>>     across versions, like heap/disk/CPU consumption, time to 
>>>>>>>>> perform
>>>>>>>>>     actions, like client PME, server PME, rebalancing time, data
>>>>>>>>>     replication time, etc.
>>>>>>>>>      >
>>>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to 
>>>>>>>>> run
>>>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself 
>>>>>>>>> has a
>>>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
>>>>>>>>>      >
>>>>>>>>>      > A Stress suite that simulates hardware environment 
>>>>>>>>> degradation
>>>>>>>>>     during testing.
>>>>>>>>>      >
>>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>>> functional
>>>>>>>>>     regression testing of GridGain Ultimate Edition features like
>>>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
>>>>>>>>>      >
>>>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
>>> IEP-15,
>>>>>>>>>     etc, etc, etc.
>>>>>>>>>      >
>>>>>>>>>      > Most of the suites above use another in-house developed 
>>>>>>>>> Java
>>>>>> tool
>>>>>>>>>     – PiClient – to perform actual loading and miscellaneous
>>> operations
>>>>>>>>>     with Ignite under test. We use py4j Python-Java gateway 
>>>>>>>>> library
>>> to
>>>>>>>>>     control PiClient instances from the tests.
>>>>>>>>>      >
>>>>>>>>>      > When we considered CI, we put TeamCity out of scope, 
>>>>>>>>> because
>>>>>>>>>     distributed integration and performance tests tend to run for
>>> hours
>>>>>>>>>     and TeamCity agents are scarce and costly resource. So, 
>>>>>>>>> bundled
>>>>>> with
>>>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
>>>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward 
>>>>>>>>> aggregates
>>>>>> test
>>>>>>>>>     run reports across versions and has built in visualization
>>> support
>>>>>>>>>     for Combinator suite.
>>>>>>>>>      >
>>>>>>>>>      > All of the above is currently closed source, but we plan to
>>> make
>>>>>>>>>     it public for community, and publishing Tiden core [9] is the
>>> first
>>>>>>>>>     step on that way. You can review some examples of using 
>>>>>>>>> Tiden for
>>>>>>>>>     tests at my repository [11], for start.
>>>>>>>>>      >
>>>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Language
>>>>>>>>>      > Tiden: Python, 3.7
>>>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>>>>>>>>     compatible, but actually can’t work with Python 3.7 due to 
>>>>>>>>> broken
>>>>>>>>>     Zmq dependency.
>>>>>>>>>      > Comment: Python 3.7 has a much better support for 
>>>>>>>>> async-style
>>>>>>>>>     code which might be crucial for distributed application 
>>>>>>>>> testing.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 0
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test writers API
>>>>>>>>>      > Supported integration test framework concepts are basically
>>> the
>>>>>> same:
>>>>>>>>>      > - a test controller (test runner)
>>>>>>>>>      > - a cluster
>>>>>>>>>      > - a node
>>>>>>>>>      > - an application (a service in Ducktape terms)
>>>>>>>>>      > - a test
>>>>>>>>>      > Score: Tiden: 5, Ducktape: 5
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Tests selection and run
>>>>>>>>>      > Ducktape: suite-package-class-method level selection, 
>>>>>>>>> internal
>>>>>>>>>     scheduler allows to run tests in suite in parallel.
>>>>>>>>>      > Tiden: also suite-package-class-method level selection,
>>>>>>>>>     additionally allows selecting subset of tests by attribute,
>>>>>> parallel
>>>>>>>>>     runs not built in, but allows merging test reports after
>>> different
>>>>>> runs.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 2
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test configuration
>>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>>      > Tiden: any number of YaML config files, command line option
>>> for
>>>>>>>>>     fine-grained test configuration, ability to select/modify 
>>>>>>>>> tests
>>>>>>>>>     behavior based on Ignite version.
>>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Cluster control
>>>>>>>>>      > Ducktape: allow execute remote commands by node granularity
>>>>>>>>>      > Tiden: additionally can address cluster as a whole and 
>>>>>>>>> execute
>>>>>>>>>     remote commands in parallel.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Logs control
>>>>>>>>>      > Both frameworks have similar builtin support for remote 
>>>>>>>>> logs
>>>>>>>>>     collection and grepping. Tiden has built-in plugin that can 
>>>>>>>>> zip,
>>>>>>>>>     collect arbitrary log files from arbitrary locations at
>>>>>>>>>     test/module/suite granularity and unzip if needed, also
>>> application
>>>>>>>>>     API to search / wait for messages in logs. Ducktape allows 
>>>>>>>>> each
>>>>>>>>>     service declare its log files location (seemingly does not
>>> support
>>>>>>>>>     logs rollback), and a single entrypoint to collect service 
>>>>>>>>> logs.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test assertions
>>>>>>>>>      > Tiden: simple asserts, also few customized assertion 
>>>>>>>>> helpers.
>>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Test reporting
>>>>>>>>>      > Ducktape: limited to its own text/html format
>>>>>>>>>      > Tiden: provides text report, yaml report for reporting 
>>>>>>>>> tools
>>>>>>>>>     integration, XML xUnit report for integration with
>>>>>> Jenkins/TeamCity.
>>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for 
>>>>>>>>> test
>>>>>>>>>     needs. However, that means, that test can’t be scaled without
>>> test
>>>>>>>>>     code changes. Does not do any deploy, relies on external 
>>>>>>>>> means,
>>>>>> e.g.
>>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
>>>>>> test.
>>>>>>>>>     Provisioning should be done by external means. However, 
>>>>>>>>> provides
>>> a
>>>>>>>>>     conventional automated deployment routines.
>>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > Criteria: Documentation and Extensibility
>>>>>>>>>      > Tiden: current API documentation is limited, should 
>>>>>>>>> change as
>>> we
>>>>>>>>>     go open source. Tiden is easily extensible via hooks and 
>>>>>>>>> plugins,
>>>>>>>>>     see example Maven plugin and Gatling application at [11].
>>>>>>>>>      > Ducktape: basic documentation at readthedocs.io
>>>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
>>>>>>>>>     tightly coupled and hard to change. The only possible 
>>>>>>>>> extension
>>>>>>>>>     mechanism is fork-and-rewrite.
>>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>>      >
>>>>>>>>>      > I can continue more on this, but it should be enough for 
>>>>>>>>> now:
>>>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
>>>>>>>>>      >
>>>>>>>>>      > Time for discussion!
>>>>>>>>>      >
>>>>>>>>>      > ---
>>>>>>>>>      > [1] - https://www.testcontainers.org/
>>>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
>>>>>>>>>      > [3] - https://jmeter.apache.org/index.html
>>>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>>>      > [5] - https://gatling.io/docs/current/
>>>>>>>>>      > [6] - https://github.com/gridgain/yardstick
>>>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
>>>>>>>>>      > [8] -
>>>>>>>>>
>>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements 
>>>
>>>>>>>>>      > [9] - https://github.com/gridgain/tiden
>>>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
>>>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
>>>>>>>>>      >
>>>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>>>>>>>>      >> Hello,
>>>>>>>>>      >>
>>>>>>>>>      >> Branch with duck tape created -
>>>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>>>      >>
>>>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
>>>>>>>>>      >>
>>>>>>>>>      >>
>>>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>>>>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
>>>>>> написал(а):
>>>>>>>>>      >>>
>>>>>>>>>      >>> Hello, Denis.
>>>>>>>>>      >>>
>>>>>>>>>      >>> There is no rush with these improvements.
>>>>>>>>>      >>> We can wait for Maxim proposal and compare two 
>>>>>>>>> solutions :)
>>>>>>>>>      >>>
>>>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>>>>>>>>     <ma...@apache.org>> написал(а):
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> Hi Nikolay,
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> Thanks for kicking off this conversation and sharing 
>>>>>>>>> your
>>>>>>>>>     findings with the
>>>>>>>>>      >>>> results. That's the right initiative. I do agree that
>>> Ignite
>>>>>>>>>     needs to have
>>>>>>>>>      >>>> an integration testing framework with capabilities 
>>>>>>>>> listed
>>> by
>>>>>> you.
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> As we discussed privately, I would only check if 
>>>>>>>>> instead of
>>>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
>>>>>>>>>     testing framework
>>>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
>>>>>> clusters.
>>>>>>>>>     That
>>>>>>>>>      >>>> framework has been battle-tested and might be more
>>>>>> convenient for
>>>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim 
>>>>>>>>> Shonichev
>>>>>>>>>      >>>> <mshonichev@gridgain.com 
>>>>>>>>> <ma...@gridgain.com>>
>>>>>> who
>>>>>>>>>     promised to join this thread once he finishes
>>>>>>>>>      >>>> preparing the usage examples of the framework. To my
>>>>>>>>>     knowledge, Max has
>>>>>>>>>      >>>> already been working on that for several days.
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> -
>>>>>>>>>      >>>> Denis
>>>>>>>>>      >>>>
>>>>>>>>>      >>>>
>>>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>>>>>>>>     <nizhikov@apache.org <ma...@apache.org>>
>>>>>>>>>      >>>> wrote:
>>>>>>>>>      >>>>
>>>>>>>>>      >>>>> Hello, Igniters.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> I created a PoC [1] for the integration tests of 
>>>>>>>>> Ignite.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated 
>>>>>>>>> testing
>>>>>> of
>>>>>>>>>     Ignite on
>>>>>>>>>      >>>>> «real cluster».
>>>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
>>>>>>>>>      >>>>>       * client and server nodes deployed on different
>>> hosts.
>>>>>>>>>      >>>>>       * thin clients perform queries from some other 
>>>>>>>>> hosts
>>>>>>>>>      >>>>>       * etc.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of
>>> some
>>>>>>>>>     internal
>>>>>>>>>      >>>>> Ignite process
>>>>>>>>>      >>>>>       * PME
>>>>>>>>>      >>>>>       * rebalance.
>>>>>>>>>      >>>>> This means we don’t know - Do we perform 
>>>>>>>>> rebalance(or PME)
>>>>>> in
>>>>>>>>>     2.7.0 faster
>>>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
>>> Ignite
>>>>>>>>>     integration in
>>>>>>>>>      >>>>> a real-world environment:
>>>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
>>>>>>>>>      >>>>> I think some ML solutions also should be tested in
>>>>>> real-world
>>>>>>>>>     deployments.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Solution:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> I propose to use duck tape library from confluent 
>>>>>>>>> (apache
>>>>>> 2.0
>>>>>>>>>     license)
>>>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) 
>>>>>>>>> and on
>>>>>> the
>>>>>>>>>     local
>>>>>>>>>      >>>>> environment(docker) and it works just fine.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> PoC contains following services:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>>       * Simple rebalance test:
>>>>>>>>>      >>>>>               Start 2 server nodes,
>>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>>      >>>>>               Start one more server node,
>>>>>>>>>      >>>>>               Wait for rebalance finish
>>>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
>>>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark 
>>>>>>>>> worker,
>>>>>>>>>      >>>>>               Start 1 Ignite server node
>>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>>      >>>>>               Check data in application that queries it
>>> from
>>>>>>>>>     Spark.
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> All tests are fully automated.
>>>>>>>>>      >>>>> Logs collection works just fine.
>>>>>>>>>      >>>>> You can see an example of the tests report - [4].
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> Pros:
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> * Ability to test local changes(no need to public 
>>>>>>>>> changes
>>> to
>>>>>>>>>     some remote
>>>>>>>>>      >>>>> repository or similar).
>>>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
>>> tests
>>>>>>>>>     on different
>>>>>>>>>      >>>>> JDK, JVM params, config, etc.)
>>>>>>>>>      >>>>> * Isolation by default so system tests are as 
>>>>>>>>> reliable as
>>>>>>>>>     possible.
>>>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
>>> easily
>>>>>>>>>     in clusters in
>>>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
>>> Vagrant,
>>>>>>>>>     K8s, Mesos,
>>>>>>>>>      >>>>> Docker, cloud providers, etc.)
>>>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
>>>>>>>>>      >>>>> * Adopted and successfully used by other distributed 
>>>>>>>>> open
>>>>>>>>>     source project -
>>>>>>>>>      >>>>> Apache Kafka.
>>>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
>>>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
>>> performance
>>>>>>>>>     results, etc.)
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> WDYT?
>>>>>>>>>      >>>>>
>>>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
>>>>>>>>>      >>>>> [3]
>>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> <2020-07-05--004.tar.gz>
>>>
>>>
>>>
>>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

> Have you had a chance to deploy ducktests in bare metal?
Working on servers obtaining.

On Thu, Jul 9, 2020 at 10:11 AM Max Shonichev <ms...@yandex.ru> wrote:

> Anton,
>
> well, strange thing, but clean up and rerun helped.
>
>
> Ubuntu 18.04
>
>
> ====================================================================================================
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--003
> run time:         4 minutes 44.835 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
>
> ====================================================================================================
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> status:     PASS
> run time:   41.927 seconds
> {"Rebalanced in (sec)": 1.02205491065979}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
> status:     PASS
> run time:   51.985 seconds
> {"Rebalanced in (sec)": 0.0760810375213623}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> status:     PASS
> run time:   1 minute 4.283 seconds
> {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst
> latency (ms)": "31035"}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
> status:     PASS
> run time:   1 minute 13.089 seconds
> {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst
> latency (ms)": "139"}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
> status:     PASS
> run time:   53.332 seconds
>
> ----------------------------------------------------------------------------------------------------
>
>
> MacBook
>
> ================================================================================
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--001
> run time:         6 minutes 58.612 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
>
> ================================================================================
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> status:     PASS
> run time:   48.724 seconds
> {"Rebalanced in (sec)": 3.2574470043182373}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
> status:     PASS
> run time:   1 minute 23.210 seconds
> {"Rebalanced in (sec)": 2.165921211242676}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> status:     PASS
> run time:   1 minute 12.659 seconds
> {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency
> (ms)": "31063"}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
> status:     PASS
> run time:   1 minute 57.257 seconds
> {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst
> latency (ms)": "1010"}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
> status:     PASS
> run time:   1 minute 36.317 seconds
>
> =============
>
> while relative numbers proportion remains the same for different Ignite
> versions, absolute number for mac/linux differ more then twice.
>
> I'm finalizing code with 'local Tiden' appliance for your tests.  PR
> would be ready soon.
>
> Have you had a chance to deploy ducktests in bare metal?
>
>
>
> On 06.07.2020 14:27, Anton Vinogradov wrote:
> > Max,
> >
> > Thanks for the check!
> >
> >> Is it OK for those tests to fail?
> > No.
> > I see really strange things at logs.
> > Looks like you have concurrent ducktests run started not expected
> services,
> > and this broke the tests.
> > Could you please clean up the docker (use clean-up script [1]).
> > Compile sources (use script [2]) and rerun the tests.
> >
> > [1]
> >
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> > [2]
> >
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
> >
> > On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org>
> wrote:
> >
> >> Hello, Maxim.
> >>
> >> Thanks for writing down the minutes.
> >>
> >> There is no such thing as «Nikolay team» on the dev-list.
> >> I propose to focus on product requirements and what we want to gain from
> >> the framework instead of taking into account the needs of some team.
> >>
> >> Can you, please, write down your version of requirements so we can
> reach a
> >> consensus on that and therefore move to the discussion of the
> >> implementation?
> >>
> >>> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru>
> написал(а):
> >>>
> >>> Yes, Denis,
> >>>
> >>> common ground seems to be as follows:
> >>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> >> over physical hosts and share benchmark results. In the meantime, while
> I
> >> strongly believe that dockerized approach to benchmarking is a road to
> >> misleading and false positives, I'll prepare a PoC of Tiden in
> dockerized
> >> environment to support 'fast development prototyping' usecase Nikolay
> team
> >> insist on. It should be a matter of few days.
> >>>
> >>> As a side note, I've run Anton PoC locally and would like to have some
> >> comments about results:
> >>>
> >>> Test system: Ubuntu 18.04, docker 19.03.6
> >>> Test commands:
> >>>
> >>>
> >>> git clone -b ignite-ducktape git@github.com:
> anton-vinogradov/ignite.git
> >>> cd ignite
> >>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> >> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> >>> cd modules/ducktests/tests/docker
> >>> ./run_tests.sh
> >>>
> >>> Test results:
> >>>
> >>
> ====================================================================================================
> >>> SESSION REPORT (ALL TESTS)
> >>> ducktape version: 0.7.7
> >>> session_id:       2020-07-05--004
> >>> run time:         7 minutes 36.360 seconds
> >>> tests run:        5
> >>> passed:           3
> >>> failed:           2
> >>> ignored:          0
> >>>
> >>
> ====================================================================================================
> >>> test_id:
> >>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> >>> status:     FAIL
> >>> run time:   3 minutes 12.232 seconds
> >>>
> >>
> ----------------------------------------------------------------------------------------------------
> >>> test_id:
> >>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> >>> status:     FAIL
> >>> run time:   1 minute 33.076 seconds
> >>>
> >>>
> >>> Is it OK for those tests to fail? Attached is full test report
> >>>
> >>>
> >>> On 02.07.2020 17:46, Denis Magda wrote:
> >>>> Folks,
> >>>> Please share the summary of that Slack conversation here for records
> >> once
> >>>> you find common ground.
> >>>> -
> >>>> Denis
> >>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
> >> wrote:
> >>>>> Igniters.
> >>>>>
> >>>>> All who are interested in integration testing framework discussion
> are
> >>>>> welcome into slack channel -
> >>>>>
> >>
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> >>>>>
> >>>>>
> >>>>>
> >>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org>
> написал(а):
> >>>>>>
> >>>>>> Max,
> >>>>>> Thanks for joining us.
> >>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>>> dependencies being deployed by external scripts.
> >>>>>> No. It is important to distinguish development, deploy, and
> >>>>> orchestration.
> >>>>>> All-in-one solutions have extremely limited usability.
> >>>>>> As to Ducktests:
> >>>>>> Docker is responsible for deployments during development.
> >>>>>> CI/CD is responsible for deployments during release and nightly
> >> checks.
> >>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> >>>>>> Ducktape is responsible for orchestration.
> >>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion,
> >>>>>>> while ducktape internally does all actions sequentially.
> >>>>>> No. Ducktape may start any service in parallel. See Pme-free
> benchmark
> >>>>> [1] for details.
> >>>>>>
> >>>>>>> if we used ducktape solution we would have to instead prepare some
> >>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >> with
> >>>>>>> Ansible or Chef.
> >>>>>> Sure, because a way of deploy depends on infrastructure.
> >>>>>> How can we be sure that OS we use and the restrictions we have will
> be
> >>>>> compatible with Tiden?
> >>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >> dependencies
> >>>>>>> into one uber-image ...
> >>>>>> and
> >>>>>>> I guess we all know about docker hyped ability to run over
> >> distributed
> >>>>>>> virtual networks.
> >>>>>> It is very important not to confuse the test's development (docker
> >> image
> >>>>> you're talking about) and real deployment.
> >>>>>>
> >>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> >>>>>> All actions can be performed in parallel.
> >>>>>> See how Ducktests [2] starts cluster in parallel for example.
> >>>>>>
> >>>>>> [1]
> >>>>>
> >>
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> >>>>>> [2]
> >>>>>
> >>
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >>>>>>
> >>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <nizhikov@apache.org
> >
> >>>>> wrote:
> >>>>>> Hello, Maxim.
> >>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>> dependencies being deployed by external scripts
> >>>>>>
> >>>>>> Why do you think that maintaining deploy scripts coupled with the
> >>>>> testing framework is an advantage?
> >>>>>> I thought we want to see and maintain deployment scripts separate
> from
> >>>>> the testing framework.
> >>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>
> >>>>>> Can you, please, clarify, what actions do you have in mind?
> >>>>>> And why we want to execute them concurrently?
> >>>>>> Ignite node start, Client application execution can be done
> >> concurrently
> >>>>> with the ducktape approach.
> >>>>>>
> >>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> >>>>> Ansible or Chef
> >>>>>>
> >>>>>> We shouldn’t take some user approach as an argument in this
> >> discussion.
> >>>>> Let’s discuss a general approach for all users of the Ignite. Anyway,
> >> what
> >>>>> is wrong with the external deployment script approach?
> >>>>>>
> >>>>>> We, as a community, should provide several ways to run integration
> >> tests
> >>>>> out-of-the-box AND the ability to customize deployment regarding the
> >> user
> >>>>> landscape.
> >>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >>>>> dependencies into one uber-image and that looks like simple and
> elegant
> >>>>> solution however, that effectively limits you to single-host testing.
> >>>>>>
> >>>>>> Docker image should be used only by the Ignite developers to test
> >>>>> something locally.
> >>>>>> It’s not intended for some real-world testing.
> >>>>>>
> >>>>>> The main issue with the Tiden that I see, it tested and maintained
> as
> >> a
> >>>>> closed source solution.
> >>>>>> This can lead to the hard to solve problems when we start using and
> >>>>> maintaining it as an open-source solution.
> >>>>>> Like, how many developers used Tiden? And how many of developers
> were
> >>>>> not authors of the Tiden itself?
> >>>>>>
> >>>>>>
> >>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
> >>>>> написал(а):
> >>>>>>>
> >>>>>>> Anton, Nikolay,
> >>>>>>>
> >>>>>>> Let's agree on what we are arguing about: whether it is about "like
> >> or
> >>>>> don't like" or about technical properties of suggested solutions.
> >>>>>>>
> >>>>>>> If it is about likes and dislikes, then the whole discussion is
> >>>>> meaningless. However, I hope together we can analyse pros and cons
> >>>>> carefully.
> >>>>>>>
> >>>>>>> As far as I can understand now, two main differences between
> ducktape
> >>>>> and tiden is that:
> >>>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>> dependencies being deployed by external scripts.
> >>>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>
> >>>>>>> As for me, these are very important properties for distributed
> >> testing
> >>>>> framework.
> >>>>>>>
> >>>>>>> First property let us easily reuse tiden in existing
> infrastructures,
> >>>>> for example, during Zookeeper IEP testing at Sberbank site we used
> the
> >> same
> >>>>> tiden scripts that we use in our lab, the only change was putting a
> >> list of
> >>>>> hosts into config.
> >>>>>>>
> >>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> >>>>> Ansible or Chef.
> >>>>>>>
> >>>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >>>>> dependencies into one uber-image and that looks like simple and
> elegant
> >>>>> solution,
> >>>>>>> however, that effectively limits you to single-host testing.
> >>>>>>>
> >>>>>>> I guess we all know about docker hyped ability to run over
> >> distributed
> >>>>> virtual networks. We used to go that way, but quickly found that it
> is
> >> more
> >>>>> of the hype than real work. In real environments, there are problems
> >> with
> >>>>> routing, DNS, multicast and broadcast traffic, and many others, that
> >> turn
> >>>>> docker-based distributed solution into a fragile hard-to-maintain
> >> monster.
> >>>>>>>
> >>>>>>> Please, if you believe otherwise, perform a run of your PoC over at
> >>>>> least two physical hosts and share results with us.
> >>>>>>>
> >>>>>>> If you consider that one physical docker host is enough, please,
> >> don't
> >>>>> overlook that we want to run real scale scenarios, with 50-100 cache
> >>>>> groups, persistence enabled and a millions of keys loaded.
> >>>>>>>
> >>>>>>> Practical limit for such configurations is 4-6 nodes per single
> >>>>> physical host. Otherwise, tests become flaky due to resource
> >> starvation.
> >>>>>>>
> >>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> >>>>> your PoC with other tests running at TC (we're targeting TeamCity,
> >> right?)
> >>>>> and share results so we could check if the numbers are reproducible.
> >>>>>>>
> >>>>>>> I stress this once more: functional integration tests are OK to run
> >> in
> >>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >>>>>>>
> >>>>>>>
> >>>>>>> Second property let us write tests that require real-parallel
> actions
> >>>>> over hosts.
> >>>>>>>
> >>>>>>> For example, agreed scenario for PME benchmarkduring "PME
> >> optimization
> >>>>> stream" was as follows:
> >>>>>>>
> >>>>>>>   - 10 server nodes, preloaded with 1M of keys
> >>>>>>>   - 4 client nodes perform transactional load  (client nodes
> >> physically
> >>>>> separated from server nodes)
> >>>>>>>   - during load:
> >>>>>>>   -- 5 server nodes stopped in parallel
> >>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
> >>>>>>>   - load stopped, logs are analysed for exchange times.
> >>>>>>>
> >>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
> >>>>> then partition map exchange merge would not happen and we could not
> >> have
> >>>>> measured PME optimizations for that case.
> >>>>>>>
> >>>>>>>
> >>>>>>> These are limitations of ducktape that we believe as a more
> important
> >>>>>>> argument "against" than you provide "for".
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> >>>>>>>> Folks,
> >>>>>>>> First, I've created PR [1] with ducktests improvements
> >>>>>>>> PR contains the following changes
> >>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> >> 2.8)
> >>>>>>>> - Global refactoring
> >>>>>>>> -- benchmarks javacode simplification
> >>>>>>>> -- services python and java classes code deduplication
> >>>>>>>> -- fail-fast checks for java and python (eg. application should
> >>>>> explicitly write it finished with success)
> >>>>>>>> -- simple results extraction from tests and benchmarks
> >>>>>>>> -- javacode now configurable from tests/benchmarks
> >>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> >>>>> operation and log results)
> >>>>>>>> -- docker volume now marked as delegated to increase execution
> speed
> >>>>> for mac & win users
> >>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
> >>>>>>>> -- Ignite can be configured at test/benchmark
> >>>>>>>> - full and module assembly scripts added
> >>>>>>> Great job done! But let me remind one of Apache Ignite principles:
> >>>>>>> week of thinking save months of development.
> >>>>>>>
> >>>>>>>
> >>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> >>>>> integration) as a target "PoC check & real topology benchmarking
> tool".
> >>>>>>>> Ducktape pros
> >>>>>>>> - Developed for distributed system by distributed system
> developers.
> >>>>>>> So does Tiden
> >>>>>>>
> >>>>>>>> - Developed since 2014, stable.
> >>>>>>> Tiden is also pretty stable, and development start date is not a
> good
> >>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> >>>>> distributed testing) is since 2010, but we don't see it as a
> >> alternative at
> >>>>> all.
> >>>>>>>
> >>>>>>>> - Proven usability by usage at Kafka.
> >>>>>>> Tiden is proven usable by usage at GridGain and Sberbank
> deployments.
> >>>>>>> Core, storage, sql and tx teams use benchmark results provided by
> >>>>> Tiden on a daily basis.
> >>>>>>>
> >>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great
> example
> >>>>> pack.
> >>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
> >>>>> previous letter.
> >>>>>>>
> >>>>>>>> - Built-in Docker support for rapid development and checks.
> >>>>>>> False, there's no specific 'docker support' in ducktape itself, you
> >>>>> just wrap it in docker by yourself, because ducktape is lacking
> >> deployment
> >>>>> abilities.
> >>>>>>>
> >>>>>>>> - Great for CI automation.
> >>>>>>> False, there's no specific CI-enabled features in ducktape. Tiden,
> on
> >>>>> the other hand, provide generic xUnit reporting format, which is
> >> supported
> >>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
> >> Tiden
> >>>>> can use SSH agent, which is also great for CI, because both
> >>>>>>> TeamCity and Jenkins store keys in secret storage available only
> for
> >>>>> ssh-agent and only for the time of the test.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> As an additional motivation, at least 3 teams
> >>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> >> Zabbix
> >>>>> speed-up))
> >>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
> >>>>> slow-down or broke AI features)
> >>>>>>>> - Ignite SE QA team (to append already developed
> smoke/load/failover
> >>>>> tests to AI codebase)
> >>>>>>>
> >>>>>>> Please, before recommending your tests to other teams, provide
> proofs
> >>>>>>> that your tests are reproducible in real environment.
> >>>>>>>
> >>>>>>>
> >>>>>>>> now, wait for ducktest merge to start checking cases they working
> on
> >>>>> in AI way.
> >>>>>>>> Thoughts?
> >>>>>>> Let us together review both solutions, we'll try to run your tests
> in
> >>>>> our lab, and you'll try to at least checkout tiden and see if same
> >> tests
> >>>>> can be implemented with it?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> [1] https://github.com/apache/ignite/pull/7967
> >>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> >> nizhikov@apache.org
> >>>>> <ma...@apache.org>> wrote:
> >>>>>>>>     Hello, Maxim.
> >>>>>>>>     Thank you for so detailed explanation.
> >>>>>>>>     Can we put the content of this discussion somewhere on the
> wiki?
> >>>>>>>>     So It doesn’t get lost.
> >>>>>>>>     I divide the answer in several parts. From the requirements to
> >> the
> >>>>>>>>     implementation.
> >>>>>>>>     So, if we agreed on the requirements we can proceed with the
> >>>>>>>>     discussion of the implementation.
> >>>>>>>>     1. Requirements:
> >>>>>>>>     The main goal I want to achieve is *reproducibility* of the
> >> tests.
> >>>>>>>>     I’m sick and tired with the zillions of flaky, rarely failed,
> and
> >>>>>>>>     almost never failed tests in Ignite codebase.
> >>>>>>>>     We should start with the simplest scenarios that will be as
> >>>>> reliable
> >>>>>>>>     as steel :)
> >>>>>>>>     I want to know for sure:
> >>>>>>>>        - Is this PR makes rebalance quicker or not?
> >>>>>>>>        - Is this PR makes PME quicker or not?
> >>>>>>>>     So, your description of the complex test scenario looks as a
> next
> >>>>>>>>     step to me.
> >>>>>>>>     Anyway, It’s cool we already have one.
> >>>>>>>>     The second goal is to have a strict test lifecycle as we have
> in
> >>>>>>>>     JUnit and similar frameworks.
> >>>>>>>>      > It covers production-like deployment and running a
> scenarios
> >>>>> over
> >>>>>>>>     a single database instance.
> >>>>>>>>     Do you mean «single cluster» or «single host»?
> >>>>>>>>     2. Existing tests:
> >>>>>>>>      > A Combinator suite allows to run set of operations
> >> concurrently
> >>>>>>>>     over given database instance.
> >>>>>>>>      > A Consumption suite allows to run a set production-like
> >> actions
> >>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >> metrics
> >>>>>>>>     across versions
> >>>>>>>>      > A Yardstick suite
> >>>>>>>>      > A Stress suite that simulates hardware environment
> degradation
> >>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>> functional
> >>>>>>>>     regression testing
> >>>>>>>>      > Regression
> >>>>>>>>     Great news that we already have so many choices for testing!
> >>>>>>>>     Mature test base is a big +1 for Tiden.
> >>>>>>>>     3. Comparison:
> >>>>>>>>      > Criteria: Test configuration
> >>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>      > Tiden: any number of YaML config files, command line option
> >> for
> >>>>>>>>     fine-grained test configuration, ability to select/modify
> tests
> >>>>>>>>     behavior based on Ignite version.
> >>>>>>>>     1. Many YAML files can be hard to maintain.
> >>>>>>>>     2. In ducktape, you can set parameters via «—parameters»
> option.
> >>>>>>>>     Please, take a look at the doc [1]
> >>>>>>>>      > Criteria: Cluster control
> >>>>>>>>      > Tiden: additionally can address cluster as a whole and
> execute
> >>>>>>>>     remote commands in parallel.
> >>>>>>>>     It seems we implement this ability in the PoC, already.
> >>>>>>>>      > Criteria: Test assertions
> >>>>>>>>      > Tiden: simple asserts, also few customized assertion
> helpers.
> >>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>     Can you, please, be more specific.
> >>>>>>>>     What helpers do you have in mind?
> >>>>>>>>     Ducktape has an asserts that waits for logfile messages or
> some
> >>>>>>>>     process finish.
> >>>>>>>>      > Criteria: Test reporting
> >>>>>>>>      > Ducktape: limited to its own text/HTML format
> >>>>>>>>     Ducktape have
> >>>>>>>>     1. Text reporter
> >>>>>>>>     2. Customizable HTML reporter
> >>>>>>>>     3. JSON reporter.
> >>>>>>>>     We can show JSON with the any template or tool.
> >>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> test
> >>>>>>>>     needs. However, that means, that test can’t be scaled without
> >> test
> >>>>>>>>     code changes. Does not do any deploy, relies on external
> means,
> >>>>> e.g.
> >>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>     This is not true.
> >>>>>>>>     1. We can set explicit test parameters(node number) via
> >> parameters.
> >>>>>>>>     We can increase client count of cluster size without test code
> >>>>> changes.
> >>>>>>>>     2. We have many choices for the test environment. These
> choices
> >> are
> >>>>>>>>     tested and used in other projects:
> >>>>>>>>              * docker
> >>>>>>>>              * vagrant
> >>>>>>>>              * private cloud(ssh access)
> >>>>>>>>              * ec2
> >>>>>>>>     Please, take a look at Kafka documentation [2]
> >>>>>>>>      > I can continue more on this, but it should be enough for
> now:
> >>>>>>>>     We need to go deeper! :)
> >>>>>>>>     [1]
> >>>>>>>>
> >>>>>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>>>>>>>     [2]
> >>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev <
> mshonich@yandex.ru
> >>>>>>>>     <ma...@yandex.ru>> написал(а):
> >>>>>>>>      >
> >>>>>>>>      > Greetings, Nikolay,
> >>>>>>>>      >
> >>>>>>>>      > First of all, thank you for you great effort preparing PoC
> of
> >>>>>>>>     integration testing to Ignite community.
> >>>>>>>>      >
> >>>>>>>>      > It’s a shame Ignite did not have at least some such tests
> yet,
> >>>>>>>>     however, GridGain, as a major contributor to Apache Ignite
> had a
> >>>>>>>>     profound collection of in-house tools to perform integration
> and
> >>>>>>>>     performance testing for years already and while we slowly
> >> consider
> >>>>>>>>     sharing our expertise with the community, your initiative
> makes
> >> us
> >>>>>>>>     drive that process a bit faster, thanks a lot!
> >>>>>>>>      >
> >>>>>>>>      > I reviewed your PoC and want to share a little about what
> we
> >> do
> >>>>>>>>     on our part, why and how, hope it would help community take
> >> proper
> >>>>>>>>     course.
> >>>>>>>>      >
> >>>>>>>>      > First I’ll do a brief overview of what decisions we made
> and
> >>>>> what
> >>>>>>>>     we do have in our private code base, next I’ll describe what
> we
> >>>>> have
> >>>>>>>>     already donated to the public and what we plan public next,
> then
> >>>>>>>>     I’ll compare both approaches highlighting deficiencies in
> order
> >> to
> >>>>>>>>     spur public discussion on the matter.
> >>>>>>>>      >
> >>>>>>>>      > It might seem strange to use Python to run Bash to run Java
> >>>>>>>>     applications because that introduces IT industry best of
> breed’ –
> >>>>>>>>     the Python dependency hell – to the Java application code
> base.
> >> The
> >>>>>>>>     only strangest decision one can made is to use Maven to run
> >> Docker
> >>>>>>>>     to run Bash to run Python to run Bash to run Java, but
> desperate
> >>>>>>>>     times call for desperate measures I guess.
> >>>>>>>>      >
> >>>>>>>>      > There are Java-based solutions for integration testing
> exists,
> >>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they might
> go
> >>>>> well
> >>>>>>>>     for Ignite community CI pipelines by them selves. But we also
> >>>>> wanted
> >>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
> >>>>>>>>     benchmark, and this is solved by totally different set of
> tools
> >> in
> >>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>>>>>>>      >
> >>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
> >>>>> community
> >>>>>>>>     already has Yardstick [6], and there’s nothing wrong with
> writing
> >>>>>>>>     PME benchmark using Yardstick, but we also wanted to be able
> to
> >> run
> >>>>>>>>     scenarios like this:
> >>>>>>>>      > - put an X load to a Ignite database;
> >>>>>>>>      > - perform an Y set of operations to check how Ignite copes
> >> with
> >>>>>>>>     operations under load.
> >>>>>>>>      >
> >>>>>>>>      > And yes, we also wanted applications under test be deployed
> >>>>> ‘like
> >>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
> >> arises
> >>>>>>>>     questions about provisioning and nodes affinity which I’ll
> cover
> >> in
> >>>>>>>>     detail later.
> >>>>>>>>      >
> >>>>>>>>      > So we decided to put a little effort to build a simple
> tool to
> >>>>>>>>     cover different integration and performance scenarios, and
> our QA
> >>>>>>>>     lab first attempt was PoC-Tester [7], currently open source
> for
> >> all
> >>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
> >> Java-based
> >>>>>>>>     tool targeted to be run on a pre-release QA stage.
> >>>>>>>>      >
> >>>>>>>>      > It covers production-like deployment and running a
> scenarios
> >>>>> over
> >>>>>>>>     a single database instance. PoC-Tester scenarios consists of a
> >>>>>>>>     sequence of tasks running sequentially or in parallel. After
> all
> >>>>>>>>     tasks complete, or at any time during test, user can run logs
> >>>>>>>>     collection task, logs are checked against exceptions and a
> >> summary
> >>>>>>>>     of found issues and task ops/latency statistics is generated
> at
> >> the
> >>>>>>>>     end of scenario. One of the main PoC-Tester features is its
> >>>>>>>>     fire-and-forget approach to task managing. That is, you can
> >> deploy
> >>>>> a
> >>>>>>>>     grid and left it running for weeks, periodically firing some
> >> tasks
> >>>>>>>>     onto it.
> >>>>>>>>      >
> >>>>>>>>      > During earliest stages of PoC-Tester development it becomes
> >>>>> quite
> >>>>>>>>     clear that Java application development is a tedious process
> and
> >>>>>>>>     architecture decisions you take during development are slow
> and
> >>>>> hard
> >>>>>>>>     to change.
> >>>>>>>>      > For example, scenarios like this
> >>>>>>>>      > - deploy two instances of GridGain with master-slave data
> >>>>>>>>     replication configured;
> >>>>>>>>      > - put a load on master;
> >>>>>>>>      > - perform checks on slave,
> >>>>>>>>      > or like this:
> >>>>>>>>      > - preload a 1Tb of data by using your favorite tool of
> choice
> >> to
> >>>>>>>>     an Apache Ignite of version X;
> >>>>>>>>      > - run a set of functional tests running Apache Ignite
> version
> >> Y
> >>>>>>>>     over preloaded data,
> >>>>>>>>      > do not fit well in the PoC-Tester workflow.
> >>>>>>>>      >
> >>>>>>>>      > So, this is why we decided to use Python as a generic
> >> scripting
> >>>>>>>>     language of choice.
> >>>>>>>>      >
> >>>>>>>>      > Pros:
> >>>>>>>>      > - quicker prototyping and development cycles
> >>>>>>>>      > - easier to find DevOps/QA engineer with Python skills than
> >> one
> >>>>>>>>     with Java skills
> >>>>>>>>      > - used extensively all over the world for DevOps/CI
> pipelines
> >>>>> and
> >>>>>>>>     thus has rich set of libraries for all possible integration
> uses
> >>>>> cases.
> >>>>>>>>      >
> >>>>>>>>      > Cons:
> >>>>>>>>      > - Nightmare with dependencies. Better stick to specific
> >>>>>>>>     language/libraries version.
> >>>>>>>>      >
> >>>>>>>>      > Comparing alternatives for Python-based testing framework
> we
> >>>>> have
> >>>>>>>>     considered following requirements, somewhat similar to what
> >> you’ve
> >>>>>>>>     mentioned for Confluent [8] previously:
> >>>>>>>>      > - should be able run locally or distributed (bare metal or
> in
> >>>>> the
> >>>>>>>>     cloud)
> >>>>>>>>      > - should have built-in deployment facilities for
> applications
> >>>>>>>>     under test
> >>>>>>>>      > - should separate test configuration and test code
> >>>>>>>>      > -- be able to easily reconfigure tests by simple
> configuration
> >>>>>>>>     changes
> >>>>>>>>      > -- be able to easily scale test environment by simple
> >>>>>>>>     configuration changes
> >>>>>>>>      > -- be able to perform regression testing by simple
> switching
> >>>>>>>>     artifacts under test via configuration
> >>>>>>>>      > -- be able to run tests with different JDK version by
> simple
> >>>>>>>>     configuration changes
> >>>>>>>>      > - should have human readable reports and/or reporting tools
> >>>>>>>>     integration
> >>>>>>>>      > - should allow simple test progress monitoring, one does
> not
> >>>>> want
> >>>>>>>>     to run 6-hours test to find out that application actually
> crashed
> >>>>>>>>     during first hour.
> >>>>>>>>      > - should allow parallel execution of test actions
> >>>>>>>>      > - should have clean API for test writers
> >>>>>>>>      > -- clean API for distributed remote commands execution
> >>>>>>>>      > -- clean API for deployed applications start / stop and
> other
> >>>>>>>>     operations
> >>>>>>>>      > -- clean API for performing check on results
> >>>>>>>>      > - should be open source or at least source code should
> allow
> >>>>> ease
> >>>>>>>>     change or extension
> >>>>>>>>      >
> >>>>>>>>      > Back at that time we found no better alternative than to
> write
> >>>>>>>>     our own framework, and here goes Tiden [9] as GridGain
> framework
> >> of
> >>>>>>>>     choice for functional integration and performance testing.
> >>>>>>>>      >
> >>>>>>>>      > Pros:
> >>>>>>>>      > - solves all the requirements above
> >>>>>>>>      > Cons (for Ignite):
> >>>>>>>>      > - (currently) closed GridGain source
> >>>>>>>>      >
> >>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
> >> which
> >>>>>>>>     you might have heard already.
> >>>>>>>>      >
> >>>>>>>>      > A Combinator suite allows to run set of operations
> >> concurrently
> >>>>>>>>     over given database instance. Proven to find at least 30+ race
> >>>>>>>>     conditions and NPE issues.
> >>>>>>>>      >
> >>>>>>>>      > A Consumption suite allows to run a set production-like
> >> actions
> >>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >> metrics
> >>>>>>>>     across versions, like heap/disk/CPU consumption, time to
> perform
> >>>>>>>>     actions, like client PME, server PME, rebalancing time, data
> >>>>>>>>     replication time, etc.
> >>>>>>>>      >
> >>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to
> run
> >>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself
> has a
> >>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
> >>>>>>>>      >
> >>>>>>>>      > A Stress suite that simulates hardware environment
> degradation
> >>>>>>>>     during testing.
> >>>>>>>>      >
> >>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>> functional
> >>>>>>>>     regression testing of GridGain Ultimate Edition features like
> >>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
> >>>>>>>>      >
> >>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
> >> IEP-15,
> >>>>>>>>     etc, etc, etc.
> >>>>>>>>      >
> >>>>>>>>      > Most of the suites above use another in-house developed
> Java
> >>>>> tool
> >>>>>>>>     – PiClient – to perform actual loading and miscellaneous
> >> operations
> >>>>>>>>     with Ignite under test. We use py4j Python-Java gateway
> library
> >> to
> >>>>>>>>     control PiClient instances from the tests.
> >>>>>>>>      >
> >>>>>>>>      > When we considered CI, we put TeamCity out of scope,
> because
> >>>>>>>>     distributed integration and performance tests tend to run for
> >> hours
> >>>>>>>>     and TeamCity agents are scarce and costly resource. So,
> bundled
> >>>>> with
> >>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
> >>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward
> aggregates
> >>>>> test
> >>>>>>>>     run reports across versions and has built in visualization
> >> support
> >>>>>>>>     for Combinator suite.
> >>>>>>>>      >
> >>>>>>>>      > All of the above is currently closed source, but we plan to
> >> make
> >>>>>>>>     it public for community, and publishing Tiden core [9] is the
> >> first
> >>>>>>>>     step on that way. You can review some examples of using Tiden
> for
> >>>>>>>>     tests at my repository [11], for start.
> >>>>>>>>      >
> >>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Language
> >>>>>>>>      > Tiden: Python, 3.7
> >>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>>>>>>>     compatible, but actually can’t work with Python 3.7 due to
> broken
> >>>>>>>>     Zmq dependency.
> >>>>>>>>      > Comment: Python 3.7 has a much better support for
> async-style
> >>>>>>>>     code which might be crucial for distributed application
> testing.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 0
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test writers API
> >>>>>>>>      > Supported integration test framework concepts are basically
> >> the
> >>>>> same:
> >>>>>>>>      > - a test controller (test runner)
> >>>>>>>>      > - a cluster
> >>>>>>>>      > - a node
> >>>>>>>>      > - an application (a service in Ducktape terms)
> >>>>>>>>      > - a test
> >>>>>>>>      > Score: Tiden: 5, Ducktape: 5
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Tests selection and run
> >>>>>>>>      > Ducktape: suite-package-class-method level selection,
> internal
> >>>>>>>>     scheduler allows to run tests in suite in parallel.
> >>>>>>>>      > Tiden: also suite-package-class-method level selection,
> >>>>>>>>     additionally allows selecting subset of tests by attribute,
> >>>>> parallel
> >>>>>>>>     runs not built in, but allows merging test reports after
> >> different
> >>>>> runs.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 2
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test configuration
> >>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>      > Tiden: any number of YaML config files, command line option
> >> for
> >>>>>>>>     fine-grained test configuration, ability to select/modify
> tests
> >>>>>>>>     behavior based on Ignite version.
> >>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Cluster control
> >>>>>>>>      > Ducktape: allow execute remote commands by node granularity
> >>>>>>>>      > Tiden: additionally can address cluster as a whole and
> execute
> >>>>>>>>     remote commands in parallel.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Logs control
> >>>>>>>>      > Both frameworks have similar builtin support for remote
> logs
> >>>>>>>>     collection and grepping. Tiden has built-in plugin that can
> zip,
> >>>>>>>>     collect arbitrary log files from arbitrary locations at
> >>>>>>>>     test/module/suite granularity and unzip if needed, also
> >> application
> >>>>>>>>     API to search / wait for messages in logs. Ducktape allows
> each
> >>>>>>>>     service declare its log files location (seemingly does not
> >> support
> >>>>>>>>     logs rollback), and a single entrypoint to collect service
> logs.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test assertions
> >>>>>>>>      > Tiden: simple asserts, also few customized assertion
> helpers.
> >>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test reporting
> >>>>>>>>      > Ducktape: limited to its own text/html format
> >>>>>>>>      > Tiden: provides text report, yaml report for reporting
> tools
> >>>>>>>>     integration, XML xUnit report for integration with
> >>>>> Jenkins/TeamCity.
> >>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> test
> >>>>>>>>     needs. However, that means, that test can’t be scaled without
> >> test
> >>>>>>>>     code changes. Does not do any deploy, relies on external
> means,
> >>>>> e.g.
> >>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
> >>>>> test.
> >>>>>>>>     Provisioning should be done by external means. However,
> provides
> >> a
> >>>>>>>>     conventional automated deployment routines.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Documentation and Extensibility
> >>>>>>>>      > Tiden: current API documentation is limited, should change
> as
> >> we
> >>>>>>>>     go open source. Tiden is easily extensible via hooks and
> plugins,
> >>>>>>>>     see example Maven plugin and Gatling application at [11].
> >>>>>>>>      > Ducktape: basic documentation at readthedocs.io
> >>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
> >>>>>>>>     tightly coupled and hard to change. The only possible
> extension
> >>>>>>>>     mechanism is fork-and-rewrite.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > I can continue more on this, but it should be enough for
> now:
> >>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
> >>>>>>>>      >
> >>>>>>>>      > Time for discussion!
> >>>>>>>>      >
> >>>>>>>>      > ---
> >>>>>>>>      > [1] - https://www.testcontainers.org/
> >>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
> >>>>>>>>      > [3] - https://jmeter.apache.org/index.html
> >>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>>>      > [5] - https://gatling.io/docs/current/
> >>>>>>>>      > [6] - https://github.com/gridgain/yardstick
> >>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
> >>>>>>>>      > [8] -
> >>>>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>>>>>>>      > [9] - https://github.com/gridgain/tiden
> >>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
> >>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
> >>>>>>>>      >
> >>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>>>>>>>      >> Hello,
> >>>>>>>>      >>
> >>>>>>>>      >> Branch with duck tape created -
> >>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>      >>
> >>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
> >>>>>>>>      >>
> >>>>>>>>      >>
> >>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>>>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
> >>>>> написал(а):
> >>>>>>>>      >>>
> >>>>>>>>      >>> Hello, Denis.
> >>>>>>>>      >>>
> >>>>>>>>      >>> There is no rush with these improvements.
> >>>>>>>>      >>> We can wait for Maxim proposal and compare two solutions
> :)
> >>>>>>>>      >>>
> >>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> >>>>>>>>     <ma...@apache.org>> написал(а):
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> Hi Nikolay,
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> Thanks for kicking off this conversation and sharing
> your
> >>>>>>>>     findings with the
> >>>>>>>>      >>>> results. That's the right initiative. I do agree that
> >> Ignite
> >>>>>>>>     needs to have
> >>>>>>>>      >>>> an integration testing framework with capabilities
> listed
> >> by
> >>>>> you.
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> As we discussed privately, I would only check if
> instead of
> >>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
> >>>>>>>>     testing framework
> >>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
> >>>>> clusters.
> >>>>>>>>     That
> >>>>>>>>      >>>> framework has been battle-tested and might be more
> >>>>> convenient for
> >>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim
> Shonichev
> >>>>>>>>      >>>> <mshonichev@gridgain.com <mailto:
> mshonichev@gridgain.com>>
> >>>>> who
> >>>>>>>>     promised to join this thread once he finishes
> >>>>>>>>      >>>> preparing the usage examples of the framework. To my
> >>>>>>>>     knowledge, Max has
> >>>>>>>>      >>>> already been working on that for several days.
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> -
> >>>>>>>>      >>>> Denis
> >>>>>>>>      >>>>
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>>>>>>>     <nizhikov@apache.org <ma...@apache.org>>
> >>>>>>>>      >>>> wrote:
> >>>>>>>>      >>>>
> >>>>>>>>      >>>>> Hello, Igniters.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> I created a PoC [1] for the integration tests of
> Ignite.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated
> testing
> >>>>> of
> >>>>>>>>     Ignite on
> >>>>>>>>      >>>>> «real cluster».
> >>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>>>>      >>>>>       * client and server nodes deployed on different
> >> hosts.
> >>>>>>>>      >>>>>       * thin clients perform queries from some other
> hosts
> >>>>>>>>      >>>>>       * etc.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of
> >> some
> >>>>>>>>     internal
> >>>>>>>>      >>>>> Ignite process
> >>>>>>>>      >>>>>       * PME
> >>>>>>>>      >>>>>       * rebalance.
> >>>>>>>>      >>>>> This means we don’t know - Do we perform rebalance(or
> PME)
> >>>>> in
> >>>>>>>>     2.7.0 faster
> >>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
> >> Ignite
> >>>>>>>>     integration in
> >>>>>>>>      >>>>> a real-world environment:
> >>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
> >>>>>>>>      >>>>> I think some ML solutions also should be tested in
> >>>>> real-world
> >>>>>>>>     deployments.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Solution:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> I propose to use duck tape library from confluent
> (apache
> >>>>> 2.0
> >>>>>>>>     license)
> >>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) and
> on
> >>>>> the
> >>>>>>>>     local
> >>>>>>>>      >>>>> environment(docker) and it works just fine.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> PoC contains following services:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>>       * Simple rebalance test:
> >>>>>>>>      >>>>>               Start 2 server nodes,
> >>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>      >>>>>               Start one more server node,
> >>>>>>>>      >>>>>               Wait for rebalance finish
> >>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
> >>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark
> worker,
> >>>>>>>>      >>>>>               Start 1 Ignite server node
> >>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>      >>>>>               Check data in application that queries it
> >> from
> >>>>>>>>     Spark.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> All tests are fully automated.
> >>>>>>>>      >>>>> Logs collection works just fine.
> >>>>>>>>      >>>>> You can see an example of the tests report - [4].
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Pros:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> * Ability to test local changes(no need to public
> changes
> >> to
> >>>>>>>>     some remote
> >>>>>>>>      >>>>> repository or similar).
> >>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
> >> tests
> >>>>>>>>     on different
> >>>>>>>>      >>>>> JDK, JVM params, config, etc.)
> >>>>>>>>      >>>>> * Isolation by default so system tests are as reliable
> as
> >>>>>>>>     possible.
> >>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
> >> easily
> >>>>>>>>     in clusters in
> >>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
> >> Vagrant,
> >>>>>>>>     K8s, Mesos,
> >>>>>>>>      >>>>> Docker, cloud providers, etc.)
> >>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
> >>>>>>>>      >>>>> * Adopted and successfully used by other distributed
> open
> >>>>>>>>     source project -
> >>>>>>>>      >>>>> Apache Kafka.
> >>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
> >>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
> >> performance
> >>>>>>>>     results, etc.)
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> WDYT?
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>>>>>      >>>>> [3]
> >>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> <2020-07-05--004.tar.gz>
> >>
> >>
> >>
> >
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Max Shonichev <ms...@yandex.ru>.

Anton,

well, strange thing, but clean up and rerun helped.


Ubuntu 18.04

====================================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-06--003
run time:         4 minutes 44.835 seconds
tests run:        5
passed:           5
failed:           0
ignored:          0
====================================================================================================
test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
status:     PASS
run time:   41.927 seconds
{"Rebalanced in (sec)": 1.02205491065979}
----------------------------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
status:     PASS
run time:   51.985 seconds
{"Rebalanced in (sec)": 0.0760810375213623}
----------------------------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
status:     PASS
run time:   1 minute 4.283 seconds
{"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst 
latency (ms)": "31035"}
----------------------------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
status:     PASS
run time:   1 minute 13.089 seconds
{"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst 
latency (ms)": "139"}
----------------------------------------------------------------------------------------------------
test_id: 
ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
status:     PASS
run time:   53.332 seconds
----------------------------------------------------------------------------------------------------


MacBook
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-06--001
run time:         6 minutes 58.612 seconds
tests run:        5
passed:           5
failed:           0
ignored:          0
================================================================================
test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
status:     PASS
run time:   48.724 seconds
{"Rebalanced in (sec)": 3.2574470043182373}
--------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
status:     PASS
run time:   1 minute 23.210 seconds
{"Rebalanced in (sec)": 2.165921211242676}
--------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
status:     PASS
run time:   1 minute 12.659 seconds
{"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency 
(ms)": "31063"}
--------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
status:     PASS
run time:   1 minute 57.257 seconds
{"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst 
latency (ms)": "1010"}
--------------------------------------------------------------------------------
test_id: 
ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
status:     PASS
run time:   1 minute 36.317 seconds

=============

while relative numbers proportion remains the same for different Ignite 
versions, absolute number for mac/linux differ more then twice.

I'm finalizing code with 'local Tiden' appliance for your tests.  PR 
would be ready soon.

Have you had a chance to deploy ducktests in bare metal?



On 06.07.2020 14:27, Anton Vinogradov wrote:
> Max,
> 
> Thanks for the check!
> 
>> Is it OK for those tests to fail?
> No.
> I see really strange things at logs.
> Looks like you have concurrent ducktests run started not expected services,
> and this broke the tests.
> Could you please clean up the docker (use clean-up script [1]).
> Compile sources (use script [2]) and rerun the tests.
> 
> [1]
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> [2]
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
> 
> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org> wrote:
> 
>> Hello, Maxim.
>>
>> Thanks for writing down the minutes.
>>
>> There is no such thing as «Nikolay team» on the dev-list.
>> I propose to focus on product requirements and what we want to gain from
>> the framework instead of taking into account the needs of some team.
>>
>> Can you, please, write down your version of requirements so we can reach a
>> consensus on that and therefore move to the discussion of the
>> implementation?
>>
>>> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru> написал(а):
>>>
>>> Yes, Denis,
>>>
>>> common ground seems to be as follows:
>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
>> over physical hosts and share benchmark results. In the meantime, while I
>> strongly believe that dockerized approach to benchmarking is a road to
>> misleading and false positives, I'll prepare a PoC of Tiden in dockerized
>> environment to support 'fast development prototyping' usecase Nikolay team
>> insist on. It should be a matter of few days.
>>>
>>> As a side note, I've run Anton PoC locally and would like to have some
>> comments about results:
>>>
>>> Test system: Ubuntu 18.04, docker 19.03.6
>>> Test commands:
>>>
>>>
>>> git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
>>> cd ignite
>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
>>> cd modules/ducktests/tests/docker
>>> ./run_tests.sh
>>>
>>> Test results:
>>>
>> ====================================================================================================
>>> SESSION REPORT (ALL TESTS)
>>> ducktape version: 0.7.7
>>> session_id:       2020-07-05--004
>>> run time:         7 minutes 36.360 seconds
>>> tests run:        5
>>> passed:           3
>>> failed:           2
>>> ignored:          0
>>>
>> ====================================================================================================
>>> test_id:
>> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>>> status:     FAIL
>>> run time:   3 minutes 12.232 seconds
>>>
>> ----------------------------------------------------------------------------------------------------
>>> test_id:
>> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>>> status:     FAIL
>>> run time:   1 minute 33.076 seconds
>>>
>>>
>>> Is it OK for those tests to fail? Attached is full test report
>>>
>>>
>>> On 02.07.2020 17:46, Denis Magda wrote:
>>>> Folks,
>>>> Please share the summary of that Slack conversation here for records
>> once
>>>> you find common ground.
>>>> -
>>>> Denis
>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
>> wrote:
>>>>> Igniters.
>>>>>
>>>>> All who are interested in integration testing framework discussion are
>>>>> welcome into slack channel -
>>>>>
>> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
>>>>>
>>>>>
>>>>>
>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
>>>>>>
>>>>>> Max,
>>>>>> Thanks for joining us.
>>>>>>
>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>>>> dependencies being deployed by external scripts.
>>>>>> No. It is important to distinguish development, deploy, and
>>>>> orchestration.
>>>>>> All-in-one solutions have extremely limited usability.
>>>>>> As to Ducktests:
>>>>>> Docker is responsible for deployments during development.
>>>>>> CI/CD is responsible for deployments during release and nightly
>> checks.
>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
>>>>>> Ducktape is responsible for orchestration.
>>>>>>
>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>> fashion,
>>>>>>> while ducktape internally does all actions sequentially.
>>>>>> No. Ducktape may start any service in parallel. See Pme-free benchmark
>>>>> [1] for details.
>>>>>>
>>>>>>> if we used ducktape solution we would have to instead prepare some
>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
>> with
>>>>>>> Ansible or Chef.
>>>>>> Sure, because a way of deploy depends on infrastructure.
>>>>>> How can we be sure that OS we use and the restrictions we have will be
>>>>> compatible with Tiden?
>>>>>>
>>>>>>> You have solved this deficiency with docker by putting all
>> dependencies
>>>>>>> into one uber-image ...
>>>>>> and
>>>>>>> I guess we all know about docker hyped ability to run over
>> distributed
>>>>>>> virtual networks.
>>>>>> It is very important not to confuse the test's development (docker
>> image
>>>>> you're talking about) and real deployment.
>>>>>>
>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
>>>>>> All actions can be performed in parallel.
>>>>>> See how Ducktests [2] starts cluster in parallel for example.
>>>>>>
>>>>>> [1]
>>>>>
>> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
>>>>>> [2]
>>>>>
>> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
>>>>>>
>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
>>>>> wrote:
>>>>>> Hello, Maxim.
>>>>>>
>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>> dependencies being deployed by external scripts
>>>>>>
>>>>>> Why do you think that maintaining deploy scripts coupled with the
>>>>> testing framework is an advantage?
>>>>>> I thought we want to see and maintain deployment scripts separate from
>>>>> the testing framework.
>>>>>>
>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>
>>>>>> Can you, please, clarify, what actions do you have in mind?
>>>>>> And why we want to execute them concurrently?
>>>>>> Ignite node start, Client application execution can be done
>> concurrently
>>>>> with the ducktape approach.
>>>>>>
>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>>>> Ansible or Chef
>>>>>>
>>>>>> We shouldn’t take some user approach as an argument in this
>> discussion.
>>>>> Let’s discuss a general approach for all users of the Ignite. Anyway,
>> what
>>>>> is wrong with the external deployment script approach?
>>>>>>
>>>>>> We, as a community, should provide several ways to run integration
>> tests
>>>>> out-of-the-box AND the ability to customize deployment regarding the
>> user
>>>>> landscape.
>>>>>>
>>>>>>> You have solved this deficiency with docker by putting all
>>>>> dependencies into one uber-image and that looks like simple and elegant
>>>>> solution however, that effectively limits you to single-host testing.
>>>>>>
>>>>>> Docker image should be used only by the Ignite developers to test
>>>>> something locally.
>>>>>> It’s not intended for some real-world testing.
>>>>>>
>>>>>> The main issue with the Tiden that I see, it tested and maintained as
>> a
>>>>> closed source solution.
>>>>>> This can lead to the hard to solve problems when we start using and
>>>>> maintaining it as an open-source solution.
>>>>>> Like, how many developers used Tiden? And how many of developers were
>>>>> not authors of the Tiden itself?
>>>>>>
>>>>>>
>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
>>>>> написал(а):
>>>>>>>
>>>>>>> Anton, Nikolay,
>>>>>>>
>>>>>>> Let's agree on what we are arguing about: whether it is about "like
>> or
>>>>> don't like" or about technical properties of suggested solutions.
>>>>>>>
>>>>>>> If it is about likes and dislikes, then the whole discussion is
>>>>> meaningless. However, I hope together we can analyse pros and cons
>>>>> carefully.
>>>>>>>
>>>>>>> As far as I can understand now, two main differences between ducktape
>>>>> and tiden is that:
>>>>>>>
>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>> dependencies being deployed by external scripts.
>>>>>>>
>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>>>> fashion, while ducktape internally does all actions sequentially.
>>>>>>>
>>>>>>> As for me, these are very important properties for distributed
>> testing
>>>>> framework.
>>>>>>>
>>>>>>> First property let us easily reuse tiden in existing infrastructures,
>>>>> for example, during Zookeeper IEP testing at Sberbank site we used the
>> same
>>>>> tiden scripts that we use in our lab, the only change was putting a
>> list of
>>>>> hosts into config.
>>>>>>>
>>>>>>> If we used ducktape solution we would have to instead prepare some
>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>>>> Ansible or Chef.
>>>>>>>
>>>>>>>
>>>>>>> You have solved this deficiency with docker by putting all
>>>>> dependencies into one uber-image and that looks like simple and elegant
>>>>> solution,
>>>>>>> however, that effectively limits you to single-host testing.
>>>>>>>
>>>>>>> I guess we all know about docker hyped ability to run over
>> distributed
>>>>> virtual networks. We used to go that way, but quickly found that it is
>> more
>>>>> of the hype than real work. In real environments, there are problems
>> with
>>>>> routing, DNS, multicast and broadcast traffic, and many others, that
>> turn
>>>>> docker-based distributed solution into a fragile hard-to-maintain
>> monster.
>>>>>>>
>>>>>>> Please, if you believe otherwise, perform a run of your PoC over at
>>>>> least two physical hosts and share results with us.
>>>>>>>
>>>>>>> If you consider that one physical docker host is enough, please,
>> don't
>>>>> overlook that we want to run real scale scenarios, with 50-100 cache
>>>>> groups, persistence enabled and a millions of keys loaded.
>>>>>>>
>>>>>>> Practical limit for such configurations is 4-6 nodes per single
>>>>> physical host. Otherwise, tests become flaky due to resource
>> starvation.
>>>>>>>
>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
>>>>> your PoC with other tests running at TC (we're targeting TeamCity,
>> right?)
>>>>> and share results so we could check if the numbers are reproducible.
>>>>>>>
>>>>>>> I stress this once more: functional integration tests are OK to run
>> in
>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
>>>>>>>
>>>>>>>
>>>>>>> Second property let us write tests that require real-parallel actions
>>>>> over hosts.
>>>>>>>
>>>>>>> For example, agreed scenario for PME benchmarkduring "PME
>> optimization
>>>>> stream" was as follows:
>>>>>>>
>>>>>>>   - 10 server nodes, preloaded with 1M of keys
>>>>>>>   - 4 client nodes perform transactional load  (client nodes
>> physically
>>>>> separated from server nodes)
>>>>>>>   - during load:
>>>>>>>   -- 5 server nodes stopped in parallel
>>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
>>>>>>>   - load stopped, logs are analysed for exchange times.
>>>>>>>
>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
>>>>> then partition map exchange merge would not happen and we could not
>> have
>>>>> measured PME optimizations for that case.
>>>>>>>
>>>>>>>
>>>>>>> These are limitations of ducktape that we believe as a more important
>>>>>>> argument "against" than you provide "for".
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
>>>>>>>> Folks,
>>>>>>>> First, I've created PR [1] with ducktests improvements
>>>>>>>> PR contains the following changes
>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
>> 2.8)
>>>>>>>> - Global refactoring
>>>>>>>> -- benchmarks javacode simplification
>>>>>>>> -- services python and java classes code deduplication
>>>>>>>> -- fail-fast checks for java and python (eg. application should
>>>>> explicitly write it finished with success)
>>>>>>>> -- simple results extraction from tests and benchmarks
>>>>>>>> -- javacode now configurable from tests/benchmarks
>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
>>>>> operation and log results)
>>>>>>>> -- docker volume now marked as delegated to increase execution speed
>>>>> for mac & win users
>>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
>>>>>>>> -- Ignite can be configured at test/benchmark
>>>>>>>> - full and module assembly scripts added
>>>>>>> Great job done! But let me remind one of Apache Ignite principles:
>>>>>>> week of thinking save months of development.
>>>>>>>
>>>>>>>
>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
>>>>> integration) as a target "PoC check & real topology benchmarking tool".
>>>>>>>> Ducktape pros
>>>>>>>> - Developed for distributed system by distributed system developers.
>>>>>>> So does Tiden
>>>>>>>
>>>>>>>> - Developed since 2014, stable.
>>>>>>> Tiden is also pretty stable, and development start date is not a good
>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
>>>>> distributed testing) is since 2010, but we don't see it as a
>> alternative at
>>>>> all.
>>>>>>>
>>>>>>>> - Proven usability by usage at Kafka.
>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank deployments.
>>>>>>> Core, storage, sql and tx teams use benchmark results provided by
>>>>> Tiden on a daily basis.
>>>>>>>
>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great example
>>>>> pack.
>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
>>>>> previous letter.
>>>>>>>
>>>>>>>> - Built-in Docker support for rapid development and checks.
>>>>>>> False, there's no specific 'docker support' in ducktape itself, you
>>>>> just wrap it in docker by yourself, because ducktape is lacking
>> deployment
>>>>> abilities.
>>>>>>>
>>>>>>>> - Great for CI automation.
>>>>>>> False, there's no specific CI-enabled features in ducktape. Tiden, on
>>>>> the other hand, provide generic xUnit reporting format, which is
>> supported
>>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
>> Tiden
>>>>> can use SSH agent, which is also great for CI, because both
>>>>>>> TeamCity and Jenkins store keys in secret storage available only for
>>>>> ssh-agent and only for the time of the test.
>>>>>>>
>>>>>>>
>>>>>>>>> As an additional motivation, at least 3 teams
>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
>> Zabbix
>>>>> speed-up))
>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
>>>>> slow-down or broke AI features)
>>>>>>>> - Ignite SE QA team (to append already developed smoke/load/failover
>>>>> tests to AI codebase)
>>>>>>>
>>>>>>> Please, before recommending your tests to other teams, provide proofs
>>>>>>> that your tests are reproducible in real environment.
>>>>>>>
>>>>>>>
>>>>>>>> now, wait for ducktest merge to start checking cases they working on
>>>>> in AI way.
>>>>>>>> Thoughts?
>>>>>>> Let us together review both solutions, we'll try to run your tests in
>>>>> our lab, and you'll try to at least checkout tiden and see if same
>> tests
>>>>> can be implemented with it?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> [1] https://github.com/apache/ignite/pull/7967
>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
>> nizhikov@apache.org
>>>>> <ma...@apache.org>> wrote:
>>>>>>>>     Hello, Maxim.
>>>>>>>>     Thank you for so detailed explanation.
>>>>>>>>     Can we put the content of this discussion somewhere on the wiki?
>>>>>>>>     So It doesn’t get lost.
>>>>>>>>     I divide the answer in several parts. From the requirements to
>> the
>>>>>>>>     implementation.
>>>>>>>>     So, if we agreed on the requirements we can proceed with the
>>>>>>>>     discussion of the implementation.
>>>>>>>>     1. Requirements:
>>>>>>>>     The main goal I want to achieve is *reproducibility* of the
>> tests.
>>>>>>>>     I’m sick and tired with the zillions of flaky, rarely failed, and
>>>>>>>>     almost never failed tests in Ignite codebase.
>>>>>>>>     We should start with the simplest scenarios that will be as
>>>>> reliable
>>>>>>>>     as steel :)
>>>>>>>>     I want to know for sure:
>>>>>>>>        - Is this PR makes rebalance quicker or not?
>>>>>>>>        - Is this PR makes PME quicker or not?
>>>>>>>>     So, your description of the complex test scenario looks as a next
>>>>>>>>     step to me.
>>>>>>>>     Anyway, It’s cool we already have one.
>>>>>>>>     The second goal is to have a strict test lifecycle as we have in
>>>>>>>>     JUnit and similar frameworks.
>>>>>>>>      > It covers production-like deployment and running a scenarios
>>>>> over
>>>>>>>>     a single database instance.
>>>>>>>>     Do you mean «single cluster» or «single host»?
>>>>>>>>     2. Existing tests:
>>>>>>>>      > A Combinator suite allows to run set of operations
>> concurrently
>>>>>>>>     over given database instance.
>>>>>>>>      > A Consumption suite allows to run a set production-like
>> actions
>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>> metrics
>>>>>>>>     across versions
>>>>>>>>      > A Yardstick suite
>>>>>>>>      > A Stress suite that simulates hardware environment degradation
>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>> functional
>>>>>>>>     regression testing
>>>>>>>>      > Regression
>>>>>>>>     Great news that we already have so many choices for testing!
>>>>>>>>     Mature test base is a big +1 for Tiden.
>>>>>>>>     3. Comparison:
>>>>>>>>      > Criteria: Test configuration
>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>      > Tiden: any number of YaML config files, command line option
>> for
>>>>>>>>     fine-grained test configuration, ability to select/modify tests
>>>>>>>>     behavior based on Ignite version.
>>>>>>>>     1. Many YAML files can be hard to maintain.
>>>>>>>>     2. In ducktape, you can set parameters via «—parameters» option.
>>>>>>>>     Please, take a look at the doc [1]
>>>>>>>>      > Criteria: Cluster control
>>>>>>>>      > Tiden: additionally can address cluster as a whole and execute
>>>>>>>>     remote commands in parallel.
>>>>>>>>     It seems we implement this ability in the PoC, already.
>>>>>>>>      > Criteria: Test assertions
>>>>>>>>      > Tiden: simple asserts, also few customized assertion helpers.
>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>     Can you, please, be more specific.
>>>>>>>>     What helpers do you have in mind?
>>>>>>>>     Ducktape has an asserts that waits for logfile messages or some
>>>>>>>>     process finish.
>>>>>>>>      > Criteria: Test reporting
>>>>>>>>      > Ducktape: limited to its own text/HTML format
>>>>>>>>     Ducktape have
>>>>>>>>     1. Text reporter
>>>>>>>>     2. Customizable HTML reporter
>>>>>>>>     3. JSON reporter.
>>>>>>>>     We can show JSON with the any template or tool.
>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for test
>>>>>>>>     needs. However, that means, that test can’t be scaled without
>> test
>>>>>>>>     code changes. Does not do any deploy, relies on external means,
>>>>> e.g.
>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>     This is not true.
>>>>>>>>     1. We can set explicit test parameters(node number) via
>> parameters.
>>>>>>>>     We can increase client count of cluster size without test code
>>>>> changes.
>>>>>>>>     2. We have many choices for the test environment. These choices
>> are
>>>>>>>>     tested and used in other projects:
>>>>>>>>              * docker
>>>>>>>>              * vagrant
>>>>>>>>              * private cloud(ssh access)
>>>>>>>>              * ec2
>>>>>>>>     Please, take a look at Kafka documentation [2]
>>>>>>>>      > I can continue more on this, but it should be enough for now:
>>>>>>>>     We need to go deeper! :)
>>>>>>>>     [1]
>>>>>>>>
>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>>>>>>>     [2]
>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
>>>>>>>>     <ma...@yandex.ru>> написал(а):
>>>>>>>>      >
>>>>>>>>      > Greetings, Nikolay,
>>>>>>>>      >
>>>>>>>>      > First of all, thank you for you great effort preparing PoC of
>>>>>>>>     integration testing to Ignite community.
>>>>>>>>      >
>>>>>>>>      > It’s a shame Ignite did not have at least some such tests yet,
>>>>>>>>     however, GridGain, as a major contributor to Apache Ignite had a
>>>>>>>>     profound collection of in-house tools to perform integration and
>>>>>>>>     performance testing for years already and while we slowly
>> consider
>>>>>>>>     sharing our expertise with the community, your initiative makes
>> us
>>>>>>>>     drive that process a bit faster, thanks a lot!
>>>>>>>>      >
>>>>>>>>      > I reviewed your PoC and want to share a little about what we
>> do
>>>>>>>>     on our part, why and how, hope it would help community take
>> proper
>>>>>>>>     course.
>>>>>>>>      >
>>>>>>>>      > First I’ll do a brief overview of what decisions we made and
>>>>> what
>>>>>>>>     we do have in our private code base, next I’ll describe what we
>>>>> have
>>>>>>>>     already donated to the public and what we plan public next, then
>>>>>>>>     I’ll compare both approaches highlighting deficiencies in order
>> to
>>>>>>>>     spur public discussion on the matter.
>>>>>>>>      >
>>>>>>>>      > It might seem strange to use Python to run Bash to run Java
>>>>>>>>     applications because that introduces IT industry best of breed’ –
>>>>>>>>     the Python dependency hell – to the Java application code base.
>> The
>>>>>>>>     only strangest decision one can made is to use Maven to run
>> Docker
>>>>>>>>     to run Bash to run Python to run Bash to run Java, but desperate
>>>>>>>>     times call for desperate measures I guess.
>>>>>>>>      >
>>>>>>>>      > There are Java-based solutions for integration testing exists,
>>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they might go
>>>>> well
>>>>>>>>     for Ignite community CI pipelines by them selves. But we also
>>>>> wanted
>>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
>>>>>>>>     benchmark, and this is solved by totally different set of tools
>> in
>>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>>>>>>>      >
>>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
>>>>> community
>>>>>>>>     already has Yardstick [6], and there’s nothing wrong with writing
>>>>>>>>     PME benchmark using Yardstick, but we also wanted to be able to
>> run
>>>>>>>>     scenarios like this:
>>>>>>>>      > - put an X load to a Ignite database;
>>>>>>>>      > - perform an Y set of operations to check how Ignite copes
>> with
>>>>>>>>     operations under load.
>>>>>>>>      >
>>>>>>>>      > And yes, we also wanted applications under test be deployed
>>>>> ‘like
>>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
>> arises
>>>>>>>>     questions about provisioning and nodes affinity which I’ll cover
>> in
>>>>>>>>     detail later.
>>>>>>>>      >
>>>>>>>>      > So we decided to put a little effort to build a simple tool to
>>>>>>>>     cover different integration and performance scenarios, and our QA
>>>>>>>>     lab first attempt was PoC-Tester [7], currently open source for
>> all
>>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
>> Java-based
>>>>>>>>     tool targeted to be run on a pre-release QA stage.
>>>>>>>>      >
>>>>>>>>      > It covers production-like deployment and running a scenarios
>>>>> over
>>>>>>>>     a single database instance. PoC-Tester scenarios consists of a
>>>>>>>>     sequence of tasks running sequentially or in parallel. After all
>>>>>>>>     tasks complete, or at any time during test, user can run logs
>>>>>>>>     collection task, logs are checked against exceptions and a
>> summary
>>>>>>>>     of found issues and task ops/latency statistics is generated at
>> the
>>>>>>>>     end of scenario. One of the main PoC-Tester features is its
>>>>>>>>     fire-and-forget approach to task managing. That is, you can
>> deploy
>>>>> a
>>>>>>>>     grid and left it running for weeks, periodically firing some
>> tasks
>>>>>>>>     onto it.
>>>>>>>>      >
>>>>>>>>      > During earliest stages of PoC-Tester development it becomes
>>>>> quite
>>>>>>>>     clear that Java application development is a tedious process and
>>>>>>>>     architecture decisions you take during development are slow and
>>>>> hard
>>>>>>>>     to change.
>>>>>>>>      > For example, scenarios like this
>>>>>>>>      > - deploy two instances of GridGain with master-slave data
>>>>>>>>     replication configured;
>>>>>>>>      > - put a load on master;
>>>>>>>>      > - perform checks on slave,
>>>>>>>>      > or like this:
>>>>>>>>      > - preload a 1Tb of data by using your favorite tool of choice
>> to
>>>>>>>>     an Apache Ignite of version X;
>>>>>>>>      > - run a set of functional tests running Apache Ignite version
>> Y
>>>>>>>>     over preloaded data,
>>>>>>>>      > do not fit well in the PoC-Tester workflow.
>>>>>>>>      >
>>>>>>>>      > So, this is why we decided to use Python as a generic
>> scripting
>>>>>>>>     language of choice.
>>>>>>>>      >
>>>>>>>>      > Pros:
>>>>>>>>      > - quicker prototyping and development cycles
>>>>>>>>      > - easier to find DevOps/QA engineer with Python skills than
>> one
>>>>>>>>     with Java skills
>>>>>>>>      > - used extensively all over the world for DevOps/CI pipelines
>>>>> and
>>>>>>>>     thus has rich set of libraries for all possible integration uses
>>>>> cases.
>>>>>>>>      >
>>>>>>>>      > Cons:
>>>>>>>>      > - Nightmare with dependencies. Better stick to specific
>>>>>>>>     language/libraries version.
>>>>>>>>      >
>>>>>>>>      > Comparing alternatives for Python-based testing framework we
>>>>> have
>>>>>>>>     considered following requirements, somewhat similar to what
>> you’ve
>>>>>>>>     mentioned for Confluent [8] previously:
>>>>>>>>      > - should be able run locally or distributed (bare metal or in
>>>>> the
>>>>>>>>     cloud)
>>>>>>>>      > - should have built-in deployment facilities for applications
>>>>>>>>     under test
>>>>>>>>      > - should separate test configuration and test code
>>>>>>>>      > -- be able to easily reconfigure tests by simple configuration
>>>>>>>>     changes
>>>>>>>>      > -- be able to easily scale test environment by simple
>>>>>>>>     configuration changes
>>>>>>>>      > -- be able to perform regression testing by simple switching
>>>>>>>>     artifacts under test via configuration
>>>>>>>>      > -- be able to run tests with different JDK version by simple
>>>>>>>>     configuration changes
>>>>>>>>      > - should have human readable reports and/or reporting tools
>>>>>>>>     integration
>>>>>>>>      > - should allow simple test progress monitoring, one does not
>>>>> want
>>>>>>>>     to run 6-hours test to find out that application actually crashed
>>>>>>>>     during first hour.
>>>>>>>>      > - should allow parallel execution of test actions
>>>>>>>>      > - should have clean API for test writers
>>>>>>>>      > -- clean API for distributed remote commands execution
>>>>>>>>      > -- clean API for deployed applications start / stop and other
>>>>>>>>     operations
>>>>>>>>      > -- clean API for performing check on results
>>>>>>>>      > - should be open source or at least source code should allow
>>>>> ease
>>>>>>>>     change or extension
>>>>>>>>      >
>>>>>>>>      > Back at that time we found no better alternative than to write
>>>>>>>>     our own framework, and here goes Tiden [9] as GridGain framework
>> of
>>>>>>>>     choice for functional integration and performance testing.
>>>>>>>>      >
>>>>>>>>      > Pros:
>>>>>>>>      > - solves all the requirements above
>>>>>>>>      > Cons (for Ignite):
>>>>>>>>      > - (currently) closed GridGain source
>>>>>>>>      >
>>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
>> which
>>>>>>>>     you might have heard already.
>>>>>>>>      >
>>>>>>>>      > A Combinator suite allows to run set of operations
>> concurrently
>>>>>>>>     over given database instance. Proven to find at least 30+ race
>>>>>>>>     conditions and NPE issues.
>>>>>>>>      >
>>>>>>>>      > A Consumption suite allows to run a set production-like
>> actions
>>>>>>>>     over given set of Ignite/GridGain versions and compare test
>> metrics
>>>>>>>>     across versions, like heap/disk/CPU consumption, time to perform
>>>>>>>>     actions, like client PME, server PME, rebalancing time, data
>>>>>>>>     replication time, etc.
>>>>>>>>      >
>>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to run
>>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself has a
>>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
>>>>>>>>      >
>>>>>>>>      > A Stress suite that simulates hardware environment degradation
>>>>>>>>     during testing.
>>>>>>>>      >
>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
>>>>> functional
>>>>>>>>     regression testing of GridGain Ultimate Edition features like
>>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
>>>>>>>>      >
>>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
>> IEP-15,
>>>>>>>>     etc, etc, etc.
>>>>>>>>      >
>>>>>>>>      > Most of the suites above use another in-house developed Java
>>>>> tool
>>>>>>>>     – PiClient – to perform actual loading and miscellaneous
>> operations
>>>>>>>>     with Ignite under test. We use py4j Python-Java gateway library
>> to
>>>>>>>>     control PiClient instances from the tests.
>>>>>>>>      >
>>>>>>>>      > When we considered CI, we put TeamCity out of scope, because
>>>>>>>>     distributed integration and performance tests tend to run for
>> hours
>>>>>>>>     and TeamCity agents are scarce and costly resource. So, bundled
>>>>> with
>>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
>>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
>>>>> test
>>>>>>>>     run reports across versions and has built in visualization
>> support
>>>>>>>>     for Combinator suite.
>>>>>>>>      >
>>>>>>>>      > All of the above is currently closed source, but we plan to
>> make
>>>>>>>>     it public for community, and publishing Tiden core [9] is the
>> first
>>>>>>>>     step on that way. You can review some examples of using Tiden for
>>>>>>>>     tests at my repository [11], for start.
>>>>>>>>      >
>>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
>>>>>>>>      >
>>>>>>>>      > Criteria: Language
>>>>>>>>      > Tiden: Python, 3.7
>>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>>>>>>>     compatible, but actually can’t work with Python 3.7 due to broken
>>>>>>>>     Zmq dependency.
>>>>>>>>      > Comment: Python 3.7 has a much better support for async-style
>>>>>>>>     code which might be crucial for distributed application testing.
>>>>>>>>      > Score: Tiden: 1, Ducktape: 0
>>>>>>>>      >
>>>>>>>>      > Criteria: Test writers API
>>>>>>>>      > Supported integration test framework concepts are basically
>> the
>>>>> same:
>>>>>>>>      > - a test controller (test runner)
>>>>>>>>      > - a cluster
>>>>>>>>      > - a node
>>>>>>>>      > - an application (a service in Ducktape terms)
>>>>>>>>      > - a test
>>>>>>>>      > Score: Tiden: 5, Ducktape: 5
>>>>>>>>      >
>>>>>>>>      > Criteria: Tests selection and run
>>>>>>>>      > Ducktape: suite-package-class-method level selection, internal
>>>>>>>>     scheduler allows to run tests in suite in parallel.
>>>>>>>>      > Tiden: also suite-package-class-method level selection,
>>>>>>>>     additionally allows selecting subset of tests by attribute,
>>>>> parallel
>>>>>>>>     runs not built in, but allows merging test reports after
>> different
>>>>> runs.
>>>>>>>>      > Score: Tiden: 2, Ducktape: 2
>>>>>>>>      >
>>>>>>>>      > Criteria: Test configuration
>>>>>>>>      > Ducktape: single JSON string for all tests
>>>>>>>>      > Tiden: any number of YaML config files, command line option
>> for
>>>>>>>>     fine-grained test configuration, ability to select/modify tests
>>>>>>>>     behavior based on Ignite version.
>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Cluster control
>>>>>>>>      > Ducktape: allow execute remote commands by node granularity
>>>>>>>>      > Tiden: additionally can address cluster as a whole and execute
>>>>>>>>     remote commands in parallel.
>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Logs control
>>>>>>>>      > Both frameworks have similar builtin support for remote logs
>>>>>>>>     collection and grepping. Tiden has built-in plugin that can zip,
>>>>>>>>     collect arbitrary log files from arbitrary locations at
>>>>>>>>     test/module/suite granularity and unzip if needed, also
>> application
>>>>>>>>     API to search / wait for messages in logs. Ducktape allows each
>>>>>>>>     service declare its log files location (seemingly does not
>> support
>>>>>>>>     logs rollback), and a single entrypoint to collect service logs.
>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Test assertions
>>>>>>>>      > Tiden: simple asserts, also few customized assertion helpers.
>>>>>>>>      > Ducktape: simple asserts.
>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Test reporting
>>>>>>>>      > Ducktape: limited to its own text/html format
>>>>>>>>      > Tiden: provides text report, yaml report for reporting tools
>>>>>>>>     integration, XML xUnit report for integration with
>>>>> Jenkins/TeamCity.
>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Provisioning and deployment
>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for test
>>>>>>>>     needs. However, that means, that test can’t be scaled without
>> test
>>>>>>>>     code changes. Does not do any deploy, relies on external means,
>>>>> e.g.
>>>>>>>>     pre-packaged in docker image, as in PoC.
>>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
>>>>> test.
>>>>>>>>     Provisioning should be done by external means. However, provides
>> a
>>>>>>>>     conventional automated deployment routines.
>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > Criteria: Documentation and Extensibility
>>>>>>>>      > Tiden: current API documentation is limited, should change as
>> we
>>>>>>>>     go open source. Tiden is easily extensible via hooks and plugins,
>>>>>>>>     see example Maven plugin and Gatling application at [11].
>>>>>>>>      > Ducktape: basic documentation at readthedocs.io
>>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
>>>>>>>>     tightly coupled and hard to change. The only possible extension
>>>>>>>>     mechanism is fork-and-rewrite.
>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>>>>      >
>>>>>>>>      > I can continue more on this, but it should be enough for now:
>>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
>>>>>>>>      >
>>>>>>>>      > Time for discussion!
>>>>>>>>      >
>>>>>>>>      > ---
>>>>>>>>      > [1] - https://www.testcontainers.org/
>>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
>>>>>>>>      > [3] - https://jmeter.apache.org/index.html
>>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>>      > [5] - https://gatling.io/docs/current/
>>>>>>>>      > [6] - https://github.com/gridgain/yardstick
>>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
>>>>>>>>      > [8] -
>>>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
>>>>>>>>      > [9] - https://github.com/gridgain/tiden
>>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
>>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
>>>>>>>>      >
>>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>>>>>>>      >> Hello,
>>>>>>>>      >>
>>>>>>>>      >> Branch with duck tape created -
>>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>>>      >>
>>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
>>>>>>>>      >>
>>>>>>>>      >>
>>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>>>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
>>>>> написал(а):
>>>>>>>>      >>>
>>>>>>>>      >>> Hello, Denis.
>>>>>>>>      >>>
>>>>>>>>      >>> There is no rush with these improvements.
>>>>>>>>      >>> We can wait for Maxim proposal and compare two solutions :)
>>>>>>>>      >>>
>>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>>>>>>>     <ma...@apache.org>> написал(а):
>>>>>>>>      >>>>
>>>>>>>>      >>>> Hi Nikolay,
>>>>>>>>      >>>>
>>>>>>>>      >>>> Thanks for kicking off this conversation and sharing your
>>>>>>>>     findings with the
>>>>>>>>      >>>> results. That's the right initiative. I do agree that
>> Ignite
>>>>>>>>     needs to have
>>>>>>>>      >>>> an integration testing framework with capabilities listed
>> by
>>>>> you.
>>>>>>>>      >>>>
>>>>>>>>      >>>> As we discussed privately, I would only check if instead of
>>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
>>>>>>>>     testing framework
>>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
>>>>> clusters.
>>>>>>>>     That
>>>>>>>>      >>>> framework has been battle-tested and might be more
>>>>> convenient for
>>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>>>>>>>      >>>> <mshonichev@gridgain.com <ma...@gridgain.com>>
>>>>> who
>>>>>>>>     promised to join this thread once he finishes
>>>>>>>>      >>>> preparing the usage examples of the framework. To my
>>>>>>>>     knowledge, Max has
>>>>>>>>      >>>> already been working on that for several days.
>>>>>>>>      >>>>
>>>>>>>>      >>>> -
>>>>>>>>      >>>> Denis
>>>>>>>>      >>>>
>>>>>>>>      >>>>
>>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>>>>>>>     <nizhikov@apache.org <ma...@apache.org>>
>>>>>>>>      >>>> wrote:
>>>>>>>>      >>>>
>>>>>>>>      >>>>> Hello, Igniters.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated testing
>>>>> of
>>>>>>>>     Ignite on
>>>>>>>>      >>>>> «real cluster».
>>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
>>>>>>>>      >>>>>       * client and server nodes deployed on different
>> hosts.
>>>>>>>>      >>>>>       * thin clients perform queries from some other hosts
>>>>>>>>      >>>>>       * etc.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of
>> some
>>>>>>>>     internal
>>>>>>>>      >>>>> Ignite process
>>>>>>>>      >>>>>       * PME
>>>>>>>>      >>>>>       * rebalance.
>>>>>>>>      >>>>> This means we don’t know - Do we perform rebalance(or PME)
>>>>> in
>>>>>>>>     2.7.0 faster
>>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
>> Ignite
>>>>>>>>     integration in
>>>>>>>>      >>>>> a real-world environment:
>>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
>>>>>>>>      >>>>> I think some ML solutions also should be tested in
>>>>> real-world
>>>>>>>>     deployments.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> Solution:
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> I propose to use duck tape library from confluent (apache
>>>>> 2.0
>>>>>>>>     license)
>>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) and on
>>>>> the
>>>>>>>>     local
>>>>>>>>      >>>>> environment(docker) and it works just fine.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> PoC contains following services:
>>>>>>>>      >>>>>
>>>>>>>>      >>>>>       * Simple rebalance test:
>>>>>>>>      >>>>>               Start 2 server nodes,
>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>      >>>>>               Start one more server node,
>>>>>>>>      >>>>>               Wait for rebalance finish
>>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
>>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark worker,
>>>>>>>>      >>>>>               Start 1 Ignite server node
>>>>>>>>      >>>>>               Create some data with Ignite client,
>>>>>>>>      >>>>>               Check data in application that queries it
>> from
>>>>>>>>     Spark.
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> All tests are fully automated.
>>>>>>>>      >>>>> Logs collection works just fine.
>>>>>>>>      >>>>> You can see an example of the tests report - [4].
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> Pros:
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> * Ability to test local changes(no need to public changes
>> to
>>>>>>>>     some remote
>>>>>>>>      >>>>> repository or similar).
>>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
>> tests
>>>>>>>>     on different
>>>>>>>>      >>>>> JDK, JVM params, config, etc.)
>>>>>>>>      >>>>> * Isolation by default so system tests are as reliable as
>>>>>>>>     possible.
>>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
>> easily
>>>>>>>>     in clusters in
>>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
>> Vagrant,
>>>>>>>>     K8s, Mesos,
>>>>>>>>      >>>>> Docker, cloud providers, etc.)
>>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
>>>>>>>>      >>>>> * Adopted and successfully used by other distributed open
>>>>>>>>     source project -
>>>>>>>>      >>>>> Apache Kafka.
>>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
>>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
>> performance
>>>>>>>>     results, etc.)
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> WDYT?
>>>>>>>>      >>>>>
>>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
>>>>>>>>      >>>>> [3]
>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>>>>>>
>>>>>
>>>>>
>>>>>
>>> <2020-07-05--004.tar.gz>
>>
>>
>>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Ilya Suntsov <is...@gridgain.com>.

Nikolay,

Can you, please, write down your version of requirements so we can reach a
> consensus on that and therefore move to the discussion of the
> implementation?

I guess that Max's requirements quite similar to your requirements:

   1. The framework should support deployment on hardware/docker/AWS ...
   2. Integration with TeamCity/Jenkins
   3. Clients Java applications contain basic tests logic, Python for
   deployment/logs analysis
   4. Tests can be executed against the dev branch/release build
   5. The framework should allow us to create stable performance tests
   6. Clear reporting

Max, please correct me if I miss something.

пн, 6 июл. 2020 г. в 14:27, Anton Vinogradov <av...@apache.org>:

> Max,
>
> Thanks for the check!
>
> > Is it OK for those tests to fail?
> No.
> I see really strange things at logs.
> Looks like you have concurrent ducktests run started not expected services,
> and this broke the tests.
> Could you please clean up the docker (use clean-up script [1]).
> Compile sources (use script [2]) and rerun the tests.
>
> [1]
>
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> [2]
>
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
>
> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org>
> wrote:
>
> > Hello, Maxim.
> >
> > Thanks for writing down the minutes.
> >
> > There is no such thing as «Nikolay team» on the dev-list.
> > I propose to focus on product requirements and what we want to gain from
> > the framework instead of taking into account the needs of some team.
> >
> > Can you, please, write down your version of requirements so we can reach
> a
> > consensus on that and therefore move to the discussion of the
> > implementation?
> >
> > > 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru>
> написал(а):
> > >
> > > Yes, Denis,
> > >
> > > common ground seems to be as follows:
> > > Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> > over physical hosts and share benchmark results. In the meantime, while I
> > strongly believe that dockerized approach to benchmarking is a road to
> > misleading and false positives, I'll prepare a PoC of Tiden in dockerized
> > environment to support 'fast development prototyping' usecase Nikolay
> team
> > insist on. It should be a matter of few days.
> > >
> > > As a side note, I've run Anton PoC locally and would like to have some
> > comments about results:
> > >
> > > Test system: Ubuntu 18.04, docker 19.03.6
> > > Test commands:
> > >
> > >
> > > git clone -b ignite-ducktape git@github.com:
> anton-vinogradov/ignite.git
> > > cd ignite
> > > mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> > -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> > > cd modules/ducktests/tests/docker
> > > ./run_tests.sh
> > >
> > > Test results:
> > >
> >
> ====================================================================================================
> > > SESSION REPORT (ALL TESTS)
> > > ducktape version: 0.7.7
> > > session_id:       2020-07-05--004
> > > run time:         7 minutes 36.360 seconds
> > > tests run:        5
> > > passed:           3
> > > failed:           2
> > > ignored:          0
> > >
> >
> ====================================================================================================
> > > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> > > status:     FAIL
> > > run time:   3 minutes 12.232 seconds
> > >
> >
> ----------------------------------------------------------------------------------------------------
> > > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> > > status:     FAIL
> > > run time:   1 minute 33.076 seconds
> > >
> > >
> > > Is it OK for those tests to fail? Attached is full test report
> > >
> > >
> > > On 02.07.2020 17:46, Denis Magda wrote:
> > >> Folks,
> > >> Please share the summary of that Slack conversation here for records
> > once
> > >> you find common ground.
> > >> -
> > >> Denis
> > >> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
> > wrote:
> > >>> Igniters.
> > >>>
> > >>> All who are interested in integration testing framework discussion
> are
> > >>> welcome into slack channel -
> > >>>
> >
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> > >>>
> > >>>
> > >>>
> > >>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org>
> написал(а):
> > >>>>
> > >>>> Max,
> > >>>> Thanks for joining us.
> > >>>>
> > >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> > >>>>> dependencies being deployed by external scripts.
> > >>>> No. It is important to distinguish development, deploy, and
> > >>> orchestration.
> > >>>> All-in-one solutions have extremely limited usability.
> > >>>> As to Ducktests:
> > >>>> Docker is responsible for deployments during development.
> > >>>> CI/CD is responsible for deployments during release and nightly
> > checks.
> > >>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> > >>>> Ducktape is responsible for orchestration.
> > >>>>
> > >>>>> 2. tiden can execute actions over remote nodes in real parallel
> > >>> fashion,
> > >>>>> while ducktape internally does all actions sequentially.
> > >>>> No. Ducktape may start any service in parallel. See Pme-free
> benchmark
> > >>> [1] for details.
> > >>>>
> > >>>>> if we used ducktape solution we would have to instead prepare some
> > >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> > with
> > >>>>> Ansible or Chef.
> > >>>> Sure, because a way of deploy depends on infrastructure.
> > >>>> How can we be sure that OS we use and the restrictions we have will
> be
> > >>> compatible with Tiden?
> > >>>>
> > >>>>> You have solved this deficiency with docker by putting all
> > dependencies
> > >>>>> into one uber-image ...
> > >>>> and
> > >>>>> I guess we all know about docker hyped ability to run over
> > distributed
> > >>>>> virtual networks.
> > >>>> It is very important not to confuse the test's development (docker
> > image
> > >>> you're talking about) and real deployment.
> > >>>>
> > >>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> > >>>> All actions can be performed in parallel.
> > >>>> See how Ducktests [2] starts cluster in parallel for example.
> > >>>>
> > >>>> [1]
> > >>>
> >
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> > >>>> [2]
> > >>>
> >
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> > >>>>
> > >>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <nizhikov@apache.org
> >
> > >>> wrote:
> > >>>> Hello, Maxim.
> > >>>>
> > >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> > >>> dependencies being deployed by external scripts
> > >>>>
> > >>>> Why do you think that maintaining deploy scripts coupled with the
> > >>> testing framework is an advantage?
> > >>>> I thought we want to see and maintain deployment scripts separate
> from
> > >>> the testing framework.
> > >>>>
> > >>>>> 2. tiden can execute actions over remote nodes in real parallel
> > >>> fashion, while ducktape internally does all actions sequentially.
> > >>>>
> > >>>> Can you, please, clarify, what actions do you have in mind?
> > >>>> And why we want to execute them concurrently?
> > >>>> Ignite node start, Client application execution can be done
> > concurrently
> > >>> with the ducktape approach.
> > >>>>
> > >>>>> If we used ducktape solution we would have to instead prepare some
> > >>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> > >>> Ansible or Chef
> > >>>>
> > >>>> We shouldn’t take some user approach as an argument in this
> > discussion.
> > >>> Let’s discuss a general approach for all users of the Ignite. Anyway,
> > what
> > >>> is wrong with the external deployment script approach?
> > >>>>
> > >>>> We, as a community, should provide several ways to run integration
> > tests
> > >>> out-of-the-box AND the ability to customize deployment regarding the
> > user
> > >>> landscape.
> > >>>>
> > >>>>> You have solved this deficiency with docker by putting all
> > >>> dependencies into one uber-image and that looks like simple and
> elegant
> > >>> solution however, that effectively limits you to single-host testing.
> > >>>>
> > >>>> Docker image should be used only by the Ignite developers to test
> > >>> something locally.
> > >>>> It’s not intended for some real-world testing.
> > >>>>
> > >>>> The main issue with the Tiden that I see, it tested and maintained
> as
> > a
> > >>> closed source solution.
> > >>>> This can lead to the hard to solve problems when we start using and
> > >>> maintaining it as an open-source solution.
> > >>>> Like, how many developers used Tiden? And how many of developers
> were
> > >>> not authors of the Tiden itself?
> > >>>>
> > >>>>
> > >>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
> > >>> написал(а):
> > >>>>>
> > >>>>> Anton, Nikolay,
> > >>>>>
> > >>>>> Let's agree on what we are arguing about: whether it is about "like
> > or
> > >>> don't like" or about technical properties of suggested solutions.
> > >>>>>
> > >>>>> If it is about likes and dislikes, then the whole discussion is
> > >>> meaningless. However, I hope together we can analyse pros and cons
> > >>> carefully.
> > >>>>>
> > >>>>> As far as I can understand now, two main differences between
> ducktape
> > >>> and tiden is that:
> > >>>>>
> > >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> > >>> dependencies being deployed by external scripts.
> > >>>>>
> > >>>>> 2. tiden can execute actions over remote nodes in real parallel
> > >>> fashion, while ducktape internally does all actions sequentially.
> > >>>>>
> > >>>>> As for me, these are very important properties for distributed
> > testing
> > >>> framework.
> > >>>>>
> > >>>>> First property let us easily reuse tiden in existing
> infrastructures,
> > >>> for example, during Zookeeper IEP testing at Sberbank site we used
> the
> > same
> > >>> tiden scripts that we use in our lab, the only change was putting a
> > list of
> > >>> hosts into config.
> > >>>>>
> > >>>>> If we used ducktape solution we would have to instead prepare some
> > >>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> > >>> Ansible or Chef.
> > >>>>>
> > >>>>>
> > >>>>> You have solved this deficiency with docker by putting all
> > >>> dependencies into one uber-image and that looks like simple and
> elegant
> > >>> solution,
> > >>>>> however, that effectively limits you to single-host testing.
> > >>>>>
> > >>>>> I guess we all know about docker hyped ability to run over
> > distributed
> > >>> virtual networks. We used to go that way, but quickly found that it
> is
> > more
> > >>> of the hype than real work. In real environments, there are problems
> > with
> > >>> routing, DNS, multicast and broadcast traffic, and many others, that
> > turn
> > >>> docker-based distributed solution into a fragile hard-to-maintain
> > monster.
> > >>>>>
> > >>>>> Please, if you believe otherwise, perform a run of your PoC over at
> > >>> least two physical hosts and share results with us.
> > >>>>>
> > >>>>> If you consider that one physical docker host is enough, please,
> > don't
> > >>> overlook that we want to run real scale scenarios, with 50-100 cache
> > >>> groups, persistence enabled and a millions of keys loaded.
> > >>>>>
> > >>>>> Practical limit for such configurations is 4-6 nodes per single
> > >>> physical host. Otherwise, tests become flaky due to resource
> > starvation.
> > >>>>>
> > >>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> > >>> your PoC with other tests running at TC (we're targeting TeamCity,
> > right?)
> > >>> and share results so we could check if the numbers are reproducible.
> > >>>>>
> > >>>>> I stress this once more: functional integration tests are OK to run
> > in
> > >>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> > >>>>>
> > >>>>>
> > >>>>> Second property let us write tests that require real-parallel
> actions
> > >>> over hosts.
> > >>>>>
> > >>>>> For example, agreed scenario for PME benchmarkduring "PME
> > optimization
> > >>> stream" was as follows:
> > >>>>>
> > >>>>>  - 10 server nodes, preloaded with 1M of keys
> > >>>>>  - 4 client nodes perform transactional load  (client nodes
> > physically
> > >>> separated from server nodes)
> > >>>>>  - during load:
> > >>>>>  -- 5 server nodes stopped in parallel
> > >>>>>  -- after 1 minute, all 5 nodes are started in parallel
> > >>>>>  - load stopped, logs are analysed for exchange times.
> > >>>>>
> > >>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
> > >>> then partition map exchange merge would not happen and we could not
> > have
> > >>> measured PME optimizations for that case.
> > >>>>>
> > >>>>>
> > >>>>> These are limitations of ducktape that we believe as a more
> important
> > >>>>> argument "against" than you provide "for".
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> > >>>>>> Folks,
> > >>>>>> First, I've created PR [1] with ducktests improvements
> > >>>>>> PR contains the following changes
> > >>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> > >>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> > 2.8)
> > >>>>>> - Global refactoring
> > >>>>>> -- benchmarks javacode simplification
> > >>>>>> -- services python and java classes code deduplication
> > >>>>>> -- fail-fast checks for java and python (eg. application should
> > >>> explicitly write it finished with success)
> > >>>>>> -- simple results extraction from tests and benchmarks
> > >>>>>> -- javacode now configurable from tests/benchmarks
> > >>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> > >>> operation and log results)
> > >>>>>> -- docker volume now marked as delegated to increase execution
> speed
> > >>> for mac & win users
> > >>>>>> -- Ignite cluster now start in parallel (start speed-up)
> > >>>>>> -- Ignite can be configured at test/benchmark
> > >>>>>> - full and module assembly scripts added
> > >>>>> Great job done! But let me remind one of Apache Ignite principles:
> > >>>>> week of thinking save months of development.
> > >>>>>
> > >>>>>
> > >>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> > >>> integration) as a target "PoC check & real topology benchmarking
> tool".
> > >>>>>> Ducktape pros
> > >>>>>> - Developed for distributed system by distributed system
> developers.
> > >>>>> So does Tiden
> > >>>>>
> > >>>>>> - Developed since 2014, stable.
> > >>>>> Tiden is also pretty stable, and development start date is not a
> good
> > >>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> > >>> distributed testing) is since 2010, but we don't see it as a
> > alternative at
> > >>> all.
> > >>>>>
> > >>>>>> - Proven usability by usage at Kafka.
> > >>>>> Tiden is proven usable by usage at GridGain and Sberbank
> deployments.
> > >>>>> Core, storage, sql and tx teams use benchmark results provided by
> > >>> Tiden on a daily basis.
> > >>>>>
> > >>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great
> example
> > >>> pack.
> > >>>>> We'll donate some of our suites to Ignite as I've mentioned in
> > >>> previous letter.
> > >>>>>
> > >>>>>> - Built-in Docker support for rapid development and checks.
> > >>>>> False, there's no specific 'docker support' in ducktape itself, you
> > >>> just wrap it in docker by yourself, because ducktape is lacking
> > deployment
> > >>> abilities.
> > >>>>>
> > >>>>>> - Great for CI automation.
> > >>>>> False, there's no specific CI-enabled features in ducktape. Tiden,
> on
> > >>> the other hand, provide generic xUnit reporting format, which is
> > supported
> > >>> by both TeamCity and Jenkins. Also, instead of using private keys,
> > Tiden
> > >>> can use SSH agent, which is also great for CI, because both
> > >>>>> TeamCity and Jenkins store keys in secret storage available only
> for
> > >>> ssh-agent and only for the time of the test.
> > >>>>>
> > >>>>>
> > >>>>>>> As an additional motivation, at least 3 teams
> > >>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> > Zabbix
> > >>> speed-up))
> > >>>>>> - Ignite SE Plugins team (to check plugin's features does not
> > >>> slow-down or broke AI features)
> > >>>>>> - Ignite SE QA team (to append already developed
> smoke/load/failover
> > >>> tests to AI codebase)
> > >>>>>
> > >>>>> Please, before recommending your tests to other teams, provide
> proofs
> > >>>>> that your tests are reproducible in real environment.
> > >>>>>
> > >>>>>
> > >>>>>> now, wait for ducktest merge to start checking cases they working
> on
> > >>> in AI way.
> > >>>>>> Thoughts?
> > >>>>> Let us together review both solutions, we'll try to run your tests
> in
> > >>> our lab, and you'll try to at least checkout tiden and see if same
> > tests
> > >>> can be implemented with it?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> [1] https://github.com/apache/ignite/pull/7967
> > >>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> > >>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> > nizhikov@apache.org
> > >>> <ma...@apache.org>> wrote:
> > >>>>>>    Hello, Maxim.
> > >>>>>>    Thank you for so detailed explanation.
> > >>>>>>    Can we put the content of this discussion somewhere on the
> wiki?
> > >>>>>>    So It doesn’t get lost.
> > >>>>>>    I divide the answer in several parts. From the requirements to
> > the
> > >>>>>>    implementation.
> > >>>>>>    So, if we agreed on the requirements we can proceed with the
> > >>>>>>    discussion of the implementation.
> > >>>>>>    1. Requirements:
> > >>>>>>    The main goal I want to achieve is *reproducibility* of the
> > tests.
> > >>>>>>    I’m sick and tired with the zillions of flaky, rarely failed,
> and
> > >>>>>>    almost never failed tests in Ignite codebase.
> > >>>>>>    We should start with the simplest scenarios that will be as
> > >>> reliable
> > >>>>>>    as steel :)
> > >>>>>>    I want to know for sure:
> > >>>>>>       - Is this PR makes rebalance quicker or not?
> > >>>>>>       - Is this PR makes PME quicker or not?
> > >>>>>>    So, your description of the complex test scenario looks as a
> next
> > >>>>>>    step to me.
> > >>>>>>    Anyway, It’s cool we already have one.
> > >>>>>>    The second goal is to have a strict test lifecycle as we have
> in
> > >>>>>>    JUnit and similar frameworks.
> > >>>>>>     > It covers production-like deployment and running a scenarios
> > >>> over
> > >>>>>>    a single database instance.
> > >>>>>>    Do you mean «single cluster» or «single host»?
> > >>>>>>    2. Existing tests:
> > >>>>>>     > A Combinator suite allows to run set of operations
> > concurrently
> > >>>>>>    over given database instance.
> > >>>>>>     > A Consumption suite allows to run a set production-like
> > actions
> > >>>>>>    over given set of Ignite/GridGain versions and compare test
> > metrics
> > >>>>>>    across versions
> > >>>>>>     > A Yardstick suite
> > >>>>>>     > A Stress suite that simulates hardware environment
> degradation
> > >>>>>>     > An Ultimate, DR and Compatibility suites that performs
> > >>> functional
> > >>>>>>    regression testing
> > >>>>>>     > Regression
> > >>>>>>    Great news that we already have so many choices for testing!
> > >>>>>>    Mature test base is a big +1 for Tiden.
> > >>>>>>    3. Comparison:
> > >>>>>>     > Criteria: Test configuration
> > >>>>>>     > Ducktape: single JSON string for all tests
> > >>>>>>     > Tiden: any number of YaML config files, command line option
> > for
> > >>>>>>    fine-grained test configuration, ability to select/modify tests
> > >>>>>>    behavior based on Ignite version.
> > >>>>>>    1. Many YAML files can be hard to maintain.
> > >>>>>>    2. In ducktape, you can set parameters via «—parameters»
> option.
> > >>>>>>    Please, take a look at the doc [1]
> > >>>>>>     > Criteria: Cluster control
> > >>>>>>     > Tiden: additionally can address cluster as a whole and
> execute
> > >>>>>>    remote commands in parallel.
> > >>>>>>    It seems we implement this ability in the PoC, already.
> > >>>>>>     > Criteria: Test assertions
> > >>>>>>     > Tiden: simple asserts, also few customized assertion
> helpers.
> > >>>>>>     > Ducktape: simple asserts.
> > >>>>>>    Can you, please, be more specific.
> > >>>>>>    What helpers do you have in mind?
> > >>>>>>    Ducktape has an asserts that waits for logfile messages or some
> > >>>>>>    process finish.
> > >>>>>>     > Criteria: Test reporting
> > >>>>>>     > Ducktape: limited to its own text/HTML format
> > >>>>>>    Ducktape have
> > >>>>>>    1. Text reporter
> > >>>>>>    2. Customizable HTML reporter
> > >>>>>>    3. JSON reporter.
> > >>>>>>    We can show JSON with the any template or tool.
> > >>>>>>     > Criteria: Provisioning and deployment
> > >>>>>>     > Ducktape: can provision subset of hosts from cluster for
> test
> > >>>>>>    needs. However, that means, that test can’t be scaled without
> > test
> > >>>>>>    code changes. Does not do any deploy, relies on external means,
> > >>> e.g.
> > >>>>>>    pre-packaged in docker image, as in PoC.
> > >>>>>>    This is not true.
> > >>>>>>    1. We can set explicit test parameters(node number) via
> > parameters.
> > >>>>>>    We can increase client count of cluster size without test code
> > >>> changes.
> > >>>>>>    2. We have many choices for the test environment. These choices
> > are
> > >>>>>>    tested and used in other projects:
> > >>>>>>             * docker
> > >>>>>>             * vagrant
> > >>>>>>             * private cloud(ssh access)
> > >>>>>>             * ec2
> > >>>>>>    Please, take a look at Kafka documentation [2]
> > >>>>>>     > I can continue more on this, but it should be enough for
> now:
> > >>>>>>    We need to go deeper! :)
> > >>>>>>    [1]
> > >>>>>>
> > >>>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> > >>>>>>    [2]
> > >>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> > >>>>>>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <
> mshonich@yandex.ru
> > >>>>>>    <ma...@yandex.ru>> написал(а):
> > >>>>>>     >
> > >>>>>>     > Greetings, Nikolay,
> > >>>>>>     >
> > >>>>>>     > First of all, thank you for you great effort preparing PoC
> of
> > >>>>>>    integration testing to Ignite community.
> > >>>>>>     >
> > >>>>>>     > It’s a shame Ignite did not have at least some such tests
> yet,
> > >>>>>>    however, GridGain, as a major contributor to Apache Ignite had
> a
> > >>>>>>    profound collection of in-house tools to perform integration
> and
> > >>>>>>    performance testing for years already and while we slowly
> > consider
> > >>>>>>    sharing our expertise with the community, your initiative makes
> > us
> > >>>>>>    drive that process a bit faster, thanks a lot!
> > >>>>>>     >
> > >>>>>>     > I reviewed your PoC and want to share a little about what we
> > do
> > >>>>>>    on our part, why and how, hope it would help community take
> > proper
> > >>>>>>    course.
> > >>>>>>     >
> > >>>>>>     > First I’ll do a brief overview of what decisions we made and
> > >>> what
> > >>>>>>    we do have in our private code base, next I’ll describe what we
> > >>> have
> > >>>>>>    already donated to the public and what we plan public next,
> then
> > >>>>>>    I’ll compare both approaches highlighting deficiencies in order
> > to
> > >>>>>>    spur public discussion on the matter.
> > >>>>>>     >
> > >>>>>>     > It might seem strange to use Python to run Bash to run Java
> > >>>>>>    applications because that introduces IT industry best of
> breed’ –
> > >>>>>>    the Python dependency hell – to the Java application code base.
> > The
> > >>>>>>    only strangest decision one can made is to use Maven to run
> > Docker
> > >>>>>>    to run Bash to run Python to run Bash to run Java, but
> desperate
> > >>>>>>    times call for desperate measures I guess.
> > >>>>>>     >
> > >>>>>>     > There are Java-based solutions for integration testing
> exists,
> > >>>>>>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go
> > >>> well
> > >>>>>>    for Ignite community CI pipelines by them selves. But we also
> > >>> wanted
> > >>>>>>    to run performance tests and benchmarks, like the dreaded PME
> > >>>>>>    benchmark, and this is solved by totally different set of tools
> > in
> > >>>>>>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> > >>>>>>     >
> > >>>>>>     > Speaking specifically about benchmarking, Apache Ignite
> > >>> community
> > >>>>>>    already has Yardstick [6], and there’s nothing wrong with
> writing
> > >>>>>>    PME benchmark using Yardstick, but we also wanted to be able to
> > run
> > >>>>>>    scenarios like this:
> > >>>>>>     > - put an X load to a Ignite database;
> > >>>>>>     > - perform an Y set of operations to check how Ignite copes
> > with
> > >>>>>>    operations under load.
> > >>>>>>     >
> > >>>>>>     > And yes, we also wanted applications under test be deployed
> > >>> ‘like
> > >>>>>>    in a production’, e.g. distributed over a set of hosts. This
> > arises
> > >>>>>>    questions about provisioning and nodes affinity which I’ll
> cover
> > in
> > >>>>>>    detail later.
> > >>>>>>     >
> > >>>>>>     > So we decided to put a little effort to build a simple tool
> to
> > >>>>>>    cover different integration and performance scenarios, and our
> QA
> > >>>>>>    lab first attempt was PoC-Tester [7], currently open source for
> > all
> > >>>>>>    but for reporting web UI. It’s a quite simple to use 95%
> > Java-based
> > >>>>>>    tool targeted to be run on a pre-release QA stage.
> > >>>>>>     >
> > >>>>>>     > It covers production-like deployment and running a scenarios
> > >>> over
> > >>>>>>    a single database instance. PoC-Tester scenarios consists of a
> > >>>>>>    sequence of tasks running sequentially or in parallel. After
> all
> > >>>>>>    tasks complete, or at any time during test, user can run logs
> > >>>>>>    collection task, logs are checked against exceptions and a
> > summary
> > >>>>>>    of found issues and task ops/latency statistics is generated at
> > the
> > >>>>>>    end of scenario. One of the main PoC-Tester features is its
> > >>>>>>    fire-and-forget approach to task managing. That is, you can
> > deploy
> > >>> a
> > >>>>>>    grid and left it running for weeks, periodically firing some
> > tasks
> > >>>>>>    onto it.
> > >>>>>>     >
> > >>>>>>     > During earliest stages of PoC-Tester development it becomes
> > >>> quite
> > >>>>>>    clear that Java application development is a tedious process
> and
> > >>>>>>    architecture decisions you take during development are slow and
> > >>> hard
> > >>>>>>    to change.
> > >>>>>>     > For example, scenarios like this
> > >>>>>>     > - deploy two instances of GridGain with master-slave data
> > >>>>>>    replication configured;
> > >>>>>>     > - put a load on master;
> > >>>>>>     > - perform checks on slave,
> > >>>>>>     > or like this:
> > >>>>>>     > - preload a 1Tb of data by using your favorite tool of
> choice
> > to
> > >>>>>>    an Apache Ignite of version X;
> > >>>>>>     > - run a set of functional tests running Apache Ignite
> version
> > Y
> > >>>>>>    over preloaded data,
> > >>>>>>     > do not fit well in the PoC-Tester workflow.
> > >>>>>>     >
> > >>>>>>     > So, this is why we decided to use Python as a generic
> > scripting
> > >>>>>>    language of choice.
> > >>>>>>     >
> > >>>>>>     > Pros:
> > >>>>>>     > - quicker prototyping and development cycles
> > >>>>>>     > - easier to find DevOps/QA engineer with Python skills than
> > one
> > >>>>>>    with Java skills
> > >>>>>>     > - used extensively all over the world for DevOps/CI
> pipelines
> > >>> and
> > >>>>>>    thus has rich set of libraries for all possible integration
> uses
> > >>> cases.
> > >>>>>>     >
> > >>>>>>     > Cons:
> > >>>>>>     > - Nightmare with dependencies. Better stick to specific
> > >>>>>>    language/libraries version.
> > >>>>>>     >
> > >>>>>>     > Comparing alternatives for Python-based testing framework we
> > >>> have
> > >>>>>>    considered following requirements, somewhat similar to what
> > you’ve
> > >>>>>>    mentioned for Confluent [8] previously:
> > >>>>>>     > - should be able run locally or distributed (bare metal or
> in
> > >>> the
> > >>>>>>    cloud)
> > >>>>>>     > - should have built-in deployment facilities for
> applications
> > >>>>>>    under test
> > >>>>>>     > - should separate test configuration and test code
> > >>>>>>     > -- be able to easily reconfigure tests by simple
> configuration
> > >>>>>>    changes
> > >>>>>>     > -- be able to easily scale test environment by simple
> > >>>>>>    configuration changes
> > >>>>>>     > -- be able to perform regression testing by simple switching
> > >>>>>>    artifacts under test via configuration
> > >>>>>>     > -- be able to run tests with different JDK version by simple
> > >>>>>>    configuration changes
> > >>>>>>     > - should have human readable reports and/or reporting tools
> > >>>>>>    integration
> > >>>>>>     > - should allow simple test progress monitoring, one does not
> > >>> want
> > >>>>>>    to run 6-hours test to find out that application actually
> crashed
> > >>>>>>    during first hour.
> > >>>>>>     > - should allow parallel execution of test actions
> > >>>>>>     > - should have clean API for test writers
> > >>>>>>     > -- clean API for distributed remote commands execution
> > >>>>>>     > -- clean API for deployed applications start / stop and
> other
> > >>>>>>    operations
> > >>>>>>     > -- clean API for performing check on results
> > >>>>>>     > - should be open source or at least source code should allow
> > >>> ease
> > >>>>>>    change or extension
> > >>>>>>     >
> > >>>>>>     > Back at that time we found no better alternative than to
> write
> > >>>>>>    our own framework, and here goes Tiden [9] as GridGain
> framework
> > of
> > >>>>>>    choice for functional integration and performance testing.
> > >>>>>>     >
> > >>>>>>     > Pros:
> > >>>>>>     > - solves all the requirements above
> > >>>>>>     > Cons (for Ignite):
> > >>>>>>     > - (currently) closed GridGain source
> > >>>>>>     >
> > >>>>>>     > On top of Tiden we’ve built a set of test suites, some of
> > which
> > >>>>>>    you might have heard already.
> > >>>>>>     >
> > >>>>>>     > A Combinator suite allows to run set of operations
> > concurrently
> > >>>>>>    over given database instance. Proven to find at least 30+ race
> > >>>>>>    conditions and NPE issues.
> > >>>>>>     >
> > >>>>>>     > A Consumption suite allows to run a set production-like
> > actions
> > >>>>>>    over given set of Ignite/GridGain versions and compare test
> > metrics
> > >>>>>>    across versions, like heap/disk/CPU consumption, time to
> perform
> > >>>>>>    actions, like client PME, server PME, rebalancing time, data
> > >>>>>>    replication time, etc.
> > >>>>>>     >
> > >>>>>>     > A Yardstick suite is a thin layer of Python glue code to run
> > >>>>>>    Apache Ignite pre-release benchmarks set. Yardstick itself has
> a
> > >>>>>>    mediocre deployment capabilities, Tiden solves this easily.
> > >>>>>>     >
> > >>>>>>     > A Stress suite that simulates hardware environment
> degradation
> > >>>>>>    during testing.
> > >>>>>>     >
> > >>>>>>     > An Ultimate, DR and Compatibility suites that performs
> > >>> functional
> > >>>>>>    regression testing of GridGain Ultimate Edition features like
> > >>>>>>    snapshots, security, data replication, rolling upgrades, etc.
> > >>>>>>     >
> > >>>>>>     > A Regression and some IEPs testing suites, like IEP-14,
> > IEP-15,
> > >>>>>>    etc, etc, etc.
> > >>>>>>     >
> > >>>>>>     > Most of the suites above use another in-house developed Java
> > >>> tool
> > >>>>>>    – PiClient – to perform actual loading and miscellaneous
> > operations
> > >>>>>>    with Ignite under test. We use py4j Python-Java gateway library
> > to
> > >>>>>>    control PiClient instances from the tests.
> > >>>>>>     >
> > >>>>>>     > When we considered CI, we put TeamCity out of scope, because
> > >>>>>>    distributed integration and performance tests tend to run for
> > hours
> > >>>>>>    and TeamCity agents are scarce and costly resource. So, bundled
> > >>> with
> > >>>>>>    Tiden there is jenkins-job-builder [10] based CI pipelines and
> > >>>>>>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
> > >>> test
> > >>>>>>    run reports across versions and has built in visualization
> > support
> > >>>>>>    for Combinator suite.
> > >>>>>>     >
> > >>>>>>     > All of the above is currently closed source, but we plan to
> > make
> > >>>>>>    it public for community, and publishing Tiden core [9] is the
> > first
> > >>>>>>    step on that way. You can review some examples of using Tiden
> for
> > >>>>>>    tests at my repository [11], for start.
> > >>>>>>     >
> > >>>>>>     > Now, let’s compare Ducktape PoC and Tiden.
> > >>>>>>     >
> > >>>>>>     > Criteria: Language
> > >>>>>>     > Tiden: Python, 3.7
> > >>>>>>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> > >>>>>>    compatible, but actually can’t work with Python 3.7 due to
> broken
> > >>>>>>    Zmq dependency.
> > >>>>>>     > Comment: Python 3.7 has a much better support for
> async-style
> > >>>>>>    code which might be crucial for distributed application
> testing.
> > >>>>>>     > Score: Tiden: 1, Ducktape: 0
> > >>>>>>     >
> > >>>>>>     > Criteria: Test writers API
> > >>>>>>     > Supported integration test framework concepts are basically
> > the
> > >>> same:
> > >>>>>>     > - a test controller (test runner)
> > >>>>>>     > - a cluster
> > >>>>>>     > - a node
> > >>>>>>     > - an application (a service in Ducktape terms)
> > >>>>>>     > - a test
> > >>>>>>     > Score: Tiden: 5, Ducktape: 5
> > >>>>>>     >
> > >>>>>>     > Criteria: Tests selection and run
> > >>>>>>     > Ducktape: suite-package-class-method level selection,
> internal
> > >>>>>>    scheduler allows to run tests in suite in parallel.
> > >>>>>>     > Tiden: also suite-package-class-method level selection,
> > >>>>>>    additionally allows selecting subset of tests by attribute,
> > >>> parallel
> > >>>>>>    runs not built in, but allows merging test reports after
> > different
> > >>> runs.
> > >>>>>>     > Score: Tiden: 2, Ducktape: 2
> > >>>>>>     >
> > >>>>>>     > Criteria: Test configuration
> > >>>>>>     > Ducktape: single JSON string for all tests
> > >>>>>>     > Tiden: any number of YaML config files, command line option
> > for
> > >>>>>>    fine-grained test configuration, ability to select/modify tests
> > >>>>>>    behavior based on Ignite version.
> > >>>>>>     > Score: Tiden: 3, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Cluster control
> > >>>>>>     > Ducktape: allow execute remote commands by node granularity
> > >>>>>>     > Tiden: additionally can address cluster as a whole and
> execute
> > >>>>>>    remote commands in parallel.
> > >>>>>>     > Score: Tiden: 2, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Logs control
> > >>>>>>     > Both frameworks have similar builtin support for remote logs
> > >>>>>>    collection and grepping. Tiden has built-in plugin that can
> zip,
> > >>>>>>    collect arbitrary log files from arbitrary locations at
> > >>>>>>    test/module/suite granularity and unzip if needed, also
> > application
> > >>>>>>    API to search / wait for messages in logs. Ducktape allows each
> > >>>>>>    service declare its log files location (seemingly does not
> > support
> > >>>>>>    logs rollback), and a single entrypoint to collect service
> logs.
> > >>>>>>     > Score: Tiden: 1, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Test assertions
> > >>>>>>     > Tiden: simple asserts, also few customized assertion
> helpers.
> > >>>>>>     > Ducktape: simple asserts.
> > >>>>>>     > Score: Tiden: 2, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Test reporting
> > >>>>>>     > Ducktape: limited to its own text/html format
> > >>>>>>     > Tiden: provides text report, yaml report for reporting tools
> > >>>>>>    integration, XML xUnit report for integration with
> > >>> Jenkins/TeamCity.
> > >>>>>>     > Score: Tiden: 3, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Provisioning and deployment
> > >>>>>>     > Ducktape: can provision subset of hosts from cluster for
> test
> > >>>>>>    needs. However, that means, that test can’t be scaled without
> > test
> > >>>>>>    code changes. Does not do any deploy, relies on external means,
> > >>> e.g.
> > >>>>>>    pre-packaged in docker image, as in PoC.
> > >>>>>>     > Tiden: Given a set of hosts, Tiden uses all of them for the
> > >>> test.
> > >>>>>>    Provisioning should be done by external means. However,
> provides
> > a
> > >>>>>>    conventional automated deployment routines.
> > >>>>>>     > Score: Tiden: 1, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > Criteria: Documentation and Extensibility
> > >>>>>>     > Tiden: current API documentation is limited, should change
> as
> > we
> > >>>>>>    go open source. Tiden is easily extensible via hooks and
> plugins,
> > >>>>>>    see example Maven plugin and Gatling application at [11].
> > >>>>>>     > Ducktape: basic documentation at readthedocs.io
> > >>>>>>    <http://readthedocs.io>. Codebase is rigid, framework core is
> > >>>>>>    tightly coupled and hard to change. The only possible extension
> > >>>>>>    mechanism is fork-and-rewrite.
> > >>>>>>     > Score: Tiden: 2, Ducktape: 1
> > >>>>>>     >
> > >>>>>>     > I can continue more on this, but it should be enough for
> now:
> > >>>>>>     > Overall score: Tiden: 22, Ducktape: 14.
> > >>>>>>     >
> > >>>>>>     > Time for discussion!
> > >>>>>>     >
> > >>>>>>     > ---
> > >>>>>>     > [1] - https://www.testcontainers.org/
> > >>>>>>     > [2] - http://arquillian.org/guides/getting_started/
> > >>>>>>     > [3] - https://jmeter.apache.org/index.html
> > >>>>>>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> > >>>>>>     > [5] - https://gatling.io/docs/current/
> > >>>>>>     > [6] - https://github.com/gridgain/yardstick
> > >>>>>>     > [7] - https://github.com/gridgain/poc-tester
> > >>>>>>     > [8] -
> > >>>>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> > >>>>>>     > [9] - https://github.com/gridgain/tiden
> > >>>>>>     > [10] - https://pypi.org/project/jenkins-job-builder/
> > >>>>>>     > [11] - https://github.com/mshonichev/tiden_examples
> > >>>>>>     >
> > >>>>>>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> > >>>>>>     >> Hello,
> > >>>>>>     >>
> > >>>>>>     >> Branch with duck tape created -
> > >>>>>>    https://github.com/apache/ignite/tree/ignite-ducktape
> > >>>>>>     >>
> > >>>>>>     >> Any who are willing to contribute to PoC are welcome.
> > >>>>>>     >>
> > >>>>>>     >>
> > >>>>>>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> > >>>>>>    <nizhikov.dev@gmail.com <ma...@gmail.com>>
> > >>> написал(а):
> > >>>>>>     >>>
> > >>>>>>     >>> Hello, Denis.
> > >>>>>>     >>>
> > >>>>>>     >>> There is no rush with these improvements.
> > >>>>>>     >>> We can wait for Maxim proposal and compare two solutions
> :)
> > >>>>>>     >>>
> > >>>>>>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> > >>>>>>    <ma...@apache.org>> написал(а):
> > >>>>>>     >>>>
> > >>>>>>     >>>> Hi Nikolay,
> > >>>>>>     >>>>
> > >>>>>>     >>>> Thanks for kicking off this conversation and sharing your
> > >>>>>>    findings with the
> > >>>>>>     >>>> results. That's the right initiative. I do agree that
> > Ignite
> > >>>>>>    needs to have
> > >>>>>>     >>>> an integration testing framework with capabilities listed
> > by
> > >>> you.
> > >>>>>>     >>>>
> > >>>>>>     >>>> As we discussed privately, I would only check if instead
> of
> > >>>>>>     >>>> Confluent's Ducktape library, we can use an integration
> > >>>>>>    testing framework
> > >>>>>>     >>>> developed by GridGain for testing of Ignite/GridGain
> > >>> clusters.
> > >>>>>>    That
> > >>>>>>     >>>> framework has been battle-tested and might be more
> > >>> convenient for
> > >>>>>>     >>>> Ignite-specific workloads. Let's wait for @Maksim
> Shonichev
> > >>>>>>     >>>> <mshonichev@gridgain.com <mailto:mshonichev@gridgain.com
> >>
> > >>> who
> > >>>>>>    promised to join this thread once he finishes
> > >>>>>>     >>>> preparing the usage examples of the framework. To my
> > >>>>>>    knowledge, Max has
> > >>>>>>     >>>> already been working on that for several days.
> > >>>>>>     >>>>
> > >>>>>>     >>>> -
> > >>>>>>     >>>> Denis
> > >>>>>>     >>>>
> > >>>>>>     >>>>
> > >>>>>>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> > >>>>>>    <nizhikov@apache.org <ma...@apache.org>>
> > >>>>>>     >>>> wrote:
> > >>>>>>     >>>>
> > >>>>>>     >>>>> Hello, Igniters.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> I created a PoC [1] for the integration tests of Ignite.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> Let me briefly explain the gap I want to cover:
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> 1. For now, we don’t have a solution for automated
> testing
> > >>> of
> > >>>>>>    Ignite on
> > >>>>>>     >>>>> «real cluster».
> > >>>>>>     >>>>> By «real cluster» I mean cluster «like a production»:
> > >>>>>>     >>>>>       * client and server nodes deployed on different
> > hosts.
> > >>>>>>     >>>>>       * thin clients perform queries from some other
> hosts
> > >>>>>>     >>>>>       * etc.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> 2. We don’t have a solution for automated benchmarks of
> > some
> > >>>>>>    internal
> > >>>>>>     >>>>> Ignite process
> > >>>>>>     >>>>>       * PME
> > >>>>>>     >>>>>       * rebalance.
> > >>>>>>     >>>>> This means we don’t know - Do we perform rebalance(or
> PME)
> > >>> in
> > >>>>>>    2.7.0 faster
> > >>>>>>     >>>>> or slower than in 2.8.0 for the same cluster?
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> 3. We don’t have a solution for automated testing of
> > Ignite
> > >>>>>>    integration in
> > >>>>>>     >>>>> a real-world environment:
> > >>>>>>     >>>>> Ignite-Spark integration can be taken as an example.
> > >>>>>>     >>>>> I think some ML solutions also should be tested in
> > >>> real-world
> > >>>>>>    deployments.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> Solution:
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> I propose to use duck tape library from confluent
> (apache
> > >>> 2.0
> > >>>>>>    license)
> > >>>>>>     >>>>> I tested it both on the real cluster(Yandex Cloud) and
> on
> > >>> the
> > >>>>>>    local
> > >>>>>>     >>>>> environment(docker) and it works just fine.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> PoC contains following services:
> > >>>>>>     >>>>>
> > >>>>>>     >>>>>       * Simple rebalance test:
> > >>>>>>     >>>>>               Start 2 server nodes,
> > >>>>>>     >>>>>               Create some data with Ignite client,
> > >>>>>>     >>>>>               Start one more server node,
> > >>>>>>     >>>>>               Wait for rebalance finish
> > >>>>>>     >>>>>       * Simple Ignite-Spark integration test:
> > >>>>>>     >>>>>               Start 1 Spark master, start 1 Spark
> worker,
> > >>>>>>     >>>>>               Start 1 Ignite server node
> > >>>>>>     >>>>>               Create some data with Ignite client,
> > >>>>>>     >>>>>               Check data in application that queries it
> > from
> > >>>>>>    Spark.
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> All tests are fully automated.
> > >>>>>>     >>>>> Logs collection works just fine.
> > >>>>>>     >>>>> You can see an example of the tests report - [4].
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> Pros:
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> * Ability to test local changes(no need to public
> changes
> > to
> > >>>>>>    some remote
> > >>>>>>     >>>>> repository or similar).
> > >>>>>>     >>>>> * Ability to parametrize test environment(run the same
> > tests
> > >>>>>>    on different
> > >>>>>>     >>>>> JDK, JVM params, config, etc.)
> > >>>>>>     >>>>> * Isolation by default so system tests are as reliable
> as
> > >>>>>>    possible.
> > >>>>>>     >>>>> * Utilities for pulling up and tearing down services
> > easily
> > >>>>>>    in clusters in
> > >>>>>>     >>>>> different environments (e.g. local, custom cluster,
> > Vagrant,
> > >>>>>>    K8s, Mesos,
> > >>>>>>     >>>>> Docker, cloud providers, etc.)
> > >>>>>>     >>>>> * Easy to write unit tests for distributed systems
> > >>>>>>     >>>>> * Adopted and successfully used by other distributed
> open
> > >>>>>>    source project -
> > >>>>>>     >>>>> Apache Kafka.
> > >>>>>>     >>>>> * Collect results (e.g. logs, console output)
> > >>>>>>     >>>>> * Report results (e.g. expected conditions met,
> > performance
> > >>>>>>    results, etc.)
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> WDYT?
> > >>>>>>     >>>>>
> > >>>>>>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> > >>>>>>     >>>>> [2] https://github.com/confluentinc/ducktape
> > >>>>>>     >>>>> [3]
> > >>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> > >>>>>>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> > >>>>
> > >>>
> > >>>
> > >>>
> > > <2020-07-05--004.tar.gz>
> >
> >
> >
>


-- 
Best Regards,
Ilya Suntsov
email: isuntsov@gridgain.com
*GridGain Systems*
www.gridgain.com

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

Max,

Thanks for the check!

> Is it OK for those tests to fail?
No.
I see really strange things at logs.
Looks like you have concurrent ducktests run started not expected services,
and this broke the tests.
Could you please clean up the docker (use clean-up script [1]).
Compile sources (use script [2]) and rerun the tests.

[1]
https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
[2]
https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh

On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <ni...@apache.org> wrote:

> Hello, Maxim.
>
> Thanks for writing down the minutes.
>
> There is no such thing as «Nikolay team» on the dev-list.
> I propose to focus on product requirements and what we want to gain from
> the framework instead of taking into account the needs of some team.
>
> Can you, please, write down your version of requirements so we can reach a
> consensus on that and therefore move to the discussion of the
> implementation?
>
> > 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru> написал(а):
> >
> > Yes, Denis,
> >
> > common ground seems to be as follows:
> > Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> over physical hosts and share benchmark results. In the meantime, while I
> strongly believe that dockerized approach to benchmarking is a road to
> misleading and false positives, I'll prepare a PoC of Tiden in dockerized
> environment to support 'fast development prototyping' usecase Nikolay team
> insist on. It should be a matter of few days.
> >
> > As a side note, I've run Anton PoC locally and would like to have some
> comments about results:
> >
> > Test system: Ubuntu 18.04, docker 19.03.6
> > Test commands:
> >
> >
> > git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
> > cd ignite
> > mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> > cd modules/ducktests/tests/docker
> > ./run_tests.sh
> >
> > Test results:
> >
> ====================================================================================================
> > SESSION REPORT (ALL TESTS)
> > ducktape version: 0.7.7
> > session_id:       2020-07-05--004
> > run time:         7 minutes 36.360 seconds
> > tests run:        5
> > passed:           3
> > failed:           2
> > ignored:          0
> >
> ====================================================================================================
> > test_id:
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> > status:     FAIL
> > run time:   3 minutes 12.232 seconds
> >
> ----------------------------------------------------------------------------------------------------
> > test_id:
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> > status:     FAIL
> > run time:   1 minute 33.076 seconds
> >
> >
> > Is it OK for those tests to fail? Attached is full test report
> >
> >
> > On 02.07.2020 17:46, Denis Magda wrote:
> >> Folks,
> >> Please share the summary of that Slack conversation here for records
> once
> >> you find common ground.
> >> -
> >> Denis
> >> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org>
> wrote:
> >>> Igniters.
> >>>
> >>> All who are interested in integration testing framework discussion are
> >>> welcome into slack channel -
> >>>
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> >>>
> >>>
> >>>
> >>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
> >>>>
> >>>> Max,
> >>>> Thanks for joining us.
> >>>>
> >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>> dependencies being deployed by external scripts.
> >>>> No. It is important to distinguish development, deploy, and
> >>> orchestration.
> >>>> All-in-one solutions have extremely limited usability.
> >>>> As to Ducktests:
> >>>> Docker is responsible for deployments during development.
> >>>> CI/CD is responsible for deployments during release and nightly
> checks.
> >>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> >>>> Ducktape is responsible for orchestration.
> >>>>
> >>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>> fashion,
> >>>>> while ducktape internally does all actions sequentially.
> >>>> No. Ducktape may start any service in parallel. See Pme-free benchmark
> >>> [1] for details.
> >>>>
> >>>>> if we used ducktape solution we would have to instead prepare some
> >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> >>>>> Ansible or Chef.
> >>>> Sure, because a way of deploy depends on infrastructure.
> >>>> How can we be sure that OS we use and the restrictions we have will be
> >>> compatible with Tiden?
> >>>>
> >>>>> You have solved this deficiency with docker by putting all
> dependencies
> >>>>> into one uber-image ...
> >>>> and
> >>>>> I guess we all know about docker hyped ability to run over
> distributed
> >>>>> virtual networks.
> >>>> It is very important not to confuse the test's development (docker
> image
> >>> you're talking about) and real deployment.
> >>>>
> >>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> >>>> All actions can be performed in parallel.
> >>>> See how Ducktests [2] starts cluster in parallel for example.
> >>>>
> >>>> [1]
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> >>>> [2]
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >>>>
> >>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
> >>> wrote:
> >>>> Hello, Maxim.
> >>>>
> >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>> dependencies being deployed by external scripts
> >>>>
> >>>> Why do you think that maintaining deploy scripts coupled with the
> >>> testing framework is an advantage?
> >>>> I thought we want to see and maintain deployment scripts separate from
> >>> the testing framework.
> >>>>
> >>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>> fashion, while ducktape internally does all actions sequentially.
> >>>>
> >>>> Can you, please, clarify, what actions do you have in mind?
> >>>> And why we want to execute them concurrently?
> >>>> Ignite node start, Client application execution can be done
> concurrently
> >>> with the ducktape approach.
> >>>>
> >>>>> If we used ducktape solution we would have to instead prepare some
> >>> deployment scripts to pre-initialize Sberbank hosts, for example, with
> >>> Ansible or Chef
> >>>>
> >>>> We shouldn’t take some user approach as an argument in this
> discussion.
> >>> Let’s discuss a general approach for all users of the Ignite. Anyway,
> what
> >>> is wrong with the external deployment script approach?
> >>>>
> >>>> We, as a community, should provide several ways to run integration
> tests
> >>> out-of-the-box AND the ability to customize deployment regarding the
> user
> >>> landscape.
> >>>>
> >>>>> You have solved this deficiency with docker by putting all
> >>> dependencies into one uber-image and that looks like simple and elegant
> >>> solution however, that effectively limits you to single-host testing.
> >>>>
> >>>> Docker image should be used only by the Ignite developers to test
> >>> something locally.
> >>>> It’s not intended for some real-world testing.
> >>>>
> >>>> The main issue with the Tiden that I see, it tested and maintained as
> a
> >>> closed source solution.
> >>>> This can lead to the hard to solve problems when we start using and
> >>> maintaining it as an open-source solution.
> >>>> Like, how many developers used Tiden? And how many of developers were
> >>> not authors of the Tiden itself?
> >>>>
> >>>>
> >>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
> >>> написал(а):
> >>>>>
> >>>>> Anton, Nikolay,
> >>>>>
> >>>>> Let's agree on what we are arguing about: whether it is about "like
> or
> >>> don't like" or about technical properties of suggested solutions.
> >>>>>
> >>>>> If it is about likes and dislikes, then the whole discussion is
> >>> meaningless. However, I hope together we can analyse pros and cons
> >>> carefully.
> >>>>>
> >>>>> As far as I can understand now, two main differences between ducktape
> >>> and tiden is that:
> >>>>>
> >>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>> dependencies being deployed by external scripts.
> >>>>>
> >>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>> fashion, while ducktape internally does all actions sequentially.
> >>>>>
> >>>>> As for me, these are very important properties for distributed
> testing
> >>> framework.
> >>>>>
> >>>>> First property let us easily reuse tiden in existing infrastructures,
> >>> for example, during Zookeeper IEP testing at Sberbank site we used the
> same
> >>> tiden scripts that we use in our lab, the only change was putting a
> list of
> >>> hosts into config.
> >>>>>
> >>>>> If we used ducktape solution we would have to instead prepare some
> >>> deployment scripts to pre-initialize Sberbank hosts, for example, with
> >>> Ansible or Chef.
> >>>>>
> >>>>>
> >>>>> You have solved this deficiency with docker by putting all
> >>> dependencies into one uber-image and that looks like simple and elegant
> >>> solution,
> >>>>> however, that effectively limits you to single-host testing.
> >>>>>
> >>>>> I guess we all know about docker hyped ability to run over
> distributed
> >>> virtual networks. We used to go that way, but quickly found that it is
> more
> >>> of the hype than real work. In real environments, there are problems
> with
> >>> routing, DNS, multicast and broadcast traffic, and many others, that
> turn
> >>> docker-based distributed solution into a fragile hard-to-maintain
> monster.
> >>>>>
> >>>>> Please, if you believe otherwise, perform a run of your PoC over at
> >>> least two physical hosts and share results with us.
> >>>>>
> >>>>> If you consider that one physical docker host is enough, please,
> don't
> >>> overlook that we want to run real scale scenarios, with 50-100 cache
> >>> groups, persistence enabled and a millions of keys loaded.
> >>>>>
> >>>>> Practical limit for such configurations is 4-6 nodes per single
> >>> physical host. Otherwise, tests become flaky due to resource
> starvation.
> >>>>>
> >>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> >>> your PoC with other tests running at TC (we're targeting TeamCity,
> right?)
> >>> and share results so we could check if the numbers are reproducible.
> >>>>>
> >>>>> I stress this once more: functional integration tests are OK to run
> in
> >>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >>>>>
> >>>>>
> >>>>> Second property let us write tests that require real-parallel actions
> >>> over hosts.
> >>>>>
> >>>>> For example, agreed scenario for PME benchmarkduring "PME
> optimization
> >>> stream" was as follows:
> >>>>>
> >>>>>  - 10 server nodes, preloaded with 1M of keys
> >>>>>  - 4 client nodes perform transactional load  (client nodes
> physically
> >>> separated from server nodes)
> >>>>>  - during load:
> >>>>>  -- 5 server nodes stopped in parallel
> >>>>>  -- after 1 minute, all 5 nodes are started in parallel
> >>>>>  - load stopped, logs are analysed for exchange times.
> >>>>>
> >>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
> >>> then partition map exchange merge would not happen and we could not
> have
> >>> measured PME optimizations for that case.
> >>>>>
> >>>>>
> >>>>> These are limitations of ducktape that we believe as a more important
> >>>>> argument "against" than you provide "for".
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> >>>>>> Folks,
> >>>>>> First, I've created PR [1] with ducktests improvements
> >>>>>> PR contains the following changes
> >>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> 2.8)
> >>>>>> - Global refactoring
> >>>>>> -- benchmarks javacode simplification
> >>>>>> -- services python and java classes code deduplication
> >>>>>> -- fail-fast checks for java and python (eg. application should
> >>> explicitly write it finished with success)
> >>>>>> -- simple results extraction from tests and benchmarks
> >>>>>> -- javacode now configurable from tests/benchmarks
> >>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> >>> operation and log results)
> >>>>>> -- docker volume now marked as delegated to increase execution speed
> >>> for mac & win users
> >>>>>> -- Ignite cluster now start in parallel (start speed-up)
> >>>>>> -- Ignite can be configured at test/benchmark
> >>>>>> - full and module assembly scripts added
> >>>>> Great job done! But let me remind one of Apache Ignite principles:
> >>>>> week of thinking save months of development.
> >>>>>
> >>>>>
> >>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> >>> integration) as a target "PoC check & real topology benchmarking tool".
> >>>>>> Ducktape pros
> >>>>>> - Developed for distributed system by distributed system developers.
> >>>>> So does Tiden
> >>>>>
> >>>>>> - Developed since 2014, stable.
> >>>>> Tiden is also pretty stable, and development start date is not a good
> >>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> >>> distributed testing) is since 2010, but we don't see it as a
> alternative at
> >>> all.
> >>>>>
> >>>>>> - Proven usability by usage at Kafka.
> >>>>> Tiden is proven usable by usage at GridGain and Sberbank deployments.
> >>>>> Core, storage, sql and tx teams use benchmark results provided by
> >>> Tiden on a daily basis.
> >>>>>
> >>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great example
> >>> pack.
> >>>>> We'll donate some of our suites to Ignite as I've mentioned in
> >>> previous letter.
> >>>>>
> >>>>>> - Built-in Docker support for rapid development and checks.
> >>>>> False, there's no specific 'docker support' in ducktape itself, you
> >>> just wrap it in docker by yourself, because ducktape is lacking
> deployment
> >>> abilities.
> >>>>>
> >>>>>> - Great for CI automation.
> >>>>> False, there's no specific CI-enabled features in ducktape. Tiden, on
> >>> the other hand, provide generic xUnit reporting format, which is
> supported
> >>> by both TeamCity and Jenkins. Also, instead of using private keys,
> Tiden
> >>> can use SSH agent, which is also great for CI, because both
> >>>>> TeamCity and Jenkins store keys in secret storage available only for
> >>> ssh-agent and only for the time of the test.
> >>>>>
> >>>>>
> >>>>>>> As an additional motivation, at least 3 teams
> >>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> Zabbix
> >>> speed-up))
> >>>>>> - Ignite SE Plugins team (to check plugin's features does not
> >>> slow-down or broke AI features)
> >>>>>> - Ignite SE QA team (to append already developed smoke/load/failover
> >>> tests to AI codebase)
> >>>>>
> >>>>> Please, before recommending your tests to other teams, provide proofs
> >>>>> that your tests are reproducible in real environment.
> >>>>>
> >>>>>
> >>>>>> now, wait for ducktest merge to start checking cases they working on
> >>> in AI way.
> >>>>>> Thoughts?
> >>>>> Let us together review both solutions, we'll try to run your tests in
> >>> our lab, and you'll try to at least checkout tiden and see if same
> tests
> >>> can be implemented with it?
> >>>>>
> >>>>>
> >>>>>
> >>>>>> [1] https://github.com/apache/ignite/pull/7967
> >>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> nizhikov@apache.org
> >>> <ma...@apache.org>> wrote:
> >>>>>>    Hello, Maxim.
> >>>>>>    Thank you for so detailed explanation.
> >>>>>>    Can we put the content of this discussion somewhere on the wiki?
> >>>>>>    So It doesn’t get lost.
> >>>>>>    I divide the answer in several parts. From the requirements to
> the
> >>>>>>    implementation.
> >>>>>>    So, if we agreed on the requirements we can proceed with the
> >>>>>>    discussion of the implementation.
> >>>>>>    1. Requirements:
> >>>>>>    The main goal I want to achieve is *reproducibility* of the
> tests.
> >>>>>>    I’m sick and tired with the zillions of flaky, rarely failed, and
> >>>>>>    almost never failed tests in Ignite codebase.
> >>>>>>    We should start with the simplest scenarios that will be as
> >>> reliable
> >>>>>>    as steel :)
> >>>>>>    I want to know for sure:
> >>>>>>       - Is this PR makes rebalance quicker or not?
> >>>>>>       - Is this PR makes PME quicker or not?
> >>>>>>    So, your description of the complex test scenario looks as a next
> >>>>>>    step to me.
> >>>>>>    Anyway, It’s cool we already have one.
> >>>>>>    The second goal is to have a strict test lifecycle as we have in
> >>>>>>    JUnit and similar frameworks.
> >>>>>>     > It covers production-like deployment and running a scenarios
> >>> over
> >>>>>>    a single database instance.
> >>>>>>    Do you mean «single cluster» or «single host»?
> >>>>>>    2. Existing tests:
> >>>>>>     > A Combinator suite allows to run set of operations
> concurrently
> >>>>>>    over given database instance.
> >>>>>>     > A Consumption suite allows to run a set production-like
> actions
> >>>>>>    over given set of Ignite/GridGain versions and compare test
> metrics
> >>>>>>    across versions
> >>>>>>     > A Yardstick suite
> >>>>>>     > A Stress suite that simulates hardware environment degradation
> >>>>>>     > An Ultimate, DR and Compatibility suites that performs
> >>> functional
> >>>>>>    regression testing
> >>>>>>     > Regression
> >>>>>>    Great news that we already have so many choices for testing!
> >>>>>>    Mature test base is a big +1 for Tiden.
> >>>>>>    3. Comparison:
> >>>>>>     > Criteria: Test configuration
> >>>>>>     > Ducktape: single JSON string for all tests
> >>>>>>     > Tiden: any number of YaML config files, command line option
> for
> >>>>>>    fine-grained test configuration, ability to select/modify tests
> >>>>>>    behavior based on Ignite version.
> >>>>>>    1. Many YAML files can be hard to maintain.
> >>>>>>    2. In ducktape, you can set parameters via «—parameters» option.
> >>>>>>    Please, take a look at the doc [1]
> >>>>>>     > Criteria: Cluster control
> >>>>>>     > Tiden: additionally can address cluster as a whole and execute
> >>>>>>    remote commands in parallel.
> >>>>>>    It seems we implement this ability in the PoC, already.
> >>>>>>     > Criteria: Test assertions
> >>>>>>     > Tiden: simple asserts, also few customized assertion helpers.
> >>>>>>     > Ducktape: simple asserts.
> >>>>>>    Can you, please, be more specific.
> >>>>>>    What helpers do you have in mind?
> >>>>>>    Ducktape has an asserts that waits for logfile messages or some
> >>>>>>    process finish.
> >>>>>>     > Criteria: Test reporting
> >>>>>>     > Ducktape: limited to its own text/HTML format
> >>>>>>    Ducktape have
> >>>>>>    1. Text reporter
> >>>>>>    2. Customizable HTML reporter
> >>>>>>    3. JSON reporter.
> >>>>>>    We can show JSON with the any template or tool.
> >>>>>>     > Criteria: Provisioning and deployment
> >>>>>>     > Ducktape: can provision subset of hosts from cluster for test
> >>>>>>    needs. However, that means, that test can’t be scaled without
> test
> >>>>>>    code changes. Does not do any deploy, relies on external means,
> >>> e.g.
> >>>>>>    pre-packaged in docker image, as in PoC.
> >>>>>>    This is not true.
> >>>>>>    1. We can set explicit test parameters(node number) via
> parameters.
> >>>>>>    We can increase client count of cluster size without test code
> >>> changes.
> >>>>>>    2. We have many choices for the test environment. These choices
> are
> >>>>>>    tested and used in other projects:
> >>>>>>             * docker
> >>>>>>             * vagrant
> >>>>>>             * private cloud(ssh access)
> >>>>>>             * ec2
> >>>>>>    Please, take a look at Kafka documentation [2]
> >>>>>>     > I can continue more on this, but it should be enough for now:
> >>>>>>    We need to go deeper! :)
> >>>>>>    [1]
> >>>>>>
> >>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>>>>>    [2]
> >>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>>>>>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
> >>>>>>    <ma...@yandex.ru>> написал(а):
> >>>>>>     >
> >>>>>>     > Greetings, Nikolay,
> >>>>>>     >
> >>>>>>     > First of all, thank you for you great effort preparing PoC of
> >>>>>>    integration testing to Ignite community.
> >>>>>>     >
> >>>>>>     > It’s a shame Ignite did not have at least some such tests yet,
> >>>>>>    however, GridGain, as a major contributor to Apache Ignite had a
> >>>>>>    profound collection of in-house tools to perform integration and
> >>>>>>    performance testing for years already and while we slowly
> consider
> >>>>>>    sharing our expertise with the community, your initiative makes
> us
> >>>>>>    drive that process a bit faster, thanks a lot!
> >>>>>>     >
> >>>>>>     > I reviewed your PoC and want to share a little about what we
> do
> >>>>>>    on our part, why and how, hope it would help community take
> proper
> >>>>>>    course.
> >>>>>>     >
> >>>>>>     > First I’ll do a brief overview of what decisions we made and
> >>> what
> >>>>>>    we do have in our private code base, next I’ll describe what we
> >>> have
> >>>>>>    already donated to the public and what we plan public next, then
> >>>>>>    I’ll compare both approaches highlighting deficiencies in order
> to
> >>>>>>    spur public discussion on the matter.
> >>>>>>     >
> >>>>>>     > It might seem strange to use Python to run Bash to run Java
> >>>>>>    applications because that introduces IT industry best of breed’ –
> >>>>>>    the Python dependency hell – to the Java application code base.
> The
> >>>>>>    only strangest decision one can made is to use Maven to run
> Docker
> >>>>>>    to run Bash to run Python to run Bash to run Java, but desperate
> >>>>>>    times call for desperate measures I guess.
> >>>>>>     >
> >>>>>>     > There are Java-based solutions for integration testing exists,
> >>>>>>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go
> >>> well
> >>>>>>    for Ignite community CI pipelines by them selves. But we also
> >>> wanted
> >>>>>>    to run performance tests and benchmarks, like the dreaded PME
> >>>>>>    benchmark, and this is solved by totally different set of tools
> in
> >>>>>>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>>>>>     >
> >>>>>>     > Speaking specifically about benchmarking, Apache Ignite
> >>> community
> >>>>>>    already has Yardstick [6], and there’s nothing wrong with writing
> >>>>>>    PME benchmark using Yardstick, but we also wanted to be able to
> run
> >>>>>>    scenarios like this:
> >>>>>>     > - put an X load to a Ignite database;
> >>>>>>     > - perform an Y set of operations to check how Ignite copes
> with
> >>>>>>    operations under load.
> >>>>>>     >
> >>>>>>     > And yes, we also wanted applications under test be deployed
> >>> ‘like
> >>>>>>    in a production’, e.g. distributed over a set of hosts. This
> arises
> >>>>>>    questions about provisioning and nodes affinity which I’ll cover
> in
> >>>>>>    detail later.
> >>>>>>     >
> >>>>>>     > So we decided to put a little effort to build a simple tool to
> >>>>>>    cover different integration and performance scenarios, and our QA
> >>>>>>    lab first attempt was PoC-Tester [7], currently open source for
> all
> >>>>>>    but for reporting web UI. It’s a quite simple to use 95%
> Java-based
> >>>>>>    tool targeted to be run on a pre-release QA stage.
> >>>>>>     >
> >>>>>>     > It covers production-like deployment and running a scenarios
> >>> over
> >>>>>>    a single database instance. PoC-Tester scenarios consists of a
> >>>>>>    sequence of tasks running sequentially or in parallel. After all
> >>>>>>    tasks complete, or at any time during test, user can run logs
> >>>>>>    collection task, logs are checked against exceptions and a
> summary
> >>>>>>    of found issues and task ops/latency statistics is generated at
> the
> >>>>>>    end of scenario. One of the main PoC-Tester features is its
> >>>>>>    fire-and-forget approach to task managing. That is, you can
> deploy
> >>> a
> >>>>>>    grid and left it running for weeks, periodically firing some
> tasks
> >>>>>>    onto it.
> >>>>>>     >
> >>>>>>     > During earliest stages of PoC-Tester development it becomes
> >>> quite
> >>>>>>    clear that Java application development is a tedious process and
> >>>>>>    architecture decisions you take during development are slow and
> >>> hard
> >>>>>>    to change.
> >>>>>>     > For example, scenarios like this
> >>>>>>     > - deploy two instances of GridGain with master-slave data
> >>>>>>    replication configured;
> >>>>>>     > - put a load on master;
> >>>>>>     > - perform checks on slave,
> >>>>>>     > or like this:
> >>>>>>     > - preload a 1Tb of data by using your favorite tool of choice
> to
> >>>>>>    an Apache Ignite of version X;
> >>>>>>     > - run a set of functional tests running Apache Ignite version
> Y
> >>>>>>    over preloaded data,
> >>>>>>     > do not fit well in the PoC-Tester workflow.
> >>>>>>     >
> >>>>>>     > So, this is why we decided to use Python as a generic
> scripting
> >>>>>>    language of choice.
> >>>>>>     >
> >>>>>>     > Pros:
> >>>>>>     > - quicker prototyping and development cycles
> >>>>>>     > - easier to find DevOps/QA engineer with Python skills than
> one
> >>>>>>    with Java skills
> >>>>>>     > - used extensively all over the world for DevOps/CI pipelines
> >>> and
> >>>>>>    thus has rich set of libraries for all possible integration uses
> >>> cases.
> >>>>>>     >
> >>>>>>     > Cons:
> >>>>>>     > - Nightmare with dependencies. Better stick to specific
> >>>>>>    language/libraries version.
> >>>>>>     >
> >>>>>>     > Comparing alternatives for Python-based testing framework we
> >>> have
> >>>>>>    considered following requirements, somewhat similar to what
> you’ve
> >>>>>>    mentioned for Confluent [8] previously:
> >>>>>>     > - should be able run locally or distributed (bare metal or in
> >>> the
> >>>>>>    cloud)
> >>>>>>     > - should have built-in deployment facilities for applications
> >>>>>>    under test
> >>>>>>     > - should separate test configuration and test code
> >>>>>>     > -- be able to easily reconfigure tests by simple configuration
> >>>>>>    changes
> >>>>>>     > -- be able to easily scale test environment by simple
> >>>>>>    configuration changes
> >>>>>>     > -- be able to perform regression testing by simple switching
> >>>>>>    artifacts under test via configuration
> >>>>>>     > -- be able to run tests with different JDK version by simple
> >>>>>>    configuration changes
> >>>>>>     > - should have human readable reports and/or reporting tools
> >>>>>>    integration
> >>>>>>     > - should allow simple test progress monitoring, one does not
> >>> want
> >>>>>>    to run 6-hours test to find out that application actually crashed
> >>>>>>    during first hour.
> >>>>>>     > - should allow parallel execution of test actions
> >>>>>>     > - should have clean API for test writers
> >>>>>>     > -- clean API for distributed remote commands execution
> >>>>>>     > -- clean API for deployed applications start / stop and other
> >>>>>>    operations
> >>>>>>     > -- clean API for performing check on results
> >>>>>>     > - should be open source or at least source code should allow
> >>> ease
> >>>>>>    change or extension
> >>>>>>     >
> >>>>>>     > Back at that time we found no better alternative than to write
> >>>>>>    our own framework, and here goes Tiden [9] as GridGain framework
> of
> >>>>>>    choice for functional integration and performance testing.
> >>>>>>     >
> >>>>>>     > Pros:
> >>>>>>     > - solves all the requirements above
> >>>>>>     > Cons (for Ignite):
> >>>>>>     > - (currently) closed GridGain source
> >>>>>>     >
> >>>>>>     > On top of Tiden we’ve built a set of test suites, some of
> which
> >>>>>>    you might have heard already.
> >>>>>>     >
> >>>>>>     > A Combinator suite allows to run set of operations
> concurrently
> >>>>>>    over given database instance. Proven to find at least 30+ race
> >>>>>>    conditions and NPE issues.
> >>>>>>     >
> >>>>>>     > A Consumption suite allows to run a set production-like
> actions
> >>>>>>    over given set of Ignite/GridGain versions and compare test
> metrics
> >>>>>>    across versions, like heap/disk/CPU consumption, time to perform
> >>>>>>    actions, like client PME, server PME, rebalancing time, data
> >>>>>>    replication time, etc.
> >>>>>>     >
> >>>>>>     > A Yardstick suite is a thin layer of Python glue code to run
> >>>>>>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
> >>>>>>    mediocre deployment capabilities, Tiden solves this easily.
> >>>>>>     >
> >>>>>>     > A Stress suite that simulates hardware environment degradation
> >>>>>>    during testing.
> >>>>>>     >
> >>>>>>     > An Ultimate, DR and Compatibility suites that performs
> >>> functional
> >>>>>>    regression testing of GridGain Ultimate Edition features like
> >>>>>>    snapshots, security, data replication, rolling upgrades, etc.
> >>>>>>     >
> >>>>>>     > A Regression and some IEPs testing suites, like IEP-14,
> IEP-15,
> >>>>>>    etc, etc, etc.
> >>>>>>     >
> >>>>>>     > Most of the suites above use another in-house developed Java
> >>> tool
> >>>>>>    – PiClient – to perform actual loading and miscellaneous
> operations
> >>>>>>    with Ignite under test. We use py4j Python-Java gateway library
> to
> >>>>>>    control PiClient instances from the tests.
> >>>>>>     >
> >>>>>>     > When we considered CI, we put TeamCity out of scope, because
> >>>>>>    distributed integration and performance tests tend to run for
> hours
> >>>>>>    and TeamCity agents are scarce and costly resource. So, bundled
> >>> with
> >>>>>>    Tiden there is jenkins-job-builder [10] based CI pipelines and
> >>>>>>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
> >>> test
> >>>>>>    run reports across versions and has built in visualization
> support
> >>>>>>    for Combinator suite.
> >>>>>>     >
> >>>>>>     > All of the above is currently closed source, but we plan to
> make
> >>>>>>    it public for community, and publishing Tiden core [9] is the
> first
> >>>>>>    step on that way. You can review some examples of using Tiden for
> >>>>>>    tests at my repository [11], for start.
> >>>>>>     >
> >>>>>>     > Now, let’s compare Ducktape PoC and Tiden.
> >>>>>>     >
> >>>>>>     > Criteria: Language
> >>>>>>     > Tiden: Python, 3.7
> >>>>>>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>>>>>    compatible, but actually can’t work with Python 3.7 due to broken
> >>>>>>    Zmq dependency.
> >>>>>>     > Comment: Python 3.7 has a much better support for async-style
> >>>>>>    code which might be crucial for distributed application testing.
> >>>>>>     > Score: Tiden: 1, Ducktape: 0
> >>>>>>     >
> >>>>>>     > Criteria: Test writers API
> >>>>>>     > Supported integration test framework concepts are basically
> the
> >>> same:
> >>>>>>     > - a test controller (test runner)
> >>>>>>     > - a cluster
> >>>>>>     > - a node
> >>>>>>     > - an application (a service in Ducktape terms)
> >>>>>>     > - a test
> >>>>>>     > Score: Tiden: 5, Ducktape: 5
> >>>>>>     >
> >>>>>>     > Criteria: Tests selection and run
> >>>>>>     > Ducktape: suite-package-class-method level selection, internal
> >>>>>>    scheduler allows to run tests in suite in parallel.
> >>>>>>     > Tiden: also suite-package-class-method level selection,
> >>>>>>    additionally allows selecting subset of tests by attribute,
> >>> parallel
> >>>>>>    runs not built in, but allows merging test reports after
> different
> >>> runs.
> >>>>>>     > Score: Tiden: 2, Ducktape: 2
> >>>>>>     >
> >>>>>>     > Criteria: Test configuration
> >>>>>>     > Ducktape: single JSON string for all tests
> >>>>>>     > Tiden: any number of YaML config files, command line option
> for
> >>>>>>    fine-grained test configuration, ability to select/modify tests
> >>>>>>    behavior based on Ignite version.
> >>>>>>     > Score: Tiden: 3, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Cluster control
> >>>>>>     > Ducktape: allow execute remote commands by node granularity
> >>>>>>     > Tiden: additionally can address cluster as a whole and execute
> >>>>>>    remote commands in parallel.
> >>>>>>     > Score: Tiden: 2, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Logs control
> >>>>>>     > Both frameworks have similar builtin support for remote logs
> >>>>>>    collection and grepping. Tiden has built-in plugin that can zip,
> >>>>>>    collect arbitrary log files from arbitrary locations at
> >>>>>>    test/module/suite granularity and unzip if needed, also
> application
> >>>>>>    API to search / wait for messages in logs. Ducktape allows each
> >>>>>>    service declare its log files location (seemingly does not
> support
> >>>>>>    logs rollback), and a single entrypoint to collect service logs.
> >>>>>>     > Score: Tiden: 1, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Test assertions
> >>>>>>     > Tiden: simple asserts, also few customized assertion helpers.
> >>>>>>     > Ducktape: simple asserts.
> >>>>>>     > Score: Tiden: 2, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Test reporting
> >>>>>>     > Ducktape: limited to its own text/html format
> >>>>>>     > Tiden: provides text report, yaml report for reporting tools
> >>>>>>    integration, XML xUnit report for integration with
> >>> Jenkins/TeamCity.
> >>>>>>     > Score: Tiden: 3, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Provisioning and deployment
> >>>>>>     > Ducktape: can provision subset of hosts from cluster for test
> >>>>>>    needs. However, that means, that test can’t be scaled without
> test
> >>>>>>    code changes. Does not do any deploy, relies on external means,
> >>> e.g.
> >>>>>>    pre-packaged in docker image, as in PoC.
> >>>>>>     > Tiden: Given a set of hosts, Tiden uses all of them for the
> >>> test.
> >>>>>>    Provisioning should be done by external means. However, provides
> a
> >>>>>>    conventional automated deployment routines.
> >>>>>>     > Score: Tiden: 1, Ducktape: 1
> >>>>>>     >
> >>>>>>     > Criteria: Documentation and Extensibility
> >>>>>>     > Tiden: current API documentation is limited, should change as
> we
> >>>>>>    go open source. Tiden is easily extensible via hooks and plugins,
> >>>>>>    see example Maven plugin and Gatling application at [11].
> >>>>>>     > Ducktape: basic documentation at readthedocs.io
> >>>>>>    <http://readthedocs.io>. Codebase is rigid, framework core is
> >>>>>>    tightly coupled and hard to change. The only possible extension
> >>>>>>    mechanism is fork-and-rewrite.
> >>>>>>     > Score: Tiden: 2, Ducktape: 1
> >>>>>>     >
> >>>>>>     > I can continue more on this, but it should be enough for now:
> >>>>>>     > Overall score: Tiden: 22, Ducktape: 14.
> >>>>>>     >
> >>>>>>     > Time for discussion!
> >>>>>>     >
> >>>>>>     > ---
> >>>>>>     > [1] - https://www.testcontainers.org/
> >>>>>>     > [2] - http://arquillian.org/guides/getting_started/
> >>>>>>     > [3] - https://jmeter.apache.org/index.html
> >>>>>>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>     > [5] - https://gatling.io/docs/current/
> >>>>>>     > [6] - https://github.com/gridgain/yardstick
> >>>>>>     > [7] - https://github.com/gridgain/poc-tester
> >>>>>>     > [8] -
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>>>>>     > [9] - https://github.com/gridgain/tiden
> >>>>>>     > [10] - https://pypi.org/project/jenkins-job-builder/
> >>>>>>     > [11] - https://github.com/mshonichev/tiden_examples
> >>>>>>     >
> >>>>>>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>>>>>     >> Hello,
> >>>>>>     >>
> >>>>>>     >> Branch with duck tape created -
> >>>>>>    https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>     >>
> >>>>>>     >> Any who are willing to contribute to PoC are welcome.
> >>>>>>     >>
> >>>>>>     >>
> >>>>>>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>>>>>    <nizhikov.dev@gmail.com <ma...@gmail.com>>
> >>> написал(а):
> >>>>>>     >>>
> >>>>>>     >>> Hello, Denis.
> >>>>>>     >>>
> >>>>>>     >>> There is no rush with these improvements.
> >>>>>>     >>> We can wait for Maxim proposal and compare two solutions :)
> >>>>>>     >>>
> >>>>>>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> >>>>>>    <ma...@apache.org>> написал(а):
> >>>>>>     >>>>
> >>>>>>     >>>> Hi Nikolay,
> >>>>>>     >>>>
> >>>>>>     >>>> Thanks for kicking off this conversation and sharing your
> >>>>>>    findings with the
> >>>>>>     >>>> results. That's the right initiative. I do agree that
> Ignite
> >>>>>>    needs to have
> >>>>>>     >>>> an integration testing framework with capabilities listed
> by
> >>> you.
> >>>>>>     >>>>
> >>>>>>     >>>> As we discussed privately, I would only check if instead of
> >>>>>>     >>>> Confluent's Ducktape library, we can use an integration
> >>>>>>    testing framework
> >>>>>>     >>>> developed by GridGain for testing of Ignite/GridGain
> >>> clusters.
> >>>>>>    That
> >>>>>>     >>>> framework has been battle-tested and might be more
> >>> convenient for
> >>>>>>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> >>>>>>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>>
> >>> who
> >>>>>>    promised to join this thread once he finishes
> >>>>>>     >>>> preparing the usage examples of the framework. To my
> >>>>>>    knowledge, Max has
> >>>>>>     >>>> already been working on that for several days.
> >>>>>>     >>>>
> >>>>>>     >>>> -
> >>>>>>     >>>> Denis
> >>>>>>     >>>>
> >>>>>>     >>>>
> >>>>>>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>>>>>    <nizhikov@apache.org <ma...@apache.org>>
> >>>>>>     >>>> wrote:
> >>>>>>     >>>>
> >>>>>>     >>>>> Hello, Igniters.
> >>>>>>     >>>>>
> >>>>>>     >>>>> I created a PoC [1] for the integration tests of Ignite.
> >>>>>>     >>>>>
> >>>>>>     >>>>> Let me briefly explain the gap I want to cover:
> >>>>>>     >>>>>
> >>>>>>     >>>>> 1. For now, we don’t have a solution for automated testing
> >>> of
> >>>>>>    Ignite on
> >>>>>>     >>>>> «real cluster».
> >>>>>>     >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>>     >>>>>       * client and server nodes deployed on different
> hosts.
> >>>>>>     >>>>>       * thin clients perform queries from some other hosts
> >>>>>>     >>>>>       * etc.
> >>>>>>     >>>>>
> >>>>>>     >>>>> 2. We don’t have a solution for automated benchmarks of
> some
> >>>>>>    internal
> >>>>>>     >>>>> Ignite process
> >>>>>>     >>>>>       * PME
> >>>>>>     >>>>>       * rebalance.
> >>>>>>     >>>>> This means we don’t know - Do we perform rebalance(or PME)
> >>> in
> >>>>>>    2.7.0 faster
> >>>>>>     >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>>     >>>>>
> >>>>>>     >>>>> 3. We don’t have a solution for automated testing of
> Ignite
> >>>>>>    integration in
> >>>>>>     >>>>> a real-world environment:
> >>>>>>     >>>>> Ignite-Spark integration can be taken as an example.
> >>>>>>     >>>>> I think some ML solutions also should be tested in
> >>> real-world
> >>>>>>    deployments.
> >>>>>>     >>>>>
> >>>>>>     >>>>> Solution:
> >>>>>>     >>>>>
> >>>>>>     >>>>> I propose to use duck tape library from confluent (apache
> >>> 2.0
> >>>>>>    license)
> >>>>>>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on
> >>> the
> >>>>>>    local
> >>>>>>     >>>>> environment(docker) and it works just fine.
> >>>>>>     >>>>>
> >>>>>>     >>>>> PoC contains following services:
> >>>>>>     >>>>>
> >>>>>>     >>>>>       * Simple rebalance test:
> >>>>>>     >>>>>               Start 2 server nodes,
> >>>>>>     >>>>>               Create some data with Ignite client,
> >>>>>>     >>>>>               Start one more server node,
> >>>>>>     >>>>>               Wait for rebalance finish
> >>>>>>     >>>>>       * Simple Ignite-Spark integration test:
> >>>>>>     >>>>>               Start 1 Spark master, start 1 Spark worker,
> >>>>>>     >>>>>               Start 1 Ignite server node
> >>>>>>     >>>>>               Create some data with Ignite client,
> >>>>>>     >>>>>               Check data in application that queries it
> from
> >>>>>>    Spark.
> >>>>>>     >>>>>
> >>>>>>     >>>>> All tests are fully automated.
> >>>>>>     >>>>> Logs collection works just fine.
> >>>>>>     >>>>> You can see an example of the tests report - [4].
> >>>>>>     >>>>>
> >>>>>>     >>>>> Pros:
> >>>>>>     >>>>>
> >>>>>>     >>>>> * Ability to test local changes(no need to public changes
> to
> >>>>>>    some remote
> >>>>>>     >>>>> repository or similar).
> >>>>>>     >>>>> * Ability to parametrize test environment(run the same
> tests
> >>>>>>    on different
> >>>>>>     >>>>> JDK, JVM params, config, etc.)
> >>>>>>     >>>>> * Isolation by default so system tests are as reliable as
> >>>>>>    possible.
> >>>>>>     >>>>> * Utilities for pulling up and tearing down services
> easily
> >>>>>>    in clusters in
> >>>>>>     >>>>> different environments (e.g. local, custom cluster,
> Vagrant,
> >>>>>>    K8s, Mesos,
> >>>>>>     >>>>> Docker, cloud providers, etc.)
> >>>>>>     >>>>> * Easy to write unit tests for distributed systems
> >>>>>>     >>>>> * Adopted and successfully used by other distributed open
> >>>>>>    source project -
> >>>>>>     >>>>> Apache Kafka.
> >>>>>>     >>>>> * Collect results (e.g. logs, console output)
> >>>>>>     >>>>> * Report results (e.g. expected conditions met,
> performance
> >>>>>>    results, etc.)
> >>>>>>     >>>>>
> >>>>>>     >>>>> WDYT?
> >>>>>>     >>>>>
> >>>>>>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>>>     >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>>>     >>>>> [3]
> >>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>>>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >>>>
> >>>
> >>>
> >>>
> > <2020-07-05--004.tar.gz>
>
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Maxim.

Thanks for writing down the minutes.

There is no such thing as «Nikolay team» on the dev-list.
I propose to focus on product requirements and what we want to gain from the framework instead of taking into account the needs of some team.

Can you, please, write down your version of requirements so we can reach a consensus on that and therefore move to the discussion of the implementation?

> 6 июля 2020 г., в 11:18, Max Shonichev <ms...@yandex.ru> написал(а):
> 
> Yes, Denis,
> 
> common ground seems to be as follows:
> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC over physical hosts and share benchmark results. In the meantime, while I strongly believe that dockerized approach to benchmarking is a road to misleading and false positives, I'll prepare a PoC of Tiden in dockerized environment to support 'fast development prototyping' usecase Nikolay team insist on. It should be a matter of few days.
> 
> As a side note, I've run Anton PoC locally and would like to have some comments about results:
> 
> Test system: Ubuntu 18.04, docker 19.03.6
> Test commands:
> 
> 
> git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
> cd ignite
> mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> cd modules/ducktests/tests/docker
> ./run_tests.sh
> 
> Test results:
> ====================================================================================================
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-05--004
> run time:         7 minutes 36.360 seconds
> tests run:        5
> passed:           3
> failed:           2
> ignored:          0
> ====================================================================================================
> test_id: ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> status:     FAIL
> run time:   3 minutes 12.232 seconds
> ----------------------------------------------------------------------------------------------------
> test_id: ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> status:     FAIL
> run time:   1 minute 33.076 seconds
> 
> 
> Is it OK for those tests to fail? Attached is full test report
> 
> 
> On 02.07.2020 17:46, Denis Magda wrote:
>> Folks,
>> Please share the summary of that Slack conversation here for records once
>> you find common ground.
>> -
>> Denis
>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org> wrote:
>>> Igniters.
>>> 
>>> All who are interested in integration testing framework discussion are
>>> welcome into slack channel -
>>> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
>>> 
>>> 
>>> 
>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
>>>> 
>>>> Max,
>>>> Thanks for joining us.
>>>> 
>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>>> dependencies being deployed by external scripts.
>>>> No. It is important to distinguish development, deploy, and
>>> orchestration.
>>>> All-in-one solutions have extremely limited usability.
>>>> As to Ducktests:
>>>> Docker is responsible for deployments during development.
>>>> CI/CD is responsible for deployments during release and nightly checks.
>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
>>>> Ducktape is responsible for orchestration.
>>>> 
>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>> fashion,
>>>>> while ducktape internally does all actions sequentially.
>>>> No. Ducktape may start any service in parallel. See Pme-free benchmark
>>> [1] for details.
>>>> 
>>>>> if we used ducktape solution we would have to instead prepare some
>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>>>> Ansible or Chef.
>>>> Sure, because a way of deploy depends on infrastructure.
>>>> How can we be sure that OS we use and the restrictions we have will be
>>> compatible with Tiden?
>>>> 
>>>>> You have solved this deficiency with docker by putting all dependencies
>>>>> into one uber-image ...
>>>> and
>>>>> I guess we all know about docker hyped ability to run over distributed
>>>>> virtual networks.
>>>> It is very important not to confuse the test's development (docker image
>>> you're talking about) and real deployment.
>>>> 
>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
>>>> All actions can be performed in parallel.
>>>> See how Ducktests [2] starts cluster in parallel for example.
>>>> 
>>>> [1]
>>> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
>>>> [2]
>>> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
>>>> 
>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
>>> wrote:
>>>> Hello, Maxim.
>>>> 
>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>> dependencies being deployed by external scripts
>>>> 
>>>> Why do you think that maintaining deploy scripts coupled with the
>>> testing framework is an advantage?
>>>> I thought we want to see and maintain deployment scripts separate from
>>> the testing framework.
>>>> 
>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>> fashion, while ducktape internally does all actions sequentially.
>>>> 
>>>> Can you, please, clarify, what actions do you have in mind?
>>>> And why we want to execute them concurrently?
>>>> Ignite node start, Client application execution can be done concurrently
>>> with the ducktape approach.
>>>> 
>>>>> If we used ducktape solution we would have to instead prepare some
>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>> Ansible or Chef
>>>> 
>>>> We shouldn’t take some user approach as an argument in this discussion.
>>> Let’s discuss a general approach for all users of the Ignite. Anyway, what
>>> is wrong with the external deployment script approach?
>>>> 
>>>> We, as a community, should provide several ways to run integration tests
>>> out-of-the-box AND the ability to customize deployment regarding the user
>>> landscape.
>>>> 
>>>>> You have solved this deficiency with docker by putting all
>>> dependencies into one uber-image and that looks like simple and elegant
>>> solution however, that effectively limits you to single-host testing.
>>>> 
>>>> Docker image should be used only by the Ignite developers to test
>>> something locally.
>>>> It’s not intended for some real-world testing.
>>>> 
>>>> The main issue with the Tiden that I see, it tested and maintained as a
>>> closed source solution.
>>>> This can lead to the hard to solve problems when we start using and
>>> maintaining it as an open-source solution.
>>>> Like, how many developers used Tiden? And how many of developers were
>>> not authors of the Tiden itself?
>>>> 
>>>> 
>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
>>> написал(а):
>>>>> 
>>>>> Anton, Nikolay,
>>>>> 
>>>>> Let's agree on what we are arguing about: whether it is about "like or
>>> don't like" or about technical properties of suggested solutions.
>>>>> 
>>>>> If it is about likes and dislikes, then the whole discussion is
>>> meaningless. However, I hope together we can analyse pros and cons
>>> carefully.
>>>>> 
>>>>> As far as I can understand now, two main differences between ducktape
>>> and tiden is that:
>>>>> 
>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>> dependencies being deployed by external scripts.
>>>>> 
>>>>> 2. tiden can execute actions over remote nodes in real parallel
>>> fashion, while ducktape internally does all actions sequentially.
>>>>> 
>>>>> As for me, these are very important properties for distributed testing
>>> framework.
>>>>> 
>>>>> First property let us easily reuse tiden in existing infrastructures,
>>> for example, during Zookeeper IEP testing at Sberbank site we used the same
>>> tiden scripts that we use in our lab, the only change was putting a list of
>>> hosts into config.
>>>>> 
>>>>> If we used ducktape solution we would have to instead prepare some
>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>> Ansible or Chef.
>>>>> 
>>>>> 
>>>>> You have solved this deficiency with docker by putting all
>>> dependencies into one uber-image and that looks like simple and elegant
>>> solution,
>>>>> however, that effectively limits you to single-host testing.
>>>>> 
>>>>> I guess we all know about docker hyped ability to run over distributed
>>> virtual networks. We used to go that way, but quickly found that it is more
>>> of the hype than real work. In real environments, there are problems with
>>> routing, DNS, multicast and broadcast traffic, and many others, that turn
>>> docker-based distributed solution into a fragile hard-to-maintain monster.
>>>>> 
>>>>> Please, if you believe otherwise, perform a run of your PoC over at
>>> least two physical hosts and share results with us.
>>>>> 
>>>>> If you consider that one physical docker host is enough, please, don't
>>> overlook that we want to run real scale scenarios, with 50-100 cache
>>> groups, persistence enabled and a millions of keys loaded.
>>>>> 
>>>>> Practical limit for such configurations is 4-6 nodes per single
>>> physical host. Otherwise, tests become flaky due to resource starvation.
>>>>> 
>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
>>> your PoC with other tests running at TC (we're targeting TeamCity, right?)
>>> and share results so we could check if the numbers are reproducible.
>>>>> 
>>>>> I stress this once more: functional integration tests are OK to run in
>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
>>>>> 
>>>>> 
>>>>> Second property let us write tests that require real-parallel actions
>>> over hosts.
>>>>> 
>>>>> For example, agreed scenario for PME benchmarkduring "PME optimization
>>> stream" was as follows:
>>>>> 
>>>>>  - 10 server nodes, preloaded with 1M of keys
>>>>>  - 4 client nodes perform transactional load  (client nodes physically
>>> separated from server nodes)
>>>>>  - during load:
>>>>>  -- 5 server nodes stopped in parallel
>>>>>  -- after 1 minute, all 5 nodes are started in parallel
>>>>>  - load stopped, logs are analysed for exchange times.
>>>>> 
>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
>>> then partition map exchange merge would not happen and we could not have
>>> measured PME optimizations for that case.
>>>>> 
>>>>> 
>>>>> These are limitations of ducktape that we believe as a more important
>>>>> argument "against" than you provide "for".
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
>>>>>> Folks,
>>>>>> First, I've created PR [1] with ducktests improvements
>>>>>> PR contains the following changes
>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
>>>>>> - Global refactoring
>>>>>> -- benchmarks javacode simplification
>>>>>> -- services python and java classes code deduplication
>>>>>> -- fail-fast checks for java and python (eg. application should
>>> explicitly write it finished with success)
>>>>>> -- simple results extraction from tests and benchmarks
>>>>>> -- javacode now configurable from tests/benchmarks
>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
>>> operation and log results)
>>>>>> -- docker volume now marked as delegated to increase execution speed
>>> for mac & win users
>>>>>> -- Ignite cluster now start in parallel (start speed-up)
>>>>>> -- Ignite can be configured at test/benchmark
>>>>>> - full and module assembly scripts added
>>>>> Great job done! But let me remind one of Apache Ignite principles:
>>>>> week of thinking save months of development.
>>>>> 
>>>>> 
>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
>>> integration) as a target "PoC check & real topology benchmarking tool".
>>>>>> Ducktape pros
>>>>>> - Developed for distributed system by distributed system developers.
>>>>> So does Tiden
>>>>> 
>>>>>> - Developed since 2014, stable.
>>>>> Tiden is also pretty stable, and development start date is not a good
>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
>>> distributed testing) is since 2010, but we don't see it as a alternative at
>>> all.
>>>>> 
>>>>>> - Proven usability by usage at Kafka.
>>>>> Tiden is proven usable by usage at GridGain and Sberbank deployments.
>>>>> Core, storage, sql and tx teams use benchmark results provided by
>>> Tiden on a daily basis.
>>>>> 
>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great example
>>> pack.
>>>>> We'll donate some of our suites to Ignite as I've mentioned in
>>> previous letter.
>>>>> 
>>>>>> - Built-in Docker support for rapid development and checks.
>>>>> False, there's no specific 'docker support' in ducktape itself, you
>>> just wrap it in docker by yourself, because ducktape is lacking deployment
>>> abilities.
>>>>> 
>>>>>> - Great for CI automation.
>>>>> False, there's no specific CI-enabled features in ducktape. Tiden, on
>>> the other hand, provide generic xUnit reporting format, which is supported
>>> by both TeamCity and Jenkins. Also, instead of using private keys, Tiden
>>> can use SSH agent, which is also great for CI, because both
>>>>> TeamCity and Jenkins store keys in secret storage available only for
>>> ssh-agent and only for the time of the test.
>>>>> 
>>>>> 
>>>>>>> As an additional motivation, at least 3 teams
>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
>>> speed-up))
>>>>>> - Ignite SE Plugins team (to check plugin's features does not
>>> slow-down or broke AI features)
>>>>>> - Ignite SE QA team (to append already developed smoke/load/failover
>>> tests to AI codebase)
>>>>> 
>>>>> Please, before recommending your tests to other teams, provide proofs
>>>>> that your tests are reproducible in real environment.
>>>>> 
>>>>> 
>>>>>> now, wait for ducktest merge to start checking cases they working on
>>> in AI way.
>>>>>> Thoughts?
>>>>> Let us together review both solutions, we'll try to run your tests in
>>> our lab, and you'll try to at least checkout tiden and see if same tests
>>> can be implemented with it?
>>>>> 
>>>>> 
>>>>> 
>>>>>> [1] https://github.com/apache/ignite/pull/7967
>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org
>>> <ma...@apache.org>> wrote:
>>>>>>    Hello, Maxim.
>>>>>>    Thank you for so detailed explanation.
>>>>>>    Can we put the content of this discussion somewhere on the wiki?
>>>>>>    So It doesn’t get lost.
>>>>>>    I divide the answer in several parts. From the requirements to the
>>>>>>    implementation.
>>>>>>    So, if we agreed on the requirements we can proceed with the
>>>>>>    discussion of the implementation.
>>>>>>    1. Requirements:
>>>>>>    The main goal I want to achieve is *reproducibility* of the tests.
>>>>>>    I’m sick and tired with the zillions of flaky, rarely failed, and
>>>>>>    almost never failed tests in Ignite codebase.
>>>>>>    We should start with the simplest scenarios that will be as
>>> reliable
>>>>>>    as steel :)
>>>>>>    I want to know for sure:
>>>>>>       - Is this PR makes rebalance quicker or not?
>>>>>>       - Is this PR makes PME quicker or not?
>>>>>>    So, your description of the complex test scenario looks as a next
>>>>>>    step to me.
>>>>>>    Anyway, It’s cool we already have one.
>>>>>>    The second goal is to have a strict test lifecycle as we have in
>>>>>>    JUnit and similar frameworks.
>>>>>>     > It covers production-like deployment and running a scenarios
>>> over
>>>>>>    a single database instance.
>>>>>>    Do you mean «single cluster» or «single host»?
>>>>>>    2. Existing tests:
>>>>>>     > A Combinator suite allows to run set of operations concurrently
>>>>>>    over given database instance.
>>>>>>     > A Consumption suite allows to run a set production-like actions
>>>>>>    over given set of Ignite/GridGain versions and compare test metrics
>>>>>>    across versions
>>>>>>     > A Yardstick suite
>>>>>>     > A Stress suite that simulates hardware environment degradation
>>>>>>     > An Ultimate, DR and Compatibility suites that performs
>>> functional
>>>>>>    regression testing
>>>>>>     > Regression
>>>>>>    Great news that we already have so many choices for testing!
>>>>>>    Mature test base is a big +1 for Tiden.
>>>>>>    3. Comparison:
>>>>>>     > Criteria: Test configuration
>>>>>>     > Ducktape: single JSON string for all tests
>>>>>>     > Tiden: any number of YaML config files, command line option for
>>>>>>    fine-grained test configuration, ability to select/modify tests
>>>>>>    behavior based on Ignite version.
>>>>>>    1. Many YAML files can be hard to maintain.
>>>>>>    2. In ducktape, you can set parameters via «—parameters» option.
>>>>>>    Please, take a look at the doc [1]
>>>>>>     > Criteria: Cluster control
>>>>>>     > Tiden: additionally can address cluster as a whole and execute
>>>>>>    remote commands in parallel.
>>>>>>    It seems we implement this ability in the PoC, already.
>>>>>>     > Criteria: Test assertions
>>>>>>     > Tiden: simple asserts, also few customized assertion helpers.
>>>>>>     > Ducktape: simple asserts.
>>>>>>    Can you, please, be more specific.
>>>>>>    What helpers do you have in mind?
>>>>>>    Ducktape has an asserts that waits for logfile messages or some
>>>>>>    process finish.
>>>>>>     > Criteria: Test reporting
>>>>>>     > Ducktape: limited to its own text/HTML format
>>>>>>    Ducktape have
>>>>>>    1. Text reporter
>>>>>>    2. Customizable HTML reporter
>>>>>>    3. JSON reporter.
>>>>>>    We can show JSON with the any template or tool.
>>>>>>     > Criteria: Provisioning and deployment
>>>>>>     > Ducktape: can provision subset of hosts from cluster for test
>>>>>>    needs. However, that means, that test can’t be scaled without test
>>>>>>    code changes. Does not do any deploy, relies on external means,
>>> e.g.
>>>>>>    pre-packaged in docker image, as in PoC.
>>>>>>    This is not true.
>>>>>>    1. We can set explicit test parameters(node number) via parameters.
>>>>>>    We can increase client count of cluster size without test code
>>> changes.
>>>>>>    2. We have many choices for the test environment. These choices are
>>>>>>    tested and used in other projects:
>>>>>>             * docker
>>>>>>             * vagrant
>>>>>>             * private cloud(ssh access)
>>>>>>             * ec2
>>>>>>    Please, take a look at Kafka documentation [2]
>>>>>>     > I can continue more on this, but it should be enough for now:
>>>>>>    We need to go deeper! :)
>>>>>>    [1]
>>>>>> 
>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>>>>>    [2]
>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>>>>>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
>>>>>>    <ma...@yandex.ru>> написал(а):
>>>>>>     >
>>>>>>     > Greetings, Nikolay,
>>>>>>     >
>>>>>>     > First of all, thank you for you great effort preparing PoC of
>>>>>>    integration testing to Ignite community.
>>>>>>     >
>>>>>>     > It’s a shame Ignite did not have at least some such tests yet,
>>>>>>    however, GridGain, as a major contributor to Apache Ignite had a
>>>>>>    profound collection of in-house tools to perform integration and
>>>>>>    performance testing for years already and while we slowly consider
>>>>>>    sharing our expertise with the community, your initiative makes us
>>>>>>    drive that process a bit faster, thanks a lot!
>>>>>>     >
>>>>>>     > I reviewed your PoC and want to share a little about what we do
>>>>>>    on our part, why and how, hope it would help community take proper
>>>>>>    course.
>>>>>>     >
>>>>>>     > First I’ll do a brief overview of what decisions we made and
>>> what
>>>>>>    we do have in our private code base, next I’ll describe what we
>>> have
>>>>>>    already donated to the public and what we plan public next, then
>>>>>>    I’ll compare both approaches highlighting deficiencies in order to
>>>>>>    spur public discussion on the matter.
>>>>>>     >
>>>>>>     > It might seem strange to use Python to run Bash to run Java
>>>>>>    applications because that introduces IT industry best of breed’ –
>>>>>>    the Python dependency hell – to the Java application code base. The
>>>>>>    only strangest decision one can made is to use Maven to run Docker
>>>>>>    to run Bash to run Python to run Bash to run Java, but desperate
>>>>>>    times call for desperate measures I guess.
>>>>>>     >
>>>>>>     > There are Java-based solutions for integration testing exists,
>>>>>>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go
>>> well
>>>>>>    for Ignite community CI pipelines by them selves. But we also
>>> wanted
>>>>>>    to run performance tests and benchmarks, like the dreaded PME
>>>>>>    benchmark, and this is solved by totally different set of tools in
>>>>>>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>>>>>     >
>>>>>>     > Speaking specifically about benchmarking, Apache Ignite
>>> community
>>>>>>    already has Yardstick [6], and there’s nothing wrong with writing
>>>>>>    PME benchmark using Yardstick, but we also wanted to be able to run
>>>>>>    scenarios like this:
>>>>>>     > - put an X load to a Ignite database;
>>>>>>     > - perform an Y set of operations to check how Ignite copes with
>>>>>>    operations under load.
>>>>>>     >
>>>>>>     > And yes, we also wanted applications under test be deployed
>>> ‘like
>>>>>>    in a production’, e.g. distributed over a set of hosts. This arises
>>>>>>    questions about provisioning and nodes affinity which I’ll cover in
>>>>>>    detail later.
>>>>>>     >
>>>>>>     > So we decided to put a little effort to build a simple tool to
>>>>>>    cover different integration and performance scenarios, and our QA
>>>>>>    lab first attempt was PoC-Tester [7], currently open source for all
>>>>>>    but for reporting web UI. It’s a quite simple to use 95% Java-based
>>>>>>    tool targeted to be run on a pre-release QA stage.
>>>>>>     >
>>>>>>     > It covers production-like deployment and running a scenarios
>>> over
>>>>>>    a single database instance. PoC-Tester scenarios consists of a
>>>>>>    sequence of tasks running sequentially or in parallel. After all
>>>>>>    tasks complete, or at any time during test, user can run logs
>>>>>>    collection task, logs are checked against exceptions and a summary
>>>>>>    of found issues and task ops/latency statistics is generated at the
>>>>>>    end of scenario. One of the main PoC-Tester features is its
>>>>>>    fire-and-forget approach to task managing. That is, you can deploy
>>> a
>>>>>>    grid and left it running for weeks, periodically firing some tasks
>>>>>>    onto it.
>>>>>>     >
>>>>>>     > During earliest stages of PoC-Tester development it becomes
>>> quite
>>>>>>    clear that Java application development is a tedious process and
>>>>>>    architecture decisions you take during development are slow and
>>> hard
>>>>>>    to change.
>>>>>>     > For example, scenarios like this
>>>>>>     > - deploy two instances of GridGain with master-slave data
>>>>>>    replication configured;
>>>>>>     > - put a load on master;
>>>>>>     > - perform checks on slave,
>>>>>>     > or like this:
>>>>>>     > - preload a 1Tb of data by using your favorite tool of choice to
>>>>>>    an Apache Ignite of version X;
>>>>>>     > - run a set of functional tests running Apache Ignite version Y
>>>>>>    over preloaded data,
>>>>>>     > do not fit well in the PoC-Tester workflow.
>>>>>>     >
>>>>>>     > So, this is why we decided to use Python as a generic scripting
>>>>>>    language of choice.
>>>>>>     >
>>>>>>     > Pros:
>>>>>>     > - quicker prototyping and development cycles
>>>>>>     > - easier to find DevOps/QA engineer with Python skills than one
>>>>>>    with Java skills
>>>>>>     > - used extensively all over the world for DevOps/CI pipelines
>>> and
>>>>>>    thus has rich set of libraries for all possible integration uses
>>> cases.
>>>>>>     >
>>>>>>     > Cons:
>>>>>>     > - Nightmare with dependencies. Better stick to specific
>>>>>>    language/libraries version.
>>>>>>     >
>>>>>>     > Comparing alternatives for Python-based testing framework we
>>> have
>>>>>>    considered following requirements, somewhat similar to what you’ve
>>>>>>    mentioned for Confluent [8] previously:
>>>>>>     > - should be able run locally or distributed (bare metal or in
>>> the
>>>>>>    cloud)
>>>>>>     > - should have built-in deployment facilities for applications
>>>>>>    under test
>>>>>>     > - should separate test configuration and test code
>>>>>>     > -- be able to easily reconfigure tests by simple configuration
>>>>>>    changes
>>>>>>     > -- be able to easily scale test environment by simple
>>>>>>    configuration changes
>>>>>>     > -- be able to perform regression testing by simple switching
>>>>>>    artifacts under test via configuration
>>>>>>     > -- be able to run tests with different JDK version by simple
>>>>>>    configuration changes
>>>>>>     > - should have human readable reports and/or reporting tools
>>>>>>    integration
>>>>>>     > - should allow simple test progress monitoring, one does not
>>> want
>>>>>>    to run 6-hours test to find out that application actually crashed
>>>>>>    during first hour.
>>>>>>     > - should allow parallel execution of test actions
>>>>>>     > - should have clean API for test writers
>>>>>>     > -- clean API for distributed remote commands execution
>>>>>>     > -- clean API for deployed applications start / stop and other
>>>>>>    operations
>>>>>>     > -- clean API for performing check on results
>>>>>>     > - should be open source or at least source code should allow
>>> ease
>>>>>>    change or extension
>>>>>>     >
>>>>>>     > Back at that time we found no better alternative than to write
>>>>>>    our own framework, and here goes Tiden [9] as GridGain framework of
>>>>>>    choice for functional integration and performance testing.
>>>>>>     >
>>>>>>     > Pros:
>>>>>>     > - solves all the requirements above
>>>>>>     > Cons (for Ignite):
>>>>>>     > - (currently) closed GridGain source
>>>>>>     >
>>>>>>     > On top of Tiden we’ve built a set of test suites, some of which
>>>>>>    you might have heard already.
>>>>>>     >
>>>>>>     > A Combinator suite allows to run set of operations concurrently
>>>>>>    over given database instance. Proven to find at least 30+ race
>>>>>>    conditions and NPE issues.
>>>>>>     >
>>>>>>     > A Consumption suite allows to run a set production-like actions
>>>>>>    over given set of Ignite/GridGain versions and compare test metrics
>>>>>>    across versions, like heap/disk/CPU consumption, time to perform
>>>>>>    actions, like client PME, server PME, rebalancing time, data
>>>>>>    replication time, etc.
>>>>>>     >
>>>>>>     > A Yardstick suite is a thin layer of Python glue code to run
>>>>>>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
>>>>>>    mediocre deployment capabilities, Tiden solves this easily.
>>>>>>     >
>>>>>>     > A Stress suite that simulates hardware environment degradation
>>>>>>    during testing.
>>>>>>     >
>>>>>>     > An Ultimate, DR and Compatibility suites that performs
>>> functional
>>>>>>    regression testing of GridGain Ultimate Edition features like
>>>>>>    snapshots, security, data replication, rolling upgrades, etc.
>>>>>>     >
>>>>>>     > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
>>>>>>    etc, etc, etc.
>>>>>>     >
>>>>>>     > Most of the suites above use another in-house developed Java
>>> tool
>>>>>>    – PiClient – to perform actual loading and miscellaneous operations
>>>>>>    with Ignite under test. We use py4j Python-Java gateway library to
>>>>>>    control PiClient instances from the tests.
>>>>>>     >
>>>>>>     > When we considered CI, we put TeamCity out of scope, because
>>>>>>    distributed integration and performance tests tend to run for hours
>>>>>>    and TeamCity agents are scarce and costly resource. So, bundled
>>> with
>>>>>>    Tiden there is jenkins-job-builder [10] based CI pipelines and
>>>>>>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
>>> test
>>>>>>    run reports across versions and has built in visualization support
>>>>>>    for Combinator suite.
>>>>>>     >
>>>>>>     > All of the above is currently closed source, but we plan to make
>>>>>>    it public for community, and publishing Tiden core [9] is the first
>>>>>>    step on that way. You can review some examples of using Tiden for
>>>>>>    tests at my repository [11], for start.
>>>>>>     >
>>>>>>     > Now, let’s compare Ducktape PoC and Tiden.
>>>>>>     >
>>>>>>     > Criteria: Language
>>>>>>     > Tiden: Python, 3.7
>>>>>>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>>>>>    compatible, but actually can’t work with Python 3.7 due to broken
>>>>>>    Zmq dependency.
>>>>>>     > Comment: Python 3.7 has a much better support for async-style
>>>>>>    code which might be crucial for distributed application testing.
>>>>>>     > Score: Tiden: 1, Ducktape: 0
>>>>>>     >
>>>>>>     > Criteria: Test writers API
>>>>>>     > Supported integration test framework concepts are basically the
>>> same:
>>>>>>     > - a test controller (test runner)
>>>>>>     > - a cluster
>>>>>>     > - a node
>>>>>>     > - an application (a service in Ducktape terms)
>>>>>>     > - a test
>>>>>>     > Score: Tiden: 5, Ducktape: 5
>>>>>>     >
>>>>>>     > Criteria: Tests selection and run
>>>>>>     > Ducktape: suite-package-class-method level selection, internal
>>>>>>    scheduler allows to run tests in suite in parallel.
>>>>>>     > Tiden: also suite-package-class-method level selection,
>>>>>>    additionally allows selecting subset of tests by attribute,
>>> parallel
>>>>>>    runs not built in, but allows merging test reports after different
>>> runs.
>>>>>>     > Score: Tiden: 2, Ducktape: 2
>>>>>>     >
>>>>>>     > Criteria: Test configuration
>>>>>>     > Ducktape: single JSON string for all tests
>>>>>>     > Tiden: any number of YaML config files, command line option for
>>>>>>    fine-grained test configuration, ability to select/modify tests
>>>>>>    behavior based on Ignite version.
>>>>>>     > Score: Tiden: 3, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Cluster control
>>>>>>     > Ducktape: allow execute remote commands by node granularity
>>>>>>     > Tiden: additionally can address cluster as a whole and execute
>>>>>>    remote commands in parallel.
>>>>>>     > Score: Tiden: 2, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Logs control
>>>>>>     > Both frameworks have similar builtin support for remote logs
>>>>>>    collection and grepping. Tiden has built-in plugin that can zip,
>>>>>>    collect arbitrary log files from arbitrary locations at
>>>>>>    test/module/suite granularity and unzip if needed, also application
>>>>>>    API to search / wait for messages in logs. Ducktape allows each
>>>>>>    service declare its log files location (seemingly does not support
>>>>>>    logs rollback), and a single entrypoint to collect service logs.
>>>>>>     > Score: Tiden: 1, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Test assertions
>>>>>>     > Tiden: simple asserts, also few customized assertion helpers.
>>>>>>     > Ducktape: simple asserts.
>>>>>>     > Score: Tiden: 2, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Test reporting
>>>>>>     > Ducktape: limited to its own text/html format
>>>>>>     > Tiden: provides text report, yaml report for reporting tools
>>>>>>    integration, XML xUnit report for integration with
>>> Jenkins/TeamCity.
>>>>>>     > Score: Tiden: 3, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Provisioning and deployment
>>>>>>     > Ducktape: can provision subset of hosts from cluster for test
>>>>>>    needs. However, that means, that test can’t be scaled without test
>>>>>>    code changes. Does not do any deploy, relies on external means,
>>> e.g.
>>>>>>    pre-packaged in docker image, as in PoC.
>>>>>>     > Tiden: Given a set of hosts, Tiden uses all of them for the
>>> test.
>>>>>>    Provisioning should be done by external means. However, provides a
>>>>>>    conventional automated deployment routines.
>>>>>>     > Score: Tiden: 1, Ducktape: 1
>>>>>>     >
>>>>>>     > Criteria: Documentation and Extensibility
>>>>>>     > Tiden: current API documentation is limited, should change as we
>>>>>>    go open source. Tiden is easily extensible via hooks and plugins,
>>>>>>    see example Maven plugin and Gatling application at [11].
>>>>>>     > Ducktape: basic documentation at readthedocs.io
>>>>>>    <http://readthedocs.io>. Codebase is rigid, framework core is
>>>>>>    tightly coupled and hard to change. The only possible extension
>>>>>>    mechanism is fork-and-rewrite.
>>>>>>     > Score: Tiden: 2, Ducktape: 1
>>>>>>     >
>>>>>>     > I can continue more on this, but it should be enough for now:
>>>>>>     > Overall score: Tiden: 22, Ducktape: 14.
>>>>>>     >
>>>>>>     > Time for discussion!
>>>>>>     >
>>>>>>     > ---
>>>>>>     > [1] - https://www.testcontainers.org/
>>>>>>     > [2] - http://arquillian.org/guides/getting_started/
>>>>>>     > [3] - https://jmeter.apache.org/index.html
>>>>>>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>>>>>     > [5] - https://gatling.io/docs/current/
>>>>>>     > [6] - https://github.com/gridgain/yardstick
>>>>>>     > [7] - https://github.com/gridgain/poc-tester
>>>>>>     > [8] -
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
>>>>>>     > [9] - https://github.com/gridgain/tiden
>>>>>>     > [10] - https://pypi.org/project/jenkins-job-builder/
>>>>>>     > [11] - https://github.com/mshonichev/tiden_examples
>>>>>>     >
>>>>>>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>>>>>     >> Hello,
>>>>>>     >>
>>>>>>     >> Branch with duck tape created -
>>>>>>    https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>>     >>
>>>>>>     >> Any who are willing to contribute to PoC are welcome.
>>>>>>     >>
>>>>>>     >>
>>>>>>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>>>>>    <nizhikov.dev@gmail.com <ma...@gmail.com>>
>>> написал(а):
>>>>>>     >>>
>>>>>>     >>> Hello, Denis.
>>>>>>     >>>
>>>>>>     >>> There is no rush with these improvements.
>>>>>>     >>> We can wait for Maxim proposal and compare two solutions :)
>>>>>>     >>>
>>>>>>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>>>>>    <ma...@apache.org>> написал(а):
>>>>>>     >>>>
>>>>>>     >>>> Hi Nikolay,
>>>>>>     >>>>
>>>>>>     >>>> Thanks for kicking off this conversation and sharing your
>>>>>>    findings with the
>>>>>>     >>>> results. That's the right initiative. I do agree that Ignite
>>>>>>    needs to have
>>>>>>     >>>> an integration testing framework with capabilities listed by
>>> you.
>>>>>>     >>>>
>>>>>>     >>>> As we discussed privately, I would only check if instead of
>>>>>>     >>>> Confluent's Ducktape library, we can use an integration
>>>>>>    testing framework
>>>>>>     >>>> developed by GridGain for testing of Ignite/GridGain
>>> clusters.
>>>>>>    That
>>>>>>     >>>> framework has been battle-tested and might be more
>>> convenient for
>>>>>>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>>>>>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>>
>>> who
>>>>>>    promised to join this thread once he finishes
>>>>>>     >>>> preparing the usage examples of the framework. To my
>>>>>>    knowledge, Max has
>>>>>>     >>>> already been working on that for several days.
>>>>>>     >>>>
>>>>>>     >>>> -
>>>>>>     >>>> Denis
>>>>>>     >>>>
>>>>>>     >>>>
>>>>>>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>>>>>    <nizhikov@apache.org <ma...@apache.org>>
>>>>>>     >>>> wrote:
>>>>>>     >>>>
>>>>>>     >>>>> Hello, Igniters.
>>>>>>     >>>>>
>>>>>>     >>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>>>     >>>>>
>>>>>>     >>>>> Let me briefly explain the gap I want to cover:
>>>>>>     >>>>>
>>>>>>     >>>>> 1. For now, we don’t have a solution for automated testing
>>> of
>>>>>>    Ignite on
>>>>>>     >>>>> «real cluster».
>>>>>>     >>>>> By «real cluster» I mean cluster «like a production»:
>>>>>>     >>>>>       * client and server nodes deployed on different hosts.
>>>>>>     >>>>>       * thin clients perform queries from some other hosts
>>>>>>     >>>>>       * etc.
>>>>>>     >>>>>
>>>>>>     >>>>> 2. We don’t have a solution for automated benchmarks of some
>>>>>>    internal
>>>>>>     >>>>> Ignite process
>>>>>>     >>>>>       * PME
>>>>>>     >>>>>       * rebalance.
>>>>>>     >>>>> This means we don’t know - Do we perform rebalance(or PME)
>>> in
>>>>>>    2.7.0 faster
>>>>>>     >>>>> or slower than in 2.8.0 for the same cluster?
>>>>>>     >>>>>
>>>>>>     >>>>> 3. We don’t have a solution for automated testing of Ignite
>>>>>>    integration in
>>>>>>     >>>>> a real-world environment:
>>>>>>     >>>>> Ignite-Spark integration can be taken as an example.
>>>>>>     >>>>> I think some ML solutions also should be tested in
>>> real-world
>>>>>>    deployments.
>>>>>>     >>>>>
>>>>>>     >>>>> Solution:
>>>>>>     >>>>>
>>>>>>     >>>>> I propose to use duck tape library from confluent (apache
>>> 2.0
>>>>>>    license)
>>>>>>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on
>>> the
>>>>>>    local
>>>>>>     >>>>> environment(docker) and it works just fine.
>>>>>>     >>>>>
>>>>>>     >>>>> PoC contains following services:
>>>>>>     >>>>>
>>>>>>     >>>>>       * Simple rebalance test:
>>>>>>     >>>>>               Start 2 server nodes,
>>>>>>     >>>>>               Create some data with Ignite client,
>>>>>>     >>>>>               Start one more server node,
>>>>>>     >>>>>               Wait for rebalance finish
>>>>>>     >>>>>       * Simple Ignite-Spark integration test:
>>>>>>     >>>>>               Start 1 Spark master, start 1 Spark worker,
>>>>>>     >>>>>               Start 1 Ignite server node
>>>>>>     >>>>>               Create some data with Ignite client,
>>>>>>     >>>>>               Check data in application that queries it from
>>>>>>    Spark.
>>>>>>     >>>>>
>>>>>>     >>>>> All tests are fully automated.
>>>>>>     >>>>> Logs collection works just fine.
>>>>>>     >>>>> You can see an example of the tests report - [4].
>>>>>>     >>>>>
>>>>>>     >>>>> Pros:
>>>>>>     >>>>>
>>>>>>     >>>>> * Ability to test local changes(no need to public changes to
>>>>>>    some remote
>>>>>>     >>>>> repository or similar).
>>>>>>     >>>>> * Ability to parametrize test environment(run the same tests
>>>>>>    on different
>>>>>>     >>>>> JDK, JVM params, config, etc.)
>>>>>>     >>>>> * Isolation by default so system tests are as reliable as
>>>>>>    possible.
>>>>>>     >>>>> * Utilities for pulling up and tearing down services easily
>>>>>>    in clusters in
>>>>>>     >>>>> different environments (e.g. local, custom cluster, Vagrant,
>>>>>>    K8s, Mesos,
>>>>>>     >>>>> Docker, cloud providers, etc.)
>>>>>>     >>>>> * Easy to write unit tests for distributed systems
>>>>>>     >>>>> * Adopted and successfully used by other distributed open
>>>>>>    source project -
>>>>>>     >>>>> Apache Kafka.
>>>>>>     >>>>> * Collect results (e.g. logs, console output)
>>>>>>     >>>>> * Report results (e.g. expected conditions met, performance
>>>>>>    results, etc.)
>>>>>>     >>>>>
>>>>>>     >>>>> WDYT?
>>>>>>     >>>>>
>>>>>>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>>>     >>>>> [2] https://github.com/confluentinc/ducktape
>>>>>>     >>>>> [3]
>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>>>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>>>> 
>>> 
>>> 
>>> 
> <2020-07-05--004.tar.gz>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Max Shonichev <ms...@yandex.ru>.

Yes, Denis,

common ground seems to be as follows:
Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC 
over physical hosts and share benchmark results. In the meantime, while 
I strongly believe that dockerized approach to benchmarking is a road to 
misleading and false positives, I'll prepare a PoC of Tiden in 
dockerized environment to support 'fast development prototyping' usecase 
Nikolay team insist on. It should be a matter of few days.

As a side note, I've run Anton PoC locally and would like to have some 
comments about results:

Test system: Ubuntu 18.04, docker 19.03.6
Test commands:


git clone -b ignite-ducktape git@github.com:anton-vinogradov/ignite.git
cd ignite
mvn clean install -DskipTests -Dmaven.javadoc.skip=true 
-Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
cd modules/ducktests/tests/docker
./run_tests.sh

Test results:
====================================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:       2020-07-05--004
run time:         7 minutes 36.360 seconds
tests run:        5
passed:           3
failed:           2
ignored:          0
====================================================================================================
test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
status:     FAIL
run time:   3 minutes 12.232 seconds
----------------------------------------------------------------------------------------------------
test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
status:     FAIL
run time:   1 minute 33.076 seconds


Is it OK for those tests to fail? Attached is full test report


On 02.07.2020 17:46, Denis Magda wrote:
> Folks,
> 
> Please share the summary of that Slack conversation here for records once
> you find common ground.
> 
> -
> Denis
> 
> 
> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org> wrote:
> 
>> Igniters.
>>
>> All who are interested in integration testing framework discussion are
>> welcome into slack channel -
>> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
>>
>>
>>
>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
>>>
>>> Max,
>>> Thanks for joining us.
>>>
>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>>>> dependencies being deployed by external scripts.
>>> No. It is important to distinguish development, deploy, and
>> orchestration.
>>> All-in-one solutions have extremely limited usability.
>>> As to Ducktests:
>>> Docker is responsible for deployments during development.
>>> CI/CD is responsible for deployments during release and nightly checks.
>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
>>> Ducktape is responsible for orchestration.
>>>
>>>> 2. tiden can execute actions over remote nodes in real parallel
>> fashion,
>>>> while ducktape internally does all actions sequentially.
>>> No. Ducktape may start any service in parallel. See Pme-free benchmark
>> [1] for details.
>>>
>>>> if we used ducktape solution we would have to instead prepare some
>>>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>>>> Ansible or Chef.
>>> Sure, because a way of deploy depends on infrastructure.
>>> How can we be sure that OS we use and the restrictions we have will be
>> compatible with Tiden?
>>>
>>>> You have solved this deficiency with docker by putting all dependencies
>>>> into one uber-image ...
>>> and
>>>> I guess we all know about docker hyped ability to run over distributed
>>>> virtual networks.
>>> It is very important not to confuse the test's development (docker image
>> you're talking about) and real deployment.
>>>
>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
>>> All actions can be performed in parallel.
>>> See how Ducktests [2] starts cluster in parallel for example.
>>>
>>> [1]
>> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
>>> [2]
>> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
>>>
>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
>> wrote:
>>> Hello, Maxim.
>>>
>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>> dependencies being deployed by external scripts
>>>
>>> Why do you think that maintaining deploy scripts coupled with the
>> testing framework is an advantage?
>>> I thought we want to see and maintain deployment scripts separate from
>> the testing framework.
>>>
>>>> 2. tiden can execute actions over remote nodes in real parallel
>> fashion, while ducktape internally does all actions sequentially.
>>>
>>> Can you, please, clarify, what actions do you have in mind?
>>> And why we want to execute them concurrently?
>>> Ignite node start, Client application execution can be done concurrently
>> with the ducktape approach.
>>>
>>>> If we used ducktape solution we would have to instead prepare some
>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>> Ansible or Chef
>>>
>>> We shouldn’t take some user approach as an argument in this discussion.
>> Let’s discuss a general approach for all users of the Ignite. Anyway, what
>> is wrong with the external deployment script approach?
>>>
>>> We, as a community, should provide several ways to run integration tests
>> out-of-the-box AND the ability to customize deployment regarding the user
>> landscape.
>>>
>>>> You have solved this deficiency with docker by putting all
>> dependencies into one uber-image and that looks like simple and elegant
>> solution however, that effectively limits you to single-host testing.
>>>
>>> Docker image should be used only by the Ignite developers to test
>> something locally.
>>> It’s not intended for some real-world testing.
>>>
>>> The main issue with the Tiden that I see, it tested and maintained as a
>> closed source solution.
>>> This can lead to the hard to solve problems when we start using and
>> maintaining it as an open-source solution.
>>> Like, how many developers used Tiden? And how many of developers were
>> not authors of the Tiden itself?
>>>
>>>
>>>> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
>> написал(а):
>>>>
>>>> Anton, Nikolay,
>>>>
>>>> Let's agree on what we are arguing about: whether it is about "like or
>> don't like" or about technical properties of suggested solutions.
>>>>
>>>> If it is about likes and dislikes, then the whole discussion is
>> meaningless. However, I hope together we can analyse pros and cons
>> carefully.
>>>>
>>>> As far as I can understand now, two main differences between ducktape
>> and tiden is that:
>>>>
>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
>> dependencies being deployed by external scripts.
>>>>
>>>> 2. tiden can execute actions over remote nodes in real parallel
>> fashion, while ducktape internally does all actions sequentially.
>>>>
>>>> As for me, these are very important properties for distributed testing
>> framework.
>>>>
>>>> First property let us easily reuse tiden in existing infrastructures,
>> for example, during Zookeeper IEP testing at Sberbank site we used the same
>> tiden scripts that we use in our lab, the only change was putting a list of
>> hosts into config.
>>>>
>>>> If we used ducktape solution we would have to instead prepare some
>> deployment scripts to pre-initialize Sberbank hosts, for example, with
>> Ansible or Chef.
>>>>
>>>>
>>>> You have solved this deficiency with docker by putting all
>> dependencies into one uber-image and that looks like simple and elegant
>> solution,
>>>> however, that effectively limits you to single-host testing.
>>>>
>>>> I guess we all know about docker hyped ability to run over distributed
>> virtual networks. We used to go that way, but quickly found that it is more
>> of the hype than real work. In real environments, there are problems with
>> routing, DNS, multicast and broadcast traffic, and many others, that turn
>> docker-based distributed solution into a fragile hard-to-maintain monster.
>>>>
>>>> Please, if you believe otherwise, perform a run of your PoC over at
>> least two physical hosts and share results with us.
>>>>
>>>> If you consider that one physical docker host is enough, please, don't
>> overlook that we want to run real scale scenarios, with 50-100 cache
>> groups, persistence enabled and a millions of keys loaded.
>>>>
>>>> Practical limit for such configurations is 4-6 nodes per single
>> physical host. Otherwise, tests become flaky due to resource starvation.
>>>>
>>>> Please, if you believe otherwise, perform at least a 10 of runs of
>> your PoC with other tests running at TC (we're targeting TeamCity, right?)
>> and share results so we could check if the numbers are reproducible.
>>>>
>>>> I stress this once more: functional integration tests are OK to run in
>> Docker and CI, but running benchmarks in Docker is a big NO GO.
>>>>
>>>>
>>>> Second property let us write tests that require real-parallel actions
>> over hosts.
>>>>
>>>> For example, agreed scenario for PME benchmarkduring "PME optimization
>> stream" was as follows:
>>>>
>>>>   - 10 server nodes, preloaded with 1M of keys
>>>>   - 4 client nodes perform transactional load  (client nodes physically
>> separated from server nodes)
>>>>   - during load:
>>>>   -- 5 server nodes stopped in parallel
>>>>   -- after 1 minute, all 5 nodes are started in parallel
>>>>   - load stopped, logs are analysed for exchange times.
>>>>
>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
>> then partition map exchange merge would not happen and we could not have
>> measured PME optimizations for that case.
>>>>
>>>>
>>>> These are limitations of ducktape that we believe as a more important
>>>> argument "against" than you provide "for".
>>>>
>>>>
>>>>
>>>>
>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
>>>>> Folks,
>>>>> First, I've created PR [1] with ducktests improvements
>>>>> PR contains the following changes
>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
>>>>> - Global refactoring
>>>>> -- benchmarks javacode simplification
>>>>> -- services python and java classes code deduplication
>>>>> -- fail-fast checks for java and python (eg. application should
>> explicitly write it finished with success)
>>>>> -- simple results extraction from tests and benchmarks
>>>>> -- javacode now configurable from tests/benchmarks
>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
>> operation and log results)
>>>>> -- docker volume now marked as delegated to increase execution speed
>> for mac & win users
>>>>> -- Ignite cluster now start in parallel (start speed-up)
>>>>> -- Ignite can be configured at test/benchmark
>>>>> - full and module assembly scripts added
>>>> Great job done! But let me remind one of Apache Ignite principles:
>>>> week of thinking save months of development.
>>>>
>>>>
>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
>> integration) as a target "PoC check & real topology benchmarking tool".
>>>>> Ducktape pros
>>>>> - Developed for distributed system by distributed system developers.
>>>> So does Tiden
>>>>
>>>>> - Developed since 2014, stable.
>>>> Tiden is also pretty stable, and development start date is not a good
>> argument, for example pytest is since 2004, pytest-xdist (plugin for
>> distributed testing) is since 2010, but we don't see it as a alternative at
>> all.
>>>>
>>>>> - Proven usability by usage at Kafka.
>>>> Tiden is proven usable by usage at GridGain and Sberbank deployments.
>>>> Core, storage, sql and tx teams use benchmark results provided by
>> Tiden on a daily basis.
>>>>
>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great example
>> pack.
>>>> We'll donate some of our suites to Ignite as I've mentioned in
>> previous letter.
>>>>
>>>>> - Built-in Docker support for rapid development and checks.
>>>> False, there's no specific 'docker support' in ducktape itself, you
>> just wrap it in docker by yourself, because ducktape is lacking deployment
>> abilities.
>>>>
>>>>> - Great for CI automation.
>>>> False, there's no specific CI-enabled features in ducktape. Tiden, on
>> the other hand, provide generic xUnit reporting format, which is supported
>> by both TeamCity and Jenkins. Also, instead of using private keys, Tiden
>> can use SSH agent, which is also great for CI, because both
>>>> TeamCity and Jenkins store keys in secret storage available only for
>> ssh-agent and only for the time of the test.
>>>>
>>>>
>>>>>> As an additional motivation, at least 3 teams
>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
>> speed-up))
>>>>> - Ignite SE Plugins team (to check plugin's features does not
>> slow-down or broke AI features)
>>>>> - Ignite SE QA team (to append already developed smoke/load/failover
>> tests to AI codebase)
>>>>
>>>> Please, before recommending your tests to other teams, provide proofs
>>>> that your tests are reproducible in real environment.
>>>>
>>>>
>>>>> now, wait for ducktest merge to start checking cases they working on
>> in AI way.
>>>>> Thoughts?
>>>> Let us together review both solutions, we'll try to run your tests in
>> our lab, and you'll try to at least checkout tiden and see if same tests
>> can be implemented with it?
>>>>
>>>>
>>>>
>>>>> [1] https://github.com/apache/ignite/pull/7967
>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org
>> <ma...@apache.org>> wrote:
>>>>>     Hello, Maxim.
>>>>>     Thank you for so detailed explanation.
>>>>>     Can we put the content of this discussion somewhere on the wiki?
>>>>>     So It doesn’t get lost.
>>>>>     I divide the answer in several parts. From the requirements to the
>>>>>     implementation.
>>>>>     So, if we agreed on the requirements we can proceed with the
>>>>>     discussion of the implementation.
>>>>>     1. Requirements:
>>>>>     The main goal I want to achieve is *reproducibility* of the tests.
>>>>>     I’m sick and tired with the zillions of flaky, rarely failed, and
>>>>>     almost never failed tests in Ignite codebase.
>>>>>     We should start with the simplest scenarios that will be as
>> reliable
>>>>>     as steel :)
>>>>>     I want to know for sure:
>>>>>        - Is this PR makes rebalance quicker or not?
>>>>>        - Is this PR makes PME quicker or not?
>>>>>     So, your description of the complex test scenario looks as a next
>>>>>     step to me.
>>>>>     Anyway, It’s cool we already have one.
>>>>>     The second goal is to have a strict test lifecycle as we have in
>>>>>     JUnit and similar frameworks.
>>>>>      > It covers production-like deployment and running a scenarios
>> over
>>>>>     a single database instance.
>>>>>     Do you mean «single cluster» or «single host»?
>>>>>     2. Existing tests:
>>>>>      > A Combinator suite allows to run set of operations concurrently
>>>>>     over given database instance.
>>>>>      > A Consumption suite allows to run a set production-like actions
>>>>>     over given set of Ignite/GridGain versions and compare test metrics
>>>>>     across versions
>>>>>      > A Yardstick suite
>>>>>      > A Stress suite that simulates hardware environment degradation
>>>>>      > An Ultimate, DR and Compatibility suites that performs
>> functional
>>>>>     regression testing
>>>>>      > Regression
>>>>>     Great news that we already have so many choices for testing!
>>>>>     Mature test base is a big +1 for Tiden.
>>>>>     3. Comparison:
>>>>>      > Criteria: Test configuration
>>>>>      > Ducktape: single JSON string for all tests
>>>>>      > Tiden: any number of YaML config files, command line option for
>>>>>     fine-grained test configuration, ability to select/modify tests
>>>>>     behavior based on Ignite version.
>>>>>     1. Many YAML files can be hard to maintain.
>>>>>     2. In ducktape, you can set parameters via «—parameters» option.
>>>>>     Please, take a look at the doc [1]
>>>>>      > Criteria: Cluster control
>>>>>      > Tiden: additionally can address cluster as a whole and execute
>>>>>     remote commands in parallel.
>>>>>     It seems we implement this ability in the PoC, already.
>>>>>      > Criteria: Test assertions
>>>>>      > Tiden: simple asserts, also few customized assertion helpers.
>>>>>      > Ducktape: simple asserts.
>>>>>     Can you, please, be more specific.
>>>>>     What helpers do you have in mind?
>>>>>     Ducktape has an asserts that waits for logfile messages or some
>>>>>     process finish.
>>>>>      > Criteria: Test reporting
>>>>>      > Ducktape: limited to its own text/HTML format
>>>>>     Ducktape have
>>>>>     1. Text reporter
>>>>>     2. Customizable HTML reporter
>>>>>     3. JSON reporter.
>>>>>     We can show JSON with the any template or tool.
>>>>>      > Criteria: Provisioning and deployment
>>>>>      > Ducktape: can provision subset of hosts from cluster for test
>>>>>     needs. However, that means, that test can’t be scaled without test
>>>>>     code changes. Does not do any deploy, relies on external means,
>> e.g.
>>>>>     pre-packaged in docker image, as in PoC.
>>>>>     This is not true.
>>>>>     1. We can set explicit test parameters(node number) via parameters.
>>>>>     We can increase client count of cluster size without test code
>> changes.
>>>>>     2. We have many choices for the test environment. These choices are
>>>>>     tested and used in other projects:
>>>>>              * docker
>>>>>              * vagrant
>>>>>              * private cloud(ssh access)
>>>>>              * ec2
>>>>>     Please, take a look at Kafka documentation [2]
>>>>>      > I can continue more on this, but it should be enough for now:
>>>>>     We need to go deeper! :)
>>>>>     [1]
>>>>>
>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>>>>     [2]
>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
>>>>>     <ma...@yandex.ru>> написал(а):
>>>>>      >
>>>>>      > Greetings, Nikolay,
>>>>>      >
>>>>>      > First of all, thank you for you great effort preparing PoC of
>>>>>     integration testing to Ignite community.
>>>>>      >
>>>>>      > It’s a shame Ignite did not have at least some such tests yet,
>>>>>     however, GridGain, as a major contributor to Apache Ignite had a
>>>>>     profound collection of in-house tools to perform integration and
>>>>>     performance testing for years already and while we slowly consider
>>>>>     sharing our expertise with the community, your initiative makes us
>>>>>     drive that process a bit faster, thanks a lot!
>>>>>      >
>>>>>      > I reviewed your PoC and want to share a little about what we do
>>>>>     on our part, why and how, hope it would help community take proper
>>>>>     course.
>>>>>      >
>>>>>      > First I’ll do a brief overview of what decisions we made and
>> what
>>>>>     we do have in our private code base, next I’ll describe what we
>> have
>>>>>     already donated to the public and what we plan public next, then
>>>>>     I’ll compare both approaches highlighting deficiencies in order to
>>>>>     spur public discussion on the matter.
>>>>>      >
>>>>>      > It might seem strange to use Python to run Bash to run Java
>>>>>     applications because that introduces IT industry best of breed’ –
>>>>>     the Python dependency hell – to the Java application code base. The
>>>>>     only strangest decision one can made is to use Maven to run Docker
>>>>>     to run Bash to run Python to run Bash to run Java, but desperate
>>>>>     times call for desperate measures I guess.
>>>>>      >
>>>>>      > There are Java-based solutions for integration testing exists,
>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they might go
>> well
>>>>>     for Ignite community CI pipelines by them selves. But we also
>> wanted
>>>>>     to run performance tests and benchmarks, like the dreaded PME
>>>>>     benchmark, and this is solved by totally different set of tools in
>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>>>>      >
>>>>>      > Speaking specifically about benchmarking, Apache Ignite
>> community
>>>>>     already has Yardstick [6], and there’s nothing wrong with writing
>>>>>     PME benchmark using Yardstick, but we also wanted to be able to run
>>>>>     scenarios like this:
>>>>>      > - put an X load to a Ignite database;
>>>>>      > - perform an Y set of operations to check how Ignite copes with
>>>>>     operations under load.
>>>>>      >
>>>>>      > And yes, we also wanted applications under test be deployed
>> ‘like
>>>>>     in a production’, e.g. distributed over a set of hosts. This arises
>>>>>     questions about provisioning and nodes affinity which I’ll cover in
>>>>>     detail later.
>>>>>      >
>>>>>      > So we decided to put a little effort to build a simple tool to
>>>>>     cover different integration and performance scenarios, and our QA
>>>>>     lab first attempt was PoC-Tester [7], currently open source for all
>>>>>     but for reporting web UI. It’s a quite simple to use 95% Java-based
>>>>>     tool targeted to be run on a pre-release QA stage.
>>>>>      >
>>>>>      > It covers production-like deployment and running a scenarios
>> over
>>>>>     a single database instance. PoC-Tester scenarios consists of a
>>>>>     sequence of tasks running sequentially or in parallel. After all
>>>>>     tasks complete, or at any time during test, user can run logs
>>>>>     collection task, logs are checked against exceptions and a summary
>>>>>     of found issues and task ops/latency statistics is generated at the
>>>>>     end of scenario. One of the main PoC-Tester features is its
>>>>>     fire-and-forget approach to task managing. That is, you can deploy
>> a
>>>>>     grid and left it running for weeks, periodically firing some tasks
>>>>>     onto it.
>>>>>      >
>>>>>      > During earliest stages of PoC-Tester development it becomes
>> quite
>>>>>     clear that Java application development is a tedious process and
>>>>>     architecture decisions you take during development are slow and
>> hard
>>>>>     to change.
>>>>>      > For example, scenarios like this
>>>>>      > - deploy two instances of GridGain with master-slave data
>>>>>     replication configured;
>>>>>      > - put a load on master;
>>>>>      > - perform checks on slave,
>>>>>      > or like this:
>>>>>      > - preload a 1Tb of data by using your favorite tool of choice to
>>>>>     an Apache Ignite of version X;
>>>>>      > - run a set of functional tests running Apache Ignite version Y
>>>>>     over preloaded data,
>>>>>      > do not fit well in the PoC-Tester workflow.
>>>>>      >
>>>>>      > So, this is why we decided to use Python as a generic scripting
>>>>>     language of choice.
>>>>>      >
>>>>>      > Pros:
>>>>>      > - quicker prototyping and development cycles
>>>>>      > - easier to find DevOps/QA engineer with Python skills than one
>>>>>     with Java skills
>>>>>      > - used extensively all over the world for DevOps/CI pipelines
>> and
>>>>>     thus has rich set of libraries for all possible integration uses
>> cases.
>>>>>      >
>>>>>      > Cons:
>>>>>      > - Nightmare with dependencies. Better stick to specific
>>>>>     language/libraries version.
>>>>>      >
>>>>>      > Comparing alternatives for Python-based testing framework we
>> have
>>>>>     considered following requirements, somewhat similar to what you’ve
>>>>>     mentioned for Confluent [8] previously:
>>>>>      > - should be able run locally or distributed (bare metal or in
>> the
>>>>>     cloud)
>>>>>      > - should have built-in deployment facilities for applications
>>>>>     under test
>>>>>      > - should separate test configuration and test code
>>>>>      > -- be able to easily reconfigure tests by simple configuration
>>>>>     changes
>>>>>      > -- be able to easily scale test environment by simple
>>>>>     configuration changes
>>>>>      > -- be able to perform regression testing by simple switching
>>>>>     artifacts under test via configuration
>>>>>      > -- be able to run tests with different JDK version by simple
>>>>>     configuration changes
>>>>>      > - should have human readable reports and/or reporting tools
>>>>>     integration
>>>>>      > - should allow simple test progress monitoring, one does not
>> want
>>>>>     to run 6-hours test to find out that application actually crashed
>>>>>     during first hour.
>>>>>      > - should allow parallel execution of test actions
>>>>>      > - should have clean API for test writers
>>>>>      > -- clean API for distributed remote commands execution
>>>>>      > -- clean API for deployed applications start / stop and other
>>>>>     operations
>>>>>      > -- clean API for performing check on results
>>>>>      > - should be open source or at least source code should allow
>> ease
>>>>>     change or extension
>>>>>      >
>>>>>      > Back at that time we found no better alternative than to write
>>>>>     our own framework, and here goes Tiden [9] as GridGain framework of
>>>>>     choice for functional integration and performance testing.
>>>>>      >
>>>>>      > Pros:
>>>>>      > - solves all the requirements above
>>>>>      > Cons (for Ignite):
>>>>>      > - (currently) closed GridGain source
>>>>>      >
>>>>>      > On top of Tiden we’ve built a set of test suites, some of which
>>>>>     you might have heard already.
>>>>>      >
>>>>>      > A Combinator suite allows to run set of operations concurrently
>>>>>     over given database instance. Proven to find at least 30+ race
>>>>>     conditions and NPE issues.
>>>>>      >
>>>>>      > A Consumption suite allows to run a set production-like actions
>>>>>     over given set of Ignite/GridGain versions and compare test metrics
>>>>>     across versions, like heap/disk/CPU consumption, time to perform
>>>>>     actions, like client PME, server PME, rebalancing time, data
>>>>>     replication time, etc.
>>>>>      >
>>>>>      > A Yardstick suite is a thin layer of Python glue code to run
>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself has a
>>>>>     mediocre deployment capabilities, Tiden solves this easily.
>>>>>      >
>>>>>      > A Stress suite that simulates hardware environment degradation
>>>>>     during testing.
>>>>>      >
>>>>>      > An Ultimate, DR and Compatibility suites that performs
>> functional
>>>>>     regression testing of GridGain Ultimate Edition features like
>>>>>     snapshots, security, data replication, rolling upgrades, etc.
>>>>>      >
>>>>>      > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
>>>>>     etc, etc, etc.
>>>>>      >
>>>>>      > Most of the suites above use another in-house developed Java
>> tool
>>>>>     – PiClient – to perform actual loading and miscellaneous operations
>>>>>     with Ignite under test. We use py4j Python-Java gateway library to
>>>>>     control PiClient instances from the tests.
>>>>>      >
>>>>>      > When we considered CI, we put TeamCity out of scope, because
>>>>>     distributed integration and performance tests tend to run for hours
>>>>>     and TeamCity agents are scarce and costly resource. So, bundled
>> with
>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
>> test
>>>>>     run reports across versions and has built in visualization support
>>>>>     for Combinator suite.
>>>>>      >
>>>>>      > All of the above is currently closed source, but we plan to make
>>>>>     it public for community, and publishing Tiden core [9] is the first
>>>>>     step on that way. You can review some examples of using Tiden for
>>>>>     tests at my repository [11], for start.
>>>>>      >
>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
>>>>>      >
>>>>>      > Criteria: Language
>>>>>      > Tiden: Python, 3.7
>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>>>>     compatible, but actually can’t work with Python 3.7 due to broken
>>>>>     Zmq dependency.
>>>>>      > Comment: Python 3.7 has a much better support for async-style
>>>>>     code which might be crucial for distributed application testing.
>>>>>      > Score: Tiden: 1, Ducktape: 0
>>>>>      >
>>>>>      > Criteria: Test writers API
>>>>>      > Supported integration test framework concepts are basically the
>> same:
>>>>>      > - a test controller (test runner)
>>>>>      > - a cluster
>>>>>      > - a node
>>>>>      > - an application (a service in Ducktape terms)
>>>>>      > - a test
>>>>>      > Score: Tiden: 5, Ducktape: 5
>>>>>      >
>>>>>      > Criteria: Tests selection and run
>>>>>      > Ducktape: suite-package-class-method level selection, internal
>>>>>     scheduler allows to run tests in suite in parallel.
>>>>>      > Tiden: also suite-package-class-method level selection,
>>>>>     additionally allows selecting subset of tests by attribute,
>> parallel
>>>>>     runs not built in, but allows merging test reports after different
>> runs.
>>>>>      > Score: Tiden: 2, Ducktape: 2
>>>>>      >
>>>>>      > Criteria: Test configuration
>>>>>      > Ducktape: single JSON string for all tests
>>>>>      > Tiden: any number of YaML config files, command line option for
>>>>>     fine-grained test configuration, ability to select/modify tests
>>>>>     behavior based on Ignite version.
>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Cluster control
>>>>>      > Ducktape: allow execute remote commands by node granularity
>>>>>      > Tiden: additionally can address cluster as a whole and execute
>>>>>     remote commands in parallel.
>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Logs control
>>>>>      > Both frameworks have similar builtin support for remote logs
>>>>>     collection and grepping. Tiden has built-in plugin that can zip,
>>>>>     collect arbitrary log files from arbitrary locations at
>>>>>     test/module/suite granularity and unzip if needed, also application
>>>>>     API to search / wait for messages in logs. Ducktape allows each
>>>>>     service declare its log files location (seemingly does not support
>>>>>     logs rollback), and a single entrypoint to collect service logs.
>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Test assertions
>>>>>      > Tiden: simple asserts, also few customized assertion helpers.
>>>>>      > Ducktape: simple asserts.
>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Test reporting
>>>>>      > Ducktape: limited to its own text/html format
>>>>>      > Tiden: provides text report, yaml report for reporting tools
>>>>>     integration, XML xUnit report for integration with
>> Jenkins/TeamCity.
>>>>>      > Score: Tiden: 3, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Provisioning and deployment
>>>>>      > Ducktape: can provision subset of hosts from cluster for test
>>>>>     needs. However, that means, that test can’t be scaled without test
>>>>>     code changes. Does not do any deploy, relies on external means,
>> e.g.
>>>>>     pre-packaged in docker image, as in PoC.
>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
>> test.
>>>>>     Provisioning should be done by external means. However, provides a
>>>>>     conventional automated deployment routines.
>>>>>      > Score: Tiden: 1, Ducktape: 1
>>>>>      >
>>>>>      > Criteria: Documentation and Extensibility
>>>>>      > Tiden: current API documentation is limited, should change as we
>>>>>     go open source. Tiden is easily extensible via hooks and plugins,
>>>>>     see example Maven plugin and Gatling application at [11].
>>>>>      > Ducktape: basic documentation at readthedocs.io
>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
>>>>>     tightly coupled and hard to change. The only possible extension
>>>>>     mechanism is fork-and-rewrite.
>>>>>      > Score: Tiden: 2, Ducktape: 1
>>>>>      >
>>>>>      > I can continue more on this, but it should be enough for now:
>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
>>>>>      >
>>>>>      > Time for discussion!
>>>>>      >
>>>>>      > ---
>>>>>      > [1] - https://www.testcontainers.org/
>>>>>      > [2] - http://arquillian.org/guides/getting_started/
>>>>>      > [3] - https://jmeter.apache.org/index.html
>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>>>>      > [5] - https://gatling.io/docs/current/
>>>>>      > [6] - https://github.com/gridgain/yardstick
>>>>>      > [7] - https://github.com/gridgain/poc-tester
>>>>>      > [8] -
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
>>>>>      > [9] - https://github.com/gridgain/tiden
>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
>>>>>      >
>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>>>>      >> Hello,
>>>>>      >>
>>>>>      >> Branch with duck tape created -
>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
>>>>>      >>
>>>>>      >> Any who are willing to contribute to PoC are welcome.
>>>>>      >>
>>>>>      >>
>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>>>>     <nizhikov.dev@gmail.com <ma...@gmail.com>>
>> написал(а):
>>>>>      >>>
>>>>>      >>> Hello, Denis.
>>>>>      >>>
>>>>>      >>> There is no rush with these improvements.
>>>>>      >>> We can wait for Maxim proposal and compare two solutions :)
>>>>>      >>>
>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>>>>     <ma...@apache.org>> написал(а):
>>>>>      >>>>
>>>>>      >>>> Hi Nikolay,
>>>>>      >>>>
>>>>>      >>>> Thanks for kicking off this conversation and sharing your
>>>>>     findings with the
>>>>>      >>>> results. That's the right initiative. I do agree that Ignite
>>>>>     needs to have
>>>>>      >>>> an integration testing framework with capabilities listed by
>> you.
>>>>>      >>>>
>>>>>      >>>> As we discussed privately, I would only check if instead of
>>>>>      >>>> Confluent's Ducktape library, we can use an integration
>>>>>     testing framework
>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
>> clusters.
>>>>>     That
>>>>>      >>>> framework has been battle-tested and might be more
>> convenient for
>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>>>>      >>>> <mshonichev@gridgain.com <ma...@gridgain.com>>
>> who
>>>>>     promised to join this thread once he finishes
>>>>>      >>>> preparing the usage examples of the framework. To my
>>>>>     knowledge, Max has
>>>>>      >>>> already been working on that for several days.
>>>>>      >>>>
>>>>>      >>>> -
>>>>>      >>>> Denis
>>>>>      >>>>
>>>>>      >>>>
>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>>>>     <nizhikov@apache.org <ma...@apache.org>>
>>>>>      >>>> wrote:
>>>>>      >>>>
>>>>>      >>>>> Hello, Igniters.
>>>>>      >>>>>
>>>>>      >>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>>      >>>>>
>>>>>      >>>>> Let me briefly explain the gap I want to cover:
>>>>>      >>>>>
>>>>>      >>>>> 1. For now, we don’t have a solution for automated testing
>> of
>>>>>     Ignite on
>>>>>      >>>>> «real cluster».
>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
>>>>>      >>>>>       * client and server nodes deployed on different hosts.
>>>>>      >>>>>       * thin clients perform queries from some other hosts
>>>>>      >>>>>       * etc.
>>>>>      >>>>>
>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of some
>>>>>     internal
>>>>>      >>>>> Ignite process
>>>>>      >>>>>       * PME
>>>>>      >>>>>       * rebalance.
>>>>>      >>>>> This means we don’t know - Do we perform rebalance(or PME)
>> in
>>>>>     2.7.0 faster
>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
>>>>>      >>>>>
>>>>>      >>>>> 3. We don’t have a solution for automated testing of Ignite
>>>>>     integration in
>>>>>      >>>>> a real-world environment:
>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
>>>>>      >>>>> I think some ML solutions also should be tested in
>> real-world
>>>>>     deployments.
>>>>>      >>>>>
>>>>>      >>>>> Solution:
>>>>>      >>>>>
>>>>>      >>>>> I propose to use duck tape library from confluent (apache
>> 2.0
>>>>>     license)
>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) and on
>> the
>>>>>     local
>>>>>      >>>>> environment(docker) and it works just fine.
>>>>>      >>>>>
>>>>>      >>>>> PoC contains following services:
>>>>>      >>>>>
>>>>>      >>>>>       * Simple rebalance test:
>>>>>      >>>>>               Start 2 server nodes,
>>>>>      >>>>>               Create some data with Ignite client,
>>>>>      >>>>>               Start one more server node,
>>>>>      >>>>>               Wait for rebalance finish
>>>>>      >>>>>       * Simple Ignite-Spark integration test:
>>>>>      >>>>>               Start 1 Spark master, start 1 Spark worker,
>>>>>      >>>>>               Start 1 Ignite server node
>>>>>      >>>>>               Create some data with Ignite client,
>>>>>      >>>>>               Check data in application that queries it from
>>>>>     Spark.
>>>>>      >>>>>
>>>>>      >>>>> All tests are fully automated.
>>>>>      >>>>> Logs collection works just fine.
>>>>>      >>>>> You can see an example of the tests report - [4].
>>>>>      >>>>>
>>>>>      >>>>> Pros:
>>>>>      >>>>>
>>>>>      >>>>> * Ability to test local changes(no need to public changes to
>>>>>     some remote
>>>>>      >>>>> repository or similar).
>>>>>      >>>>> * Ability to parametrize test environment(run the same tests
>>>>>     on different
>>>>>      >>>>> JDK, JVM params, config, etc.)
>>>>>      >>>>> * Isolation by default so system tests are as reliable as
>>>>>     possible.
>>>>>      >>>>> * Utilities for pulling up and tearing down services easily
>>>>>     in clusters in
>>>>>      >>>>> different environments (e.g. local, custom cluster, Vagrant,
>>>>>     K8s, Mesos,
>>>>>      >>>>> Docker, cloud providers, etc.)
>>>>>      >>>>> * Easy to write unit tests for distributed systems
>>>>>      >>>>> * Adopted and successfully used by other distributed open
>>>>>     source project -
>>>>>      >>>>> Apache Kafka.
>>>>>      >>>>> * Collect results (e.g. logs, console output)
>>>>>      >>>>> * Report results (e.g. expected conditions met, performance
>>>>>     results, etc.)
>>>>>      >>>>>
>>>>>      >>>>> WDYT?
>>>>>      >>>>>
>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
>>>>>      >>>>> [3]
>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>>>
>>
>>
>>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Denis Magda <dm...@apache.org>.

Folks,

Please share the summary of that Slack conversation here for records once
you find common ground.

-
Denis


On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <ni...@apache.org> wrote:

> Igniters.
>
> All who are interested in integration testing framework discussion are
> welcome into slack channel -
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
>
>
>
> > 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
> >
> > Max,
> > Thanks for joining us.
> >
> > > 1. tiden can deploy artifacts by itself, while ducktape relies on
> > > dependencies being deployed by external scripts.
> > No. It is important to distinguish development, deploy, and
> orchestration.
> > All-in-one solutions have extremely limited usability.
> > As to Ducktests:
> > Docker is responsible for deployments during development.
> > CI/CD is responsible for deployments during release and nightly checks.
> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> > Ducktape is responsible for orchestration.
> >
> > > 2. tiden can execute actions over remote nodes in real parallel
> fashion,
> > >while ducktape internally does all actions sequentially.
> > No. Ducktape may start any service in parallel. See Pme-free benchmark
> [1] for details.
> >
> > > if we used ducktape solution we would have to instead prepare some
> > > deployment scripts to pre-initialize Sberbank hosts, for example, with
> > > Ansible or Chef.
> > Sure, because a way of deploy depends on infrastructure.
> > How can we be sure that OS we use and the restrictions we have will be
> compatible with Tiden?
> >
> > > You have solved this deficiency with docker by putting all dependencies
> > > into one uber-image ...
> > and
> > > I guess we all know about docker hyped ability to run over distributed
> > >virtual networks.
> > It is very important not to confuse the test's development (docker image
> you're talking about) and real deployment.
> >
> > > If we had stopped and started 5 nodes one-by-one, as ducktape does
> > All actions can be performed in parallel.
> > See how Ducktests [2] starts cluster in parallel for example.
> >
> > [1]
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> > [2]
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >
> > On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org>
> wrote:
> > Hello, Maxim.
> >
> > > 1. tiden can deploy artifacts by itself, while ducktape relies on
> dependencies being deployed by external scripts
> >
> > Why do you think that maintaining deploy scripts coupled with the
> testing framework is an advantage?
> > I thought we want to see and maintain deployment scripts separate from
> the testing framework.
> >
> > > 2. tiden can execute actions over remote nodes in real parallel
> fashion, while ducktape internally does all actions sequentially.
> >
> > Can you, please, clarify, what actions do you have in mind?
> > And why we want to execute them concurrently?
> > Ignite node start, Client application execution can be done concurrently
> with the ducktape approach.
> >
> > > If we used ducktape solution we would have to instead prepare some
> deployment scripts to pre-initialize Sberbank hosts, for example, with
> Ansible or Chef
> >
> > We shouldn’t take some user approach as an argument in this discussion.
> Let’s discuss a general approach for all users of the Ignite. Anyway, what
> is wrong with the external deployment script approach?
> >
> > We, as a community, should provide several ways to run integration tests
> out-of-the-box AND the ability to customize deployment regarding the user
> landscape.
> >
> > > You have solved this deficiency with docker by putting all
> dependencies into one uber-image and that looks like simple and elegant
> solution however, that effectively limits you to single-host testing.
> >
> > Docker image should be used only by the Ignite developers to test
> something locally.
> > It’s not intended for some real-world testing.
> >
> > The main issue with the Tiden that I see, it tested and maintained as a
> closed source solution.
> > This can lead to the hard to solve problems when we start using and
> maintaining it as an open-source solution.
> > Like, how many developers used Tiden? And how many of developers were
> not authors of the Tiden itself?
> >
> >
> > > 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru>
> написал(а):
> > >
> > > Anton, Nikolay,
> > >
> > > Let's agree on what we are arguing about: whether it is about "like or
> don't like" or about technical properties of suggested solutions.
> > >
> > > If it is about likes and dislikes, then the whole discussion is
> meaningless. However, I hope together we can analyse pros and cons
> carefully.
> > >
> > > As far as I can understand now, two main differences between ducktape
> and tiden is that:
> > >
> > > 1. tiden can deploy artifacts by itself, while ducktape relies on
> dependencies being deployed by external scripts.
> > >
> > > 2. tiden can execute actions over remote nodes in real parallel
> fashion, while ducktape internally does all actions sequentially.
> > >
> > > As for me, these are very important properties for distributed testing
> framework.
> > >
> > > First property let us easily reuse tiden in existing infrastructures,
> for example, during Zookeeper IEP testing at Sberbank site we used the same
> tiden scripts that we use in our lab, the only change was putting a list of
> hosts into config.
> > >
> > > If we used ducktape solution we would have to instead prepare some
> deployment scripts to pre-initialize Sberbank hosts, for example, with
> Ansible or Chef.
> > >
> > >
> > > You have solved this deficiency with docker by putting all
> dependencies into one uber-image and that looks like simple and elegant
> solution,
> > > however, that effectively limits you to single-host testing.
> > >
> > > I guess we all know about docker hyped ability to run over distributed
> virtual networks. We used to go that way, but quickly found that it is more
> of the hype than real work. In real environments, there are problems with
> routing, DNS, multicast and broadcast traffic, and many others, that turn
> docker-based distributed solution into a fragile hard-to-maintain monster.
> > >
> > > Please, if you believe otherwise, perform a run of your PoC over at
> least two physical hosts and share results with us.
> > >
> > > If you consider that one physical docker host is enough, please, don't
> overlook that we want to run real scale scenarios, with 50-100 cache
> groups, persistence enabled and a millions of keys loaded.
> > >
> > > Practical limit for such configurations is 4-6 nodes per single
> physical host. Otherwise, tests become flaky due to resource starvation.
> > >
> > > Please, if you believe otherwise, perform at least a 10 of runs of
> your PoC with other tests running at TC (we're targeting TeamCity, right?)
> and share results so we could check if the numbers are reproducible.
> > >
> > > I stress this once more: functional integration tests are OK to run in
> Docker and CI, but running benchmarks in Docker is a big NO GO.
> > >
> > >
> > > Second property let us write tests that require real-parallel actions
> over hosts.
> > >
> > > For example, agreed scenario for PME benchmarkduring "PME optimization
> stream" was as follows:
> > >
> > >  - 10 server nodes, preloaded with 1M of keys
> > >  - 4 client nodes perform transactional load  (client nodes physically
> separated from server nodes)
> > >  - during load:
> > >  -- 5 server nodes stopped in parallel
> > >  -- after 1 minute, all 5 nodes are started in parallel
> > >  - load stopped, logs are analysed for exchange times.
> > >
> > > If we had stopped and started 5 nodes one-by-one, as ducktape does,
> then partition map exchange merge would not happen and we could not have
> measured PME optimizations for that case.
> > >
> > >
> > > These are limitations of ducktape that we believe as a more important
> > > argument "against" than you provide "for".
> > >
> > >
> > >
> > >
> > > On 30.06.2020 14:58, Anton Vinogradov wrote:
> > >> Folks,
> > >> First, I've created PR [1] with ducktests improvements
> > >> PR contains the following changes
> > >> - Pme-free switch proof-benchmark (2.7.6 vs master)
> > >> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
> > >> - Global refactoring
> > >> -- benchmarks javacode simplification
> > >> -- services python and java classes code deduplication
> > >> -- fail-fast checks for java and python (eg. application should
> explicitly write it finished with success)
> > >> -- simple results extraction from tests and benchmarks
> > >> -- javacode now configurable from tests/benchmarks
> > >> -- proper SIGTERM handling at javacode (eg. it may finish last
> operation and log results)
> > >> -- docker volume now marked as delegated to increase execution speed
> for mac & win users
> > >> -- Ignite cluster now start in parallel (start speed-up)
> > >> -- Ignite can be configured at test/benchmark
> > >> - full and module assembly scripts added
> > > Great job done! But let me remind one of Apache Ignite principles:
> > > week of thinking save months of development.
> > >
> > >
> > >> Second, I'd like to propose to accept ducktests [2] (ducktape
> integration) as a target "PoC check & real topology benchmarking tool".
> > >> Ducktape pros
> > >> - Developed for distributed system by distributed system developers.
> > > So does Tiden
> > >
> > >> - Developed since 2014, stable.
> > > Tiden is also pretty stable, and development start date is not a good
> argument, for example pytest is since 2004, pytest-xdist (plugin for
> distributed testing) is since 2010, but we don't see it as a alternative at
> all.
> > >
> > >> - Proven usability by usage at Kafka.
> > > Tiden is proven usable by usage at GridGain and Sberbank deployments.
> > > Core, storage, sql and tx teams use benchmark results provided by
> Tiden on a daily basis.
> > >
> > >> - Dozens of dozens tests and benchmarks at Kafka as a great example
> pack.
> > > We'll donate some of our suites to Ignite as I've mentioned in
> previous letter.
> > >
> > >> - Built-in Docker support for rapid development and checks.
> > > False, there's no specific 'docker support' in ducktape itself, you
> just wrap it in docker by yourself, because ducktape is lacking deployment
> abilities.
> > >
> > >> - Great for CI automation.
> > > False, there's no specific CI-enabled features in ducktape. Tiden, on
> the other hand, provide generic xUnit reporting format, which is supported
> by both TeamCity and Jenkins. Also, instead of using private keys, Tiden
> can use SSH agent, which is also great for CI, because both
> > > TeamCity and Jenkins store keys in secret storage available only for
> ssh-agent and only for the time of the test.
> > >
> > >
> > >> > As an additional motivation, at least 3 teams
> > >> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
> speed-up))
> > >> - Ignite SE Plugins team (to check plugin's features does not
> slow-down or broke AI features)
> > >> - Ignite SE QA team (to append already developed smoke/load/failover
> tests to AI codebase)
> > >
> > > Please, before recommending your tests to other teams, provide proofs
> > > that your tests are reproducible in real environment.
> > >
> > >
> > >> now, wait for ducktest merge to start checking cases they working on
> in AI way.
> > >> Thoughts?
> > > Let us together review both solutions, we'll try to run your tests in
> our lab, and you'll try to at least checkout tiden and see if same tests
> can be implemented with it?
> > >
> > >
> > >
> > >> [1] https://github.com/apache/ignite/pull/7967
> > >> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> > >> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org
> <ma...@apache.org>> wrote:
> > >>    Hello, Maxim.
> > >>    Thank you for so detailed explanation.
> > >>    Can we put the content of this discussion somewhere on the wiki?
> > >>    So It doesn’t get lost.
> > >>    I divide the answer in several parts. From the requirements to the
> > >>    implementation.
> > >>    So, if we agreed on the requirements we can proceed with the
> > >>    discussion of the implementation.
> > >>    1. Requirements:
> > >>    The main goal I want to achieve is *reproducibility* of the tests.
> > >>    I’m sick and tired with the zillions of flaky, rarely failed, and
> > >>    almost never failed tests in Ignite codebase.
> > >>    We should start with the simplest scenarios that will be as
> reliable
> > >>    as steel :)
> > >>    I want to know for sure:
> > >>       - Is this PR makes rebalance quicker or not?
> > >>       - Is this PR makes PME quicker or not?
> > >>    So, your description of the complex test scenario looks as a next
> > >>    step to me.
> > >>    Anyway, It’s cool we already have one.
> > >>    The second goal is to have a strict test lifecycle as we have in
> > >>    JUnit and similar frameworks.
> > >>     > It covers production-like deployment and running a scenarios
> over
> > >>    a single database instance.
> > >>    Do you mean «single cluster» or «single host»?
> > >>    2. Existing tests:
> > >>     > A Combinator suite allows to run set of operations concurrently
> > >>    over given database instance.
> > >>     > A Consumption suite allows to run a set production-like actions
> > >>    over given set of Ignite/GridGain versions and compare test metrics
> > >>    across versions
> > >>     > A Yardstick suite
> > >>     > A Stress suite that simulates hardware environment degradation
> > >>     > An Ultimate, DR and Compatibility suites that performs
> functional
> > >>    regression testing
> > >>     > Regression
> > >>    Great news that we already have so many choices for testing!
> > >>    Mature test base is a big +1 for Tiden.
> > >>    3. Comparison:
> > >>     > Criteria: Test configuration
> > >>     > Ducktape: single JSON string for all tests
> > >>     > Tiden: any number of YaML config files, command line option for
> > >>    fine-grained test configuration, ability to select/modify tests
> > >>    behavior based on Ignite version.
> > >>    1. Many YAML files can be hard to maintain.
> > >>    2. In ducktape, you can set parameters via «—parameters» option.
> > >>    Please, take a look at the doc [1]
> > >>     > Criteria: Cluster control
> > >>     > Tiden: additionally can address cluster as a whole and execute
> > >>    remote commands in parallel.
> > >>    It seems we implement this ability in the PoC, already.
> > >>     > Criteria: Test assertions
> > >>     > Tiden: simple asserts, also few customized assertion helpers.
> > >>     > Ducktape: simple asserts.
> > >>    Can you, please, be more specific.
> > >>    What helpers do you have in mind?
> > >>    Ducktape has an asserts that waits for logfile messages or some
> > >>    process finish.
> > >>     > Criteria: Test reporting
> > >>     > Ducktape: limited to its own text/HTML format
> > >>    Ducktape have
> > >>    1. Text reporter
> > >>    2. Customizable HTML reporter
> > >>    3. JSON reporter.
> > >>    We can show JSON with the any template or tool.
> > >>     > Criteria: Provisioning and deployment
> > >>     > Ducktape: can provision subset of hosts from cluster for test
> > >>    needs. However, that means, that test can’t be scaled without test
> > >>    code changes. Does not do any deploy, relies on external means,
> e.g.
> > >>    pre-packaged in docker image, as in PoC.
> > >>    This is not true.
> > >>    1. We can set explicit test parameters(node number) via parameters.
> > >>    We can increase client count of cluster size without test code
> changes.
> > >>    2. We have many choices for the test environment. These choices are
> > >>    tested and used in other projects:
> > >>             * docker
> > >>             * vagrant
> > >>             * private cloud(ssh access)
> > >>             * ec2
> > >>    Please, take a look at Kafka documentation [2]
> > >>     > I can continue more on this, but it should be enough for now:
> > >>    We need to go deeper! :)
> > >>    [1]
> > >>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> > >>    [2]
> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> > >>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
> > >>    <ma...@yandex.ru>> написал(а):
> > >>     >
> > >>     > Greetings, Nikolay,
> > >>     >
> > >>     > First of all, thank you for you great effort preparing PoC of
> > >>    integration testing to Ignite community.
> > >>     >
> > >>     > It’s a shame Ignite did not have at least some such tests yet,
> > >>    however, GridGain, as a major contributor to Apache Ignite had a
> > >>    profound collection of in-house tools to perform integration and
> > >>    performance testing for years already and while we slowly consider
> > >>    sharing our expertise with the community, your initiative makes us
> > >>    drive that process a bit faster, thanks a lot!
> > >>     >
> > >>     > I reviewed your PoC and want to share a little about what we do
> > >>    on our part, why and how, hope it would help community take proper
> > >>    course.
> > >>     >
> > >>     > First I’ll do a brief overview of what decisions we made and
> what
> > >>    we do have in our private code base, next I’ll describe what we
> have
> > >>    already donated to the public and what we plan public next, then
> > >>    I’ll compare both approaches highlighting deficiencies in order to
> > >>    spur public discussion on the matter.
> > >>     >
> > >>     > It might seem strange to use Python to run Bash to run Java
> > >>    applications because that introduces IT industry best of breed’ –
> > >>    the Python dependency hell – to the Java application code base. The
> > >>    only strangest decision one can made is to use Maven to run Docker
> > >>    to run Bash to run Python to run Bash to run Java, but desperate
> > >>    times call for desperate measures I guess.
> > >>     >
> > >>     > There are Java-based solutions for integration testing exists,
> > >>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go
> well
> > >>    for Ignite community CI pipelines by them selves. But we also
> wanted
> > >>    to run performance tests and benchmarks, like the dreaded PME
> > >>    benchmark, and this is solved by totally different set of tools in
> > >>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> > >>     >
> > >>     > Speaking specifically about benchmarking, Apache Ignite
> community
> > >>    already has Yardstick [6], and there’s nothing wrong with writing
> > >>    PME benchmark using Yardstick, but we also wanted to be able to run
> > >>    scenarios like this:
> > >>     > - put an X load to a Ignite database;
> > >>     > - perform an Y set of operations to check how Ignite copes with
> > >>    operations under load.
> > >>     >
> > >>     > And yes, we also wanted applications under test be deployed
> ‘like
> > >>    in a production’, e.g. distributed over a set of hosts. This arises
> > >>    questions about provisioning and nodes affinity which I’ll cover in
> > >>    detail later.
> > >>     >
> > >>     > So we decided to put a little effort to build a simple tool to
> > >>    cover different integration and performance scenarios, and our QA
> > >>    lab first attempt was PoC-Tester [7], currently open source for all
> > >>    but for reporting web UI. It’s a quite simple to use 95% Java-based
> > >>    tool targeted to be run on a pre-release QA stage.
> > >>     >
> > >>     > It covers production-like deployment and running a scenarios
> over
> > >>    a single database instance. PoC-Tester scenarios consists of a
> > >>    sequence of tasks running sequentially or in parallel. After all
> > >>    tasks complete, or at any time during test, user can run logs
> > >>    collection task, logs are checked against exceptions and a summary
> > >>    of found issues and task ops/latency statistics is generated at the
> > >>    end of scenario. One of the main PoC-Tester features is its
> > >>    fire-and-forget approach to task managing. That is, you can deploy
> a
> > >>    grid and left it running for weeks, periodically firing some tasks
> > >>    onto it.
> > >>     >
> > >>     > During earliest stages of PoC-Tester development it becomes
> quite
> > >>    clear that Java application development is a tedious process and
> > >>    architecture decisions you take during development are slow and
> hard
> > >>    to change.
> > >>     > For example, scenarios like this
> > >>     > - deploy two instances of GridGain with master-slave data
> > >>    replication configured;
> > >>     > - put a load on master;
> > >>     > - perform checks on slave,
> > >>     > or like this:
> > >>     > - preload a 1Tb of data by using your favorite tool of choice to
> > >>    an Apache Ignite of version X;
> > >>     > - run a set of functional tests running Apache Ignite version Y
> > >>    over preloaded data,
> > >>     > do not fit well in the PoC-Tester workflow.
> > >>     >
> > >>     > So, this is why we decided to use Python as a generic scripting
> > >>    language of choice.
> > >>     >
> > >>     > Pros:
> > >>     > - quicker prototyping and development cycles
> > >>     > - easier to find DevOps/QA engineer with Python skills than one
> > >>    with Java skills
> > >>     > - used extensively all over the world for DevOps/CI pipelines
> and
> > >>    thus has rich set of libraries for all possible integration uses
> cases.
> > >>     >
> > >>     > Cons:
> > >>     > - Nightmare with dependencies. Better stick to specific
> > >>    language/libraries version.
> > >>     >
> > >>     > Comparing alternatives for Python-based testing framework we
> have
> > >>    considered following requirements, somewhat similar to what you’ve
> > >>    mentioned for Confluent [8] previously:
> > >>     > - should be able run locally or distributed (bare metal or in
> the
> > >>    cloud)
> > >>     > - should have built-in deployment facilities for applications
> > >>    under test
> > >>     > - should separate test configuration and test code
> > >>     > -- be able to easily reconfigure tests by simple configuration
> > >>    changes
> > >>     > -- be able to easily scale test environment by simple
> > >>    configuration changes
> > >>     > -- be able to perform regression testing by simple switching
> > >>    artifacts under test via configuration
> > >>     > -- be able to run tests with different JDK version by simple
> > >>    configuration changes
> > >>     > - should have human readable reports and/or reporting tools
> > >>    integration
> > >>     > - should allow simple test progress monitoring, one does not
> want
> > >>    to run 6-hours test to find out that application actually crashed
> > >>    during first hour.
> > >>     > - should allow parallel execution of test actions
> > >>     > - should have clean API for test writers
> > >>     > -- clean API for distributed remote commands execution
> > >>     > -- clean API for deployed applications start / stop and other
> > >>    operations
> > >>     > -- clean API for performing check on results
> > >>     > - should be open source or at least source code should allow
> ease
> > >>    change or extension
> > >>     >
> > >>     > Back at that time we found no better alternative than to write
> > >>    our own framework, and here goes Tiden [9] as GridGain framework of
> > >>    choice for functional integration and performance testing.
> > >>     >
> > >>     > Pros:
> > >>     > - solves all the requirements above
> > >>     > Cons (for Ignite):
> > >>     > - (currently) closed GridGain source
> > >>     >
> > >>     > On top of Tiden we’ve built a set of test suites, some of which
> > >>    you might have heard already.
> > >>     >
> > >>     > A Combinator suite allows to run set of operations concurrently
> > >>    over given database instance. Proven to find at least 30+ race
> > >>    conditions and NPE issues.
> > >>     >
> > >>     > A Consumption suite allows to run a set production-like actions
> > >>    over given set of Ignite/GridGain versions and compare test metrics
> > >>    across versions, like heap/disk/CPU consumption, time to perform
> > >>    actions, like client PME, server PME, rebalancing time, data
> > >>    replication time, etc.
> > >>     >
> > >>     > A Yardstick suite is a thin layer of Python glue code to run
> > >>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
> > >>    mediocre deployment capabilities, Tiden solves this easily.
> > >>     >
> > >>     > A Stress suite that simulates hardware environment degradation
> > >>    during testing.
> > >>     >
> > >>     > An Ultimate, DR and Compatibility suites that performs
> functional
> > >>    regression testing of GridGain Ultimate Edition features like
> > >>    snapshots, security, data replication, rolling upgrades, etc.
> > >>     >
> > >>     > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
> > >>    etc, etc, etc.
> > >>     >
> > >>     > Most of the suites above use another in-house developed Java
> tool
> > >>    – PiClient – to perform actual loading and miscellaneous operations
> > >>    with Ignite under test. We use py4j Python-Java gateway library to
> > >>    control PiClient instances from the tests.
> > >>     >
> > >>     > When we considered CI, we put TeamCity out of scope, because
> > >>    distributed integration and performance tests tend to run for hours
> > >>    and TeamCity agents are scarce and costly resource. So, bundled
> with
> > >>    Tiden there is jenkins-job-builder [10] based CI pipelines and
> > >>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates
> test
> > >>    run reports across versions and has built in visualization support
> > >>    for Combinator suite.
> > >>     >
> > >>     > All of the above is currently closed source, but we plan to make
> > >>    it public for community, and publishing Tiden core [9] is the first
> > >>    step on that way. You can review some examples of using Tiden for
> > >>    tests at my repository [11], for start.
> > >>     >
> > >>     > Now, let’s compare Ducktape PoC and Tiden.
> > >>     >
> > >>     > Criteria: Language
> > >>     > Tiden: Python, 3.7
> > >>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> > >>    compatible, but actually can’t work with Python 3.7 due to broken
> > >>    Zmq dependency.
> > >>     > Comment: Python 3.7 has a much better support for async-style
> > >>    code which might be crucial for distributed application testing.
> > >>     > Score: Tiden: 1, Ducktape: 0
> > >>     >
> > >>     > Criteria: Test writers API
> > >>     > Supported integration test framework concepts are basically the
> same:
> > >>     > - a test controller (test runner)
> > >>     > - a cluster
> > >>     > - a node
> > >>     > - an application (a service in Ducktape terms)
> > >>     > - a test
> > >>     > Score: Tiden: 5, Ducktape: 5
> > >>     >
> > >>     > Criteria: Tests selection and run
> > >>     > Ducktape: suite-package-class-method level selection, internal
> > >>    scheduler allows to run tests in suite in parallel.
> > >>     > Tiden: also suite-package-class-method level selection,
> > >>    additionally allows selecting subset of tests by attribute,
> parallel
> > >>    runs not built in, but allows merging test reports after different
> runs.
> > >>     > Score: Tiden: 2, Ducktape: 2
> > >>     >
> > >>     > Criteria: Test configuration
> > >>     > Ducktape: single JSON string for all tests
> > >>     > Tiden: any number of YaML config files, command line option for
> > >>    fine-grained test configuration, ability to select/modify tests
> > >>    behavior based on Ignite version.
> > >>     > Score: Tiden: 3, Ducktape: 1
> > >>     >
> > >>     > Criteria: Cluster control
> > >>     > Ducktape: allow execute remote commands by node granularity
> > >>     > Tiden: additionally can address cluster as a whole and execute
> > >>    remote commands in parallel.
> > >>     > Score: Tiden: 2, Ducktape: 1
> > >>     >
> > >>     > Criteria: Logs control
> > >>     > Both frameworks have similar builtin support for remote logs
> > >>    collection and grepping. Tiden has built-in plugin that can zip,
> > >>    collect arbitrary log files from arbitrary locations at
> > >>    test/module/suite granularity and unzip if needed, also application
> > >>    API to search / wait for messages in logs. Ducktape allows each
> > >>    service declare its log files location (seemingly does not support
> > >>    logs rollback), and a single entrypoint to collect service logs.
> > >>     > Score: Tiden: 1, Ducktape: 1
> > >>     >
> > >>     > Criteria: Test assertions
> > >>     > Tiden: simple asserts, also few customized assertion helpers.
> > >>     > Ducktape: simple asserts.
> > >>     > Score: Tiden: 2, Ducktape: 1
> > >>     >
> > >>     > Criteria: Test reporting
> > >>     > Ducktape: limited to its own text/html format
> > >>     > Tiden: provides text report, yaml report for reporting tools
> > >>    integration, XML xUnit report for integration with
> Jenkins/TeamCity.
> > >>     > Score: Tiden: 3, Ducktape: 1
> > >>     >
> > >>     > Criteria: Provisioning and deployment
> > >>     > Ducktape: can provision subset of hosts from cluster for test
> > >>    needs. However, that means, that test can’t be scaled without test
> > >>    code changes. Does not do any deploy, relies on external means,
> e.g.
> > >>    pre-packaged in docker image, as in PoC.
> > >>     > Tiden: Given a set of hosts, Tiden uses all of them for the
> test.
> > >>    Provisioning should be done by external means. However, provides a
> > >>    conventional automated deployment routines.
> > >>     > Score: Tiden: 1, Ducktape: 1
> > >>     >
> > >>     > Criteria: Documentation and Extensibility
> > >>     > Tiden: current API documentation is limited, should change as we
> > >>    go open source. Tiden is easily extensible via hooks and plugins,
> > >>    see example Maven plugin and Gatling application at [11].
> > >>     > Ducktape: basic documentation at readthedocs.io
> > >>    <http://readthedocs.io>. Codebase is rigid, framework core is
> > >>    tightly coupled and hard to change. The only possible extension
> > >>    mechanism is fork-and-rewrite.
> > >>     > Score: Tiden: 2, Ducktape: 1
> > >>     >
> > >>     > I can continue more on this, but it should be enough for now:
> > >>     > Overall score: Tiden: 22, Ducktape: 14.
> > >>     >
> > >>     > Time for discussion!
> > >>     >
> > >>     > ---
> > >>     > [1] - https://www.testcontainers.org/
> > >>     > [2] - http://arquillian.org/guides/getting_started/
> > >>     > [3] - https://jmeter.apache.org/index.html
> > >>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> > >>     > [5] - https://gatling.io/docs/current/
> > >>     > [6] - https://github.com/gridgain/yardstick
> > >>     > [7] - https://github.com/gridgain/poc-tester
> > >>     > [8] -
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> > >>     > [9] - https://github.com/gridgain/tiden
> > >>     > [10] - https://pypi.org/project/jenkins-job-builder/
> > >>     > [11] - https://github.com/mshonichev/tiden_examples
> > >>     >
> > >>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> > >>     >> Hello,
> > >>     >>
> > >>     >> Branch with duck tape created -
> > >>    https://github.com/apache/ignite/tree/ignite-ducktape
> > >>     >>
> > >>     >> Any who are willing to contribute to PoC are welcome.
> > >>     >>
> > >>     >>
> > >>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> > >>    <nizhikov.dev@gmail.com <ma...@gmail.com>>
> написал(а):
> > >>     >>>
> > >>     >>> Hello, Denis.
> > >>     >>>
> > >>     >>> There is no rush with these improvements.
> > >>     >>> We can wait for Maxim proposal and compare two solutions :)
> > >>     >>>
> > >>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> > >>    <ma...@apache.org>> написал(а):
> > >>     >>>>
> > >>     >>>> Hi Nikolay,
> > >>     >>>>
> > >>     >>>> Thanks for kicking off this conversation and sharing your
> > >>    findings with the
> > >>     >>>> results. That's the right initiative. I do agree that Ignite
> > >>    needs to have
> > >>     >>>> an integration testing framework with capabilities listed by
> you.
> > >>     >>>>
> > >>     >>>> As we discussed privately, I would only check if instead of
> > >>     >>>> Confluent's Ducktape library, we can use an integration
> > >>    testing framework
> > >>     >>>> developed by GridGain for testing of Ignite/GridGain
> clusters.
> > >>    That
> > >>     >>>> framework has been battle-tested and might be more
> convenient for
> > >>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> > >>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>>
> who
> > >>    promised to join this thread once he finishes
> > >>     >>>> preparing the usage examples of the framework. To my
> > >>    knowledge, Max has
> > >>     >>>> already been working on that for several days.
> > >>     >>>>
> > >>     >>>> -
> > >>     >>>> Denis
> > >>     >>>>
> > >>     >>>>
> > >>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> > >>    <nizhikov@apache.org <ma...@apache.org>>
> > >>     >>>> wrote:
> > >>     >>>>
> > >>     >>>>> Hello, Igniters.
> > >>     >>>>>
> > >>     >>>>> I created a PoC [1] for the integration tests of Ignite.
> > >>     >>>>>
> > >>     >>>>> Let me briefly explain the gap I want to cover:
> > >>     >>>>>
> > >>     >>>>> 1. For now, we don’t have a solution for automated testing
> of
> > >>    Ignite on
> > >>     >>>>> «real cluster».
> > >>     >>>>> By «real cluster» I mean cluster «like a production»:
> > >>     >>>>>       * client and server nodes deployed on different hosts.
> > >>     >>>>>       * thin clients perform queries from some other hosts
> > >>     >>>>>       * etc.
> > >>     >>>>>
> > >>     >>>>> 2. We don’t have a solution for automated benchmarks of some
> > >>    internal
> > >>     >>>>> Ignite process
> > >>     >>>>>       * PME
> > >>     >>>>>       * rebalance.
> > >>     >>>>> This means we don’t know - Do we perform rebalance(or PME)
> in
> > >>    2.7.0 faster
> > >>     >>>>> or slower than in 2.8.0 for the same cluster?
> > >>     >>>>>
> > >>     >>>>> 3. We don’t have a solution for automated testing of Ignite
> > >>    integration in
> > >>     >>>>> a real-world environment:
> > >>     >>>>> Ignite-Spark integration can be taken as an example.
> > >>     >>>>> I think some ML solutions also should be tested in
> real-world
> > >>    deployments.
> > >>     >>>>>
> > >>     >>>>> Solution:
> > >>     >>>>>
> > >>     >>>>> I propose to use duck tape library from confluent (apache
> 2.0
> > >>    license)
> > >>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on
> the
> > >>    local
> > >>     >>>>> environment(docker) and it works just fine.
> > >>     >>>>>
> > >>     >>>>> PoC contains following services:
> > >>     >>>>>
> > >>     >>>>>       * Simple rebalance test:
> > >>     >>>>>               Start 2 server nodes,
> > >>     >>>>>               Create some data with Ignite client,
> > >>     >>>>>               Start one more server node,
> > >>     >>>>>               Wait for rebalance finish
> > >>     >>>>>       * Simple Ignite-Spark integration test:
> > >>     >>>>>               Start 1 Spark master, start 1 Spark worker,
> > >>     >>>>>               Start 1 Ignite server node
> > >>     >>>>>               Create some data with Ignite client,
> > >>     >>>>>               Check data in application that queries it from
> > >>    Spark.
> > >>     >>>>>
> > >>     >>>>> All tests are fully automated.
> > >>     >>>>> Logs collection works just fine.
> > >>     >>>>> You can see an example of the tests report - [4].
> > >>     >>>>>
> > >>     >>>>> Pros:
> > >>     >>>>>
> > >>     >>>>> * Ability to test local changes(no need to public changes to
> > >>    some remote
> > >>     >>>>> repository or similar).
> > >>     >>>>> * Ability to parametrize test environment(run the same tests
> > >>    on different
> > >>     >>>>> JDK, JVM params, config, etc.)
> > >>     >>>>> * Isolation by default so system tests are as reliable as
> > >>    possible.
> > >>     >>>>> * Utilities for pulling up and tearing down services easily
> > >>    in clusters in
> > >>     >>>>> different environments (e.g. local, custom cluster, Vagrant,
> > >>    K8s, Mesos,
> > >>     >>>>> Docker, cloud providers, etc.)
> > >>     >>>>> * Easy to write unit tests for distributed systems
> > >>     >>>>> * Adopted and successfully used by other distributed open
> > >>    source project -
> > >>     >>>>> Apache Kafka.
> > >>     >>>>> * Collect results (e.g. logs, console output)
> > >>     >>>>> * Report results (e.g. expected conditions met, performance
> > >>    results, etc.)
> > >>     >>>>>
> > >>     >>>>> WDYT?
> > >>     >>>>>
> > >>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> > >>     >>>>> [2] https://github.com/confluentinc/ducktape
> > >>     >>>>> [3]
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> > >>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >
>
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Igniters.

All who are interested in integration testing framework discussion are welcome into slack channel - https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2



> 2 июля 2020 г., в 13:06, Anton Vinogradov <av...@apache.org> написал(а):
> 
> Max,
> Thanks for joining us.
> 
> > 1. tiden can deploy artifacts by itself, while ducktape relies on
> > dependencies being deployed by external scripts.
> No. It is important to distinguish development, deploy, and orchestration. 
> All-in-one solutions have extremely limited usability.
> As to Ducktests:
> Docker is responsible for deployments during development.
> CI/CD is responsible for deployments during release and nightly checks. It's up to the team to chose AWS, VM, BareMetal, and even OS.
> Ducktape is responsible for orchestration.
> 
> > 2. tiden can execute actions over remote nodes in real parallel fashion,
> >while ducktape internally does all actions sequentially.
> No. Ducktape may start any service in parallel. See Pme-free benchmark [1] for details.
> 
> > if we used ducktape solution we would have to instead prepare some
> > deployment scripts to pre-initialize Sberbank hosts, for example, with
> > Ansible or Chef.
> Sure, because a way of deploy depends on infrastructure.
> How can we be sure that OS we use and the restrictions we have will be compatible with Tiden?
> 
> > You have solved this deficiency with docker by putting all dependencies
> > into one uber-image ...
> and
> > I guess we all know about docker hyped ability to run over distributed
> >virtual networks.
> It is very important not to confuse the test's development (docker image you're talking about) and real deployment.
> 
> > If we had stopped and started 5 nodes one-by-one, as ducktape does
> All actions can be performed in parallel. 
> See how Ducktests [2] starts cluster in parallel for example.
> 
> [1] https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> [2] https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> 
> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org> wrote:
> Hello, Maxim.
> 
> > 1. tiden can deploy artifacts by itself, while ducktape relies on dependencies being deployed by external scripts
> 
> Why do you think that maintaining deploy scripts coupled with the testing framework is an advantage?
> I thought we want to see and maintain deployment scripts separate from the testing framework.
> 
> > 2. tiden can execute actions over remote nodes in real parallel fashion, while ducktape internally does all actions sequentially.
> 
> Can you, please, clarify, what actions do you have in mind?
> And why we want to execute them concurrently?
> Ignite node start, Client application execution can be done concurrently with the ducktape approach.
> 
> > If we used ducktape solution we would have to instead prepare some deployment scripts to pre-initialize Sberbank hosts, for example, with Ansible or Chef
> 
> We shouldn’t take some user approach as an argument in this discussion. Let’s discuss a general approach for all users of the Ignite. Anyway, what is wrong with the external deployment script approach?
> 
> We, as a community, should provide several ways to run integration tests out-of-the-box AND the ability to customize deployment regarding the user landscape.
> 
> > You have solved this deficiency with docker by putting all dependencies into one uber-image and that looks like simple and elegant solution however, that effectively limits you to single-host testing.
> 
> Docker image should be used only by the Ignite developers to test something locally.
> It’s not intended for some real-world testing.
> 
> The main issue with the Tiden that I see, it tested and maintained as a closed source solution.
> This can lead to the hard to solve problems when we start using and maintaining it as an open-source solution.
> Like, how many developers used Tiden? And how many of developers were not authors of the Tiden itself?
> 
> 
> > 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru> написал(а):
> > 
> > Anton, Nikolay,
> > 
> > Let's agree on what we are arguing about: whether it is about "like or don't like" or about technical properties of suggested solutions.
> > 
> > If it is about likes and dislikes, then the whole discussion is meaningless. However, I hope together we can analyse pros and cons carefully.
> > 
> > As far as I can understand now, two main differences between ducktape and tiden is that:
> > 
> > 1. tiden can deploy artifacts by itself, while ducktape relies on dependencies being deployed by external scripts.
> > 
> > 2. tiden can execute actions over remote nodes in real parallel fashion, while ducktape internally does all actions sequentially.
> > 
> > As for me, these are very important properties for distributed testing framework.
> > 
> > First property let us easily reuse tiden in existing infrastructures, for example, during Zookeeper IEP testing at Sberbank site we used the same tiden scripts that we use in our lab, the only change was putting a list of hosts into config.
> > 
> > If we used ducktape solution we would have to instead prepare some deployment scripts to pre-initialize Sberbank hosts, for example, with Ansible or Chef.
> > 
> > 
> > You have solved this deficiency with docker by putting all dependencies into one uber-image and that looks like simple and elegant solution,
> > however, that effectively limits you to single-host testing.
> > 
> > I guess we all know about docker hyped ability to run over distributed virtual networks. We used to go that way, but quickly found that it is more of the hype than real work. In real environments, there are problems with routing, DNS, multicast and broadcast traffic, and many others, that turn docker-based distributed solution into a fragile hard-to-maintain monster.
> > 
> > Please, if you believe otherwise, perform a run of your PoC over at least two physical hosts and share results with us.
> > 
> > If you consider that one physical docker host is enough, please, don't overlook that we want to run real scale scenarios, with 50-100 cache groups, persistence enabled and a millions of keys loaded.
> > 
> > Practical limit for such configurations is 4-6 nodes per single physical host. Otherwise, tests become flaky due to resource starvation.
> > 
> > Please, if you believe otherwise, perform at least a 10 of runs of your PoC with other tests running at TC (we're targeting TeamCity, right?) and share results so we could check if the numbers are reproducible.
> > 
> > I stress this once more: functional integration tests are OK to run in Docker and CI, but running benchmarks in Docker is a big NO GO.
> > 
> > 
> > Second property let us write tests that require real-parallel actions over hosts.
> > 
> > For example, agreed scenario for PME benchmarkduring "PME optimization stream" was as follows:
> > 
> >  - 10 server nodes, preloaded with 1M of keys
> >  - 4 client nodes perform transactional load  (client nodes physically separated from server nodes)
> >  - during load:
> >  -- 5 server nodes stopped in parallel
> >  -- after 1 minute, all 5 nodes are started in parallel
> >  - load stopped, logs are analysed for exchange times.
> > 
> > If we had stopped and started 5 nodes one-by-one, as ducktape does, then partition map exchange merge would not happen and we could not have measured PME optimizations for that case.
> > 
> > 
> > These are limitations of ducktape that we believe as a more important
> > argument "against" than you provide "for".
> > 
> > 
> > 
> > 
> > On 30.06.2020 14:58, Anton Vinogradov wrote:
> >> Folks,
> >> First, I've created PR [1] with ducktests improvements
> >> PR contains the following changes
> >> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
> >> - Global refactoring
> >> -- benchmarks javacode simplification
> >> -- services python and java classes code deduplication
> >> -- fail-fast checks for java and python (eg. application should explicitly write it finished with success)
> >> -- simple results extraction from tests and benchmarks
> >> -- javacode now configurable from tests/benchmarks
> >> -- proper SIGTERM handling at javacode (eg. it may finish last operation and log results)
> >> -- docker volume now marked as delegated to increase execution speed for mac & win users
> >> -- Ignite cluster now start in parallel (start speed-up)
> >> -- Ignite can be configured at test/benchmark
> >> - full and module assembly scripts added
> > Great job done! But let me remind one of Apache Ignite principles:
> > week of thinking save months of development.
> > 
> > 
> >> Second, I'd like to propose to accept ducktests [2] (ducktape integration) as a target "PoC check & real topology benchmarking tool".
> >> Ducktape pros
> >> - Developed for distributed system by distributed system developers.
> > So does Tiden
> > 
> >> - Developed since 2014, stable.
> > Tiden is also pretty stable, and development start date is not a good argument, for example pytest is since 2004, pytest-xdist (plugin for distributed testing) is since 2010, but we don't see it as a alternative at all.
> > 
> >> - Proven usability by usage at Kafka.
> > Tiden is proven usable by usage at GridGain and Sberbank deployments.
> > Core, storage, sql and tx teams use benchmark results provided by Tiden on a daily basis.
> > 
> >> - Dozens of dozens tests and benchmarks at Kafka as a great example pack.
> > We'll donate some of our suites to Ignite as I've mentioned in previous letter.
> > 
> >> - Built-in Docker support for rapid development and checks.
> > False, there's no specific 'docker support' in ducktape itself, you just wrap it in docker by yourself, because ducktape is lacking deployment abilities.
> > 
> >> - Great for CI automation.
> > False, there's no specific CI-enabled features in ducktape. Tiden, on the other hand, provide generic xUnit reporting format, which is supported by both TeamCity and Jenkins. Also, instead of using private keys, Tiden can use SSH agent, which is also great for CI, because both
> > TeamCity and Jenkins store keys in secret storage available only for ssh-agent and only for the time of the test.
> > 
> > 
> >> > As an additional motivation, at least 3 teams
> >> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix speed-up))
> >> - Ignite SE Plugins team (to check plugin's features does not slow-down or broke AI features)
> >> - Ignite SE QA team (to append already developed smoke/load/failover tests to AI codebase)
> > 
> > Please, before recommending your tests to other teams, provide proofs
> > that your tests are reproducible in real environment.
> > 
> > 
> >> now, wait for ducktest merge to start checking cases they working on in AI way.
> >> Thoughts?
> > Let us together review both solutions, we'll try to run your tests in our lab, and you'll try to at least checkout tiden and see if same tests can be implemented with it?
> > 
> > 
> > 
> >> [1] https://github.com/apache/ignite/pull/7967
> >> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org <ma...@apache.org>> wrote:
> >>    Hello, Maxim.
> >>    Thank you for so detailed explanation.
> >>    Can we put the content of this discussion somewhere on the wiki?
> >>    So It doesn’t get lost.
> >>    I divide the answer in several parts. From the requirements to the
> >>    implementation.
> >>    So, if we agreed on the requirements we can proceed with the
> >>    discussion of the implementation.
> >>    1. Requirements:
> >>    The main goal I want to achieve is *reproducibility* of the tests.
> >>    I’m sick and tired with the zillions of flaky, rarely failed, and
> >>    almost never failed tests in Ignite codebase.
> >>    We should start with the simplest scenarios that will be as reliable
> >>    as steel :)
> >>    I want to know for sure:
> >>       - Is this PR makes rebalance quicker or not?
> >>       - Is this PR makes PME quicker or not?
> >>    So, your description of the complex test scenario looks as a next
> >>    step to me.
> >>    Anyway, It’s cool we already have one.
> >>    The second goal is to have a strict test lifecycle as we have in
> >>    JUnit and similar frameworks.
> >>     > It covers production-like deployment and running a scenarios over
> >>    a single database instance.
> >>    Do you mean «single cluster» or «single host»?
> >>    2. Existing tests:
> >>     > A Combinator suite allows to run set of operations concurrently
> >>    over given database instance.
> >>     > A Consumption suite allows to run a set production-like actions
> >>    over given set of Ignite/GridGain versions and compare test metrics
> >>    across versions
> >>     > A Yardstick suite
> >>     > A Stress suite that simulates hardware environment degradation
> >>     > An Ultimate, DR and Compatibility suites that performs functional
> >>    regression testing
> >>     > Regression
> >>    Great news that we already have so many choices for testing!
> >>    Mature test base is a big +1 for Tiden.
> >>    3. Comparison:
> >>     > Criteria: Test configuration
> >>     > Ducktape: single JSON string for all tests
> >>     > Tiden: any number of YaML config files, command line option for
> >>    fine-grained test configuration, ability to select/modify tests
> >>    behavior based on Ignite version.
> >>    1. Many YAML files can be hard to maintain.
> >>    2. In ducktape, you can set parameters via «—parameters» option.
> >>    Please, take a look at the doc [1]
> >>     > Criteria: Cluster control
> >>     > Tiden: additionally can address cluster as a whole and execute
> >>    remote commands in parallel.
> >>    It seems we implement this ability in the PoC, already.
> >>     > Criteria: Test assertions
> >>     > Tiden: simple asserts, also few customized assertion helpers.
> >>     > Ducktape: simple asserts.
> >>    Can you, please, be more specific.
> >>    What helpers do you have in mind?
> >>    Ducktape has an asserts that waits for logfile messages or some
> >>    process finish.
> >>     > Criteria: Test reporting
> >>     > Ducktape: limited to its own text/HTML format
> >>    Ducktape have
> >>    1. Text reporter
> >>    2. Customizable HTML reporter
> >>    3. JSON reporter.
> >>    We can show JSON with the any template or tool.
> >>     > Criteria: Provisioning and deployment
> >>     > Ducktape: can provision subset of hosts from cluster for test
> >>    needs. However, that means, that test can’t be scaled without test
> >>    code changes. Does not do any deploy, relies on external means, e.g.
> >>    pre-packaged in docker image, as in PoC.
> >>    This is not true.
> >>    1. We can set explicit test parameters(node number) via parameters.
> >>    We can increase client count of cluster size without test code changes.
> >>    2. We have many choices for the test environment. These choices are
> >>    tested and used in other projects:
> >>             * docker
> >>             * vagrant
> >>             * private cloud(ssh access)
> >>             * ec2
> >>    Please, take a look at Kafka documentation [2]
> >>     > I can continue more on this, but it should be enough for now:
> >>    We need to go deeper! :)
> >>    [1]
> >>    https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>    [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
> >>    <ma...@yandex.ru>> написал(а):
> >>     >
> >>     > Greetings, Nikolay,
> >>     >
> >>     > First of all, thank you for you great effort preparing PoC of
> >>    integration testing to Ignite community.
> >>     >
> >>     > It’s a shame Ignite did not have at least some such tests yet,
> >>    however, GridGain, as a major contributor to Apache Ignite had a
> >>    profound collection of in-house tools to perform integration and
> >>    performance testing for years already and while we slowly consider
> >>    sharing our expertise with the community, your initiative makes us
> >>    drive that process a bit faster, thanks a lot!
> >>     >
> >>     > I reviewed your PoC and want to share a little about what we do
> >>    on our part, why and how, hope it would help community take proper
> >>    course.
> >>     >
> >>     > First I’ll do a brief overview of what decisions we made and what
> >>    we do have in our private code base, next I’ll describe what we have
> >>    already donated to the public and what we plan public next, then
> >>    I’ll compare both approaches highlighting deficiencies in order to
> >>    spur public discussion on the matter.
> >>     >
> >>     > It might seem strange to use Python to run Bash to run Java
> >>    applications because that introduces IT industry best of breed’ –
> >>    the Python dependency hell – to the Java application code base. The
> >>    only strangest decision one can made is to use Maven to run Docker
> >>    to run Bash to run Python to run Bash to run Java, but desperate
> >>    times call for desperate measures I guess.
> >>     >
> >>     > There are Java-based solutions for integration testing exists,
> >>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go well
> >>    for Ignite community CI pipelines by them selves. But we also wanted
> >>    to run performance tests and benchmarks, like the dreaded PME
> >>    benchmark, and this is solved by totally different set of tools in
> >>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>     >
> >>     > Speaking specifically about benchmarking, Apache Ignite community
> >>    already has Yardstick [6], and there’s nothing wrong with writing
> >>    PME benchmark using Yardstick, but we also wanted to be able to run
> >>    scenarios like this:
> >>     > - put an X load to a Ignite database;
> >>     > - perform an Y set of operations to check how Ignite copes with
> >>    operations under load.
> >>     >
> >>     > And yes, we also wanted applications under test be deployed ‘like
> >>    in a production’, e.g. distributed over a set of hosts. This arises
> >>    questions about provisioning and nodes affinity which I’ll cover in
> >>    detail later.
> >>     >
> >>     > So we decided to put a little effort to build a simple tool to
> >>    cover different integration and performance scenarios, and our QA
> >>    lab first attempt was PoC-Tester [7], currently open source for all
> >>    but for reporting web UI. It’s a quite simple to use 95% Java-based
> >>    tool targeted to be run on a pre-release QA stage.
> >>     >
> >>     > It covers production-like deployment and running a scenarios over
> >>    a single database instance. PoC-Tester scenarios consists of a
> >>    sequence of tasks running sequentially or in parallel. After all
> >>    tasks complete, or at any time during test, user can run logs
> >>    collection task, logs are checked against exceptions and a summary
> >>    of found issues and task ops/latency statistics is generated at the
> >>    end of scenario. One of the main PoC-Tester features is its
> >>    fire-and-forget approach to task managing. That is, you can deploy a
> >>    grid and left it running for weeks, periodically firing some tasks
> >>    onto it.
> >>     >
> >>     > During earliest stages of PoC-Tester development it becomes quite
> >>    clear that Java application development is a tedious process and
> >>    architecture decisions you take during development are slow and hard
> >>    to change.
> >>     > For example, scenarios like this
> >>     > - deploy two instances of GridGain with master-slave data
> >>    replication configured;
> >>     > - put a load on master;
> >>     > - perform checks on slave,
> >>     > or like this:
> >>     > - preload a 1Tb of data by using your favorite tool of choice to
> >>    an Apache Ignite of version X;
> >>     > - run a set of functional tests running Apache Ignite version Y
> >>    over preloaded data,
> >>     > do not fit well in the PoC-Tester workflow.
> >>     >
> >>     > So, this is why we decided to use Python as a generic scripting
> >>    language of choice.
> >>     >
> >>     > Pros:
> >>     > - quicker prototyping and development cycles
> >>     > - easier to find DevOps/QA engineer with Python skills than one
> >>    with Java skills
> >>     > - used extensively all over the world for DevOps/CI pipelines and
> >>    thus has rich set of libraries for all possible integration uses cases.
> >>     >
> >>     > Cons:
> >>     > - Nightmare with dependencies. Better stick to specific
> >>    language/libraries version.
> >>     >
> >>     > Comparing alternatives for Python-based testing framework we have
> >>    considered following requirements, somewhat similar to what you’ve
> >>    mentioned for Confluent [8] previously:
> >>     > - should be able run locally or distributed (bare metal or in the
> >>    cloud)
> >>     > - should have built-in deployment facilities for applications
> >>    under test
> >>     > - should separate test configuration and test code
> >>     > -- be able to easily reconfigure tests by simple configuration
> >>    changes
> >>     > -- be able to easily scale test environment by simple
> >>    configuration changes
> >>     > -- be able to perform regression testing by simple switching
> >>    artifacts under test via configuration
> >>     > -- be able to run tests with different JDK version by simple
> >>    configuration changes
> >>     > - should have human readable reports and/or reporting tools
> >>    integration
> >>     > - should allow simple test progress monitoring, one does not want
> >>    to run 6-hours test to find out that application actually crashed
> >>    during first hour.
> >>     > - should allow parallel execution of test actions
> >>     > - should have clean API for test writers
> >>     > -- clean API for distributed remote commands execution
> >>     > -- clean API for deployed applications start / stop and other
> >>    operations
> >>     > -- clean API for performing check on results
> >>     > - should be open source or at least source code should allow ease
> >>    change or extension
> >>     >
> >>     > Back at that time we found no better alternative than to write
> >>    our own framework, and here goes Tiden [9] as GridGain framework of
> >>    choice for functional integration and performance testing.
> >>     >
> >>     > Pros:
> >>     > - solves all the requirements above
> >>     > Cons (for Ignite):
> >>     > - (currently) closed GridGain source
> >>     >
> >>     > On top of Tiden we’ve built a set of test suites, some of which
> >>    you might have heard already.
> >>     >
> >>     > A Combinator suite allows to run set of operations concurrently
> >>    over given database instance. Proven to find at least 30+ race
> >>    conditions and NPE issues.
> >>     >
> >>     > A Consumption suite allows to run a set production-like actions
> >>    over given set of Ignite/GridGain versions and compare test metrics
> >>    across versions, like heap/disk/CPU consumption, time to perform
> >>    actions, like client PME, server PME, rebalancing time, data
> >>    replication time, etc.
> >>     >
> >>     > A Yardstick suite is a thin layer of Python glue code to run
> >>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
> >>    mediocre deployment capabilities, Tiden solves this easily.
> >>     >
> >>     > A Stress suite that simulates hardware environment degradation
> >>    during testing.
> >>     >
> >>     > An Ultimate, DR and Compatibility suites that performs functional
> >>    regression testing of GridGain Ultimate Edition features like
> >>    snapshots, security, data replication, rolling upgrades, etc.
> >>     >
> >>     > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
> >>    etc, etc, etc.
> >>     >
> >>     > Most of the suites above use another in-house developed Java tool
> >>    – PiClient – to perform actual loading and miscellaneous operations
> >>    with Ignite under test. We use py4j Python-Java gateway library to
> >>    control PiClient instances from the tests.
> >>     >
> >>     > When we considered CI, we put TeamCity out of scope, because
> >>    distributed integration and performance tests tend to run for hours
> >>    and TeamCity agents are scarce and costly resource. So, bundled with
> >>    Tiden there is jenkins-job-builder [10] based CI pipelines and
> >>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates test
> >>    run reports across versions and has built in visualization support
> >>    for Combinator suite.
> >>     >
> >>     > All of the above is currently closed source, but we plan to make
> >>    it public for community, and publishing Tiden core [9] is the first
> >>    step on that way. You can review some examples of using Tiden for
> >>    tests at my repository [11], for start.
> >>     >
> >>     > Now, let’s compare Ducktape PoC and Tiden.
> >>     >
> >>     > Criteria: Language
> >>     > Tiden: Python, 3.7
> >>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>    compatible, but actually can’t work with Python 3.7 due to broken
> >>    Zmq dependency.
> >>     > Comment: Python 3.7 has a much better support for async-style
> >>    code which might be crucial for distributed application testing.
> >>     > Score: Tiden: 1, Ducktape: 0
> >>     >
> >>     > Criteria: Test writers API
> >>     > Supported integration test framework concepts are basically the same:
> >>     > - a test controller (test runner)
> >>     > - a cluster
> >>     > - a node
> >>     > - an application (a service in Ducktape terms)
> >>     > - a test
> >>     > Score: Tiden: 5, Ducktape: 5
> >>     >
> >>     > Criteria: Tests selection and run
> >>     > Ducktape: suite-package-class-method level selection, internal
> >>    scheduler allows to run tests in suite in parallel.
> >>     > Tiden: also suite-package-class-method level selection,
> >>    additionally allows selecting subset of tests by attribute, parallel
> >>    runs not built in, but allows merging test reports after different runs.
> >>     > Score: Tiden: 2, Ducktape: 2
> >>     >
> >>     > Criteria: Test configuration
> >>     > Ducktape: single JSON string for all tests
> >>     > Tiden: any number of YaML config files, command line option for
> >>    fine-grained test configuration, ability to select/modify tests
> >>    behavior based on Ignite version.
> >>     > Score: Tiden: 3, Ducktape: 1
> >>     >
> >>     > Criteria: Cluster control
> >>     > Ducktape: allow execute remote commands by node granularity
> >>     > Tiden: additionally can address cluster as a whole and execute
> >>    remote commands in parallel.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > Criteria: Logs control
> >>     > Both frameworks have similar builtin support for remote logs
> >>    collection and grepping. Tiden has built-in plugin that can zip,
> >>    collect arbitrary log files from arbitrary locations at
> >>    test/module/suite granularity and unzip if needed, also application
> >>    API to search / wait for messages in logs. Ducktape allows each
> >>    service declare its log files location (seemingly does not support
> >>    logs rollback), and a single entrypoint to collect service logs.
> >>     > Score: Tiden: 1, Ducktape: 1
> >>     >
> >>     > Criteria: Test assertions
> >>     > Tiden: simple asserts, also few customized assertion helpers.
> >>     > Ducktape: simple asserts.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > Criteria: Test reporting
> >>     > Ducktape: limited to its own text/html format
> >>     > Tiden: provides text report, yaml report for reporting tools
> >>    integration, XML xUnit report for integration with Jenkins/TeamCity.
> >>     > Score: Tiden: 3, Ducktape: 1
> >>     >
> >>     > Criteria: Provisioning and deployment
> >>     > Ducktape: can provision subset of hosts from cluster for test
> >>    needs. However, that means, that test can’t be scaled without test
> >>    code changes. Does not do any deploy, relies on external means, e.g.
> >>    pre-packaged in docker image, as in PoC.
> >>     > Tiden: Given a set of hosts, Tiden uses all of them for the test.
> >>    Provisioning should be done by external means. However, provides a
> >>    conventional automated deployment routines.
> >>     > Score: Tiden: 1, Ducktape: 1
> >>     >
> >>     > Criteria: Documentation and Extensibility
> >>     > Tiden: current API documentation is limited, should change as we
> >>    go open source. Tiden is easily extensible via hooks and plugins,
> >>    see example Maven plugin and Gatling application at [11].
> >>     > Ducktape: basic documentation at readthedocs.io
> >>    <http://readthedocs.io>. Codebase is rigid, framework core is
> >>    tightly coupled and hard to change. The only possible extension
> >>    mechanism is fork-and-rewrite.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > I can continue more on this, but it should be enough for now:
> >>     > Overall score: Tiden: 22, Ducktape: 14.
> >>     >
> >>     > Time for discussion!
> >>     >
> >>     > ---
> >>     > [1] - https://www.testcontainers.org/
> >>     > [2] - http://arquillian.org/guides/getting_started/
> >>     > [3] - https://jmeter.apache.org/index.html
> >>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>     > [5] - https://gatling.io/docs/current/
> >>     > [6] - https://github.com/gridgain/yardstick
> >>     > [7] - https://github.com/gridgain/poc-tester
> >>     > [8] -
> >>    https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>     > [9] - https://github.com/gridgain/tiden
> >>     > [10] - https://pypi.org/project/jenkins-job-builder/
> >>     > [11] - https://github.com/mshonichev/tiden_examples
> >>     >
> >>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>     >> Hello,
> >>     >>
> >>     >> Branch with duck tape created -
> >>    https://github.com/apache/ignite/tree/ignite-ducktape
> >>     >>
> >>     >> Any who are willing to contribute to PoC are welcome.
> >>     >>
> >>     >>
> >>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>    <nizhikov.dev@gmail.com <ma...@gmail.com>> написал(а):
> >>     >>>
> >>     >>> Hello, Denis.
> >>     >>>
> >>     >>> There is no rush with these improvements.
> >>     >>> We can wait for Maxim proposal and compare two solutions :)
> >>     >>>
> >>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> >>    <ma...@apache.org>> написал(а):
> >>     >>>>
> >>     >>>> Hi Nikolay,
> >>     >>>>
> >>     >>>> Thanks for kicking off this conversation and sharing your
> >>    findings with the
> >>     >>>> results. That's the right initiative. I do agree that Ignite
> >>    needs to have
> >>     >>>> an integration testing framework with capabilities listed by you.
> >>     >>>>
> >>     >>>> As we discussed privately, I would only check if instead of
> >>     >>>> Confluent's Ducktape library, we can use an integration
> >>    testing framework
> >>     >>>> developed by GridGain for testing of Ignite/GridGain clusters.
> >>    That
> >>     >>>> framework has been battle-tested and might be more convenient for
> >>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> >>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>> who
> >>    promised to join this thread once he finishes
> >>     >>>> preparing the usage examples of the framework. To my
> >>    knowledge, Max has
> >>     >>>> already been working on that for several days.
> >>     >>>>
> >>     >>>> -
> >>     >>>> Denis
> >>     >>>>
> >>     >>>>
> >>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>    <nizhikov@apache.org <ma...@apache.org>>
> >>     >>>> wrote:
> >>     >>>>
> >>     >>>>> Hello, Igniters.
> >>     >>>>>
> >>     >>>>> I created a PoC [1] for the integration tests of Ignite.
> >>     >>>>>
> >>     >>>>> Let me briefly explain the gap I want to cover:
> >>     >>>>>
> >>     >>>>> 1. For now, we don’t have a solution for automated testing of
> >>    Ignite on
> >>     >>>>> «real cluster».
> >>     >>>>> By «real cluster» I mean cluster «like a production»:
> >>     >>>>>       * client and server nodes deployed on different hosts.
> >>     >>>>>       * thin clients perform queries from some other hosts
> >>     >>>>>       * etc.
> >>     >>>>>
> >>     >>>>> 2. We don’t have a solution for automated benchmarks of some
> >>    internal
> >>     >>>>> Ignite process
> >>     >>>>>       * PME
> >>     >>>>>       * rebalance.
> >>     >>>>> This means we don’t know - Do we perform rebalance(or PME) in
> >>    2.7.0 faster
> >>     >>>>> or slower than in 2.8.0 for the same cluster?
> >>     >>>>>
> >>     >>>>> 3. We don’t have a solution for automated testing of Ignite
> >>    integration in
> >>     >>>>> a real-world environment:
> >>     >>>>> Ignite-Spark integration can be taken as an example.
> >>     >>>>> I think some ML solutions also should be tested in real-world
> >>    deployments.
> >>     >>>>>
> >>     >>>>> Solution:
> >>     >>>>>
> >>     >>>>> I propose to use duck tape library from confluent (apache 2.0
> >>    license)
> >>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on the
> >>    local
> >>     >>>>> environment(docker) and it works just fine.
> >>     >>>>>
> >>     >>>>> PoC contains following services:
> >>     >>>>>
> >>     >>>>>       * Simple rebalance test:
> >>     >>>>>               Start 2 server nodes,
> >>     >>>>>               Create some data with Ignite client,
> >>     >>>>>               Start one more server node,
> >>     >>>>>               Wait for rebalance finish
> >>     >>>>>       * Simple Ignite-Spark integration test:
> >>     >>>>>               Start 1 Spark master, start 1 Spark worker,
> >>     >>>>>               Start 1 Ignite server node
> >>     >>>>>               Create some data with Ignite client,
> >>     >>>>>               Check data in application that queries it from
> >>    Spark.
> >>     >>>>>
> >>     >>>>> All tests are fully automated.
> >>     >>>>> Logs collection works just fine.
> >>     >>>>> You can see an example of the tests report - [4].
> >>     >>>>>
> >>     >>>>> Pros:
> >>     >>>>>
> >>     >>>>> * Ability to test local changes(no need to public changes to
> >>    some remote
> >>     >>>>> repository or similar).
> >>     >>>>> * Ability to parametrize test environment(run the same tests
> >>    on different
> >>     >>>>> JDK, JVM params, config, etc.)
> >>     >>>>> * Isolation by default so system tests are as reliable as
> >>    possible.
> >>     >>>>> * Utilities for pulling up and tearing down services easily
> >>    in clusters in
> >>     >>>>> different environments (e.g. local, custom cluster, Vagrant,
> >>    K8s, Mesos,
> >>     >>>>> Docker, cloud providers, etc.)
> >>     >>>>> * Easy to write unit tests for distributed systems
> >>     >>>>> * Adopted and successfully used by other distributed open
> >>    source project -
> >>     >>>>> Apache Kafka.
> >>     >>>>> * Collect results (e.g. logs, console output)
> >>     >>>>> * Report results (e.g. expected conditions met, performance
> >>    results, etc.)
> >>     >>>>>
> >>     >>>>> WDYT?
> >>     >>>>>
> >>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>     >>>>> [2] https://github.com/confluentinc/ducktape
> >>     >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

Max,
Thanks for joining us.

> 1. tiden can deploy artifacts by itself, while ducktape relies on
> dependencies being deployed by external scripts.
No. It is important to distinguish development, deploy, and orchestration.
All-in-one solutions have extremely limited usability.
As to Ducktests:
Docker is responsible for deployments during development.
CI/CD is responsible for deployments during release and nightly checks.
It's up to the team to chose AWS, VM, BareMetal, and even OS.
Ducktape is responsible for orchestration.

> 2. tiden can execute actions over remote nodes in real parallel fashion,
>while ducktape internally does all actions sequentially.
No. Ducktape may start any service in parallel. See Pme-free benchmark [1]
for details.

> if we used ducktape solution we would have to instead prepare some
> deployment scripts to pre-initialize Sberbank hosts, for example, with
> Ansible or Chef.
Sure, because a way of deploy depends on infrastructure.
How can we be sure that OS we use and the restrictions we have will be
compatible with Tiden?

> You have solved this deficiency with docker by putting all dependencies
> into one uber-image ...
and
> I guess we all know about docker hyped ability to run over distributed
>virtual networks.
It is very important not to confuse the test's development (docker
image you're talking about) and real deployment.

> If we had stopped and started 5 nodes one-by-one, as ducktape does
All actions can be performed in parallel.
See how Ducktests [2] starts cluster in parallel for example.

[1]
https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
[2]
https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79

On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <ni...@apache.org> wrote:

> Hello, Maxim.
>
> > 1. tiden can deploy artifacts by itself, while ducktape relies on
> dependencies being deployed by external scripts
>
> Why do you think that maintaining deploy scripts coupled with the testing
> framework is an advantage?
> I thought we want to see and maintain deployment scripts separate from the
> testing framework.
>
> > 2. tiden can execute actions over remote nodes in real parallel fashion,
> while ducktape internally does all actions sequentially.
>
> Can you, please, clarify, what actions do you have in mind?
> And why we want to execute them concurrently?
> Ignite node start, Client application execution can be done concurrently
> with the ducktape approach.
>
> > If we used ducktape solution we would have to instead prepare some
> deployment scripts to pre-initialize Sberbank hosts, for example, with
> Ansible or Chef
>
> We shouldn’t take some user approach as an argument in this discussion.
> Let’s discuss a general approach for all users of the Ignite. Anyway, what
> is wrong with the external deployment script approach?
>
> We, as a community, should provide several ways to run integration tests
> out-of-the-box AND the ability to customize deployment regarding the user
> landscape.
>
> > You have solved this deficiency with docker by putting all dependencies
> into one uber-image and that looks like simple and elegant solution
> however, that effectively limits you to single-host testing.
>
> Docker image should be used only by the Ignite developers to test
> something locally.
> It’s not intended for some real-world testing.
>
> The main issue with the Tiden that I see, it tested and maintained as a
> closed source solution.
> This can lead to the hard to solve problems when we start using and
> maintaining it as an open-source solution.
> Like, how many developers used Tiden? And how many of developers were not
> authors of the Tiden itself?
>
>
> > 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru> написал(а):
> >
> > Anton, Nikolay,
> >
> > Let's agree on what we are arguing about: whether it is about "like or
> don't like" or about technical properties of suggested solutions.
> >
> > If it is about likes and dislikes, then the whole discussion is
> meaningless. However, I hope together we can analyse pros and cons
> carefully.
> >
> > As far as I can understand now, two main differences between ducktape
> and tiden is that:
> >
> > 1. tiden can deploy artifacts by itself, while ducktape relies on
> dependencies being deployed by external scripts.
> >
> > 2. tiden can execute actions over remote nodes in real parallel fashion,
> while ducktape internally does all actions sequentially.
> >
> > As for me, these are very important properties for distributed testing
> framework.
> >
> > First property let us easily reuse tiden in existing infrastructures,
> for example, during Zookeeper IEP testing at Sberbank site we used the same
> tiden scripts that we use in our lab, the only change was putting a list of
> hosts into config.
> >
> > If we used ducktape solution we would have to instead prepare some
> deployment scripts to pre-initialize Sberbank hosts, for example, with
> Ansible or Chef.
> >
> >
> > You have solved this deficiency with docker by putting all dependencies
> into one uber-image and that looks like simple and elegant solution,
> > however, that effectively limits you to single-host testing.
> >
> > I guess we all know about docker hyped ability to run over distributed
> virtual networks. We used to go that way, but quickly found that it is more
> of the hype than real work. In real environments, there are problems with
> routing, DNS, multicast and broadcast traffic, and many others, that turn
> docker-based distributed solution into a fragile hard-to-maintain monster.
> >
> > Please, if you believe otherwise, perform a run of your PoC over at
> least two physical hosts and share results with us.
> >
> > If you consider that one physical docker host is enough, please, don't
> overlook that we want to run real scale scenarios, with 50-100 cache
> groups, persistence enabled and a millions of keys loaded.
> >
> > Practical limit for such configurations is 4-6 nodes per single physical
> host. Otherwise, tests become flaky due to resource starvation.
> >
> > Please, if you believe otherwise, perform at least a 10 of runs of your
> PoC with other tests running at TC (we're targeting TeamCity, right?) and
> share results so we could check if the numbers are reproducible.
> >
> > I stress this once more: functional integration tests are OK to run in
> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >
> >
> > Second property let us write tests that require real-parallel actions
> over hosts.
> >
> > For example, agreed scenario for PME benchmarkduring "PME optimization
> stream" was as follows:
> >
> >  - 10 server nodes, preloaded with 1M of keys
> >  - 4 client nodes perform transactional load  (client nodes physically
> separated from server nodes)
> >  - during load:
> >  -- 5 server nodes stopped in parallel
> >  -- after 1 minute, all 5 nodes are started in parallel
> >  - load stopped, logs are analysed for exchange times.
> >
> > If we had stopped and started 5 nodes one-by-one, as ducktape does, then
> partition map exchange merge would not happen and we could not have
> measured PME optimizations for that case.
> >
> >
> > These are limitations of ducktape that we believe as a more important
> > argument "against" than you provide "for".
> >
> >
> >
> >
> > On 30.06.2020 14:58, Anton Vinogradov wrote:
> >> Folks,
> >> First, I've created PR [1] with ducktests improvements
> >> PR contains the following changes
> >> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
> >> - Global refactoring
> >> -- benchmarks javacode simplification
> >> -- services python and java classes code deduplication
> >> -- fail-fast checks for java and python (eg. application should
> explicitly write it finished with success)
> >> -- simple results extraction from tests and benchmarks
> >> -- javacode now configurable from tests/benchmarks
> >> -- proper SIGTERM handling at javacode (eg. it may finish last
> operation and log results)
> >> -- docker volume now marked as delegated to increase execution speed
> for mac & win users
> >> -- Ignite cluster now start in parallel (start speed-up)
> >> -- Ignite can be configured at test/benchmark
> >> - full and module assembly scripts added
> > Great job done! But let me remind one of Apache Ignite principles:
> > week of thinking save months of development.
> >
> >
> >> Second, I'd like to propose to accept ducktests [2] (ducktape
> integration) as a target "PoC check & real topology benchmarking tool".
> >> Ducktape pros
> >> - Developed for distributed system by distributed system developers.
> > So does Tiden
> >
> >> - Developed since 2014, stable.
> > Tiden is also pretty stable, and development start date is not a good
> argument, for example pytest is since 2004, pytest-xdist (plugin for
> distributed testing) is since 2010, but we don't see it as a alternative at
> all.
> >
> >> - Proven usability by usage at Kafka.
> > Tiden is proven usable by usage at GridGain and Sberbank deployments.
> > Core, storage, sql and tx teams use benchmark results provided by Tiden
> on a daily basis.
> >
> >> - Dozens of dozens tests and benchmarks at Kafka as a great example
> pack.
> > We'll donate some of our suites to Ignite as I've mentioned in previous
> letter.
> >
> >> - Built-in Docker support for rapid development and checks.
> > False, there's no specific 'docker support' in ducktape itself, you just
> wrap it in docker by yourself, because ducktape is lacking deployment
> abilities.
> >
> >> - Great for CI automation.
> > False, there's no specific CI-enabled features in ducktape. Tiden, on
> the other hand, provide generic xUnit reporting format, which is supported
> by both TeamCity and Jenkins. Also, instead of using private keys, Tiden
> can use SSH agent, which is also great for CI, because both
> > TeamCity and Jenkins store keys in secret storage available only for
> ssh-agent and only for the time of the test.
> >
> >
> >> > As an additional motivation, at least 3 teams
> >> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
> speed-up))
> >> - Ignite SE Plugins team (to check plugin's features does not slow-down
> or broke AI features)
> >> - Ignite SE QA team (to append already developed smoke/load/failover
> tests to AI codebase)
> >
> > Please, before recommending your tests to other teams, provide proofs
> > that your tests are reproducible in real environment.
> >
> >
> >> now, wait for ducktest merge to start checking cases they working on in
> AI way.
> >> Thoughts?
> > Let us together review both solutions, we'll try to run your tests in
> our lab, and you'll try to at least checkout tiden and see if same tests
> can be implemented with it?
> >
> >
> >
> >> [1] https://github.com/apache/ignite/pull/7967
> >> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org
> <ma...@apache.org>> wrote:
> >>    Hello, Maxim.
> >>    Thank you for so detailed explanation.
> >>    Can we put the content of this discussion somewhere on the wiki?
> >>    So It doesn’t get lost.
> >>    I divide the answer in several parts. From the requirements to the
> >>    implementation.
> >>    So, if we agreed on the requirements we can proceed with the
> >>    discussion of the implementation.
> >>    1. Requirements:
> >>    The main goal I want to achieve is *reproducibility* of the tests.
> >>    I’m sick and tired with the zillions of flaky, rarely failed, and
> >>    almost never failed tests in Ignite codebase.
> >>    We should start with the simplest scenarios that will be as reliable
> >>    as steel :)
> >>    I want to know for sure:
> >>       - Is this PR makes rebalance quicker or not?
> >>       - Is this PR makes PME quicker or not?
> >>    So, your description of the complex test scenario looks as a next
> >>    step to me.
> >>    Anyway, It’s cool we already have one.
> >>    The second goal is to have a strict test lifecycle as we have in
> >>    JUnit and similar frameworks.
> >>     > It covers production-like deployment and running a scenarios over
> >>    a single database instance.
> >>    Do you mean «single cluster» or «single host»?
> >>    2. Existing tests:
> >>     > A Combinator suite allows to run set of operations concurrently
> >>    over given database instance.
> >>     > A Consumption suite allows to run a set production-like actions
> >>    over given set of Ignite/GridGain versions and compare test metrics
> >>    across versions
> >>     > A Yardstick suite
> >>     > A Stress suite that simulates hardware environment degradation
> >>     > An Ultimate, DR and Compatibility suites that performs functional
> >>    regression testing
> >>     > Regression
> >>    Great news that we already have so many choices for testing!
> >>    Mature test base is a big +1 for Tiden.
> >>    3. Comparison:
> >>     > Criteria: Test configuration
> >>     > Ducktape: single JSON string for all tests
> >>     > Tiden: any number of YaML config files, command line option for
> >>    fine-grained test configuration, ability to select/modify tests
> >>    behavior based on Ignite version.
> >>    1. Many YAML files can be hard to maintain.
> >>    2. In ducktape, you can set parameters via «—parameters» option.
> >>    Please, take a look at the doc [1]
> >>     > Criteria: Cluster control
> >>     > Tiden: additionally can address cluster as a whole and execute
> >>    remote commands in parallel.
> >>    It seems we implement this ability in the PoC, already.
> >>     > Criteria: Test assertions
> >>     > Tiden: simple asserts, also few customized assertion helpers.
> >>     > Ducktape: simple asserts.
> >>    Can you, please, be more specific.
> >>    What helpers do you have in mind?
> >>    Ducktape has an asserts that waits for logfile messages or some
> >>    process finish.
> >>     > Criteria: Test reporting
> >>     > Ducktape: limited to its own text/HTML format
> >>    Ducktape have
> >>    1. Text reporter
> >>    2. Customizable HTML reporter
> >>    3. JSON reporter.
> >>    We can show JSON with the any template or tool.
> >>     > Criteria: Provisioning and deployment
> >>     > Ducktape: can provision subset of hosts from cluster for test
> >>    needs. However, that means, that test can’t be scaled without test
> >>    code changes. Does not do any deploy, relies on external means, e.g.
> >>    pre-packaged in docker image, as in PoC.
> >>    This is not true.
> >>    1. We can set explicit test parameters(node number) via parameters.
> >>    We can increase client count of cluster size without test code
> changes.
> >>    2. We have many choices for the test environment. These choices are
> >>    tested and used in other projects:
> >>             * docker
> >>             * vagrant
> >>             * private cloud(ssh access)
> >>             * ec2
> >>    Please, take a look at Kafka documentation [2]
> >>     > I can continue more on this, but it should be enough for now:
> >>    We need to go deeper! :)
> >>    [1]
> >>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>    [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
> >>    <ma...@yandex.ru>> написал(а):
> >>     >
> >>     > Greetings, Nikolay,
> >>     >
> >>     > First of all, thank you for you great effort preparing PoC of
> >>    integration testing to Ignite community.
> >>     >
> >>     > It’s a shame Ignite did not have at least some such tests yet,
> >>    however, GridGain, as a major contributor to Apache Ignite had a
> >>    profound collection of in-house tools to perform integration and
> >>    performance testing for years already and while we slowly consider
> >>    sharing our expertise with the community, your initiative makes us
> >>    drive that process a bit faster, thanks a lot!
> >>     >
> >>     > I reviewed your PoC and want to share a little about what we do
> >>    on our part, why and how, hope it would help community take proper
> >>    course.
> >>     >
> >>     > First I’ll do a brief overview of what decisions we made and what
> >>    we do have in our private code base, next I’ll describe what we have
> >>    already donated to the public and what we plan public next, then
> >>    I’ll compare both approaches highlighting deficiencies in order to
> >>    spur public discussion on the matter.
> >>     >
> >>     > It might seem strange to use Python to run Bash to run Java
> >>    applications because that introduces IT industry best of breed’ –
> >>    the Python dependency hell – to the Java application code base. The
> >>    only strangest decision one can made is to use Maven to run Docker
> >>    to run Bash to run Python to run Bash to run Java, but desperate
> >>    times call for desperate measures I guess.
> >>     >
> >>     > There are Java-based solutions for integration testing exists,
> >>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go well
> >>    for Ignite community CI pipelines by them selves. But we also wanted
> >>    to run performance tests and benchmarks, like the dreaded PME
> >>    benchmark, and this is solved by totally different set of tools in
> >>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>     >
> >>     > Speaking specifically about benchmarking, Apache Ignite community
> >>    already has Yardstick [6], and there’s nothing wrong with writing
> >>    PME benchmark using Yardstick, but we also wanted to be able to run
> >>    scenarios like this:
> >>     > - put an X load to a Ignite database;
> >>     > - perform an Y set of operations to check how Ignite copes with
> >>    operations under load.
> >>     >
> >>     > And yes, we also wanted applications under test be deployed ‘like
> >>    in a production’, e.g. distributed over a set of hosts. This arises
> >>    questions about provisioning and nodes affinity which I’ll cover in
> >>    detail later.
> >>     >
> >>     > So we decided to put a little effort to build a simple tool to
> >>    cover different integration and performance scenarios, and our QA
> >>    lab first attempt was PoC-Tester [7], currently open source for all
> >>    but for reporting web UI. It’s a quite simple to use 95% Java-based
> >>    tool targeted to be run on a pre-release QA stage.
> >>     >
> >>     > It covers production-like deployment and running a scenarios over
> >>    a single database instance. PoC-Tester scenarios consists of a
> >>    sequence of tasks running sequentially or in parallel. After all
> >>    tasks complete, or at any time during test, user can run logs
> >>    collection task, logs are checked against exceptions and a summary
> >>    of found issues and task ops/latency statistics is generated at the
> >>    end of scenario. One of the main PoC-Tester features is its
> >>    fire-and-forget approach to task managing. That is, you can deploy a
> >>    grid and left it running for weeks, periodically firing some tasks
> >>    onto it.
> >>     >
> >>     > During earliest stages of PoC-Tester development it becomes quite
> >>    clear that Java application development is a tedious process and
> >>    architecture decisions you take during development are slow and hard
> >>    to change.
> >>     > For example, scenarios like this
> >>     > - deploy two instances of GridGain with master-slave data
> >>    replication configured;
> >>     > - put a load on master;
> >>     > - perform checks on slave,
> >>     > or like this:
> >>     > - preload a 1Tb of data by using your favorite tool of choice to
> >>    an Apache Ignite of version X;
> >>     > - run a set of functional tests running Apache Ignite version Y
> >>    over preloaded data,
> >>     > do not fit well in the PoC-Tester workflow.
> >>     >
> >>     > So, this is why we decided to use Python as a generic scripting
> >>    language of choice.
> >>     >
> >>     > Pros:
> >>     > - quicker prototyping and development cycles
> >>     > - easier to find DevOps/QA engineer with Python skills than one
> >>    with Java skills
> >>     > - used extensively all over the world for DevOps/CI pipelines and
> >>    thus has rich set of libraries for all possible integration uses
> cases.
> >>     >
> >>     > Cons:
> >>     > - Nightmare with dependencies. Better stick to specific
> >>    language/libraries version.
> >>     >
> >>     > Comparing alternatives for Python-based testing framework we have
> >>    considered following requirements, somewhat similar to what you’ve
> >>    mentioned for Confluent [8] previously:
> >>     > - should be able run locally or distributed (bare metal or in the
> >>    cloud)
> >>     > - should have built-in deployment facilities for applications
> >>    under test
> >>     > - should separate test configuration and test code
> >>     > -- be able to easily reconfigure tests by simple configuration
> >>    changes
> >>     > -- be able to easily scale test environment by simple
> >>    configuration changes
> >>     > -- be able to perform regression testing by simple switching
> >>    artifacts under test via configuration
> >>     > -- be able to run tests with different JDK version by simple
> >>    configuration changes
> >>     > - should have human readable reports and/or reporting tools
> >>    integration
> >>     > - should allow simple test progress monitoring, one does not want
> >>    to run 6-hours test to find out that application actually crashed
> >>    during first hour.
> >>     > - should allow parallel execution of test actions
> >>     > - should have clean API for test writers
> >>     > -- clean API for distributed remote commands execution
> >>     > -- clean API for deployed applications start / stop and other
> >>    operations
> >>     > -- clean API for performing check on results
> >>     > - should be open source or at least source code should allow ease
> >>    change or extension
> >>     >
> >>     > Back at that time we found no better alternative than to write
> >>    our own framework, and here goes Tiden [9] as GridGain framework of
> >>    choice for functional integration and performance testing.
> >>     >
> >>     > Pros:
> >>     > - solves all the requirements above
> >>     > Cons (for Ignite):
> >>     > - (currently) closed GridGain source
> >>     >
> >>     > On top of Tiden we’ve built a set of test suites, some of which
> >>    you might have heard already.
> >>     >
> >>     > A Combinator suite allows to run set of operations concurrently
> >>    over given database instance. Proven to find at least 30+ race
> >>    conditions and NPE issues.
> >>     >
> >>     > A Consumption suite allows to run a set production-like actions
> >>    over given set of Ignite/GridGain versions and compare test metrics
> >>    across versions, like heap/disk/CPU consumption, time to perform
> >>    actions, like client PME, server PME, rebalancing time, data
> >>    replication time, etc.
> >>     >
> >>     > A Yardstick suite is a thin layer of Python glue code to run
> >>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
> >>    mediocre deployment capabilities, Tiden solves this easily.
> >>     >
> >>     > A Stress suite that simulates hardware environment degradation
> >>    during testing.
> >>     >
> >>     > An Ultimate, DR and Compatibility suites that performs functional
> >>    regression testing of GridGain Ultimate Edition features like
> >>    snapshots, security, data replication, rolling upgrades, etc.
> >>     >
> >>     > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
> >>    etc, etc, etc.
> >>     >
> >>     > Most of the suites above use another in-house developed Java tool
> >>    – PiClient – to perform actual loading and miscellaneous operations
> >>    with Ignite under test. We use py4j Python-Java gateway library to
> >>    control PiClient instances from the tests.
> >>     >
> >>     > When we considered CI, we put TeamCity out of scope, because
> >>    distributed integration and performance tests tend to run for hours
> >>    and TeamCity agents are scarce and costly resource. So, bundled with
> >>    Tiden there is jenkins-job-builder [10] based CI pipelines and
> >>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates test
> >>    run reports across versions and has built in visualization support
> >>    for Combinator suite.
> >>     >
> >>     > All of the above is currently closed source, but we plan to make
> >>    it public for community, and publishing Tiden core [9] is the first
> >>    step on that way. You can review some examples of using Tiden for
> >>    tests at my repository [11], for start.
> >>     >
> >>     > Now, let’s compare Ducktape PoC and Tiden.
> >>     >
> >>     > Criteria: Language
> >>     > Tiden: Python, 3.7
> >>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>    compatible, but actually can’t work with Python 3.7 due to broken
> >>    Zmq dependency.
> >>     > Comment: Python 3.7 has a much better support for async-style
> >>    code which might be crucial for distributed application testing.
> >>     > Score: Tiden: 1, Ducktape: 0
> >>     >
> >>     > Criteria: Test writers API
> >>     > Supported integration test framework concepts are basically the
> same:
> >>     > - a test controller (test runner)
> >>     > - a cluster
> >>     > - a node
> >>     > - an application (a service in Ducktape terms)
> >>     > - a test
> >>     > Score: Tiden: 5, Ducktape: 5
> >>     >
> >>     > Criteria: Tests selection and run
> >>     > Ducktape: suite-package-class-method level selection, internal
> >>    scheduler allows to run tests in suite in parallel.
> >>     > Tiden: also suite-package-class-method level selection,
> >>    additionally allows selecting subset of tests by attribute, parallel
> >>    runs not built in, but allows merging test reports after different
> runs.
> >>     > Score: Tiden: 2, Ducktape: 2
> >>     >
> >>     > Criteria: Test configuration
> >>     > Ducktape: single JSON string for all tests
> >>     > Tiden: any number of YaML config files, command line option for
> >>    fine-grained test configuration, ability to select/modify tests
> >>    behavior based on Ignite version.
> >>     > Score: Tiden: 3, Ducktape: 1
> >>     >
> >>     > Criteria: Cluster control
> >>     > Ducktape: allow execute remote commands by node granularity
> >>     > Tiden: additionally can address cluster as a whole and execute
> >>    remote commands in parallel.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > Criteria: Logs control
> >>     > Both frameworks have similar builtin support for remote logs
> >>    collection and grepping. Tiden has built-in plugin that can zip,
> >>    collect arbitrary log files from arbitrary locations at
> >>    test/module/suite granularity and unzip if needed, also application
> >>    API to search / wait for messages in logs. Ducktape allows each
> >>    service declare its log files location (seemingly does not support
> >>    logs rollback), and a single entrypoint to collect service logs.
> >>     > Score: Tiden: 1, Ducktape: 1
> >>     >
> >>     > Criteria: Test assertions
> >>     > Tiden: simple asserts, also few customized assertion helpers.
> >>     > Ducktape: simple asserts.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > Criteria: Test reporting
> >>     > Ducktape: limited to its own text/html format
> >>     > Tiden: provides text report, yaml report for reporting tools
> >>    integration, XML xUnit report for integration with Jenkins/TeamCity.
> >>     > Score: Tiden: 3, Ducktape: 1
> >>     >
> >>     > Criteria: Provisioning and deployment
> >>     > Ducktape: can provision subset of hosts from cluster for test
> >>    needs. However, that means, that test can’t be scaled without test
> >>    code changes. Does not do any deploy, relies on external means, e.g.
> >>    pre-packaged in docker image, as in PoC.
> >>     > Tiden: Given a set of hosts, Tiden uses all of them for the test.
> >>    Provisioning should be done by external means. However, provides a
> >>    conventional automated deployment routines.
> >>     > Score: Tiden: 1, Ducktape: 1
> >>     >
> >>     > Criteria: Documentation and Extensibility
> >>     > Tiden: current API documentation is limited, should change as we
> >>    go open source. Tiden is easily extensible via hooks and plugins,
> >>    see example Maven plugin and Gatling application at [11].
> >>     > Ducktape: basic documentation at readthedocs.io
> >>    <http://readthedocs.io>. Codebase is rigid, framework core is
> >>    tightly coupled and hard to change. The only possible extension
> >>    mechanism is fork-and-rewrite.
> >>     > Score: Tiden: 2, Ducktape: 1
> >>     >
> >>     > I can continue more on this, but it should be enough for now:
> >>     > Overall score: Tiden: 22, Ducktape: 14.
> >>     >
> >>     > Time for discussion!
> >>     >
> >>     > ---
> >>     > [1] - https://www.testcontainers.org/
> >>     > [2] - http://arquillian.org/guides/getting_started/
> >>     > [3] - https://jmeter.apache.org/index.html
> >>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>     > [5] - https://gatling.io/docs/current/
> >>     > [6] - https://github.com/gridgain/yardstick
> >>     > [7] - https://github.com/gridgain/poc-tester
> >>     > [8] -
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>     > [9] - https://github.com/gridgain/tiden
> >>     > [10] - https://pypi.org/project/jenkins-job-builder/
> >>     > [11] - https://github.com/mshonichev/tiden_examples
> >>     >
> >>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>     >> Hello,
> >>     >>
> >>     >> Branch with duck tape created -
> >>    https://github.com/apache/ignite/tree/ignite-ducktape
> >>     >>
> >>     >> Any who are willing to contribute to PoC are welcome.
> >>     >>
> >>     >>
> >>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>    <nizhikov.dev@gmail.com <ma...@gmail.com>> написал(а):
> >>     >>>
> >>     >>> Hello, Denis.
> >>     >>>
> >>     >>> There is no rush with these improvements.
> >>     >>> We can wait for Maxim proposal and compare two solutions :)
> >>     >>>
> >>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
> >>    <ma...@apache.org>> написал(а):
> >>     >>>>
> >>     >>>> Hi Nikolay,
> >>     >>>>
> >>     >>>> Thanks for kicking off this conversation and sharing your
> >>    findings with the
> >>     >>>> results. That's the right initiative. I do agree that Ignite
> >>    needs to have
> >>     >>>> an integration testing framework with capabilities listed by
> you.
> >>     >>>>
> >>     >>>> As we discussed privately, I would only check if instead of
> >>     >>>> Confluent's Ducktape library, we can use an integration
> >>    testing framework
> >>     >>>> developed by GridGain for testing of Ignite/GridGain clusters.
> >>    That
> >>     >>>> framework has been battle-tested and might be more convenient
> for
> >>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> >>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>> who
> >>    promised to join this thread once he finishes
> >>     >>>> preparing the usage examples of the framework. To my
> >>    knowledge, Max has
> >>     >>>> already been working on that for several days.
> >>     >>>>
> >>     >>>> -
> >>     >>>> Denis
> >>     >>>>
> >>     >>>>
> >>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>    <nizhikov@apache.org <ma...@apache.org>>
> >>     >>>> wrote:
> >>     >>>>
> >>     >>>>> Hello, Igniters.
> >>     >>>>>
> >>     >>>>> I created a PoC [1] for the integration tests of Ignite.
> >>     >>>>>
> >>     >>>>> Let me briefly explain the gap I want to cover:
> >>     >>>>>
> >>     >>>>> 1. For now, we don’t have a solution for automated testing of
> >>    Ignite on
> >>     >>>>> «real cluster».
> >>     >>>>> By «real cluster» I mean cluster «like a production»:
> >>     >>>>>       * client and server nodes deployed on different hosts.
> >>     >>>>>       * thin clients perform queries from some other hosts
> >>     >>>>>       * etc.
> >>     >>>>>
> >>     >>>>> 2. We don’t have a solution for automated benchmarks of some
> >>    internal
> >>     >>>>> Ignite process
> >>     >>>>>       * PME
> >>     >>>>>       * rebalance.
> >>     >>>>> This means we don’t know - Do we perform rebalance(or PME) in
> >>    2.7.0 faster
> >>     >>>>> or slower than in 2.8.0 for the same cluster?
> >>     >>>>>
> >>     >>>>> 3. We don’t have a solution for automated testing of Ignite
> >>    integration in
> >>     >>>>> a real-world environment:
> >>     >>>>> Ignite-Spark integration can be taken as an example.
> >>     >>>>> I think some ML solutions also should be tested in real-world
> >>    deployments.
> >>     >>>>>
> >>     >>>>> Solution:
> >>     >>>>>
> >>     >>>>> I propose to use duck tape library from confluent (apache 2.0
> >>    license)
> >>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on the
> >>    local
> >>     >>>>> environment(docker) and it works just fine.
> >>     >>>>>
> >>     >>>>> PoC contains following services:
> >>     >>>>>
> >>     >>>>>       * Simple rebalance test:
> >>     >>>>>               Start 2 server nodes,
> >>     >>>>>               Create some data with Ignite client,
> >>     >>>>>               Start one more server node,
> >>     >>>>>               Wait for rebalance finish
> >>     >>>>>       * Simple Ignite-Spark integration test:
> >>     >>>>>               Start 1 Spark master, start 1 Spark worker,
> >>     >>>>>               Start 1 Ignite server node
> >>     >>>>>               Create some data with Ignite client,
> >>     >>>>>               Check data in application that queries it from
> >>    Spark.
> >>     >>>>>
> >>     >>>>> All tests are fully automated.
> >>     >>>>> Logs collection works just fine.
> >>     >>>>> You can see an example of the tests report - [4].
> >>     >>>>>
> >>     >>>>> Pros:
> >>     >>>>>
> >>     >>>>> * Ability to test local changes(no need to public changes to
> >>    some remote
> >>     >>>>> repository or similar).
> >>     >>>>> * Ability to parametrize test environment(run the same tests
> >>    on different
> >>     >>>>> JDK, JVM params, config, etc.)
> >>     >>>>> * Isolation by default so system tests are as reliable as
> >>    possible.
> >>     >>>>> * Utilities for pulling up and tearing down services easily
> >>    in clusters in
> >>     >>>>> different environments (e.g. local, custom cluster, Vagrant,
> >>    K8s, Mesos,
> >>     >>>>> Docker, cloud providers, etc.)
> >>     >>>>> * Easy to write unit tests for distributed systems
> >>     >>>>> * Adopted and successfully used by other distributed open
> >>    source project -
> >>     >>>>> Apache Kafka.
> >>     >>>>> * Collect results (e.g. logs, console output)
> >>     >>>>> * Report results (e.g. expected conditions met, performance
> >>    results, etc.)
> >>     >>>>>
> >>     >>>>> WDYT?
> >>     >>>>>
> >>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>     >>>>> [2] https://github.com/confluentinc/ducktape
> >>     >>>>> [3]
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Maxim.

> 1. tiden can deploy artifacts by itself, while ducktape relies on dependencies being deployed by external scripts

Why do you think that maintaining deploy scripts coupled with the testing framework is an advantage?
I thought we want to see and maintain deployment scripts separate from the testing framework.

> 2. tiden can execute actions over remote nodes in real parallel fashion, while ducktape internally does all actions sequentially.

Can you, please, clarify, what actions do you have in mind?
And why we want to execute them concurrently?
Ignite node start, Client application execution can be done concurrently with the ducktape approach.

> If we used ducktape solution we would have to instead prepare some deployment scripts to pre-initialize Sberbank hosts, for example, with Ansible or Chef

We shouldn’t take some user approach as an argument in this discussion. Let’s discuss a general approach for all users of the Ignite. Anyway, what is wrong with the external deployment script approach?

We, as a community, should provide several ways to run integration tests out-of-the-box AND the ability to customize deployment regarding the user landscape.

> You have solved this deficiency with docker by putting all dependencies into one uber-image and that looks like simple and elegant solution however, that effectively limits you to single-host testing.

Docker image should be used only by the Ignite developers to test something locally.
It’s not intended for some real-world testing.

The main issue with the Tiden that I see, it tested and maintained as a closed source solution.
This can lead to the hard to solve problems when we start using and maintaining it as an open-source solution.
Like, how many developers used Tiden? And how many of developers were not authors of the Tiden itself?


> 2 июля 2020 г., в 12:30, Max Shonichev <ms...@yandex.ru> написал(а):
> 
> Anton, Nikolay,
> 
> Let's agree on what we are arguing about: whether it is about "like or don't like" or about technical properties of suggested solutions.
> 
> If it is about likes and dislikes, then the whole discussion is meaningless. However, I hope together we can analyse pros and cons carefully.
> 
> As far as I can understand now, two main differences between ducktape and tiden is that:
> 
> 1. tiden can deploy artifacts by itself, while ducktape relies on dependencies being deployed by external scripts.
> 
> 2. tiden can execute actions over remote nodes in real parallel fashion, while ducktape internally does all actions sequentially.
> 
> As for me, these are very important properties for distributed testing framework.
> 
> First property let us easily reuse tiden in existing infrastructures, for example, during Zookeeper IEP testing at Sberbank site we used the same tiden scripts that we use in our lab, the only change was putting a list of hosts into config.
> 
> If we used ducktape solution we would have to instead prepare some deployment scripts to pre-initialize Sberbank hosts, for example, with Ansible or Chef.
> 
> 
> You have solved this deficiency with docker by putting all dependencies into one uber-image and that looks like simple and elegant solution,
> however, that effectively limits you to single-host testing.
> 
> I guess we all know about docker hyped ability to run over distributed virtual networks. We used to go that way, but quickly found that it is more of the hype than real work. In real environments, there are problems with routing, DNS, multicast and broadcast traffic, and many others, that turn docker-based distributed solution into a fragile hard-to-maintain monster.
> 
> Please, if you believe otherwise, perform a run of your PoC over at least two physical hosts and share results with us.
> 
> If you consider that one physical docker host is enough, please, don't overlook that we want to run real scale scenarios, with 50-100 cache groups, persistence enabled and a millions of keys loaded.
> 
> Practical limit for such configurations is 4-6 nodes per single physical host. Otherwise, tests become flaky due to resource starvation.
> 
> Please, if you believe otherwise, perform at least a 10 of runs of your PoC with other tests running at TC (we're targeting TeamCity, right?) and share results so we could check if the numbers are reproducible.
> 
> I stress this once more: functional integration tests are OK to run in Docker and CI, but running benchmarks in Docker is a big NO GO.
> 
> 
> Second property let us write tests that require real-parallel actions over hosts.
> 
> For example, agreed scenario for PME benchmarkduring "PME optimization stream" was as follows:
> 
>  - 10 server nodes, preloaded with 1M of keys
>  - 4 client nodes perform transactional load  (client nodes physically separated from server nodes)
>  - during load:
>  -- 5 server nodes stopped in parallel
>  -- after 1 minute, all 5 nodes are started in parallel
>  - load stopped, logs are analysed for exchange times.
> 
> If we had stopped and started 5 nodes one-by-one, as ducktape does, then partition map exchange merge would not happen and we could not have measured PME optimizations for that case.
> 
> 
> These are limitations of ducktape that we believe as a more important
> argument "against" than you provide "for".
> 
> 
> 
> 
> On 30.06.2020 14:58, Anton Vinogradov wrote:
>> Folks,
>> First, I've created PR [1] with ducktests improvements
>> PR contains the following changes
>> - Pme-free switch proof-benchmark (2.7.6 vs master)
>> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
>> - Global refactoring
>> -- benchmarks javacode simplification
>> -- services python and java classes code deduplication
>> -- fail-fast checks for java and python (eg. application should explicitly write it finished with success)
>> -- simple results extraction from tests and benchmarks
>> -- javacode now configurable from tests/benchmarks
>> -- proper SIGTERM handling at javacode (eg. it may finish last operation and log results)
>> -- docker volume now marked as delegated to increase execution speed for mac & win users
>> -- Ignite cluster now start in parallel (start speed-up)
>> -- Ignite can be configured at test/benchmark
>> - full and module assembly scripts added
> Great job done! But let me remind one of Apache Ignite principles:
> week of thinking save months of development.
> 
> 
>> Second, I'd like to propose to accept ducktests [2] (ducktape integration) as a target "PoC check & real topology benchmarking tool".
>> Ducktape pros
>> - Developed for distributed system by distributed system developers.
> So does Tiden
> 
>> - Developed since 2014, stable.
> Tiden is also pretty stable, and development start date is not a good argument, for example pytest is since 2004, pytest-xdist (plugin for distributed testing) is since 2010, but we don't see it as a alternative at all.
> 
>> - Proven usability by usage at Kafka.
> Tiden is proven usable by usage at GridGain and Sberbank deployments.
> Core, storage, sql and tx teams use benchmark results provided by Tiden on a daily basis.
> 
>> - Dozens of dozens tests and benchmarks at Kafka as a great example pack.
> We'll donate some of our suites to Ignite as I've mentioned in previous letter.
> 
>> - Built-in Docker support for rapid development and checks.
> False, there's no specific 'docker support' in ducktape itself, you just wrap it in docker by yourself, because ducktape is lacking deployment abilities.
> 
>> - Great for CI automation.
> False, there's no specific CI-enabled features in ducktape. Tiden, on the other hand, provide generic xUnit reporting format, which is supported by both TeamCity and Jenkins. Also, instead of using private keys, Tiden can use SSH agent, which is also great for CI, because both
> TeamCity and Jenkins store keys in secret storage available only for ssh-agent and only for the time of the test.
> 
> 
>> > As an additional motivation, at least 3 teams
>> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix speed-up))
>> - Ignite SE Plugins team (to check plugin's features does not slow-down or broke AI features)
>> - Ignite SE QA team (to append already developed smoke/load/failover tests to AI codebase)
> 
> Please, before recommending your tests to other teams, provide proofs
> that your tests are reproducible in real environment.
> 
> 
>> now, wait for ducktest merge to start checking cases they working on in AI way.
>> Thoughts?
> Let us together review both solutions, we'll try to run your tests in our lab, and you'll try to at least checkout tiden and see if same tests can be implemented with it?
> 
> 
> 
>> [1] https://github.com/apache/ignite/pull/7967
>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org <ma...@apache.org>> wrote:
>>    Hello, Maxim.
>>    Thank you for so detailed explanation.
>>    Can we put the content of this discussion somewhere on the wiki?
>>    So It doesn’t get lost.
>>    I divide the answer in several parts. From the requirements to the
>>    implementation.
>>    So, if we agreed on the requirements we can proceed with the
>>    discussion of the implementation.
>>    1. Requirements:
>>    The main goal I want to achieve is *reproducibility* of the tests.
>>    I’m sick and tired with the zillions of flaky, rarely failed, and
>>    almost never failed tests in Ignite codebase.
>>    We should start with the simplest scenarios that will be as reliable
>>    as steel :)
>>    I want to know for sure:
>>       - Is this PR makes rebalance quicker or not?
>>       - Is this PR makes PME quicker or not?
>>    So, your description of the complex test scenario looks as a next
>>    step to me.
>>    Anyway, It’s cool we already have one.
>>    The second goal is to have a strict test lifecycle as we have in
>>    JUnit and similar frameworks.
>>     > It covers production-like deployment and running a scenarios over
>>    a single database instance.
>>    Do you mean «single cluster» or «single host»?
>>    2. Existing tests:
>>     > A Combinator suite allows to run set of operations concurrently
>>    over given database instance.
>>     > A Consumption suite allows to run a set production-like actions
>>    over given set of Ignite/GridGain versions and compare test metrics
>>    across versions
>>     > A Yardstick suite
>>     > A Stress suite that simulates hardware environment degradation
>>     > An Ultimate, DR and Compatibility suites that performs functional
>>    regression testing
>>     > Regression
>>    Great news that we already have so many choices for testing!
>>    Mature test base is a big +1 for Tiden.
>>    3. Comparison:
>>     > Criteria: Test configuration
>>     > Ducktape: single JSON string for all tests
>>     > Tiden: any number of YaML config files, command line option for
>>    fine-grained test configuration, ability to select/modify tests
>>    behavior based on Ignite version.
>>    1. Many YAML files can be hard to maintain.
>>    2. In ducktape, you can set parameters via «—parameters» option.
>>    Please, take a look at the doc [1]
>>     > Criteria: Cluster control
>>     > Tiden: additionally can address cluster as a whole and execute
>>    remote commands in parallel.
>>    It seems we implement this ability in the PoC, already.
>>     > Criteria: Test assertions
>>     > Tiden: simple asserts, also few customized assertion helpers.
>>     > Ducktape: simple asserts.
>>    Can you, please, be more specific.
>>    What helpers do you have in mind?
>>    Ducktape has an asserts that waits for logfile messages or some
>>    process finish.
>>     > Criteria: Test reporting
>>     > Ducktape: limited to its own text/HTML format
>>    Ducktape have
>>    1. Text reporter
>>    2. Customizable HTML reporter
>>    3. JSON reporter.
>>    We can show JSON with the any template or tool.
>>     > Criteria: Provisioning and deployment
>>     > Ducktape: can provision subset of hosts from cluster for test
>>    needs. However, that means, that test can’t be scaled without test
>>    code changes. Does not do any deploy, relies on external means, e.g.
>>    pre-packaged in docker image, as in PoC.
>>    This is not true.
>>    1. We can set explicit test parameters(node number) via parameters.
>>    We can increase client count of cluster size without test code changes.
>>    2. We have many choices for the test environment. These choices are
>>    tested and used in other projects:
>>             * docker
>>             * vagrant
>>             * private cloud(ssh access)
>>             * ec2
>>    Please, take a look at Kafka documentation [2]
>>     > I can continue more on this, but it should be enough for now:
>>    We need to go deeper! :)
>>    [1]
>>    https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>>    [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>>     > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
>>    <ma...@yandex.ru>> написал(а):
>>     >
>>     > Greetings, Nikolay,
>>     >
>>     > First of all, thank you for you great effort preparing PoC of
>>    integration testing to Ignite community.
>>     >
>>     > It’s a shame Ignite did not have at least some such tests yet,
>>    however, GridGain, as a major contributor to Apache Ignite had a
>>    profound collection of in-house tools to perform integration and
>>    performance testing for years already and while we slowly consider
>>    sharing our expertise with the community, your initiative makes us
>>    drive that process a bit faster, thanks a lot!
>>     >
>>     > I reviewed your PoC and want to share a little about what we do
>>    on our part, why and how, hope it would help community take proper
>>    course.
>>     >
>>     > First I’ll do a brief overview of what decisions we made and what
>>    we do have in our private code base, next I’ll describe what we have
>>    already donated to the public and what we plan public next, then
>>    I’ll compare both approaches highlighting deficiencies in order to
>>    spur public discussion on the matter.
>>     >
>>     > It might seem strange to use Python to run Bash to run Java
>>    applications because that introduces IT industry best of breed’ –
>>    the Python dependency hell – to the Java application code base. The
>>    only strangest decision one can made is to use Maven to run Docker
>>    to run Bash to run Python to run Bash to run Java, but desperate
>>    times call for desperate measures I guess.
>>     >
>>     > There are Java-based solutions for integration testing exists,
>>    e.g. Testcontainers [1], Arquillian [2], etc, and they might go well
>>    for Ignite community CI pipelines by them selves. But we also wanted
>>    to run performance tests and benchmarks, like the dreaded PME
>>    benchmark, and this is solved by totally different set of tools in
>>    Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>>     >
>>     > Speaking specifically about benchmarking, Apache Ignite community
>>    already has Yardstick [6], and there’s nothing wrong with writing
>>    PME benchmark using Yardstick, but we also wanted to be able to run
>>    scenarios like this:
>>     > - put an X load to a Ignite database;
>>     > - perform an Y set of operations to check how Ignite copes with
>>    operations under load.
>>     >
>>     > And yes, we also wanted applications under test be deployed ‘like
>>    in a production’, e.g. distributed over a set of hosts. This arises
>>    questions about provisioning and nodes affinity which I’ll cover in
>>    detail later.
>>     >
>>     > So we decided to put a little effort to build a simple tool to
>>    cover different integration and performance scenarios, and our QA
>>    lab first attempt was PoC-Tester [7], currently open source for all
>>    but for reporting web UI. It’s a quite simple to use 95% Java-based
>>    tool targeted to be run on a pre-release QA stage.
>>     >
>>     > It covers production-like deployment and running a scenarios over
>>    a single database instance. PoC-Tester scenarios consists of a
>>    sequence of tasks running sequentially or in parallel. After all
>>    tasks complete, or at any time during test, user can run logs
>>    collection task, logs are checked against exceptions and a summary
>>    of found issues and task ops/latency statistics is generated at the
>>    end of scenario. One of the main PoC-Tester features is its
>>    fire-and-forget approach to task managing. That is, you can deploy a
>>    grid and left it running for weeks, periodically firing some tasks
>>    onto it.
>>     >
>>     > During earliest stages of PoC-Tester development it becomes quite
>>    clear that Java application development is a tedious process and
>>    architecture decisions you take during development are slow and hard
>>    to change.
>>     > For example, scenarios like this
>>     > - deploy two instances of GridGain with master-slave data
>>    replication configured;
>>     > - put a load on master;
>>     > - perform checks on slave,
>>     > or like this:
>>     > - preload a 1Tb of data by using your favorite tool of choice to
>>    an Apache Ignite of version X;
>>     > - run a set of functional tests running Apache Ignite version Y
>>    over preloaded data,
>>     > do not fit well in the PoC-Tester workflow.
>>     >
>>     > So, this is why we decided to use Python as a generic scripting
>>    language of choice.
>>     >
>>     > Pros:
>>     > - quicker prototyping and development cycles
>>     > - easier to find DevOps/QA engineer with Python skills than one
>>    with Java skills
>>     > - used extensively all over the world for DevOps/CI pipelines and
>>    thus has rich set of libraries for all possible integration uses cases.
>>     >
>>     > Cons:
>>     > - Nightmare with dependencies. Better stick to specific
>>    language/libraries version.
>>     >
>>     > Comparing alternatives for Python-based testing framework we have
>>    considered following requirements, somewhat similar to what you’ve
>>    mentioned for Confluent [8] previously:
>>     > - should be able run locally or distributed (bare metal or in the
>>    cloud)
>>     > - should have built-in deployment facilities for applications
>>    under test
>>     > - should separate test configuration and test code
>>     > -- be able to easily reconfigure tests by simple configuration
>>    changes
>>     > -- be able to easily scale test environment by simple
>>    configuration changes
>>     > -- be able to perform regression testing by simple switching
>>    artifacts under test via configuration
>>     > -- be able to run tests with different JDK version by simple
>>    configuration changes
>>     > - should have human readable reports and/or reporting tools
>>    integration
>>     > - should allow simple test progress monitoring, one does not want
>>    to run 6-hours test to find out that application actually crashed
>>    during first hour.
>>     > - should allow parallel execution of test actions
>>     > - should have clean API for test writers
>>     > -- clean API for distributed remote commands execution
>>     > -- clean API for deployed applications start / stop and other
>>    operations
>>     > -- clean API for performing check on results
>>     > - should be open source or at least source code should allow ease
>>    change or extension
>>     >
>>     > Back at that time we found no better alternative than to write
>>    our own framework, and here goes Tiden [9] as GridGain framework of
>>    choice for functional integration and performance testing.
>>     >
>>     > Pros:
>>     > - solves all the requirements above
>>     > Cons (for Ignite):
>>     > - (currently) closed GridGain source
>>     >
>>     > On top of Tiden we’ve built a set of test suites, some of which
>>    you might have heard already.
>>     >
>>     > A Combinator suite allows to run set of operations concurrently
>>    over given database instance. Proven to find at least 30+ race
>>    conditions and NPE issues.
>>     >
>>     > A Consumption suite allows to run a set production-like actions
>>    over given set of Ignite/GridGain versions and compare test metrics
>>    across versions, like heap/disk/CPU consumption, time to perform
>>    actions, like client PME, server PME, rebalancing time, data
>>    replication time, etc.
>>     >
>>     > A Yardstick suite is a thin layer of Python glue code to run
>>    Apache Ignite pre-release benchmarks set. Yardstick itself has a
>>    mediocre deployment capabilities, Tiden solves this easily.
>>     >
>>     > A Stress suite that simulates hardware environment degradation
>>    during testing.
>>     >
>>     > An Ultimate, DR and Compatibility suites that performs functional
>>    regression testing of GridGain Ultimate Edition features like
>>    snapshots, security, data replication, rolling upgrades, etc.
>>     >
>>     > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
>>    etc, etc, etc.
>>     >
>>     > Most of the suites above use another in-house developed Java tool
>>    – PiClient – to perform actual loading and miscellaneous operations
>>    with Ignite under test. We use py4j Python-Java gateway library to
>>    control PiClient instances from the tests.
>>     >
>>     > When we considered CI, we put TeamCity out of scope, because
>>    distributed integration and performance tests tend to run for hours
>>    and TeamCity agents are scarce and costly resource. So, bundled with
>>    Tiden there is jenkins-job-builder [10] based CI pipelines and
>>    Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates test
>>    run reports across versions and has built in visualization support
>>    for Combinator suite.
>>     >
>>     > All of the above is currently closed source, but we plan to make
>>    it public for community, and publishing Tiden core [9] is the first
>>    step on that way. You can review some examples of using Tiden for
>>    tests at my repository [11], for start.
>>     >
>>     > Now, let’s compare Ducktape PoC and Tiden.
>>     >
>>     > Criteria: Language
>>     > Tiden: Python, 3.7
>>     > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>>    compatible, but actually can’t work with Python 3.7 due to broken
>>    Zmq dependency.
>>     > Comment: Python 3.7 has a much better support for async-style
>>    code which might be crucial for distributed application testing.
>>     > Score: Tiden: 1, Ducktape: 0
>>     >
>>     > Criteria: Test writers API
>>     > Supported integration test framework concepts are basically the same:
>>     > - a test controller (test runner)
>>     > - a cluster
>>     > - a node
>>     > - an application (a service in Ducktape terms)
>>     > - a test
>>     > Score: Tiden: 5, Ducktape: 5
>>     >
>>     > Criteria: Tests selection and run
>>     > Ducktape: suite-package-class-method level selection, internal
>>    scheduler allows to run tests in suite in parallel.
>>     > Tiden: also suite-package-class-method level selection,
>>    additionally allows selecting subset of tests by attribute, parallel
>>    runs not built in, but allows merging test reports after different runs.
>>     > Score: Tiden: 2, Ducktape: 2
>>     >
>>     > Criteria: Test configuration
>>     > Ducktape: single JSON string for all tests
>>     > Tiden: any number of YaML config files, command line option for
>>    fine-grained test configuration, ability to select/modify tests
>>    behavior based on Ignite version.
>>     > Score: Tiden: 3, Ducktape: 1
>>     >
>>     > Criteria: Cluster control
>>     > Ducktape: allow execute remote commands by node granularity
>>     > Tiden: additionally can address cluster as a whole and execute
>>    remote commands in parallel.
>>     > Score: Tiden: 2, Ducktape: 1
>>     >
>>     > Criteria: Logs control
>>     > Both frameworks have similar builtin support for remote logs
>>    collection and grepping. Tiden has built-in plugin that can zip,
>>    collect arbitrary log files from arbitrary locations at
>>    test/module/suite granularity and unzip if needed, also application
>>    API to search / wait for messages in logs. Ducktape allows each
>>    service declare its log files location (seemingly does not support
>>    logs rollback), and a single entrypoint to collect service logs.
>>     > Score: Tiden: 1, Ducktape: 1
>>     >
>>     > Criteria: Test assertions
>>     > Tiden: simple asserts, also few customized assertion helpers.
>>     > Ducktape: simple asserts.
>>     > Score: Tiden: 2, Ducktape: 1
>>     >
>>     > Criteria: Test reporting
>>     > Ducktape: limited to its own text/html format
>>     > Tiden: provides text report, yaml report for reporting tools
>>    integration, XML xUnit report for integration with Jenkins/TeamCity.
>>     > Score: Tiden: 3, Ducktape: 1
>>     >
>>     > Criteria: Provisioning and deployment
>>     > Ducktape: can provision subset of hosts from cluster for test
>>    needs. However, that means, that test can’t be scaled without test
>>    code changes. Does not do any deploy, relies on external means, e.g.
>>    pre-packaged in docker image, as in PoC.
>>     > Tiden: Given a set of hosts, Tiden uses all of them for the test.
>>    Provisioning should be done by external means. However, provides a
>>    conventional automated deployment routines.
>>     > Score: Tiden: 1, Ducktape: 1
>>     >
>>     > Criteria: Documentation and Extensibility
>>     > Tiden: current API documentation is limited, should change as we
>>    go open source. Tiden is easily extensible via hooks and plugins,
>>    see example Maven plugin and Gatling application at [11].
>>     > Ducktape: basic documentation at readthedocs.io
>>    <http://readthedocs.io>. Codebase is rigid, framework core is
>>    tightly coupled and hard to change. The only possible extension
>>    mechanism is fork-and-rewrite.
>>     > Score: Tiden: 2, Ducktape: 1
>>     >
>>     > I can continue more on this, but it should be enough for now:
>>     > Overall score: Tiden: 22, Ducktape: 14.
>>     >
>>     > Time for discussion!
>>     >
>>     > ---
>>     > [1] - https://www.testcontainers.org/
>>     > [2] - http://arquillian.org/guides/getting_started/
>>     > [3] - https://jmeter.apache.org/index.html
>>     > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>>     > [5] - https://gatling.io/docs/current/
>>     > [6] - https://github.com/gridgain/yardstick
>>     > [7] - https://github.com/gridgain/poc-tester
>>     > [8] -
>>    https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
>>     > [9] - https://github.com/gridgain/tiden
>>     > [10] - https://pypi.org/project/jenkins-job-builder/
>>     > [11] - https://github.com/mshonichev/tiden_examples
>>     >
>>     > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>>     >> Hello,
>>     >>
>>     >> Branch with duck tape created -
>>    https://github.com/apache/ignite/tree/ignite-ducktape
>>     >>
>>     >> Any who are willing to contribute to PoC are welcome.
>>     >>
>>     >>
>>     >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>>    <nizhikov.dev@gmail.com <ma...@gmail.com>> написал(а):
>>     >>>
>>     >>> Hello, Denis.
>>     >>>
>>     >>> There is no rush with these improvements.
>>     >>> We can wait for Maxim proposal and compare two solutions :)
>>     >>>
>>     >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>>    <ma...@apache.org>> написал(а):
>>     >>>>
>>     >>>> Hi Nikolay,
>>     >>>>
>>     >>>> Thanks for kicking off this conversation and sharing your
>>    findings with the
>>     >>>> results. That's the right initiative. I do agree that Ignite
>>    needs to have
>>     >>>> an integration testing framework with capabilities listed by you.
>>     >>>>
>>     >>>> As we discussed privately, I would only check if instead of
>>     >>>> Confluent's Ducktape library, we can use an integration
>>    testing framework
>>     >>>> developed by GridGain for testing of Ignite/GridGain clusters.
>>    That
>>     >>>> framework has been battle-tested and might be more convenient for
>>     >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>     >>>> <mshonichev@gridgain.com <ma...@gridgain.com>> who
>>    promised to join this thread once he finishes
>>     >>>> preparing the usage examples of the framework. To my
>>    knowledge, Max has
>>     >>>> already been working on that for several days.
>>     >>>>
>>     >>>> -
>>     >>>> Denis
>>     >>>>
>>     >>>>
>>     >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>>    <nizhikov@apache.org <ma...@apache.org>>
>>     >>>> wrote:
>>     >>>>
>>     >>>>> Hello, Igniters.
>>     >>>>>
>>     >>>>> I created a PoC [1] for the integration tests of Ignite.
>>     >>>>>
>>     >>>>> Let me briefly explain the gap I want to cover:
>>     >>>>>
>>     >>>>> 1. For now, we don’t have a solution for automated testing of
>>    Ignite on
>>     >>>>> «real cluster».
>>     >>>>> By «real cluster» I mean cluster «like a production»:
>>     >>>>>       * client and server nodes deployed on different hosts.
>>     >>>>>       * thin clients perform queries from some other hosts
>>     >>>>>       * etc.
>>     >>>>>
>>     >>>>> 2. We don’t have a solution for automated benchmarks of some
>>    internal
>>     >>>>> Ignite process
>>     >>>>>       * PME
>>     >>>>>       * rebalance.
>>     >>>>> This means we don’t know - Do we perform rebalance(or PME) in
>>    2.7.0 faster
>>     >>>>> or slower than in 2.8.0 for the same cluster?
>>     >>>>>
>>     >>>>> 3. We don’t have a solution for automated testing of Ignite
>>    integration in
>>     >>>>> a real-world environment:
>>     >>>>> Ignite-Spark integration can be taken as an example.
>>     >>>>> I think some ML solutions also should be tested in real-world
>>    deployments.
>>     >>>>>
>>     >>>>> Solution:
>>     >>>>>
>>     >>>>> I propose to use duck tape library from confluent (apache 2.0
>>    license)
>>     >>>>> I tested it both on the real cluster(Yandex Cloud) and on the
>>    local
>>     >>>>> environment(docker) and it works just fine.
>>     >>>>>
>>     >>>>> PoC contains following services:
>>     >>>>>
>>     >>>>>       * Simple rebalance test:
>>     >>>>>               Start 2 server nodes,
>>     >>>>>               Create some data with Ignite client,
>>     >>>>>               Start one more server node,
>>     >>>>>               Wait for rebalance finish
>>     >>>>>       * Simple Ignite-Spark integration test:
>>     >>>>>               Start 1 Spark master, start 1 Spark worker,
>>     >>>>>               Start 1 Ignite server node
>>     >>>>>               Create some data with Ignite client,
>>     >>>>>               Check data in application that queries it from
>>    Spark.
>>     >>>>>
>>     >>>>> All tests are fully automated.
>>     >>>>> Logs collection works just fine.
>>     >>>>> You can see an example of the tests report - [4].
>>     >>>>>
>>     >>>>> Pros:
>>     >>>>>
>>     >>>>> * Ability to test local changes(no need to public changes to
>>    some remote
>>     >>>>> repository or similar).
>>     >>>>> * Ability to parametrize test environment(run the same tests
>>    on different
>>     >>>>> JDK, JVM params, config, etc.)
>>     >>>>> * Isolation by default so system tests are as reliable as
>>    possible.
>>     >>>>> * Utilities for pulling up and tearing down services easily
>>    in clusters in
>>     >>>>> different environments (e.g. local, custom cluster, Vagrant,
>>    K8s, Mesos,
>>     >>>>> Docker, cloud providers, etc.)
>>     >>>>> * Easy to write unit tests for distributed systems
>>     >>>>> * Adopted and successfully used by other distributed open
>>    source project -
>>     >>>>> Apache Kafka.
>>     >>>>> * Collect results (e.g. logs, console output)
>>     >>>>> * Report results (e.g. expected conditions met, performance
>>    results, etc.)
>>     >>>>>
>>     >>>>> WDYT?
>>     >>>>>
>>     >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>     >>>>> [2] https://github.com/confluentinc/ducktape
>>     >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>     >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Max Shonichev <ms...@yandex.ru>.

Anton, Nikolay,

Let's agree on what we are arguing about: whether it is about "like or 
don't like" or about technical properties of suggested solutions.

If it is about likes and dislikes, then the whole discussion is 
meaningless. However, I hope together we can analyse pros and cons 
carefully.

As far as I can understand now, two main differences between 
ducktape and tiden is that:

1. tiden can deploy artifacts by itself, while ducktape relies on 
dependencies being deployed by external scripts.

2. tiden can execute actions over remote nodes in real parallel fashion, 
while ducktape internally does all actions sequentially.

As for me, these are very important properties for distributed testing 
framework.

First property let us easily reuse tiden in existing infrastructures, 
for example, during Zookeeper IEP testing at Sberbank site we used the 
same tiden scripts that we use in our lab, the only change was putting a 
list of hosts into config.

If we used ducktape solution we would have to instead prepare some 
deployment scripts to pre-initialize Sberbank hosts, for example, with 
Ansible or Chef.


You have solved this deficiency with docker by putting all dependencies 
into one uber-image and that looks like simple and elegant solution,
however, that effectively limits you to single-host testing.

I guess we all know about docker hyped ability to run over distributed 
virtual networks. We used to go that way, but quickly found that it is 
more of the hype than real work. In real environments, there are 
problems with routing, DNS, multicast and broadcast traffic, and many 
others, that turn docker-based distributed solution into a fragile 
hard-to-maintain monster.

Please, if you believe otherwise, perform a run of your PoC over at 
least two physical hosts and share results with us.

If you consider that one physical docker host is enough, please, don't 
overlook that we want to run real scale scenarios, with 50-100 cache 
groups, persistence enabled and a millions of keys loaded.

Practical limit for such configurations is 4-6 nodes per single physical 
host. Otherwise, tests become flaky due to resource starvation.

Please, if you believe otherwise, perform at least a 10 of runs of your 
PoC with other tests running at TC (we're targeting TeamCity, right?) 
and share results so we could check if the numbers are reproducible.

I stress this once more: functional integration tests are OK to run in 
Docker and CI, but running benchmarks in Docker is a big NO GO.


Second property let us write tests that require real-parallel actions 
over hosts.

For example, agreed scenario for PME benchmarkduring "PME optimization 
stream" was as follows:

  - 10 server nodes, preloaded with 1M of keys
  - 4 client nodes perform transactional load  (client nodes physically 
separated from server nodes)
  - during load:
  -- 5 server nodes stopped in parallel
  -- after 1 minute, all 5 nodes are started in parallel
  - load stopped, logs are analysed for exchange times.

If we had stopped and started 5 nodes one-by-one, as ducktape does, then 
partition map exchange merge would not happen and we could not have 
measured PME optimizations for that case.


These are limitations of ducktape that we believe as a more important
argument "against" than you provide "for".




On 30.06.2020 14:58, Anton Vinogradov wrote:
> Folks,
> First, I've created PR [1] with ducktests improvements
> 
> PR contains the following changes
> - Pme-free switch proof-benchmark (2.7.6 vs master)
> - Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
> - Global refactoring
> -- benchmarks javacode simplification
> -- services python and java classes code deduplication
> -- fail-fast checks for java and python (eg. application should 
> explicitly write it finished with success)
> -- simple results extraction from tests and benchmarks
> -- javacode now configurable from tests/benchmarks
> -- proper SIGTERM handling at javacode (eg. it may finish last operation 
> and log results)
> -- docker volume now marked as delegated to increase execution speed for 
> mac & win users
> -- Ignite cluster now start in parallel (start speed-up)
> -- Ignite can be configured at test/benchmark
> - full and module assembly scripts added
Great job done! But let me remind one of Apache Ignite principles:
week of thinking save months of development.


> 
> Second, I'd like to propose to accept ducktests [2] (ducktape 
> integration) as a target "PoC check & real topology benchmarking tool".
> 
> Ducktape pros
> - Developed for distributed system by distributed system developers.
So does Tiden

> - Developed since 2014, stable.
Tiden is also pretty stable, and development start date is not a good 
argument, for example pytest is since 2004, pytest-xdist (plugin for 
distributed testing) is since 2010, but we don't see it as a alternative 
at all.

> - Proven usability by usage at Kafka.
Tiden is proven usable by usage at GridGain and Sberbank deployments.
Core, storage, sql and tx teams use benchmark results provided by Tiden 
on a daily basis.

> - Dozens of dozens tests and benchmarks at Kafka as a great example pack.
We'll donate some of our suites to Ignite as I've mentioned in previous 
letter.

> - Built-in Docker support for rapid development and checks.
False, there's no specific 'docker support' in ducktape itself, you just 
wrap it in docker by yourself, because ducktape is lacking deployment 
abilities.

> - Great for CI automation.
False, there's no specific CI-enabled features in ducktape. Tiden, on 
the other hand, provide generic xUnit reporting format, which is 
supported by both TeamCity and Jenkins. Also, instead of using private 
keys, Tiden can use SSH agent, which is also great for CI, because both
TeamCity and Jenkins store keys in secret storage available only for 
ssh-agent and only for the time of the test.


>  > As an additional motivation, at least 3 teams
> - IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix 
> speed-up))
> - Ignite SE Plugins team (to check plugin's features does not slow-down 
> or broke AI features)
> - Ignite SE QA team (to append already developed smoke/load/failover 
> tests to AI codebase)

Please, before recommending your tests to other teams, provide proofs
that your tests are reproducible in real environment.


> now, wait for ducktest merge to start checking cases they working on in 
> AI way.
> 
> Thoughts?
> 
Let us together review both solutions, we'll try to run your tests in 
our lab, and you'll try to at least checkout tiden and see if same tests 
can be implemented with it?



> [1] https://github.com/apache/ignite/pull/7967
> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> 
> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <nizhikov@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Hello, Maxim.
> 
>     Thank you for so detailed explanation.
> 
>     Can we put the content of this discussion somewhere on the wiki?
>     So It doesn’t get lost.
> 
>     I divide the answer in several parts. From the requirements to the
>     implementation.
>     So, if we agreed on the requirements we can proceed with the
>     discussion of the implementation.
> 
>     1. Requirements:
> 
>     The main goal I want to achieve is *reproducibility* of the tests.
>     I’m sick and tired with the zillions of flaky, rarely failed, and
>     almost never failed tests in Ignite codebase.
>     We should start with the simplest scenarios that will be as reliable
>     as steel :)
> 
>     I want to know for sure:
>        - Is this PR makes rebalance quicker or not?
>        - Is this PR makes PME quicker or not?
> 
>     So, your description of the complex test scenario looks as a next
>     step to me.
> 
>     Anyway, It’s cool we already have one.
> 
>     The second goal is to have a strict test lifecycle as we have in
>     JUnit and similar frameworks.
> 
>      > It covers production-like deployment and running a scenarios over
>     a single database instance.
> 
>     Do you mean «single cluster» or «single host»?
> 
>     2. Existing tests:
> 
>      > A Combinator suite allows to run set of operations concurrently
>     over given database instance.
>      > A Consumption suite allows to run a set production-like actions
>     over given set of Ignite/GridGain versions and compare test metrics
>     across versions
>      > A Yardstick suite
>      > A Stress suite that simulates hardware environment degradation
>      > An Ultimate, DR and Compatibility suites that performs functional
>     regression testing
>      > Regression
> 
>     Great news that we already have so many choices for testing!
>     Mature test base is a big +1 for Tiden.
> 
>     3. Comparison:
> 
>      > Criteria: Test configuration
>      > Ducktape: single JSON string for all tests
>      > Tiden: any number of YaML config files, command line option for
>     fine-grained test configuration, ability to select/modify tests
>     behavior based on Ignite version.
> 
>     1. Many YAML files can be hard to maintain.
>     2. In ducktape, you can set parameters via «—parameters» option.
>     Please, take a look at the doc [1]
> 
>      > Criteria: Cluster control
>      > Tiden: additionally can address cluster as a whole and execute
>     remote commands in parallel.
> 
>     It seems we implement this ability in the PoC, already.
> 
>      > Criteria: Test assertions
>      > Tiden: simple asserts, also few customized assertion helpers.
>      > Ducktape: simple asserts.
> 
>     Can you, please, be more specific.
>     What helpers do you have in mind?
>     Ducktape has an asserts that waits for logfile messages or some
>     process finish.
> 
>      > Criteria: Test reporting
>      > Ducktape: limited to its own text/HTML format
> 
>     Ducktape have
>     1. Text reporter
>     2. Customizable HTML reporter
>     3. JSON reporter.
> 
>     We can show JSON with the any template or tool.
> 
>      > Criteria: Provisioning and deployment
>      > Ducktape: can provision subset of hosts from cluster for test
>     needs. However, that means, that test can’t be scaled without test
>     code changes. Does not do any deploy, relies on external means, e.g.
>     pre-packaged in docker image, as in PoC.
> 
>     This is not true.
> 
>     1. We can set explicit test parameters(node number) via parameters.
>     We can increase client count of cluster size without test code changes.
> 
>     2. We have many choices for the test environment. These choices are
>     tested and used in other projects:
>              * docker
>              * vagrant
>              * private cloud(ssh access)
>              * ec2
>     Please, take a look at Kafka documentation [2]
> 
>      > I can continue more on this, but it should be enough for now:
> 
>     We need to go deeper! :)
> 
>     [1]
>     https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
>     [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> 
>      > 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru
>     <ma...@yandex.ru>> написал(а):
>      >
>      > Greetings, Nikolay,
>      >
>      > First of all, thank you for you great effort preparing PoC of
>     integration testing to Ignite community.
>      >
>      > It’s a shame Ignite did not have at least some such tests yet,
>     however, GridGain, as a major contributor to Apache Ignite had a
>     profound collection of in-house tools to perform integration and
>     performance testing for years already and while we slowly consider
>     sharing our expertise with the community, your initiative makes us
>     drive that process a bit faster, thanks a lot!
>      >
>      > I reviewed your PoC and want to share a little about what we do
>     on our part, why and how, hope it would help community take proper
>     course.
>      >
>      > First I’ll do a brief overview of what decisions we made and what
>     we do have in our private code base, next I’ll describe what we have
>     already donated to the public and what we plan public next, then
>     I’ll compare both approaches highlighting deficiencies in order to
>     spur public discussion on the matter.
>      >
>      > It might seem strange to use Python to run Bash to run Java
>     applications because that introduces IT industry best of breed’ –
>     the Python dependency hell – to the Java application code base. The
>     only strangest decision one can made is to use Maven to run Docker
>     to run Bash to run Python to run Bash to run Java, but desperate
>     times call for desperate measures I guess.
>      >
>      > There are Java-based solutions for integration testing exists,
>     e.g. Testcontainers [1], Arquillian [2], etc, and they might go well
>     for Ignite community CI pipelines by them selves. But we also wanted
>     to run performance tests and benchmarks, like the dreaded PME
>     benchmark, and this is solved by totally different set of tools in
>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
>      >
>      > Speaking specifically about benchmarking, Apache Ignite community
>     already has Yardstick [6], and there’s nothing wrong with writing
>     PME benchmark using Yardstick, but we also wanted to be able to run
>     scenarios like this:
>      > - put an X load to a Ignite database;
>      > - perform an Y set of operations to check how Ignite copes with
>     operations under load.
>      >
>      > And yes, we also wanted applications under test be deployed ‘like
>     in a production’, e.g. distributed over a set of hosts. This arises
>     questions about provisioning and nodes affinity which I’ll cover in
>     detail later.
>      >
>      > So we decided to put a little effort to build a simple tool to
>     cover different integration and performance scenarios, and our QA
>     lab first attempt was PoC-Tester [7], currently open source for all
>     but for reporting web UI. It’s a quite simple to use 95% Java-based
>     tool targeted to be run on a pre-release QA stage.
>      >
>      > It covers production-like deployment and running a scenarios over
>     a single database instance. PoC-Tester scenarios consists of a
>     sequence of tasks running sequentially or in parallel. After all
>     tasks complete, or at any time during test, user can run logs
>     collection task, logs are checked against exceptions and a summary
>     of found issues and task ops/latency statistics is generated at the
>     end of scenario. One of the main PoC-Tester features is its
>     fire-and-forget approach to task managing. That is, you can deploy a
>     grid and left it running for weeks, periodically firing some tasks
>     onto it.
>      >
>      > During earliest stages of PoC-Tester development it becomes quite
>     clear that Java application development is a tedious process and
>     architecture decisions you take during development are slow and hard
>     to change.
>      > For example, scenarios like this
>      > - deploy two instances of GridGain with master-slave data
>     replication configured;
>      > - put a load on master;
>      > - perform checks on slave,
>      > or like this:
>      > - preload a 1Tb of data by using your favorite tool of choice to
>     an Apache Ignite of version X;
>      > - run a set of functional tests running Apache Ignite version Y
>     over preloaded data,
>      > do not fit well in the PoC-Tester workflow.
>      >
>      > So, this is why we decided to use Python as a generic scripting
>     language of choice.
>      >
>      > Pros:
>      > - quicker prototyping and development cycles
>      > - easier to find DevOps/QA engineer with Python skills than one
>     with Java skills
>      > - used extensively all over the world for DevOps/CI pipelines and
>     thus has rich set of libraries for all possible integration uses cases.
>      >
>      > Cons:
>      > - Nightmare with dependencies. Better stick to specific
>     language/libraries version.
>      >
>      > Comparing alternatives for Python-based testing framework we have
>     considered following requirements, somewhat similar to what you’ve
>     mentioned for Confluent [8] previously:
>      > - should be able run locally or distributed (bare metal or in the
>     cloud)
>      > - should have built-in deployment facilities for applications
>     under test
>      > - should separate test configuration and test code
>      > -- be able to easily reconfigure tests by simple configuration
>     changes
>      > -- be able to easily scale test environment by simple
>     configuration changes
>      > -- be able to perform regression testing by simple switching
>     artifacts under test via configuration
>      > -- be able to run tests with different JDK version by simple
>     configuration changes
>      > - should have human readable reports and/or reporting tools
>     integration
>      > - should allow simple test progress monitoring, one does not want
>     to run 6-hours test to find out that application actually crashed
>     during first hour.
>      > - should allow parallel execution of test actions
>      > - should have clean API for test writers
>      > -- clean API for distributed remote commands execution
>      > -- clean API for deployed applications start / stop and other
>     operations
>      > -- clean API for performing check on results
>      > - should be open source or at least source code should allow ease
>     change or extension
>      >
>      > Back at that time we found no better alternative than to write
>     our own framework, and here goes Tiden [9] as GridGain framework of
>     choice for functional integration and performance testing.
>      >
>      > Pros:
>      > - solves all the requirements above
>      > Cons (for Ignite):
>      > - (currently) closed GridGain source
>      >
>      > On top of Tiden we’ve built a set of test suites, some of which
>     you might have heard already.
>      >
>      > A Combinator suite allows to run set of operations concurrently
>     over given database instance. Proven to find at least 30+ race
>     conditions and NPE issues.
>      >
>      > A Consumption suite allows to run a set production-like actions
>     over given set of Ignite/GridGain versions and compare test metrics
>     across versions, like heap/disk/CPU consumption, time to perform
>     actions, like client PME, server PME, rebalancing time, data
>     replication time, etc.
>      >
>      > A Yardstick suite is a thin layer of Python glue code to run
>     Apache Ignite pre-release benchmarks set. Yardstick itself has a
>     mediocre deployment capabilities, Tiden solves this easily.
>      >
>      > A Stress suite that simulates hardware environment degradation
>     during testing.
>      >
>      > An Ultimate, DR and Compatibility suites that performs functional
>     regression testing of GridGain Ultimate Edition features like
>     snapshots, security, data replication, rolling upgrades, etc.
>      >
>      > A Regression and some IEPs testing suites, like IEP-14, IEP-15,
>     etc, etc, etc.
>      >
>      > Most of the suites above use another in-house developed Java tool
>     – PiClient – to perform actual loading and miscellaneous operations
>     with Ignite under test. We use py4j Python-Java gateway library to
>     control PiClient instances from the tests.
>      >
>      > When we considered CI, we put TeamCity out of scope, because
>     distributed integration and performance tests tend to run for hours
>     and TeamCity agents are scarce and costly resource. So, bundled with
>     Tiden there is jenkins-job-builder [10] based CI pipelines and
>     Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates test
>     run reports across versions and has built in visualization support
>     for Combinator suite.
>      >
>      > All of the above is currently closed source, but we plan to make
>     it public for community, and publishing Tiden core [9] is the first
>     step on that way. You can review some examples of using Tiden for
>     tests at my repository [11], for start.
>      >
>      > Now, let’s compare Ducktape PoC and Tiden.
>      >
>      > Criteria: Language
>      > Tiden: Python, 3.7
>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
>     compatible, but actually can’t work with Python 3.7 due to broken
>     Zmq dependency.
>      > Comment: Python 3.7 has a much better support for async-style
>     code which might be crucial for distributed application testing.
>      > Score: Tiden: 1, Ducktape: 0
>      >
>      > Criteria: Test writers API
>      > Supported integration test framework concepts are basically the same:
>      > - a test controller (test runner)
>      > - a cluster
>      > - a node
>      > - an application (a service in Ducktape terms)
>      > - a test
>      > Score: Tiden: 5, Ducktape: 5
>      >
>      > Criteria: Tests selection and run
>      > Ducktape: suite-package-class-method level selection, internal
>     scheduler allows to run tests in suite in parallel.
>      > Tiden: also suite-package-class-method level selection,
>     additionally allows selecting subset of tests by attribute, parallel
>     runs not built in, but allows merging test reports after different runs.
>      > Score: Tiden: 2, Ducktape: 2
>      >
>      > Criteria: Test configuration
>      > Ducktape: single JSON string for all tests
>      > Tiden: any number of YaML config files, command line option for
>     fine-grained test configuration, ability to select/modify tests
>     behavior based on Ignite version.
>      > Score: Tiden: 3, Ducktape: 1
>      >
>      > Criteria: Cluster control
>      > Ducktape: allow execute remote commands by node granularity
>      > Tiden: additionally can address cluster as a whole and execute
>     remote commands in parallel.
>      > Score: Tiden: 2, Ducktape: 1
>      >
>      > Criteria: Logs control
>      > Both frameworks have similar builtin support for remote logs
>     collection and grepping. Tiden has built-in plugin that can zip,
>     collect arbitrary log files from arbitrary locations at
>     test/module/suite granularity and unzip if needed, also application
>     API to search / wait for messages in logs. Ducktape allows each
>     service declare its log files location (seemingly does not support
>     logs rollback), and a single entrypoint to collect service logs.
>      > Score: Tiden: 1, Ducktape: 1
>      >
>      > Criteria: Test assertions
>      > Tiden: simple asserts, also few customized assertion helpers.
>      > Ducktape: simple asserts.
>      > Score: Tiden: 2, Ducktape: 1
>      >
>      > Criteria: Test reporting
>      > Ducktape: limited to its own text/html format
>      > Tiden: provides text report, yaml report for reporting tools
>     integration, XML xUnit report for integration with Jenkins/TeamCity.
>      > Score: Tiden: 3, Ducktape: 1
>      >
>      > Criteria: Provisioning and deployment
>      > Ducktape: can provision subset of hosts from cluster for test
>     needs. However, that means, that test can’t be scaled without test
>     code changes. Does not do any deploy, relies on external means, e.g.
>     pre-packaged in docker image, as in PoC.
>      > Tiden: Given a set of hosts, Tiden uses all of them for the test.
>     Provisioning should be done by external means. However, provides a
>     conventional automated deployment routines.
>      > Score: Tiden: 1, Ducktape: 1
>      >
>      > Criteria: Documentation and Extensibility
>      > Tiden: current API documentation is limited, should change as we
>     go open source. Tiden is easily extensible via hooks and plugins,
>     see example Maven plugin and Gatling application at [11].
>      > Ducktape: basic documentation at readthedocs.io
>     <http://readthedocs.io>. Codebase is rigid, framework core is
>     tightly coupled and hard to change. The only possible extension
>     mechanism is fork-and-rewrite.
>      > Score: Tiden: 2, Ducktape: 1
>      >
>      > I can continue more on this, but it should be enough for now:
>      > Overall score: Tiden: 22, Ducktape: 14.
>      >
>      > Time for discussion!
>      >
>      > ---
>      > [1] - https://www.testcontainers.org/
>      > [2] - http://arquillian.org/guides/getting_started/
>      > [3] - https://jmeter.apache.org/index.html
>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
>      > [5] - https://gatling.io/docs/current/
>      > [6] - https://github.com/gridgain/yardstick
>      > [7] - https://github.com/gridgain/poc-tester
>      > [8] -
>     https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
>      > [9] - https://github.com/gridgain/tiden
>      > [10] - https://pypi.org/project/jenkins-job-builder/
>      > [11] - https://github.com/mshonichev/tiden_examples
>      >
>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
>      >> Hello,
>      >>
>      >> Branch with duck tape created -
>     https://github.com/apache/ignite/tree/ignite-ducktape
>      >>
>      >> Any who are willing to contribute to PoC are welcome.
>      >>
>      >>
>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
>     <nizhikov.dev@gmail.com <ma...@gmail.com>> написал(а):
>      >>>
>      >>> Hello, Denis.
>      >>>
>      >>> There is no rush with these improvements.
>      >>> We can wait for Maxim proposal and compare two solutions :)
>      >>>
>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org
>     <ma...@apache.org>> написал(а):
>      >>>>
>      >>>> Hi Nikolay,
>      >>>>
>      >>>> Thanks for kicking off this conversation and sharing your
>     findings with the
>      >>>> results. That's the right initiative. I do agree that Ignite
>     needs to have
>      >>>> an integration testing framework with capabilities listed by you.
>      >>>>
>      >>>> As we discussed privately, I would only check if instead of
>      >>>> Confluent's Ducktape library, we can use an integration
>     testing framework
>      >>>> developed by GridGain for testing of Ignite/GridGain clusters.
>     That
>      >>>> framework has been battle-tested and might be more convenient for
>      >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>      >>>> <mshonichev@gridgain.com <ma...@gridgain.com>> who
>     promised to join this thread once he finishes
>      >>>> preparing the usage examples of the framework. To my
>     knowledge, Max has
>      >>>> already been working on that for several days.
>      >>>>
>      >>>> -
>      >>>> Denis
>      >>>>
>      >>>>
>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
>     <nizhikov@apache.org <ma...@apache.org>>
>      >>>> wrote:
>      >>>>
>      >>>>> Hello, Igniters.
>      >>>>>
>      >>>>> I created a PoC [1] for the integration tests of Ignite.
>      >>>>>
>      >>>>> Let me briefly explain the gap I want to cover:
>      >>>>>
>      >>>>> 1. For now, we don’t have a solution for automated testing of
>     Ignite on
>      >>>>> «real cluster».
>      >>>>> By «real cluster» I mean cluster «like a production»:
>      >>>>>       * client and server nodes deployed on different hosts.
>      >>>>>       * thin clients perform queries from some other hosts
>      >>>>>       * etc.
>      >>>>>
>      >>>>> 2. We don’t have a solution for automated benchmarks of some
>     internal
>      >>>>> Ignite process
>      >>>>>       * PME
>      >>>>>       * rebalance.
>      >>>>> This means we don’t know - Do we perform rebalance(or PME) in
>     2.7.0 faster
>      >>>>> or slower than in 2.8.0 for the same cluster?
>      >>>>>
>      >>>>> 3. We don’t have a solution for automated testing of Ignite
>     integration in
>      >>>>> a real-world environment:
>      >>>>> Ignite-Spark integration can be taken as an example.
>      >>>>> I think some ML solutions also should be tested in real-world
>     deployments.
>      >>>>>
>      >>>>> Solution:
>      >>>>>
>      >>>>> I propose to use duck tape library from confluent (apache 2.0
>     license)
>      >>>>> I tested it both on the real cluster(Yandex Cloud) and on the
>     local
>      >>>>> environment(docker) and it works just fine.
>      >>>>>
>      >>>>> PoC contains following services:
>      >>>>>
>      >>>>>       * Simple rebalance test:
>      >>>>>               Start 2 server nodes,
>      >>>>>               Create some data with Ignite client,
>      >>>>>               Start one more server node,
>      >>>>>               Wait for rebalance finish
>      >>>>>       * Simple Ignite-Spark integration test:
>      >>>>>               Start 1 Spark master, start 1 Spark worker,
>      >>>>>               Start 1 Ignite server node
>      >>>>>               Create some data with Ignite client,
>      >>>>>               Check data in application that queries it from
>     Spark.
>      >>>>>
>      >>>>> All tests are fully automated.
>      >>>>> Logs collection works just fine.
>      >>>>> You can see an example of the tests report - [4].
>      >>>>>
>      >>>>> Pros:
>      >>>>>
>      >>>>> * Ability to test local changes(no need to public changes to
>     some remote
>      >>>>> repository or similar).
>      >>>>> * Ability to parametrize test environment(run the same tests
>     on different
>      >>>>> JDK, JVM params, config, etc.)
>      >>>>> * Isolation by default so system tests are as reliable as
>     possible.
>      >>>>> * Utilities for pulling up and tearing down services easily
>     in clusters in
>      >>>>> different environments (e.g. local, custom cluster, Vagrant,
>     K8s, Mesos,
>      >>>>> Docker, cloud providers, etc.)
>      >>>>> * Easy to write unit tests for distributed systems
>      >>>>> * Adopted and successfully used by other distributed open
>     source project -
>      >>>>> Apache Kafka.
>      >>>>> * Collect results (e.g. logs, console output)
>      >>>>> * Report results (e.g. expected conditions met, performance
>     results, etc.)
>      >>>>>
>      >>>>> WDYT?
>      >>>>>
>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
>      >>>>> [2] https://github.com/confluentinc/ducktape
>      >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

Folks,
First, I've created PR [1] with ducktests improvements

PR contains the following changes
- Pme-free switch proof-benchmark (2.7.6 vs master)
- Ability to check (compare with) previous releases (eg. 2.7.6 & 2.8)
- Global refactoring
-- benchmarks javacode simplification
-- services python and java classes code deduplication
-- fail-fast checks for java and python (eg. application should explicitly
write it finished with success)
-- simple results extraction from tests and benchmarks
-- javacode now configurable from tests/benchmarks
-- proper SIGTERM handling at javacode (eg. it may finish last operation
and log results)
-- docker volume now marked as delegated to increase execution speed for
mac & win users
-- Ignite cluster now start in parallel (start speed-up)
-- Ignite can be configured at test/benchmark
- full and module assembly scripts added

Second, I'd like to propose to accept ducktests [2] (ducktape integration)
as a target "PoC check & real topology benchmarking tool".

Ducktape pros
- Developed for distributed system by distributed system developers.
- Developed since 2014, stable.
- Proven usability by usage at Kafka.
- Dozens of dozens tests and benchmarks at Kafka as a great example pack.
- Built-in Docker support for rapid development and checks.
- Great for CI automation.

As an additional motivation, at least 3 teams
- IEP-45 team (to check crash-recovery speed-up (discovery and Zabbix
speed-up))
- Ignite SE Plugins team (to check plugin's features does not slow-down or
broke AI features)
- Ignite SE QA team (to append already developed smoke/load/failover tests
to AI codebase)
now, wait for ducktest merge to start checking cases they working on in AI
way.

Thoughts?

[1] https://github.com/apache/ignite/pull/7967
[2] https://github.com/apache/ignite/tree/ignite-ducktape

On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Maxim.
>
> Thank you for so detailed explanation.
>
> Can we put the content of this discussion somewhere on the wiki?
> So It doesn’t get lost.
>
> I divide the answer in several parts. From the requirements to the
> implementation.
> So, if we agreed on the requirements we can proceed with the discussion of
> the implementation.
>
> 1. Requirements:
>
> The main goal I want to achieve is *reproducibility* of the tests.
> I’m sick and tired with the zillions of flaky, rarely failed, and almost
> never failed tests in Ignite codebase.
> We should start with the simplest scenarios that will be as reliable as
> steel :)
>
> I want to know for sure:
>   - Is this PR makes rebalance quicker or not?
>   - Is this PR makes PME quicker or not?
>
> So, your description of the complex test scenario looks as a next step to
> me.
>
> Anyway, It’s cool we already have one.
>
> The second goal is to have a strict test lifecycle as we have in JUnit and
> similar frameworks.
>
> > It covers production-like deployment and running a scenarios over a
> single database instance.
>
> Do you mean «single cluster» or «single host»?
>
> 2. Existing tests:
>
> > A Combinator suite allows to run set of operations concurrently over
> given database instance.
> > A Consumption suite allows to run a set production-like actions over
> given set of Ignite/GridGain versions and compare test metrics across
> versions
> > A Yardstick suite
> > A Stress suite that simulates hardware environment degradation
> > An Ultimate, DR and Compatibility suites that performs functional
> regression testing
> > Regression
>
> Great news that we already have so many choices for testing!
> Mature test base is a big +1 for Tiden.
>
> 3. Comparison:
>
> > Criteria: Test configuration
> > Ducktape: single JSON string for all tests
> > Tiden: any number of YaML config files, command line option for
> fine-grained test configuration, ability to select/modify tests behavior
> based on Ignite version.
>
> 1. Many YAML files can be hard to maintain.
> 2. In ducktape, you can set parameters via «—parameters» option. Please,
> take a look at the doc [1]
>
> > Criteria: Cluster control
> > Tiden: additionally can address cluster as a whole and execute remote
> commands in parallel.
>
> It seems we implement this ability in the PoC, already.
>
> > Criteria: Test assertions
> > Tiden: simple asserts, also few customized assertion helpers.
> > Ducktape: simple asserts.
>
> Can you, please, be more specific.
> What helpers do you have in mind?
> Ducktape has an asserts that waits for logfile messages or some process
> finish.
>
> > Criteria: Test reporting
> > Ducktape: limited to its own text/HTML format
>
> Ducktape have
> 1. Text reporter
> 2. Customizable HTML reporter
> 3. JSON reporter.
>
> We can show JSON with the any template or tool.
>
> > Criteria: Provisioning and deployment
> > Ducktape: can provision subset of hosts from cluster for test needs.
> However, that means, that test can’t be scaled without test code changes.
> Does not do any deploy, relies on external means, e.g. pre-packaged in
> docker image, as in PoC.
>
> This is not true.
>
> 1. We can set explicit test parameters(node number) via parameters.
> We can increase client count of cluster size without test code changes.
>
> 2. We have many choices for the test environment. These choices are tested
> and used in other projects:
>         * docker
>         * vagrant
>         * private cloud(ssh access)
>         * ec2
> Please, take a look at Kafka documentation [2]
>
> > I can continue more on this, but it should be enough for now:
>
> We need to go deeper! :)
>
> [1]  https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> [2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
>
> > 9 июня 2020 г., в 17:25, Max A. Shonichev <ms...@yandex.ru>
> написал(а):
> >
> > Greetings, Nikolay,
> >
> > First of all, thank you for you great effort preparing PoC of
> integration testing to Ignite community.
> >
> > It’s a shame Ignite did not have at least some such tests yet, however,
> GridGain, as a major contributor to Apache Ignite had a profound collection
> of in-house tools to perform integration and performance testing for years
> already and while we slowly consider sharing our expertise with the
> community, your initiative makes us drive that process a bit faster, thanks
> a lot!
> >
> > I reviewed your PoC and want to share a little about what we do on our
> part, why and how, hope it would help community take proper course.
> >
> > First I’ll do a brief overview of what decisions we made and what we do
> have in our private code base, next I’ll describe what we have already
> donated to the public and what we plan public next, then I’ll compare both
> approaches highlighting deficiencies in order to spur public discussion on
> the matter.
> >
> > It might seem strange to use Python to run Bash to run Java applications
> because that introduces IT industry best of breed’ – the Python dependency
> hell – to the Java application code base. The only strangest decision one
> can made is to use Maven to run Docker to run Bash to run Python to run
> Bash to run Java, but desperate times call for desperate measures I guess.
> >
> > There are Java-based solutions for integration testing exists, e.g.
> Testcontainers [1], Arquillian [2], etc, and they might go well for Ignite
> community CI pipelines by them selves. But we also wanted to run
> performance tests and benchmarks, like the dreaded PME benchmark, and this
> is solved by totally different set of tools in Java world, e.g. Jmeter [3],
> OpenJMH [4], Gatling [5], etc.
> >
> > Speaking specifically about benchmarking, Apache Ignite community
> already has Yardstick [6], and there’s nothing wrong with writing PME
> benchmark using Yardstick, but we also wanted to be able to run scenarios
> like this:
> > - put an X load to a Ignite database;
> > - perform an Y set of operations to check how Ignite copes with
> operations under load.
> >
> > And yes, we also wanted applications under test be deployed ‘like in a
> production’, e.g. distributed over a set of hosts. This arises questions
> about provisioning and nodes affinity which I’ll cover in detail later.
> >
> > So we decided to put a little effort to build a simple tool to cover
> different integration and performance scenarios, and our QA lab first
> attempt was PoC-Tester [7], currently open source for all but for reporting
> web UI. It’s a quite simple to use 95% Java-based tool targeted to be run
> on a pre-release QA stage.
> >
> > It covers production-like deployment and running a scenarios over a
> single database instance. PoC-Tester scenarios consists of a sequence of
> tasks running sequentially or in parallel. After all tasks complete, or at
> any time during test, user can run logs collection task, logs are checked
> against exceptions and a summary of found issues and task ops/latency
> statistics is generated at the end of scenario. One of the main PoC-Tester
> features is its fire-and-forget approach to task managing. That is, you can
> deploy a grid and left it running for weeks, periodically firing some tasks
> onto it.
> >
> > During earliest stages of PoC-Tester development it becomes quite clear
> that Java application development is a tedious process and architecture
> decisions you take during development are slow and hard to change.
> > For example, scenarios like this
> > - deploy two instances of GridGain with master-slave data replication
> configured;
> > - put a load on master;
> > - perform checks on slave,
> > or like this:
> > - preload a 1Tb of data by using your favorite tool of choice to an
> Apache Ignite of version X;
> > - run a set of functional tests running Apache Ignite version Y over
> preloaded data,
> > do not fit well in the PoC-Tester workflow.
> >
> > So, this is why we decided to use Python as a generic scripting language
> of choice.
> >
> > Pros:
> > - quicker prototyping and development cycles
> > - easier to find DevOps/QA engineer with Python skills than one with
> Java skills
> > - used extensively all over the world for DevOps/CI pipelines and thus
> has rich set of libraries for all possible integration uses cases.
> >
> > Cons:
> > - Nightmare with dependencies. Better stick to specific
> language/libraries version.
> >
> > Comparing alternatives for Python-based testing framework we have
> considered following requirements, somewhat similar to what you’ve
> mentioned for Confluent [8] previously:
> > - should be able run locally or distributed (bare metal or in the cloud)
> > - should have built-in deployment facilities for applications under test
> > - should separate test configuration and test code
> > -- be able to easily reconfigure tests by simple configuration changes
> > -- be able to easily scale test environment by simple configuration
> changes
> > -- be able to perform regression testing by simple switching artifacts
> under test via configuration
> > -- be able to run tests with different JDK version by simple
> configuration changes
> > - should have human readable reports and/or reporting tools integration
> > - should allow simple test progress monitoring, one does not want to run
> 6-hours test to find out that application actually crashed during first
> hour.
> > - should allow parallel execution of test actions
> > - should have clean API for test writers
> > -- clean API for distributed remote commands execution
> > -- clean API for deployed applications start / stop and other operations
> > -- clean API for performing check on results
> > - should be open source or at least source code should allow ease change
> or extension
> >
> > Back at that time we found no better alternative than to write our own
> framework, and here goes Tiden [9] as GridGain framework of choice for
> functional integration and performance testing.
> >
> > Pros:
> > - solves all the requirements above
> > Cons (for Ignite):
> > - (currently) closed GridGain source
> >
> > On top of Tiden we’ve built a set of test suites, some of which you
> might have heard already.
> >
> > A Combinator suite allows to run set of operations concurrently over
> given database instance. Proven to find at least 30+ race conditions and
> NPE issues.
> >
> > A Consumption suite allows to run a set production-like actions over
> given set of Ignite/GridGain versions and compare test metrics across
> versions, like heap/disk/CPU consumption, time to perform actions, like
> client PME, server PME, rebalancing time, data replication time, etc.
> >
> > A Yardstick suite is a thin layer of Python glue code to run Apache
> Ignite pre-release benchmarks set. Yardstick itself has a mediocre
> deployment capabilities, Tiden solves this easily.
> >
> > A Stress suite that simulates hardware environment degradation during
> testing.
> >
> > An Ultimate, DR and Compatibility suites that performs functional
> regression testing of GridGain Ultimate Edition features like snapshots,
> security, data replication, rolling upgrades, etc.
> >
> > A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc,
> etc, etc.
> >
> > Most of the suites above use another in-house developed Java tool –
> PiClient – to perform actual loading and miscellaneous operations with
> Ignite under test. We use py4j Python-Java gateway library to control
> PiClient instances from the tests.
> >
> > When we considered CI, we put TeamCity out of scope, because distributed
> integration and performance tests tend to run for hours and TeamCity agents
> are scarce and costly resource. So, bundled with Tiden there is
> jenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting.
> Also, rich web UI tool Ward aggregates test run reports across versions and
> has built in visualization support for Combinator suite.
> >
> > All of the above is currently closed source, but we plan to make it
> public for community, and publishing Tiden core [9] is the first step on
> that way. You can review some examples of using Tiden for tests at my
> repository [11], for start.
> >
> > Now, let’s compare Ducktape PoC and Tiden.
> >
> > Criteria: Language
> > Tiden: Python, 3.7
> > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible,
> but actually can’t work with Python 3.7 due to broken Zmq dependency.
> > Comment: Python 3.7 has a much better support for async-style code which
> might be crucial for distributed application testing.
> > Score: Tiden: 1, Ducktape: 0
> >
> > Criteria: Test writers API
> > Supported integration test framework concepts are basically the same:
> > - a test controller (test runner)
> > - a cluster
> > - a node
> > - an application (a service in Ducktape terms)
> > - a test
> > Score: Tiden: 5, Ducktape: 5
> >
> > Criteria: Tests selection and run
> > Ducktape: suite-package-class-method level selection, internal scheduler
> allows to run tests in suite in parallel.
> > Tiden: also suite-package-class-method level selection, additionally
> allows selecting subset of tests by attribute, parallel runs not built in,
> but allows merging test reports after different runs.
> > Score: Tiden: 2, Ducktape: 2
> >
> > Criteria: Test configuration
> > Ducktape: single JSON string for all tests
> > Tiden: any number of YaML config files, command line option for
> fine-grained test configuration, ability to select/modify tests behavior
> based on Ignite version.
> > Score: Tiden: 3, Ducktape: 1
> >
> > Criteria: Cluster control
> > Ducktape: allow execute remote commands by node granularity
> > Tiden: additionally can address cluster as a whole and execute remote
> commands in parallel.
> > Score: Tiden: 2, Ducktape: 1
> >
> > Criteria: Logs control
> > Both frameworks have similar builtin support for remote logs collection
> and grepping. Tiden has built-in plugin that can zip, collect arbitrary log
> files from arbitrary locations at test/module/suite granularity and unzip
> if needed, also application API to search / wait for messages in logs.
> Ducktape allows each service declare its log files location (seemingly does
> not support logs rollback), and a single entrypoint to collect service logs.
> > Score: Tiden: 1, Ducktape: 1
> >
> > Criteria: Test assertions
> > Tiden: simple asserts, also few customized assertion helpers.
> > Ducktape: simple asserts.
> > Score: Tiden: 2, Ducktape: 1
> >
> > Criteria: Test reporting
> > Ducktape: limited to its own text/html format
> > Tiden: provides text report, yaml report for reporting tools
> integration, XML xUnit report for integration with Jenkins/TeamCity.
> > Score: Tiden: 3, Ducktape: 1
> >
> > Criteria: Provisioning and deployment
> > Ducktape: can provision subset of hosts from cluster for test needs.
> However, that means, that test can’t be scaled without test code changes.
> Does not do any deploy, relies on external means, e.g. pre-packaged in
> docker image, as in PoC.
> > Tiden: Given a set of hosts, Tiden uses all of them for the test.
> Provisioning should be done by external means. However, provides a
> conventional automated deployment routines.
> > Score: Tiden: 1, Ducktape: 1
> >
> > Criteria: Documentation and Extensibility
> > Tiden: current API documentation is limited, should change as we go open
> source. Tiden is easily extensible via hooks and plugins, see example Maven
> plugin and Gatling application at [11].
> > Ducktape: basic documentation at readthedocs.io. Codebase is rigid,
> framework core is tightly coupled and hard to change. The only possible
> extension mechanism is fork-and-rewrite.
> > Score: Tiden: 2, Ducktape: 1
> >
> > I can continue more on this, but it should be enough for now:
> > Overall score: Tiden: 22, Ducktape: 14.
> >
> > Time for discussion!
> >
> > ---
> > [1] - https://www.testcontainers.org/
> > [2] - http://arquillian.org/guides/getting_started/
> > [3] - https://jmeter.apache.org/index.html
> > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> > [5] - https://gatling.io/docs/current/
> > [6] - https://github.com/gridgain/yardstick
> > [7] - https://github.com/gridgain/poc-tester
> > [8] -
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> > [9] - https://github.com/gridgain/tiden
> > [10] - https://pypi.org/project/jenkins-job-builder/
> > [11] - https://github.com/mshonichev/tiden_examples
> >
> > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >> Hello,
> >>
> >> Branch with duck tape created -
> https://github.com/apache/ignite/tree/ignite-ducktape
> >>
> >> Any who are willing to contribute to PoC are welcome.
> >>
> >>
> >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <ni...@gmail.com>
> написал(а):
> >>>
> >>> Hello, Denis.
> >>>
> >>> There is no rush with these improvements.
> >>> We can wait for Maxim proposal and compare two solutions :)
> >>>
> >>>> 21 мая 2020 г., в 22:24, Denis Magda <dm...@apache.org> написал(а):
> >>>>
> >>>> Hi Nikolay,
> >>>>
> >>>> Thanks for kicking off this conversation and sharing your findings
> with the
> >>>> results. That's the right initiative. I do agree that Ignite needs to
> have
> >>>> an integration testing framework with capabilities listed by you.
> >>>>
> >>>> As we discussed privately, I would only check if instead of
> >>>> Confluent's Ducktape library, we can use an integration testing
> framework
> >>>> developed by GridGain for testing of Ignite/GridGain clusters. That
> >>>> framework has been battle-tested and might be more convenient for
> >>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> >>>> <ms...@gridgain.com> who promised to join this thread once he
> finishes
> >>>> preparing the usage examples of the framework. To my knowledge, Max
> has
> >>>> already been working on that for several days.
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <nizhikov@apache.org
> >
> >>>> wrote:
> >>>>
> >>>>> Hello, Igniters.
> >>>>>
> >>>>> I created a PoC [1] for the integration tests of Ignite.
> >>>>>
> >>>>> Let me briefly explain the gap I want to cover:
> >>>>>
> >>>>> 1. For now, we don’t have a solution for automated testing of Ignite
> on
> >>>>> «real cluster».
> >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>       * client and server nodes deployed on different hosts.
> >>>>>       * thin clients perform queries from some other hosts
> >>>>>       * etc.
> >>>>>
> >>>>> 2. We don’t have a solution for automated benchmarks of some internal
> >>>>> Ignite process
> >>>>>       * PME
> >>>>>       * rebalance.
> >>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0
> faster
> >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>
> >>>>> 3. We don’t have a solution for automated testing of Ignite
> integration in
> >>>>> a real-world environment:
> >>>>> Ignite-Spark integration can be taken as an example.
> >>>>> I think some ML solutions also should be tested in real-world
> deployments.
> >>>>>
> >>>>> Solution:
> >>>>>
> >>>>> I propose to use duck tape library from confluent (apache 2.0
> license)
> >>>>> I tested it both on the real cluster(Yandex Cloud) and on the local
> >>>>> environment(docker) and it works just fine.
> >>>>>
> >>>>> PoC contains following services:
> >>>>>
> >>>>>       * Simple rebalance test:
> >>>>>               Start 2 server nodes,
> >>>>>               Create some data with Ignite client,
> >>>>>               Start one more server node,
> >>>>>               Wait for rebalance finish
> >>>>>       * Simple Ignite-Spark integration test:
> >>>>>               Start 1 Spark master, start 1 Spark worker,
> >>>>>               Start 1 Ignite server node
> >>>>>               Create some data with Ignite client,
> >>>>>               Check data in application that queries it from Spark.
> >>>>>
> >>>>> All tests are fully automated.
> >>>>> Logs collection works just fine.
> >>>>> You can see an example of the tests report - [4].
> >>>>>
> >>>>> Pros:
> >>>>>
> >>>>> * Ability to test local changes(no need to public changes to some
> remote
> >>>>> repository or similar).
> >>>>> * Ability to parametrize test environment(run the same tests on
> different
> >>>>> JDK, JVM params, config, etc.)
> >>>>> * Isolation by default so system tests are as reliable as possible.
> >>>>> * Utilities for pulling up and tearing down services easily in
> clusters in
> >>>>> different environments (e.g. local, custom cluster, Vagrant, K8s,
> Mesos,
> >>>>> Docker, cloud providers, etc.)
> >>>>> * Easy to write unit tests for distributed systems
> >>>>> * Adopted and successfully used by other distributed open source
> project -
> >>>>> Apache Kafka.
> >>>>> * Collect results (e.g. logs, console output)
> >>>>> * Report results (e.g. expected conditions met, performance results,
> etc.)
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Maxim.

Thank you for so detailed explanation.

Can we put the content of this discussion somewhere on the wiki?
So It doesn’t get lost.

I divide the answer in several parts. From the requirements to the implementation.
So, if we agreed on the requirements we can proceed with the discussion of the implementation.

1. Requirements:

The main goal I want to achieve is *reproducibility* of the tests.
I’m sick and tired with the zillions of flaky, rarely failed, and almost never failed tests in Ignite codebase.
We should start with the simplest scenarios that will be as reliable as steel :)

I want to know for sure:
  - Is this PR makes rebalance quicker or not?
  - Is this PR makes PME quicker or not?

So, your description of the complex test scenario looks as a next step to me.

Anyway, It’s cool we already have one.

The second goal is to have a strict test lifecycle as we have in JUnit and similar frameworks. 

> It covers production-like deployment and running a scenarios over a single database instance.

Do you mean «single cluster» or «single host»?

2. Existing tests:

> A Combinator suite allows to run set of operations concurrently over given database instance.
> A Consumption suite allows to run a set production-like actions over given set of Ignite/GridGain versions and compare test metrics across versions
> A Yardstick suite
> A Stress suite that simulates hardware environment degradation
> An Ultimate, DR and Compatibility suites that performs functional regression testing
> Regression

Great news that we already have so many choices for testing!
Mature test base is a big +1 for Tiden.

3. Comparison:

> Criteria: Test configuration
> Ducktape: single JSON string for all tests
> Tiden: any number of YaML config files, command line option for fine-grained test configuration, ability to select/modify tests behavior based on Ignite version.

1. Many YAML files can be hard to maintain.
2. In ducktape, you can set parameters via «—parameters» option. Please, take a look at the doc [1]

> Criteria: Cluster control
> Tiden: additionally can address cluster as a whole and execute remote commands in parallel.

It seems we implement this ability in the PoC, already.

> Criteria: Test assertions
> Tiden: simple asserts, also few customized assertion helpers.
> Ducktape: simple asserts.

Can you, please, be more specific.
What helpers do you have in mind?
Ducktape has an asserts that waits for logfile messages or some process finish.

> Criteria: Test reporting
> Ducktape: limited to its own text/HTML format

Ducktape have
1. Text reporter
2. Customizable HTML reporter
3. JSON reporter.

We can show JSON with the any template or tool.

> Criteria: Provisioning and deployment
> Ducktape: can provision subset of hosts from cluster for test needs. However, that means, that test can’t be scaled without test code changes. Does not do any deploy, relies on external means, e.g. pre-packaged in docker image, as in PoC. 

This is not true.

1. We can set explicit test parameters(node number) via parameters.
We can increase client count of cluster size without test code changes.

2. We have many choices for the test environment. These choices are tested and used in other projects:
	* docker
	* vagrant
	* private cloud(ssh access)
	* ec2
Please, take a look at Kafka documentation [2]

> I can continue more on this, but it should be enough for now:

We need to go deeper! :)

[1]  https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
[2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart

> 9 июня 2020 г., в 17:25, Max A. Shonichev <ms...@yandex.ru> написал(а):
> 
> Greetings, Nikolay,
> 
> First of all, thank you for you great effort preparing PoC of integration testing to Ignite community.
> 
> It’s a shame Ignite did not have at least some such tests yet, however, GridGain, as a major contributor to Apache Ignite had a profound collection of in-house tools to perform integration and performance testing for years already and while we slowly consider sharing our expertise with the community, your initiative makes us drive that process a bit faster, thanks a lot!
> 
> I reviewed your PoC and want to share a little about what we do on our part, why and how, hope it would help community take proper course.
> 
> First I’ll do a brief overview of what decisions we made and what we do have in our private code base, next I’ll describe what we have already donated to the public and what we plan public next, then I’ll compare both approaches highlighting deficiencies in order to spur public discussion on the matter.
> 
> It might seem strange to use Python to run Bash to run Java applications because that introduces IT industry best of breed’ – the Python dependency hell – to the Java application code base. The only strangest decision one can made is to use Maven to run Docker to run Bash to run Python to run Bash to run Java, but desperate times call for desperate measures I guess.
> 
> There are Java-based solutions for integration testing exists, e.g. Testcontainers [1], Arquillian [2], etc, and they might go well for Ignite community CI pipelines by them selves. But we also wanted to run performance tests and benchmarks, like the dreaded PME benchmark, and this is solved by totally different set of tools in Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> 
> Speaking specifically about benchmarking, Apache Ignite community already has Yardstick [6], and there’s nothing wrong with writing PME benchmark using Yardstick, but we also wanted to be able to run scenarios like this:
> - put an X load to a Ignite database;
> - perform an Y set of operations to check how Ignite copes with operations under load.
> 
> And yes, we also wanted applications under test be deployed ‘like in a production’, e.g. distributed over a set of hosts. This arises questions about provisioning and nodes affinity which I’ll cover in detail later.
> 
> So we decided to put a little effort to build a simple tool to cover different integration and performance scenarios, and our QA lab first attempt was PoC-Tester [7], currently open source for all but for reporting web UI. It’s a quite simple to use 95% Java-based tool targeted to be run on a pre-release QA stage.
> 
> It covers production-like deployment and running a scenarios over a single database instance. PoC-Tester scenarios consists of a sequence of tasks running sequentially or in parallel. After all tasks complete, or at any time during test, user can run logs collection task, logs are checked against exceptions and a summary of found issues and task ops/latency statistics is generated at the end of scenario. One of the main PoC-Tester features is its fire-and-forget approach to task managing. That is, you can deploy a grid and left it running for weeks, periodically firing some tasks onto it.
> 
> During earliest stages of PoC-Tester development it becomes quite clear that Java application development is a tedious process and architecture decisions you take during development are slow and hard to change.
> For example, scenarios like this
> - deploy two instances of GridGain with master-slave data replication configured;
> - put a load on master;
> - perform checks on slave,
> or like this:
> - preload a 1Tb of data by using your favorite tool of choice to an Apache Ignite of version X;
> - run a set of functional tests running Apache Ignite version Y over preloaded data,
> do not fit well in the PoC-Tester workflow.
> 
> So, this is why we decided to use Python as a generic scripting language of choice.
> 
> Pros:
> - quicker prototyping and development cycles
> - easier to find DevOps/QA engineer with Python skills than one with Java skills
> - used extensively all over the world for DevOps/CI pipelines and thus has rich set of libraries for all possible integration uses cases.
> 
> Cons:
> - Nightmare with dependencies. Better stick to specific language/libraries version.
> 
> Comparing alternatives for Python-based testing framework we have considered following requirements, somewhat similar to what you’ve mentioned for Confluent [8] previously:
> - should be able run locally or distributed (bare metal or in the cloud)
> - should have built-in deployment facilities for applications under test
> - should separate test configuration and test code
> -- be able to easily reconfigure tests by simple configuration changes
> -- be able to easily scale test environment by simple configuration changes
> -- be able to perform regression testing by simple switching artifacts under test via configuration
> -- be able to run tests with different JDK version by simple configuration changes
> - should have human readable reports and/or reporting tools integration
> - should allow simple test progress monitoring, one does not want to run 6-hours test to find out that application actually crashed during first hour.
> - should allow parallel execution of test actions
> - should have clean API for test writers
> -- clean API for distributed remote commands execution
> -- clean API for deployed applications start / stop and other operations
> -- clean API for performing check on results
> - should be open source or at least source code should allow ease change or extension
> 
> Back at that time we found no better alternative than to write our own framework, and here goes Tiden [9] as GridGain framework of choice for functional integration and performance testing.
> 
> Pros:
> - solves all the requirements above
> Cons (for Ignite):
> - (currently) closed GridGain source
> 
> On top of Tiden we’ve built a set of test suites, some of which you might have heard already.
> 
> A Combinator suite allows to run set of operations concurrently over given database instance. Proven to find at least 30+ race conditions and NPE issues.
> 
> A Consumption suite allows to run a set production-like actions over given set of Ignite/GridGain versions and compare test metrics across versions, like heap/disk/CPU consumption, time to perform actions, like client PME, server PME, rebalancing time, data replication time, etc.
> 
> A Yardstick suite is a thin layer of Python glue code to run Apache Ignite pre-release benchmarks set. Yardstick itself has a mediocre deployment capabilities, Tiden solves this easily.
> 
> A Stress suite that simulates hardware environment degradation during testing.
> 
> An Ultimate, DR and Compatibility suites that performs functional regression testing of GridGain Ultimate Edition features like snapshots, security, data replication, rolling upgrades, etc.
> 
> A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc, etc, etc.
> 
> Most of the suites above use another in-house developed Java tool – PiClient – to perform actual loading and miscellaneous operations with Ignite under test. We use py4j Python-Java gateway library to control PiClient instances from the tests.
> 
> When we considered CI, we put TeamCity out of scope, because distributed integration and performance tests tend to run for hours and TeamCity agents are scarce and costly resource. So, bundled with Tiden there is jenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting. Also, rich web UI tool Ward aggregates test run reports across versions and has built in visualization support for Combinator suite.
> 
> All of the above is currently closed source, but we plan to make it public for community, and publishing Tiden core [9] is the first step on that way. You can review some examples of using Tiden for tests at my repository [11], for start.
> 
> Now, let’s compare Ducktape PoC and Tiden.
> 
> Criteria: Language
> Tiden: Python, 3.7
> Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible, but actually can’t work with Python 3.7 due to broken Zmq dependency.
> Comment: Python 3.7 has a much better support for async-style code which might be crucial for distributed application testing.
> Score: Tiden: 1, Ducktape: 0
> 
> Criteria: Test writers API
> Supported integration test framework concepts are basically the same:
> - a test controller (test runner)
> - a cluster
> - a node
> - an application (a service in Ducktape terms)
> - a test
> Score: Tiden: 5, Ducktape: 5
> 
> Criteria: Tests selection and run
> Ducktape: suite-package-class-method level selection, internal scheduler allows to run tests in suite in parallel.
> Tiden: also suite-package-class-method level selection, additionally allows selecting subset of tests by attribute, parallel runs not built in, but allows merging test reports after different runs.
> Score: Tiden: 2, Ducktape: 2
> 
> Criteria: Test configuration
> Ducktape: single JSON string for all tests
> Tiden: any number of YaML config files, command line option for fine-grained test configuration, ability to select/modify tests behavior based on Ignite version.
> Score: Tiden: 3, Ducktape: 1
> 
> Criteria: Cluster control
> Ducktape: allow execute remote commands by node granularity
> Tiden: additionally can address cluster as a whole and execute remote commands in parallel.
> Score: Tiden: 2, Ducktape: 1
> 
> Criteria: Logs control
> Both frameworks have similar builtin support for remote logs collection and grepping. Tiden has built-in plugin that can zip, collect arbitrary log files from arbitrary locations at test/module/suite granularity and unzip if needed, also application API to search / wait for messages in logs. Ducktape allows each service declare its log files location (seemingly does not support logs rollback), and a single entrypoint to collect service logs.
> Score: Tiden: 1, Ducktape: 1
> 
> Criteria: Test assertions
> Tiden: simple asserts, also few customized assertion helpers.
> Ducktape: simple asserts.
> Score: Tiden: 2, Ducktape: 1
> 
> Criteria: Test reporting
> Ducktape: limited to its own text/html format
> Tiden: provides text report, yaml report for reporting tools integration, XML xUnit report for integration with Jenkins/TeamCity.
> Score: Tiden: 3, Ducktape: 1
> 
> Criteria: Provisioning and deployment
> Ducktape: can provision subset of hosts from cluster for test needs. However, that means, that test can’t be scaled without test code changes. Does not do any deploy, relies on external means, e.g. pre-packaged in docker image, as in PoC.
> Tiden: Given a set of hosts, Tiden uses all of them for the test. Provisioning should be done by external means. However, provides a conventional automated deployment routines.
> Score: Tiden: 1, Ducktape: 1
> 
> Criteria: Documentation and Extensibility
> Tiden: current API documentation is limited, should change as we go open source. Tiden is easily extensible via hooks and plugins, see example Maven plugin and Gatling application at [11].
> Ducktape: basic documentation at readthedocs.io. Codebase is rigid, framework core is tightly coupled and hard to change. The only possible extension mechanism is fork-and-rewrite.
> Score: Tiden: 2, Ducktape: 1
> 
> I can continue more on this, but it should be enough for now:
> Overall score: Tiden: 22, Ducktape: 14.
> 
> Time for discussion!
> 
> ---
> [1] - https://www.testcontainers.org/
> [2] - http://arquillian.org/guides/getting_started/
> [3] - https://jmeter.apache.org/index.html
> [4] - https://openjdk.java.net/projects/code-tools/jmh/
> [5] - https://gatling.io/docs/current/
> [6] - https://github.com/gridgain/yardstick
> [7] - https://github.com/gridgain/poc-tester
> [8] - https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> [9] - https://github.com/gridgain/tiden
> [10] - https://pypi.org/project/jenkins-job-builder/
> [11] - https://github.com/mshonichev/tiden_examples
> 
> On 25.05.2020 11:09, Nikolay Izhikov wrote:
>> Hello,
>> 
>> Branch with duck tape created - https://github.com/apache/ignite/tree/ignite-ducktape
>> 
>> Any who are willing to contribute to PoC are welcome.
>> 
>> 
>>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <ni...@gmail.com> написал(а):
>>> 
>>> Hello, Denis.
>>> 
>>> There is no rush with these improvements.
>>> We can wait for Maxim proposal and compare two solutions :)
>>> 
>>>> 21 мая 2020 г., в 22:24, Denis Magda <dm...@apache.org> написал(а):
>>>> 
>>>> Hi Nikolay,
>>>> 
>>>> Thanks for kicking off this conversation and sharing your findings with the
>>>> results. That's the right initiative. I do agree that Ignite needs to have
>>>> an integration testing framework with capabilities listed by you.
>>>> 
>>>> As we discussed privately, I would only check if instead of
>>>> Confluent's Ducktape library, we can use an integration testing framework
>>>> developed by GridGain for testing of Ignite/GridGain clusters. That
>>>> framework has been battle-tested and might be more convenient for
>>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>>> <ms...@gridgain.com> who promised to join this thread once he finishes
>>>> preparing the usage examples of the framework. To my knowledge, Max has
>>>> already been working on that for several days.
>>>> 
>>>> -
>>>> Denis
>>>> 
>>>> 
>>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <ni...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hello, Igniters.
>>>>> 
>>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>> 
>>>>> Let me briefly explain the gap I want to cover:
>>>>> 
>>>>> 1. For now, we don’t have a solution for automated testing of Ignite on
>>>>> «real cluster».
>>>>> By «real cluster» I mean cluster «like a production»:
>>>>>       * client and server nodes deployed on different hosts.
>>>>>       * thin clients perform queries from some other hosts
>>>>>       * etc.
>>>>> 
>>>>> 2. We don’t have a solution for automated benchmarks of some internal
>>>>> Ignite process
>>>>>       * PME
>>>>>       * rebalance.
>>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
>>>>> or slower than in 2.8.0 for the same cluster?
>>>>> 
>>>>> 3. We don’t have a solution for automated testing of Ignite integration in
>>>>> a real-world environment:
>>>>> Ignite-Spark integration can be taken as an example.
>>>>> I think some ML solutions also should be tested in real-world deployments.
>>>>> 
>>>>> Solution:
>>>>> 
>>>>> I propose to use duck tape library from confluent (apache 2.0 license)
>>>>> I tested it both on the real cluster(Yandex Cloud) and on the local
>>>>> environment(docker) and it works just fine.
>>>>> 
>>>>> PoC contains following services:
>>>>> 
>>>>>       * Simple rebalance test:
>>>>>               Start 2 server nodes,
>>>>>               Create some data with Ignite client,
>>>>>               Start one more server node,
>>>>>               Wait for rebalance finish
>>>>>       * Simple Ignite-Spark integration test:
>>>>>               Start 1 Spark master, start 1 Spark worker,
>>>>>               Start 1 Ignite server node
>>>>>               Create some data with Ignite client,
>>>>>               Check data in application that queries it from Spark.
>>>>> 
>>>>> All tests are fully automated.
>>>>> Logs collection works just fine.
>>>>> You can see an example of the tests report - [4].
>>>>> 
>>>>> Pros:
>>>>> 
>>>>> * Ability to test local changes(no need to public changes to some remote
>>>>> repository or similar).
>>>>> * Ability to parametrize test environment(run the same tests on different
>>>>> JDK, JVM params, config, etc.)
>>>>> * Isolation by default so system tests are as reliable as possible.
>>>>> * Utilities for pulling up and tearing down services easily in clusters in
>>>>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>>>>> Docker, cloud providers, etc.)
>>>>> * Easy to write unit tests for distributed systems
>>>>> * Adopted and successfully used by other distributed open source project -
>>>>> Apache Kafka.
>>>>> * Collect results (e.g. logs, console output)
>>>>> * Report results (e.g. expected conditions met, performance results, etc.)
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>> [2] https://github.com/confluentinc/ducktape
>>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by "Max A. Shonichev" <ms...@yandex.ru>.

Greetings, Nikolay,

First of all, thank you for you great effort preparing PoC of 
integration testing to Ignite community.

It’s a shame Ignite did not have at least some such tests yet, however, 
GridGain, as a major contributor to Apache Ignite had a profound 
collection of in-house tools to perform integration and performance 
testing for years already and while we slowly consider sharing our 
expertise with the community, your initiative makes us drive that 
process a bit faster, thanks a lot!

I reviewed your PoC and want to share a little about what we do on our 
part, why and how, hope it would help community take proper course.

First I’ll do a brief overview of what decisions we made and what we do 
have in our private code base, next I’ll describe what we have already 
donated to the public and what we plan public next, then I’ll compare 
both approaches highlighting deficiencies in order to spur public 
discussion on the matter.

It might seem strange to use Python to run Bash to run Java applications 
because that introduces IT industry best of breed’ – the Python 
dependency hell – to the Java application code base. The only strangest 
decision one can made is to use Maven to run Docker to run Bash to run 
Python to run Bash to run Java, but desperate times call for desperate 
measures I guess.

There are Java-based solutions for integration testing exists, e.g. 
Testcontainers [1], Arquillian [2], etc, and they might go well for 
Ignite community CI pipelines by them selves. But we also wanted to run 
performance tests and benchmarks, like the dreaded PME benchmark, and 
this is solved by totally different set of tools in Java world, e.g. 
Jmeter [3], OpenJMH [4], Gatling [5], etc.

Speaking specifically about benchmarking, Apache Ignite community 
already has Yardstick [6], and there’s nothing wrong with writing PME 
benchmark using Yardstick, but we also wanted to be able to run 
scenarios like this:
- put an X load to a Ignite database;
- perform an Y set of operations to check how Ignite copes with 
operations under load.

And yes, we also wanted applications under test be deployed ‘like in a 
production’, e.g. distributed over a set of hosts. This arises questions 
about provisioning and nodes affinity which I’ll cover in detail later.

So we decided to put a little effort to build a simple tool to cover 
different integration and performance scenarios, and our QA lab first 
attempt was PoC-Tester [7], currently open source for all but for 
reporting web UI. It’s a quite simple to use 95% Java-based tool 
targeted to be run on a pre-release QA stage.

It covers production-like deployment and running a scenarios over a 
single database instance. PoC-Tester scenarios consists of a sequence of 
tasks running sequentially or in parallel. After all tasks complete, or 
at any time during test, user can run logs collection task, logs are 
checked against exceptions and a summary of found issues and task 
ops/latency statistics is generated at the end of scenario. One of the 
main PoC-Tester features is its fire-and-forget approach to task 
managing. That is, you can deploy a grid and left it running for weeks, 
periodically firing some tasks onto it.

During earliest stages of PoC-Tester development it becomes quite clear 
that Java application development is a tedious process and architecture 
decisions you take during development are slow and hard to change.
For example, scenarios like this
- deploy two instances of GridGain with master-slave data replication 
configured;
- put a load on master;
- perform checks on slave,
or like this:
- preload a 1Tb of data by using your favorite tool of choice to an 
Apache Ignite of version X;
- run a set of functional tests running Apache Ignite version Y over 
preloaded data,
do not fit well in the PoC-Tester workflow.

So, this is why we decided to use Python as a generic scripting language 
of choice.

Pros:
- quicker prototyping and development cycles
- easier to find DevOps/QA engineer with Python skills than one with 
Java skills
- used extensively all over the world for DevOps/CI pipelines and thus 
has rich set of libraries for all possible integration uses cases.

Cons:
- Nightmare with dependencies. Better stick to specific 
language/libraries version.

Comparing alternatives for Python-based testing framework we have 
considered following requirements, somewhat similar to what you’ve 
mentioned for Confluent [8] previously:
- should be able run locally or distributed (bare metal or in the cloud)
- should have built-in deployment facilities for applications under test
- should separate test configuration and test code
-- be able to easily reconfigure tests by simple configuration changes
-- be able to easily scale test environment by simple configuration changes
-- be able to perform regression testing by simple switching artifacts 
under test via configuration
-- be able to run tests with different JDK version by simple 
configuration changes
- should have human readable reports and/or reporting tools integration
- should allow simple test progress monitoring, one does not want to run 
6-hours test to find out that application actually crashed during first 
hour.
- should allow parallel execution of test actions
- should have clean API for test writers
-- clean API for distributed remote commands execution
-- clean API for deployed applications start / stop and other operations
-- clean API for performing check on results
- should be open source or at least source code should allow ease change 
or extension

Back at that time we found no better alternative than to write our own 
framework, and here goes Tiden [9] as GridGain framework of choice for 
functional integration and performance testing.

Pros:
- solves all the requirements above
Cons (for Ignite):
- (currently) closed GridGain source

On top of Tiden we’ve built a set of test suites, some of which you 
might have heard already.

A Combinator suite allows to run set of operations concurrently over 
given database instance. Proven to find at least 30+ race conditions and 
NPE issues.

A Consumption suite allows to run a set production-like actions over 
given set of Ignite/GridGain versions and compare test metrics across 
versions, like heap/disk/CPU consumption, time to perform actions, like 
client PME, server PME, rebalancing time, data replication time, etc.

A Yardstick suite is a thin layer of Python glue code to run Apache 
Ignite pre-release benchmarks set. Yardstick itself has a mediocre 
deployment capabilities, Tiden solves this easily.

A Stress suite that simulates hardware environment degradation during 
testing.

An Ultimate, DR and Compatibility suites that performs functional 
regression testing of GridGain Ultimate Edition features like snapshots, 
security, data replication, rolling upgrades, etc.

A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc, 
etc, etc.

Most of the suites above use another in-house developed Java tool – 
PiClient – to perform actual loading and miscellaneous operations with 
Ignite under test. We use py4j Python-Java gateway library to control 
PiClient instances from the tests.

When we considered CI, we put TeamCity out of scope, because distributed 
integration and performance tests tend to run for hours and TeamCity 
agents are scarce and costly resource. So, bundled with Tiden there is 
jenkins-job-builder [10] based CI pipelines and Jenkins xUnit reporting. 
Also, rich web UI tool Ward aggregates test run reports across versions 
and has built in visualization support for Combinator suite.

All of the above is currently closed source, but we plan to make it 
public for community, and publishing Tiden core [9] is the first step on 
that way. You can review some examples of using Tiden for tests at my 
repository [11], for start.

Now, let’s compare Ducktape PoC and Tiden.

Criteria: Language
Tiden: Python, 3.7
Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible, 
but actually can’t work with Python 3.7 due to broken Zmq dependency.
Comment: Python 3.7 has a much better support for async-style code which 
might be crucial for distributed application testing.
Score: Tiden: 1, Ducktape: 0

Criteria: Test writers API
Supported integration test framework concepts are basically the same:
- a test controller (test runner)
- a cluster
- a node
- an application (a service in Ducktape terms)
- a test
Score: Tiden: 5, Ducktape: 5

Criteria: Tests selection and run
Ducktape: suite-package-class-method level selection, internal scheduler 
allows to run tests in suite in parallel.
Tiden: also suite-package-class-method level selection, additionally 
allows selecting subset of tests by attribute, parallel runs not built 
in, but allows merging test reports after different runs.
Score: Tiden: 2, Ducktape: 2

Criteria: Test configuration
Ducktape: single JSON string for all tests
Tiden: any number of YaML config files, command line option for 
fine-grained test configuration, ability to select/modify tests behavior 
based on Ignite version.
Score: Tiden: 3, Ducktape: 1

Criteria: Cluster control
Ducktape: allow execute remote commands by node granularity
Tiden: additionally can address cluster as a whole and execute remote 
commands in parallel.
Score: Tiden: 2, Ducktape: 1

Criteria: Logs control
Both frameworks have similar builtin support for remote logs collection 
and grepping. Tiden has built-in plugin that can zip, collect arbitrary 
log files from arbitrary locations at test/module/suite granularity and 
unzip if needed, also application API to search / wait for messages in 
logs. Ducktape allows each service declare its log files location 
(seemingly does not support logs rollback), and a single entrypoint to 
collect service logs.
Score: Tiden: 1, Ducktape: 1

Criteria: Test assertions
Tiden: simple asserts, also few customized assertion helpers.
Ducktape: simple asserts.
Score: Tiden: 2, Ducktape: 1

Criteria: Test reporting
Ducktape: limited to its own text/html format
Tiden: provides text report, yaml report for reporting tools 
integration, XML xUnit report for integration with Jenkins/TeamCity.
Score: Tiden: 3, Ducktape: 1

Criteria: Provisioning and deployment
Ducktape: can provision subset of hosts from cluster for test needs. 
However, that means, that test can’t be scaled without test code 
changes. Does not do any deploy, relies on external means, e.g. 
pre-packaged in docker image, as in PoC.
Tiden: Given a set of hosts, Tiden uses all of them for the test. 
Provisioning should be done by external means. However, provides a 
conventional automated deployment routines.
Score: Tiden: 1, Ducktape: 1

Criteria: Documentation and Extensibility
Tiden: current API documentation is limited, should change as we go open 
source. Tiden is easily extensible via hooks and plugins, see example 
Maven plugin and Gatling application at [11].
Ducktape: basic documentation at readthedocs.io. Codebase is rigid, 
framework core is tightly coupled and hard to change. The only possible 
extension mechanism is fork-and-rewrite.
Score: Tiden: 2, Ducktape: 1

I can continue more on this, but it should be enough for now:
Overall score: Tiden: 22, Ducktape: 14.

Time for discussion!

---
[1] - https://www.testcontainers.org/
[2] - http://arquillian.org/guides/getting_started/
[3] - https://jmeter.apache.org/index.html
[4] - https://openjdk.java.net/projects/code-tools/jmh/
[5] - https://gatling.io/docs/current/
[6] - https://github.com/gridgain/yardstick
[7] - https://github.com/gridgain/poc-tester
[8] - 
https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
[9] - https://github.com/gridgain/tiden
[10] - https://pypi.org/project/jenkins-job-builder/
[11] - https://github.com/mshonichev/tiden_examples

On 25.05.2020 11:09, Nikolay Izhikov wrote:
> Hello,
>
> Branch with duck tape created - https://github.com/apache/ignite/tree/ignite-ducktape
>
> Any who are willing to contribute to PoC are welcome.
>
>
>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <ni...@gmail.com> написал(а):
>>
>> Hello, Denis.
>>
>> There is no rush with these improvements.
>> We can wait for Maxim proposal and compare two solutions :)
>>
>>> 21 мая 2020 г., в 22:24, Denis Magda <dm...@apache.org> написал(а):
>>>
>>> Hi Nikolay,
>>>
>>> Thanks for kicking off this conversation and sharing your findings with the
>>> results. That's the right initiative. I do agree that Ignite needs to have
>>> an integration testing framework with capabilities listed by you.
>>>
>>> As we discussed privately, I would only check if instead of
>>> Confluent's Ducktape library, we can use an integration testing framework
>>> developed by GridGain for testing of Ignite/GridGain clusters. That
>>> framework has been battle-tested and might be more convenient for
>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>> <ms...@gridgain.com> who promised to join this thread once he finishes
>>> preparing the usage examples of the framework. To my knowledge, Max has
>>> already been working on that for several days.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <ni...@apache.org>
>>> wrote:
>>>
>>>> Hello, Igniters.
>>>>
>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>
>>>> Let me briefly explain the gap I want to cover:
>>>>
>>>> 1. For now, we don’t have a solution for automated testing of Ignite on
>>>> «real cluster».
>>>> By «real cluster» I mean cluster «like a production»:
>>>>        * client and server nodes deployed on different hosts.
>>>>        * thin clients perform queries from some other hosts
>>>>        * etc.
>>>>
>>>> 2. We don’t have a solution for automated benchmarks of some internal
>>>> Ignite process
>>>>        * PME
>>>>        * rebalance.
>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
>>>> or slower than in 2.8.0 for the same cluster?
>>>>
>>>> 3. We don’t have a solution for automated testing of Ignite integration in
>>>> a real-world environment:
>>>> Ignite-Spark integration can be taken as an example.
>>>> I think some ML solutions also should be tested in real-world deployments.
>>>>
>>>> Solution:
>>>>
>>>> I propose to use duck tape library from confluent (apache 2.0 license)
>>>> I tested it both on the real cluster(Yandex Cloud) and on the local
>>>> environment(docker) and it works just fine.
>>>>
>>>> PoC contains following services:
>>>>
>>>>        * Simple rebalance test:
>>>>                Start 2 server nodes,
>>>>                Create some data with Ignite client,
>>>>                Start one more server node,
>>>>                Wait for rebalance finish
>>>>        * Simple Ignite-Spark integration test:
>>>>                Start 1 Spark master, start 1 Spark worker,
>>>>                Start 1 Ignite server node
>>>>                Create some data with Ignite client,
>>>>                Check data in application that queries it from Spark.
>>>>
>>>> All tests are fully automated.
>>>> Logs collection works just fine.
>>>> You can see an example of the tests report - [4].
>>>>
>>>> Pros:
>>>>
>>>> * Ability to test local changes(no need to public changes to some remote
>>>> repository or similar).
>>>> * Ability to parametrize test environment(run the same tests on different
>>>> JDK, JVM params, config, etc.)
>>>> * Isolation by default so system tests are as reliable as possible.
>>>> * Utilities for pulling up and tearing down services easily in clusters in
>>>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>>>> Docker, cloud providers, etc.)
>>>> * Easy to write unit tests for distributed systems
>>>> * Adopted and successfully used by other distributed open source project -
>>>> Apache Kafka.
>>>> * Collect results (e.g. logs, console output)
>>>> * Report results (e.g. expected conditions met, performance results, etc.)
>>>>
>>>> WDYT?
>>>>
>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>> [2] https://github.com/confluentinc/ducktape
>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, 

Branch with duck tape created - https://github.com/apache/ignite/tree/ignite-ducktape

Any who are willing to contribute to PoC are welcome.


> 21 мая 2020 г., в 22:33, Nikolay Izhikov <ni...@gmail.com> написал(а):
> 
> Hello, Denis.
> 
> There is no rush with these improvements.
> We can wait for Maxim proposal and compare two solutions :)
> 
>> 21 мая 2020 г., в 22:24, Denis Magda <dm...@apache.org> написал(а):
>> 
>> Hi Nikolay,
>> 
>> Thanks for kicking off this conversation and sharing your findings with the
>> results. That's the right initiative. I do agree that Ignite needs to have
>> an integration testing framework with capabilities listed by you.
>> 
>> As we discussed privately, I would only check if instead of
>> Confluent's Ducktape library, we can use an integration testing framework
>> developed by GridGain for testing of Ignite/GridGain clusters. That
>> framework has been battle-tested and might be more convenient for
>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>> <ms...@gridgain.com> who promised to join this thread once he finishes
>> preparing the usage examples of the framework. To my knowledge, Max has
>> already been working on that for several days.
>> 
>> -
>> Denis
>> 
>> 
>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <ni...@apache.org>
>> wrote:
>> 
>>> Hello, Igniters.
>>> 
>>> I created a PoC [1] for the integration tests of Ignite.
>>> 
>>> Let me briefly explain the gap I want to cover:
>>> 
>>> 1. For now, we don’t have a solution for automated testing of Ignite on
>>> «real cluster».
>>> By «real cluster» I mean cluster «like a production»:
>>>       * client and server nodes deployed on different hosts.
>>>       * thin clients perform queries from some other hosts
>>>       * etc.
>>> 
>>> 2. We don’t have a solution for automated benchmarks of some internal
>>> Ignite process
>>>       * PME
>>>       * rebalance.
>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
>>> or slower than in 2.8.0 for the same cluster?
>>> 
>>> 3. We don’t have a solution for automated testing of Ignite integration in
>>> a real-world environment:
>>> Ignite-Spark integration can be taken as an example.
>>> I think some ML solutions also should be tested in real-world deployments.
>>> 
>>> Solution:
>>> 
>>> I propose to use duck tape library from confluent (apache 2.0 license)
>>> I tested it both on the real cluster(Yandex Cloud) and on the local
>>> environment(docker) and it works just fine.
>>> 
>>> PoC contains following services:
>>> 
>>>       * Simple rebalance test:
>>>               Start 2 server nodes,
>>>               Create some data with Ignite client,
>>>               Start one more server node,
>>>               Wait for rebalance finish
>>>       * Simple Ignite-Spark integration test:
>>>               Start 1 Spark master, start 1 Spark worker,
>>>               Start 1 Ignite server node
>>>               Create some data with Ignite client,
>>>               Check data in application that queries it from Spark.
>>> 
>>> All tests are fully automated.
>>> Logs collection works just fine.
>>> You can see an example of the tests report - [4].
>>> 
>>> Pros:
>>> 
>>> * Ability to test local changes(no need to public changes to some remote
>>> repository or similar).
>>> * Ability to parametrize test environment(run the same tests on different
>>> JDK, JVM params, config, etc.)
>>> * Isolation by default so system tests are as reliable as possible.
>>> * Utilities for pulling up and tearing down services easily in clusters in
>>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>>> Docker, cloud providers, etc.)
>>> * Easy to write unit tests for distributed systems
>>> * Adopted and successfully used by other distributed open source project -
>>> Apache Kafka.
>>> * Collect results (e.g. logs, console output)
>>> * Report results (e.g. expected conditions met, performance results, etc.)
>>> 
>>> WDYT?
>>> 
>>> [1] https://github.com/nizhikov/ignite/pull/15
>>> [2] https://github.com/confluentinc/ducktape
>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
>

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Denis.

There is no rush with these improvements.
We can wait for Maxim proposal and compare two solutions :)

> 21 мая 2020 г., в 22:24, Denis Magda <dm...@apache.org> написал(а):
> 
> Hi Nikolay,
> 
> Thanks for kicking off this conversation and sharing your findings with the
> results. That's the right initiative. I do agree that Ignite needs to have
> an integration testing framework with capabilities listed by you.
> 
> As we discussed privately, I would only check if instead of
> Confluent's Ducktape library, we can use an integration testing framework
> developed by GridGain for testing of Ignite/GridGain clusters. That
> framework has been battle-tested and might be more convenient for
> Ignite-specific workloads. Let's wait for @Maksim Shonichev
> <ms...@gridgain.com> who promised to join this thread once he finishes
> preparing the usage examples of the framework. To my knowledge, Max has
> already been working on that for several days.
> 
> -
> Denis
> 
> 
> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <ni...@apache.org>
> wrote:
> 
>> Hello, Igniters.
>> 
>> I created a PoC [1] for the integration tests of Ignite.
>> 
>> Let me briefly explain the gap I want to cover:
>> 
>> 1. For now, we don’t have a solution for automated testing of Ignite on
>> «real cluster».
>> By «real cluster» I mean cluster «like a production»:
>>        * client and server nodes deployed on different hosts.
>>        * thin clients perform queries from some other hosts
>>        * etc.
>> 
>> 2. We don’t have a solution for automated benchmarks of some internal
>> Ignite process
>>        * PME
>>        * rebalance.
>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
>> or slower than in 2.8.0 for the same cluster?
>> 
>> 3. We don’t have a solution for automated testing of Ignite integration in
>> a real-world environment:
>> Ignite-Spark integration can be taken as an example.
>> I think some ML solutions also should be tested in real-world deployments.
>> 
>> Solution:
>> 
>> I propose to use duck tape library from confluent (apache 2.0 license)
>> I tested it both on the real cluster(Yandex Cloud) and on the local
>> environment(docker) and it works just fine.
>> 
>> PoC contains following services:
>> 
>>        * Simple rebalance test:
>>                Start 2 server nodes,
>>                Create some data with Ignite client,
>>                Start one more server node,
>>                Wait for rebalance finish
>>        * Simple Ignite-Spark integration test:
>>                Start 1 Spark master, start 1 Spark worker,
>>                Start 1 Ignite server node
>>                Create some data with Ignite client,
>>                Check data in application that queries it from Spark.
>> 
>> All tests are fully automated.
>> Logs collection works just fine.
>> You can see an example of the tests report - [4].
>> 
>> Pros:
>> 
>> * Ability to test local changes(no need to public changes to some remote
>> repository or similar).
>> * Ability to parametrize test environment(run the same tests on different
>> JDK, JVM params, config, etc.)
>> * Isolation by default so system tests are as reliable as possible.
>> * Utilities for pulling up and tearing down services easily in clusters in
>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>> Docker, cloud providers, etc.)
>> * Easy to write unit tests for distributed systems
>> * Adopted and successfully used by other distributed open source project -
>> Apache Kafka.
>> * Collect results (e.g. logs, console output)
>> * Report results (e.g. expected conditions met, performance results, etc.)
>> 
>> WDYT?
>> 
>> [1] https://github.com/nizhikov/ignite/pull/15
>> [2] https://github.com/confluentinc/ducktape
>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Denis Magda <dm...@apache.org>.

Hi Nikolay,

Thanks for kicking off this conversation and sharing your findings with the
results. That's the right initiative. I do agree that Ignite needs to have
an integration testing framework with capabilities listed by you.

As we discussed privately, I would only check if instead of
Confluent's Ducktape library, we can use an integration testing framework
developed by GridGain for testing of Ignite/GridGain clusters. That
framework has been battle-tested and might be more convenient for
Ignite-specific workloads. Let's wait for @Maksim Shonichev
<ms...@gridgain.com> who promised to join this thread once he finishes
preparing the usage examples of the framework. To my knowledge, Max has
already been working on that for several days.

-
Denis


On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Igniters.
>
> I created a PoC [1] for the integration tests of Ignite.
>
> Let me briefly explain the gap I want to cover:
>
> 1. For now, we don’t have a solution for automated testing of Ignite on
> «real cluster».
> By «real cluster» I mean cluster «like a production»:
>         * client and server nodes deployed on different hosts.
>         * thin clients perform queries from some other hosts
>         * etc.
>
> 2. We don’t have a solution for automated benchmarks of some internal
> Ignite process
>         * PME
>         * rebalance.
> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
> or slower than in 2.8.0 for the same cluster?
>
> 3. We don’t have a solution for automated testing of Ignite integration in
> a real-world environment:
> Ignite-Spark integration can be taken as an example.
> I think some ML solutions also should be tested in real-world deployments.
>
> Solution:
>
> I propose to use duck tape library from confluent (apache 2.0 license)
> I tested it both on the real cluster(Yandex Cloud) and on the local
> environment(docker) and it works just fine.
>
> PoC contains following services:
>
>         * Simple rebalance test:
>                 Start 2 server nodes,
>                 Create some data with Ignite client,
>                 Start one more server node,
>                 Wait for rebalance finish
>         * Simple Ignite-Spark integration test:
>                 Start 1 Spark master, start 1 Spark worker,
>                 Start 1 Ignite server node
>                 Create some data with Ignite client,
>                 Check data in application that queries it from Spark.
>
> All tests are fully automated.
> Logs collection works just fine.
> You can see an example of the tests report - [4].
>
> Pros:
>
> * Ability to test local changes(no need to public changes to some remote
> repository or similar).
> * Ability to parametrize test environment(run the same tests on different
> JDK, JVM params, config, etc.)
> * Isolation by default so system tests are as reliable as possible.
> * Utilities for pulling up and tearing down services easily in clusters in
> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
> Docker, cloud providers, etc.)
> * Easy to write unit tests for distributed systems
> * Adopted and successfully used by other distributed open source project -
> Apache Kafka.
> * Collect results (e.g. logs, console output)
> * Report results (e.g. expected conditions met, performance results, etc.)
>
> WDYT?
>
> [1] https://github.com/nizhikov/ignite/pull/15
> [2] https://github.com/confluentinc/ducktape
> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Anton.

> Could you please explain how to perform testing on the different environments?

Different JDK:

1. We should prepare our environment and put the desired JDK in the known locations.  [/opt/jdk1.8, /opt/jdk11], for example.
2. We should parametrize test and just setup java environment variables as usual(see ignite.py [1], for example):
	export JAVA_HOME,
	export PATH

Different JVM parameters - We need to support it into the desired `service` or perform command from the test itself.
A different version of Ignite(or any other `service`) - We can run the same test on the different Ignite versions.
	All we need, just change the root path for the commands

Makes sense?

Is it possible to perform this test with TDE enabled?

Yes. All you need is parametrize Ignite config and enable some features when required.

Is it possible to perform it on custom OS (eg. Windows)?

No, This not supported out of the box.
I guess we can investigate the features of PowerShell to make it possible.

Is it possible to perform in on bare metal?

Yes.
All you need is an ssh connection.
As I said before, you can run the same tests on the dev-environment(local source + docker) with on command and right after development runs it on the bare metal.
The only change you need is one config file (cluster.json to be clear)

[1] https://github.com/nizhikov/ignite/pull/15/files#diff-463ad0ed102bd795ac78a50a868b5425R69

> 21 мая 2020 г., в 10:46, Anton Vinogradov <av...@apache.org> написал(а):
> 
> Nikolay,
> 
> Great proposal!
> 
> Could you please explain how to perform testing on the different
> environments?
> For Example, you provided a rebalance test.
> 
> Is it possible to perform this test with TDE enabled?
> Is it possible to perform it on custom OS (eg. Windows)?
> Is it possible to perform in on bare metal?
> 
> I hope you'll answer yes to all, and the question mostly about details.
> 
> On Thu, May 21, 2020 at 10:27 AM Nikolay Izhikov <ni...@apache.org>
> wrote:
> 
>> Hello, Igniters.
>> 
>> I created a PoC [1] for the integration tests of Ignite.
>> 
>> Let me briefly explain the gap I want to cover:
>> 
>> 1. For now, we don’t have a solution for automated testing of Ignite on
>> «real cluster».
>> By «real cluster» I mean cluster «like a production»:
>>        * client and server nodes deployed on different hosts.
>>        * thin clients perform queries from some other hosts
>>        * etc.
>> 
>> 2. We don’t have a solution for automated benchmarks of some internal
>> Ignite process
>>        * PME
>>        * rebalance.
>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
>> or slower than in 2.8.0 for the same cluster?
>> 
>> 3. We don’t have a solution for automated testing of Ignite integration in
>> a real-world environment:
>> Ignite-Spark integration can be taken as an example.
>> I think some ML solutions also should be tested in real-world deployments.
>> 
>> Solution:
>> 
>> I propose to use duck tape library from confluent (apache 2.0 license)
>> I tested it both on the real cluster(Yandex Cloud) and on the local
>> environment(docker) and it works just fine.
>> 
>> PoC contains following services:
>> 
>>        * Simple rebalance test:
>>                Start 2 server nodes,
>>                Create some data with Ignite client,
>>                Start one more server node,
>>                Wait for rebalance finish
>>        * Simple Ignite-Spark integration test:
>>                Start 1 Spark master, start 1 Spark worker,
>>                Start 1 Ignite server node
>>                Create some data with Ignite client,
>>                Check data in application that queries it from Spark.
>> 
>> All tests are fully automated.
>> Logs collection works just fine.
>> You can see an example of the tests report - [4].
>> 
>> Pros:
>> 
>> * Ability to test local changes(no need to public changes to some remote
>> repository or similar).
>> * Ability to parametrize test environment(run the same tests on different
>> JDK, JVM params, config, etc.)
>> * Isolation by default so system tests are as reliable as possible.
>> * Utilities for pulling up and tearing down services easily in clusters in
>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>> Docker, cloud providers, etc.)
>> * Easy to write unit tests for distributed systems
>> * Adopted and successfully used by other distributed open source project -
>> Apache Kafka.
>> * Collect results (e.g. logs, console output)
>> * Report results (e.g. expected conditions met, performance results, etc.)
>> 
>> WDYT?
>> 
>> [1] https://github.com/nizhikov/ignite/pull/15
>> [2] https://github.com/confluentinc/ducktape
>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>> [4] https://yadi.sk/d/JC8ciJZjrkdndg

Re: [DISCUSSION] Ignite integration testing framework.

Posted by Anton Vinogradov <av...@apache.org>.

Nikolay,

Great proposal!

Could you please explain how to perform testing on the different
environments?
For Example, you provided a rebalance test.

Is it possible to perform this test with TDE enabled?
Is it possible to perform it on custom OS (eg. Windows)?
Is it possible to perform in on bare metal?

I hope you'll answer yes to all, and the question mostly about details.

On Thu, May 21, 2020 at 10:27 AM Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Igniters.
>
> I created a PoC [1] for the integration tests of Ignite.
>
> Let me briefly explain the gap I want to cover:
>
> 1. For now, we don’t have a solution for automated testing of Ignite on
> «real cluster».
> By «real cluster» I mean cluster «like a production»:
>         * client and server nodes deployed on different hosts.
>         * thin clients perform queries from some other hosts
>         * etc.
>
> 2. We don’t have a solution for automated benchmarks of some internal
> Ignite process
>         * PME
>         * rebalance.
> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0 faster
> or slower than in 2.8.0 for the same cluster?
>
> 3. We don’t have a solution for automated testing of Ignite integration in
> a real-world environment:
> Ignite-Spark integration can be taken as an example.
> I think some ML solutions also should be tested in real-world deployments.
>
> Solution:
>
> I propose to use duck tape library from confluent (apache 2.0 license)
> I tested it both on the real cluster(Yandex Cloud) and on the local
> environment(docker) and it works just fine.
>
> PoC contains following services:
>
>         * Simple rebalance test:
>                 Start 2 server nodes,
>                 Create some data with Ignite client,
>                 Start one more server node,
>                 Wait for rebalance finish
>         * Simple Ignite-Spark integration test:
>                 Start 1 Spark master, start 1 Spark worker,
>                 Start 1 Ignite server node
>                 Create some data with Ignite client,
>                 Check data in application that queries it from Spark.
>
> All tests are fully automated.
> Logs collection works just fine.
> You can see an example of the tests report - [4].
>
> Pros:
>
> * Ability to test local changes(no need to public changes to some remote
> repository or similar).
> * Ability to parametrize test environment(run the same tests on different
> JDK, JVM params, config, etc.)
> * Isolation by default so system tests are as reliable as possible.
> * Utilities for pulling up and tearing down services easily in clusters in
> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
> Docker, cloud providers, etc.)
> * Easy to write unit tests for distributed systems
> * Adopted and successfully used by other distributed open source project -
> Apache Kafka.
> * Collect results (e.g. logs, console output)
> * Report results (e.g. expected conditions met, performance results, etc.)
>
> WDYT?
>
> [1] https://github.com/nizhikov/ignite/pull/15
> [2] https://github.com/confluentinc/ducktape
> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> [4] https://yadi.sk/d/JC8ciJZjrkdndg