You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/16 23:02:00 UTC

[jira] [Commented] (IMPALA-8661) Create randomized tests for stressing the event processor

    [ https://issues.apache.org/jira/browse/IMPALA-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909464#comment-16909464 ] 

ASF subversion and git services commented on IMPALA-8661:
---------------------------------------------------------

Commit 4ee31de3a104b4d64008a087925b4830e95ea826 in impala's branch refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4ee31de ]

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 4 concurrent Hive clients and each
of the client runs 50 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

Notes:
1. Ran the test with defaults. It generates about 500 events
and runs for close to 4.5 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Reviewed-on: http://gerrit.cloudera.org:8080/13932
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>


> Create randomized tests for stressing the event processor
> ---------------------------------------------------------
>
>                 Key: IMPALA-8661
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8661
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>              Labels: catalog-v2
>
> We should create pseudo-randomized batches of events to stress event processor so that we can flush out any bugs. The tests could be a junit test which generates a random sized batch with the supported event types. Once the random batch of events are processed, we should validate if the table matches with what is present in HMS



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org