You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Adar Dembo (JIRA)" <ji...@apache.org> on 2017/05/06 00:42:04 UTC

[jira] [Resolved] (KUDU-1970) Integration test for data scalability

     [ https://issues.apache.org/jira/browse/KUDU-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adar Dembo resolved KUDU-1970.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

Fixed in commit 04e8ea244d7d8793d876082381b5797625b16b44.

> Integration test for data scalability
> -------------------------------------
>
>                 Key: KUDU-1970
>                 URL: https://issues.apache.org/jira/browse/KUDU-1970
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: master, tserver
>    Affects Versions: 1.4.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>             Fix For: 1.4.0
>
>
> To help test data scalability fixes, we need a way to easily produce an environment that exhibits our current scalability issues. I'm sure one of our long-running workloads would be up to the task, but aside from taking a long time, it'd also fill up the disk, which makes it unusable on most developer machines. Ultimately, data isn't really the root cause of our scalability woes; it's the metadata necessary to maintain the data that hurts us. So an idealized environment would be heavy on the metadata. Here's a not-so-exhaustive list:
> * Many tablets.
> * Many columns per tablet.
> * Many rowsets per tablet.
> * Many data blocks.
> * Many tables (tservers don't care about this, but maybe the master does?)
> Let's write an integration test that swamps the machine with the above. It should be use an external mini cluster to simplify isolating master and tserver performance characteristics, but it needn't have more than one instance of each.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)