You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@madlib.apache.org by "Xiaocheng Tang (JIRA)" <ji...@apache.org> on 2016/08/22 18:00:24 UTC

[jira] [Commented] (MADLIB-974) Path - performance testing

    [ https://issues.apache.org/jira/browse/MADLIB-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431304#comment-15431304 ] 

Xiaocheng Tang commented on MADLIB-974:
---------------------------------------

* Testing environment
Pivotal Data Computing Appliance (DCA) half-rack for GPDB 4.2.7.1 and a DCA half-rack for HAWQ 2.0.0 with 8 nodes and 6 segments per node.
* Results ([attached|https://issues.apache.org/jira/secure/attachment/12824900/Benchmarking%20Param%20Design%20Doc%20-%20PATH.pdf])

> Path - performance testing
> --------------------------
>
>                 Key: MADLIB-974
>                 URL: https://issues.apache.org/jira/browse/MADLIB-974
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Xiaocheng Tang
>             Fix For: v1.9.1
>
>         Attachments: Benchmarking Param Design Doc - PATH.pdf, Ecommerce data set for path test 3.csv
>
>
> Story
> As a developer, I want to do performance testing on the Path algorithm so that I can understand and communicate scale effects to users.
> The proposed matrix for the 1st set of tests is:
> 1) overall data size, i.e., number of rows in data sets = 1M, 10M, 100M
> 2) number of partitions = 1k, 10k, 100k
> 3) number of matches per partition = 1k, 10k, 100k
> The proposed matrix for the 2nd set of tests is:
> 4) match "thickness", i.e., number of rows in match = 1, 1k, 10k
> 5) number of symbols =  5, 15, 25
> Acceptance
> 1) Please plot performance curves.  Do not need to run all permutations to keep the size of the test matrix reasonable. 
> E.g., when plotting the effect of number of partitions (#2 above), can fix data size at 10M (say) and number of matches per partition to 1k (say).
> Other
> 1) Can use attached data set as a baseline for duplication/fabrication.
> 2) Another useful data set is at 
> http://csr.lanl.gov/data/auth/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)