You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jianlong Zhong (JIRA)" <ji...@apache.org> on 2017/09/28 17:49:00 UTC

[jira] [Created] (GIRAPH-1161) implement random sampling for input splits

Jianlong Zhong created GIRAPH-1161:
--------------------------------------

             Summary: implement random sampling for input splits
                 Key: GIRAPH-1161
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1161
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jianlong Zhong
            Priority: Minor


Currently if we are reading vertex/edge data from multiple tables, and we only want to read a fraction of data (with giraph.inputSplitSamplePercent conf option), we'll always get the first inputSplitSamplePercent of the input slits. We should instead use a random sample of input splits so testing on sample of data would look closer to actual full data run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)