You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2018/09/14 08:28:00 UTC

[jira] [Created] (CRUNCH-673) Sort fails when using more reducers than records

Gabriel Reid created CRUNCH-673:
-----------------------------------

             Summary: Sort fails when using more reducers than records
                 Key: CRUNCH-673
                 URL: https://issues.apache.org/jira/browse/CRUNCH-673
             Project: Crunch
          Issue Type: Bug
            Reporter: Gabriel Reid


We've run into an issue where running Sort with a number of reducers that is higher than the number of records to be sorted fails.

The way in which this occurs is that a large PCollection is filtered down to almost nothing (say 10 records), and that filtered PCollection is passed in to Sort. Sort configures n reducers for the small PCollection (because it doesn't realize that it has been filtered so aggressively), so then there are for example 20 reducers configured. Reservoir sampling is used to build up the partition definitions for the TotalOrderPartitioner, but because there are only 10 records in the filtered PCollection, only 10 partitions are defined for the TotalOrderPartitioner. This then causes a precondition in TotalOrderPartitioner to fail, because the number of partitions in the partitions file doesn't match up with the number of configured reducers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)