You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/06/12 03:18:02 UTC

[jira] [Closed] (SPARK-1672) Support separate partitioners (and numbers of partitions) for users and products

     [ https://issues.apache.org/jira/browse/SPARK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng closed SPARK-1672.
--------------------------------

    Resolution: Implemented
      Assignee: Tor Myklebust

PR: https://github.com/apache/spark/pull/1014

> Support separate partitioners (and numbers of partitions) for users and products
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-1672
>                 URL: https://issues.apache.org/jira/browse/SPARK-1672
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Tor Myklebust
>            Assignee: Tor Myklebust
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> The user ought to be able to specify a partitioning of his data if he knows a good one.  It's convenient to have separate partitioners for users and products so that no strange mapping step needs to happen.
> It may also be reasonable to partition the users and products into different numbers of partitions (for instance, to balance memory requirements) if the dataset is tall, thin, and very sparse.



--
This message was sent by Atlassian JIRA
(v6.2#6252)