You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2015/06/03 02:20:50 UTC

[jira] [Commented] (MAHOUT-1641) Add conversion from a RDD[(String, String)] to a Drm[Int]

    [ https://issues.apache.org/jira/browse/MAHOUT-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570044#comment-14570044 ] 

Pat Ferrel commented on MAHOUT-1641:
------------------------------------

Hmm didn't see this earlier. There is now a secondary "apply" constructor in the companion object for IndexedDatasetSpark that takes an RDD[(String, String)].

See here: https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/sparkbindings/indexeddataset/IndexedDatasetSpark.scala

> Add conversion from a RDD[(String, String)] to a Drm[Int]
> ---------------------------------------------------------
>
>                 Key: MAHOUT-1641
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1641
>             Project: Mahout
>          Issue Type: Question
>          Components: spark
>    Affects Versions: 0.9
>            Reporter: Erlend Hamnaberg
>            Assignee: Dmitriy Lyubimov
>              Labels: DSL, scala, spark
>             Fix For: 0.11.0
>
>
> Hi.
> We are using the coocurrence part of mahout as a library. We get our data from other sources, like for instance Cassandra. We dont want to write that data to disk, and read it back since we already have the data on each slave.
> I have created some conversion functions based on one of the IndexedDatasetSpark readers, cant remember which one at the moment.
> Is there interest in the community for this kind of feature? I can probably clean it up and add this as a github pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)