You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2015/02/14 00:28:12 UTC

[jira] [Commented] (MAHOUT-1641) Add conversion from a RDD[(String, String)] to a Drm[Int]

    [ https://issues.apache.org/jira/browse/MAHOUT-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320967#comment-14320967 ] 

Dmitriy Lyubimov commented on MAHOUT-1641:
------------------------------------------

Not sure if i understand the problem correctly. 

There's indeed a general need to convert whatever key to *sequentially and ordinally* numbered things of rows (specifically, to enable certain type of transpositions).

E.g. if you have a string-labeled set of rows 

A -> x_1
B-> x_2
....
Z -> x_26

then we may want to replace keys with 

0 -> x_1
... 
25 -> x_26

and thus enable more interesting things.

incidentally, i already have a patch for this -- for the same reason. This is coming as a part of larger update soon. (I unfortunately am still wrangling with approval of these patches thru the corporate food chain here). So if you are willing to wait a tiny bit, it is coming in. 

It is also optionally computing the mapping between old and new Int keys.

But please feel free to add your patch (i may probably ask you to modify its signature and name to match mine, as i already have dependencies on that).



> Add conversion from a RDD[(String, String)] to a Drm[Int]
> ---------------------------------------------------------
>
>                 Key: MAHOUT-1641
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1641
>             Project: Mahout
>          Issue Type: Question
>          Components: spark
>    Affects Versions: 1.0
>            Reporter: Erlend Hamnaberg
>
> Hi.
> We are using the coocurrence part of mahout as a library. We get our data from other sources, like for instance Cassandra. We dont want to write that data to disk, and read it back since we already have the data on each slave.
> I have created some conversion functions based on one of the IndexedDatasetSpark readers, cant remember which one at the moment.
> Is there interest in the community for this kind of feature? I can probably clean it up and add this as a github pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)