You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Rohit Jain <ro...@gmail.com> on 2016/05/10 09:43:47 UTC

Read output of sparkrowsimilairty in scala

Hello,
I am writing scala code to pull data from db and run row-similarity
analysis. After running spark-rowsimilarity I want to read data returned by
function directly write it back to mysql db. But I don;t know how to read
the data from indexeddataset returned by
val data = SimilarityAnalysis.rowSimilarityIDS(myIDs)
In debugger it shows datatype as Indexeddataset which contains
(matrix,rowIDs,columnIds).

Thanks.--
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931

Re: Read output of sparkrowsimilairty in scala

Posted by Pat Ferrel <pa...@occamsmachete.com>.

There are several ways to do this. The design was meant to be extended by a trait that would do the actual read/write. Check out TDIndexedDatasetReader. You can create a similar trait called MySQLIndexedDatasetReader. There are other examples in that file for reading and writing. Also check the driver for how they are used. 

If you don’t like any of those you can use them to write your own code. The IndexedDataset includes an RDD based DRM with int keys to rows and columns, it also includes BiMaps called BiDictionary to translate back and forth between the ints and the original string row and column ids.

On May 10, 2016, at 2:43 AM, Rohit Jain <ro...@gmail.com> wrote:

Hello,
I am writing scala code to pull data from db and run row-similarity
analysis. After running spark-rowsimilarity I want to read data returned by
function directly write it back to mysql db. But I don;t know how to read
the data from indexeddataset returned by
val data = SimilarityAnalysis.rowSimilarityIDS(myIDs)
In debugger it shows datatype as Indexeddataset which contains
(matrix,rowIDs,columnIds).

Thanks.--
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931