You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Nantia Makrynioti <na...@gmail.com> on 2017/04/04 09:57:16 UTC

Loading data from files - Samsara

Hello,

is there a way to load data from a file, e.g. csv file, to an in-core
vector or matrix?

Thanks a lot,
Nantia

Re: Loading data from files - Samsara

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout-Samsara has a couple CLI drivers but these are mostly for examples. They read from csv files but may not do what you want.

Mahout can also run in a Spark Shell or as a library to your app, which gives you all the data loading functions of Spark or Scala. For instance I use SimilarityAnalysis.cooccurrence, which takes the Mahout data type IndexedDataset. This has a conversion helper that takes the Spark RDD[String, String]. Spark can read in an RDD[String, String] in many ways. 

In short you have all the ways of Java, HDFS, and Spark to draw from, these are not implemented in Mahout so all you need to do is convert this data into something Mahout works with like a DRM (DistributedRowMatrix) or IndexedDataset (which contains and wraps a DRM) depending on what you want to do with it.


On Apr 4, 2017, at 2:57 AM, Nantia Makrynioti <na...@gmail.com> wrote:

Hello,

is there a way to load data from a file, e.g. csv file, to an in-core
vector or matrix?

Thanks a lot,
Nantia