You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2014/07/10 15:42:04 UTC
[jira] [Resolved] (FLINK-295) Libsvm InputFormat (supervised
learning) + mahout vector support
[ https://issues.apache.org/jira/browse/FLINK-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ufuk Celebi resolved FLINK-295.
-------------------------------
Resolution: Done
> Libsvm InputFormat (supervised learning) + mahout vector support
> ----------------------------------------------------------------
>
> Key: FLINK-295
> URL: https://issues.apache.org/jira/browse/FLINK-295
> Project: Flink
> Issue Type: Bug
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> I have a working InputFormat + testcase for the libsvm format (initial version is from Ufuk). Does it make sense to add this to stratosphere-addon ([#199|https://github.com/stratosphere/stratosphere/issues/199] | [FLINK-199|https://issues.apache.org/jira/browse/FLINK-199]) ?
> Or will we have separate ML-specific repo?
> Libsvm is a common file format for machine learning (mostly supervised), and there is a huge [library|http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/) of datasets in libsvm format available.
> The input format reads a file in the libsvm format (either multi-label or single-label), and emits a mahout vector. The vector is wrapped in a PactVector class (implements Value), which is a copy of VectorWritable from mahout (needed to create a copy and change the function names, since we don't support Writable yet].
> Code:
> [LibsvmInputFormat|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/io/LibsvmInputFormat.java]
> [Test case|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/io/LibsvmInputFormat.java]
> [Pact vector|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/types/PactVector.java)
> This has some implications
> * we need to add a dependency to mahout to stratosphere-addons (which version?)
> * we need to make a copy of any version of VectorWritable and add this to our code. (Is this allowed? Which version?]
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/295
> Created by: [andrehacker|https://github.com/andrehacker]
> Labels:
> Created at: Tue Nov 26 14:06:24 CET 2013
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)