You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2014/07/10 15:42:04 UTC

[jira] [Resolved] (FLINK-295) Libsvm InputFormat (supervised learning) + mahout vector support

     [ https://issues.apache.org/jira/browse/FLINK-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ufuk Celebi resolved FLINK-295.
-------------------------------

    Resolution: Done

> Libsvm InputFormat (supervised learning) + mahout vector support
> ----------------------------------------------------------------
>
>                 Key: FLINK-295
>                 URL: https://issues.apache.org/jira/browse/FLINK-295
>             Project: Flink
>          Issue Type: Bug
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> I have a working InputFormat + testcase for the libsvm format (initial version is from Ufuk). Does it make sense to add this to stratosphere-addon ([#199|https://github.com/stratosphere/stratosphere/issues/199] | [FLINK-199|https://issues.apache.org/jira/browse/FLINK-199]) ?
> Or will we have separate ML-specific repo?
> Libsvm is a common file format for machine learning (mostly supervised), and there is a huge [library|http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/) of datasets in libsvm format available.
> The input format reads a file in the libsvm format (either multi-label or single-label), and emits a mahout vector. The vector is wrapped in a PactVector class (implements Value), which is a copy of VectorWritable from mahout (needed to create a copy and change the function names, since we don't support Writable yet].
> Code:
> [LibsvmInputFormat|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/io/LibsvmInputFormat.java]
> [Test case|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/io/LibsvmInputFormat.java]
> [Pact vector|https://github.com/andrehacker/logreg/blob/master/logreg-pact/src/main/java/de/tuberlin/dima/ml/pact/types/PactVector.java)
> This has some implications
> * we need to add a dependency to mahout to stratosphere-addons (which version?)
> * we need to make a copy of any version of VectorWritable and add this to our code. (Is this allowed? Which version?]
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/295
> Created by: [andrehacker|https://github.com/andrehacker]
> Labels: 
> Created at: Tue Nov 26 14:06:24 CET 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)