You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Theodore Vasiloudis (JIRA)" <ji...@apache.org> on 2015/06/08 16:53:00 UTC

[jira] [Created] (FLINK-2186) Reworj SVM import to support very wide files

Theodore Vasiloudis created FLINK-2186:
------------------------------------------

             Summary: Reworj SVM import to support very wide files
                 Key: FLINK-2186
                 URL: https://issues.apache.org/jira/browse/FLINK-2186
             Project: Flink
          Issue Type: Improvement
          Components: Machine Learning Library, Scala API
            Reporter: Theodore Vasiloudis


In the current readVcsFile implementation, importing CSV files with many columns can become from cumbersome to impossible.

For example to import an 11 column file wee need to write:

{code}
val cancer = env.readCsvFile[(String, String, String, String, String, String, String, String, String, String, String)]("/path/to/breast-cancer-wisconsin.data")
{code}

For many use cases in Machine Learning we might have CSV files with thousands or millions of columns that we want to import as vectors.
In that case using the current readCsvFile method becomes impossible.

We therefor need to rework the current function, or create a new one that will allow us to import CSV files with an arbitrary number of columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)