You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stephen Carman (JIRA)" <ji...@apache.org> on 2015/06/18 21:20:00 UTC

[jira] [Commented] (SPARK-8449) HDF5 read/write support for Spark MLlib

    [ https://issues.apache.org/jira/browse/SPARK-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592364#comment-14592364 ] 

Stephen Carman commented on SPARK-8449:
---------------------------------------

As per your suggestion https://github.com/apache/spark/pull/1290#issuecomment-113262365 upon looking at this more, I agree with not making it a compilation step. So I suppose we're gonna have to require it's building. Shame though I found out a couple things looking at this further...

1. The Java library is just wrappers for the C Library using the JNI, so compiled versions will have to be it seems platform specific.
2. These artifacts don't exist in the main maven repo, so we're gonna have to discuss either building them with spark or some other method of having these libraries available. I'm unsure what is the best path to go down here, I figured they would publish the artifacts, but that isn't the case. I'm hesitant to add a C related build step to building spark as I think that'd be beaten and killed with fire by anyone reading the idea.

What do you think Alex? Any other idea for proceeding with this? In the mean time, I'm gonna research how I can better integrate the dependencies in here.

> HDF5 read/write support for Spark MLlib
> ---------------------------------------
>
>                 Key: SPARK-8449
>                 URL: https://issues.apache.org/jira/browse/SPARK-8449
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.4.0
>            Reporter: Alexander Ulanov
>             Fix For: 1.4.1
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Add support for reading and writing HDF5 file format to/from LabeledPoint. HDFS and local file system have to be supported. Other Spark formats to be discussed. 
> Interface proposal:
> /* path - directory path in any Hadoop-supported file system URI */
> MLUtils.saveAsHDF5(sc: SparkContext, path: String, RDD[LabeledPoint]): Unit
> /* path - file or directory path in any Hadoop-supported file system URI */
> MLUtils.loadHDF5(sc: SparkContext, path: String): RDD[LabeledPoint]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org