You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Peter Aberline (JIRA)" <ji...@apache.org> on 2015/06/21 15:44:00 UTC
[jira] [Created] (SPARK-8510) Store and read NumPy arrays and
matrices as values in sequence files
Peter Aberline created SPARK-8510:
-------------------------------------
Summary: Store and read NumPy arrays and matrices as values in sequence files
Key: SPARK-8510
URL: https://issues.apache.org/jira/browse/SPARK-8510
Project: Spark
Issue Type: Improvement
Components: PySpark
Reporter: Peter Aberline
Priority: Minor
I have extended the provided example code DoubleArrayWritable example to store NumPy double type arrays and matrices as arrays of doubles and nested arrays of doubles.
Pandas DataFrames can be easily converted to NumPy matrices, so I've also added the ability to store the schema-less data from DataFrames and Series that contain double data.
Other than my own use there seems to be demand for this functionality:
http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=Q3n-01iOQ_pkWE1g-c39XiMCo3KhqngQg@mail.gmail.com%3E
I'll be issuing a PR for this shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org