You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/05/06 11:50:04 UTC
[jira] [Resolved] (SPARK-20353) Implement Tensorflow TFRecords file
format
[ https://issues.apache.org/jira/browse/SPARK-20353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-20353.
-------------------------------
Resolution: Won't Fix
> Implement Tensorflow TFRecords file format
> ------------------------------------------
>
> Key: SPARK-20353
> URL: https://issues.apache.org/jira/browse/SPARK-20353
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output, SQL
> Affects Versions: 2.1.0
> Reporter: Mathew Wicks
> Priority: Minor
>
> Spark is a very good prepossessing engine for tools like Tensorflow. However, we lack native support for Tensorflow's core file format, TFRecords.
> There is a project which implements this functionality as an external JAR. (But is not user friendly, or robust enough for production use.)
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector
> Here is some discussion around the above.
> https://github.com/tensorflow/ecosystem/issues/32
> If we were to implement "tfrecords" as a data-frame writable/readable format, we would have to account for the various datatypes that can be present in spark columns, and which ones are actually useful in Tensorflow.
> Note: The `spark-tensorflow-connector` described above, does not properly support the vector data type.
> Further discussion of whether this is within the scope of Spark SQL is strongly welcomed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org