You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Frank Kemmer (JIRA)" <ji...@apache.org> on 2018/09/05 18:06:00 UTC
[jira] [Created] (SPARK-25343) Extend CSV parsing to
Dataset[List[String]]
Frank Kemmer created SPARK-25343:
------------------------------------
Summary: Extend CSV parsing to Dataset[List[String]]
Key: SPARK-25343
URL: https://issues.apache.org/jira/browse/SPARK-25343
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.3.1
Reporter: Frank Kemmer
With the cvs() method it is currenty possible to create a Dataframe from Dataset[String], where the given string contains comma separated values. This is really great.
But very often we have to parse files where we have to split the values of a line by very individual value separators and regular expressions. The result is a Dataset[List[String]]. This list corresponds to what you would get, after splitting the values of a CSV string.
It would be great, if the csv() method would also accept such a Dataset as input especially given a target schema. The csv parser usually casts the separated values against the schema and can sort out lines where the values of the columns do not fit with the schema.
This is the functionality I am looking for and I think it is already implemented in the CSV parser.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org