You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2018/04/12 16:35:00 UTC
[jira] [Resolved] (SPARK-23554) Hive's
textinputformat.record.delimiter equivalent in Spark
[ https://issues.apache.org/jira/browse/SPARK-23554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruslan Dautkhanov resolved SPARK-23554.
---------------------------------------
Resolution: Duplicate
> Hive's textinputformat.record.delimiter equivalent in Spark
> -----------------------------------------------------------
>
> Key: SPARK-23554
> URL: https://issues.apache.org/jira/browse/SPARK-23554
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 2.2.1, 2.3.0
> Reporter: Ruslan Dautkhanov
> Priority: Major
> Labels: csv, csvparser
>
> It would be great if Spark would support an option similar to Hive's {{textinputformat.record.delimiter }} in spark-csv reader.
> We currently have to create Hive tables to workaround this missing functionality natively in Spark.
> {{textinputformat.record.delimiter}} was introduced back in 2011 in map-reduce era -
> see MAPREDUCE-2254.
> As an example, one of the most common use cases for us involving {{textinputformat.record.delimiter}} is to read multiple lines of text that make up a "record". Number of actual lines per "record" is varying and so {{textinputformat.record.delimiter}} is a great solution for us to process these files natively in Hadoop/Spark (custom .map() function then actually does processing of those records), and we convert it to a dataframe..
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org