You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/01/05 15:29:00 UTC
[jira] [Resolved] (SPARK-26280) Spark will read entire CSV file
even when limit is used
[ https://issues.apache.org/jira/browse/SPARK-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26280.
----------------------------------
Resolution: Duplicate
> Spark will read entire CSV file even when limit is used
> -------------------------------------------------------
>
> Key: SPARK-26280
> URL: https://issues.apache.org/jira/browse/SPARK-26280
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.1
> Reporter: Amir Bar-Or
> Priority: Major
>
> When you read CSV as below , the parser still waste time and read the entire file:
> var lineDF1 = spark.read
> .format("com.databricks.spark.csv")
> .option("header", "true") //reading the headers
> .option("mode", "DROPMALFORMED")
> .option("delimiter",",")
> .option("inferSchema", "false")
> .schema(line_schema)
> .load(i_lineitem)
> .lineDF1.limit(10)
>
> Even though a LocalLimit is created , this does not stop the FileScan and the parser from parsing entire file. Is it possible to push the limit down and stop the parsing ?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org