You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Franklyn Dsouza (JIRA)" <ji...@apache.org> on 2017/01/27 22:09:24 UTC

[jira] [Closed] (SPARK-19388) Reading an empty folder as parquet causes an Analysis Exception

     [ https://issues.apache.org/jira/browse/SPARK-19388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklyn Dsouza closed SPARK-19388.
-----------------------------------
    Resolution: Fixed

> Reading an empty folder as parquet causes an Analysis Exception
> ---------------------------------------------------------------
>
>                 Key: SPARK-19388
>                 URL: https://issues.apache.org/jira/browse/SPARK-19388
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: Franklyn Dsouza
>            Priority: Minor
>
> Reading an empty folder as parquet used to return an empty dataframe up till 2.0 .
> Now this causes an analysis exception like so 
> {code}
> In [1]: df = sc.sql.read.parquet("empty_dir/")
> ---------------------------------------------------------------------------
> AnalysisException                         Traceback (most recent call last)
> ----> 1 df = sqlCtx.read.parquet("empty_dir/")
> spark/99f3dfa6151e312379a7381b7e65637df0429941/python/pyspark/sql/readwriter.pyc in parquet(self, *paths)
>     272         [('name', 'string'), ('year', 'int'), ('month', 'int'), ('day', 'int')]
>     273         """
> --> 274         return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
>     275
>     276     @ignore_unicode_prefix
> park/99f3dfa6151e312379a7381b7e65637df0429941/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
>    1131         answer = self.gateway_client.send_command(command)
>    1132         return_value = get_return_value(
> -> 1133             answer, self.gateway_client, self.target_id, self.name)
>    1134
>    1135         for temp_arg in temp_args:
> spark/99f3dfa6151e312379a7381b7e65637df0429941/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>      67                                              e.java_exception.getStackTrace()))
>      68             if s.startswith('org.apache.spark.sql.AnalysisException: '):
> ---> 69                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
>      70             if s.startswith('org.apache.spark.sql.catalyst.analysis'):
>      71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org