You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2015/06/02 19:15:49 UTC

[jira] [Resolved] (SPARK-8037) Ignores files whose name starts with "." while enumerating files in HadoopFsRelation

     [ https://issues.apache.org/jira/browse/SPARK-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng Lian resolved SPARK-8037.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

> Ignores files whose name starts with "." while enumerating files in HadoopFsRelation
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-8037
>                 URL: https://issues.apache.org/jira/browse/SPARK-8037
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Temporary files like {{.DS_Store}} generated by Mac OS X finder may cause trouble for partition discovery. A directory whose layout looks like the following
> {noformat}
> > find parquet_partitioned
> parquet_partitioned
> parquet_partitioned/._common_metadata.crc
> parquet_partitioned/._metadata.crc
> parquet_partitioned/._SUCCESS.crc
> parquet_partitioned/_common_metadata
> parquet_partitioned/_metadata
> parquet_partitioned/_SUCCESS
> parquet_partitioned/year=2014/.DS_Store
> parquet_partitioned/year=2014/month=9
> parquet_partitioned/year=2014/month=9/.DS_Store
> parquet_partitioned/year=2014/month=9/day=1/.DS_Store
> parquet_partitioned/year=2014/month=9/day=1/.part-r-00008.gz.parquet.crc
> parquet_partitioned/year=2014/month=9/day=1/part-r-00008.gz.parquet
> parquet_partitioned/year=2015
> parquet_partitioned/year=2015/month=10
> parquet_partitioned/year=2015/month=10/day=25
> parquet_partitioned/year=2015/month=10/day=25/.part-r-00002.gz.parquet.crc
> parquet_partitioned/year=2015/month=10/day=25/.part-r-00004.gz.parquet.crc
> parquet_partitioned/year=2015/month=10/day=25/part-r-00002.gz.parquet
> parquet_partitioned/year=2015/month=10/day=25/part-r-00004.gz.parquet
> parquet_partitioned/year=2015/month=10/day=26
> parquet_partitioned/year=2015/month=10/day=26/.part-r-00005.gz.parquet.crc
> parquet_partitioned/year=2015/month=10/day=26/part-r-00005.gz.parquet
> parquet_partitioned/year=2015/month=9
> parquet_partitioned/year=2015/month=9/day=1
> parquet_partitioned/year=2015/month=9/day=1/.part-r-00007.gz.parquet.crc
> parquet_partitioned/year=2015/month=9/day=1/part-r-00007.gz.parquet
> {noformat}
> causes exception like this:
> {noformat}
> scala> val df = sqlContext.read.parquet("parquet_partitioned")
> java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
>     ArrayBuffer(year, month)
> ArrayBuffer(year)
> ArrayBuffer(year, month, day)
>     at scala.Predef$.assert(Predef.scala:179)
>     at org.apache.spark.sql.sources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:189)
>     at org.apache.spark.sql.sources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:87)
>     at org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$discoverPartitions(interfaces.scala:492)
>     at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:449)
>     at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:448)
> {noformat}
> This is because {{.DS_Store}} files are considered as a data file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org