You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/19 06:36:37 UTC

[GitHub] [spark] FatalLin edited a comment on pull request #32202: [SPARK-28098][SQL]allow reader could read files from subdirectory for non-partitioned table when configuration is enable

FatalLin edited a comment on pull request #32202:
URL: https://github.com/apache/spark/pull/32202#issuecomment-822210112

about the configuration "mapred.input.dir.recursive" and "hive.mapred.supports.subdirectories", I found a brief introduction in hive document:
``
hive.mapred.supports.subdirectories
Default Value: false
Added In: Hive 0.10.0 with HIVE-3276
Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501.
(https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties)
``
looks like "mapred.input.dir.recursive" allow map-reduce could read files from sub-directories, and "hive.mapred.supports.subdirectories" allow hive could do some sub-directories related optimization. In my first thought that due to hive and map-reduce is separate project so that's make sense that they have each own configuration about it. But in spark, the operation is only happened in spark-sql, so I only check hive-side configuration "hive.mapred.supports.subdirectories" earlier. How do you think? @attilapiros

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org