You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/04 05:25:13 UTC

[GitHub] [spark] GabeChurch commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

GabeChurch commented on pull request #32518:
URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240


    @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? 
    I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple TB writes) for awhile now and seeing worse performance when enabling the magic s3 committer. Probably worth noting that I'm partitioning, bucketing (1 col), and sorting on write. 
    
    Is magic committer simply a bad option for those of us utilizing ORC? Or maybe I'm missing something. 
   
   Property | Option
   -- | --
   spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled | true
   spark.hadoop.fs.s3a.committer.magic.enabled | true
   spark.hadoop.fs.s3a.committer.name | magic
   spark.hadoop.fs.s3a.experimental.input.fadvise | random
   spark.hadoop.fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem
   spark.hadoop.fs.s3a.readahead.range | 157810688
   spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version | 2
   spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a | org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
   spark.sql.hive.metastorePartitionPruning | True
   spark.sql.orc.filterPushdown | True
   spark.sql.parquet.output.committer.class | org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
   spark.sql.sources.commitProtocolClass | org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org