You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xiaomeng Wan <sh...@gmail.com> on 2016/11/11 13:08:10 UTC

load large number of files from s3

Hi,
We have 30 million small files (100k each) on s3. I want to know how bad it
is to load them directly from s3 ( eg driver memory, io, executor memory,
s3 reliability) before merge or distcp them. Anybody has experience? Thanks
in advance!

Regards,
Shawn