You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Edmon Begoli <eb...@gmail.com> on 2015/03/13 23:06:38 UTC

Spark on HDFS vs. Lustre vs. other file systems - formal research and performance evaluation

All,

Does anyone have any reference to a publication or other, informal sources
(blogs, notes), showing
performance of Spark on HDFS vs. other shared (Lustre, etc.) or other file
system (NFS).

I need this for formal performance research.

We are currently doing a research into this on a very specific, butique
machine, and we are seeing some controversial results.

For the purpose of literature survey and general comparison I would like to
see the findings that others have had. I know that general wisdom states
that Spark and HDFS should work the best because of the data locality
awareness.

Thank you,
*Edmon Begoli, PhD*
Chief Data Officer
Joint Institute for Computational Sciences (JICS)
ebegoli@tennessee.edu
https://www.linkedin.com/in/ebegoli