You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by alvarobrandon <al...@gmail.com> on 2016/02/26 14:59:49 UTC

Task Output size in Spark WEB UI not the same as in HDFS

I'm puzzled by the following results I got from executing an application that
just generates data and writes it to HDFS. The 16 tasks that ran for that
app look like this:

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26345/Screen_Shot_2016-02-26_at_14.png> 

So 4 task wrote 2.9GB to the output. But actually if I check in HDFS I have
16 2.9GB files. How comes?
abrandon@granduc-13:/opt/hadoop/bin$ ./hdfs dfs -ls -h kMeans
16/02/26 13:44:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 17 items
-rw-r--r--   3 abrandon supergroup          0 2016-02-26 13:40
kMeans/_SUCCESS
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00000
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:38
kMeans/part-00001
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:38
kMeans/part-00002
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:38
kMeans/part-00003
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00004
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00005
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00006
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00007
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00008
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00009
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00010
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00011
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00012
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00013
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:39
kMeans/part-00014
-rw-r--r--   3 abrandon supergroup      2.9 G 2016-02-26 13:40
kMeans/part-00015





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-Output-size-in-Spark-WEB-UI-not-the-same-as-in-HDFS-tp26345.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org