You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Christian Chua (JIRA)" <ji...@apache.org> on 2014/09/03 22:48:51 UTC
[jira] [Created] (SPARK-3387) Misleading stage description on the
driver UI
Christian Chua created SPARK-3387:
-------------------------------------
Summary: Misleading stage description on the driver UI
Key: SPARK-3387
URL: https://issues.apache.org/jira/browse/SPARK-3387
Project: Spark
Issue Type: Bug
Components: Web UI
Affects Versions: 1.0.2
Environment: Java 1.6, OSX Mountain Lion
Reporter: Christian Chua
Steps to reproduce : compile and run this modified version of the 1.0.2 pagerank example :
public static void main(String[] args) throws Exception {
JavaSparkContext sc = new JavaSparkContext("local[8]", "Sample");
JavaRDD < String > inputRDD = sc.textFile(INPUT_FILE,1);
JavaPairRDD < String , String > a = inputRDD.mapToPair(new PairFunction < String , String , String >() {
@Override
public Tuple2 < String , String > call(String s) throws Exception {
String[] parts = SPACES.split(s);
return new Tuple2 < String , String >(parts[0], parts[1]);
}
});
JavaPairRDD < String , String > b = a.distinct();
JavaPairRDD < String , Iterable < String >> c = b.groupByKey(11);
System.out.println(c.toDebugString());
System.out.println(c.collect());
JOptionPane.showMessageDialog(null, "Last Line");
sc.stop();
}
The debug string will appear as :
MappedValuesRDD[11] at groupByKey at Sample.java:45 (11 partitions)
MappedValuesRDD[10] at groupByKey at Sample.java:45 (11 partitions)
MapPartitionsRDD[9] at groupByKey at Sample.java:45 (11 partitions)
ShuffledRDD[8] at groupByKey at Sample.java:45 (11 partitions)
MappedRDD[7] at distinct at Sample.java:41 (1 partitions)
MapPartitionsRDD[6] at distinct at Sample.java:41 (1 partitions)
ShuffledRDD[5] at distinct at Sample.java:41 (1 partitions)
MapPartitionsRDD[4] at distinct at Sample.java:41 (1 partitions)
MappedRDD[3] at distinct at Sample.java:41 (1 partitions)
MappedRDD[2] at mapToPair at Sample.java:30 (1 partitions)
MappedRDD[1] at textFile at Sample.java:28 (1 partitions)
HadoopRDD[0] at textFile at Sample.java:28 (1 partitions)
The problem is that the "list of stages" in the UI (localhost:4040) does not mention anything about "groupBy"
In fact it mentions "distinct" twice:
stage 0 : collect
stage 1 : distinct
stage 2 : distinct
This is piece of misleading information can confuse the learner significantly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org