You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrql.apache.org by "Leonidas Fegaras (JIRA)" <ji...@apache.org> on 2014/10/19 18:06:33 UTC
[jira] [Commented] (MRQL-55) Add support for Hadoop Sequence input
format in flink mode
[ https://issues.apache.org/jira/browse/MRQL-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176330#comment-14176330 ]
Leonidas Fegaras commented on MRQL-55:
--------------------------------------
Here are some performance results (in secs) on a small Yarn cluster with 12 nodes (48 cores):
# PageRank (6 steps) 1M nodes, 10M edges
# K-means clustering (5 steps) 10M points
# DBLP XML PageRank (12 steps) 1.5GB
# matrix multiplication 500x500
{noformat}
Map-Reduce Spark Flink
-----------------------------------------------
1 591.8 145.1 145.3
2 1068.1 184.0 516.4
3 994.2 149.4 181.6
4 78.7 83.2 94.9
{noformat}
k-means is slower in Flink mode than in Spark mode because MRQL doesn't use Flink iterations for k-means (but it does use Flink iterations for pagerank).
> Add support for Hadoop Sequence input format in flink mode
> ----------------------------------------------------------
>
> Key: MRQL-55
> URL: https://issues.apache.org/jira/browse/MRQL-55
> Project: MRQL
> Issue Type: Improvement
> Components: Run-Time/Flink
> Affects Versions: 0.9.4
> Reporter: Leonidas Fegaras
> Assignee: Leonidas Fegaras
> Priority: Minor
> Attachments: MRQL-55.patch
>
>
> The following patch adds support for hadoop Sequence input format in flink mode. Before this, we used the flink binary input format to read/write binary files, which was not compatible with other MRQL evaluation modes. The patch also fixes the mrql.flink script to get the flink job manager from conf/.yarn-properties instead of conf/.yarn-jobmanager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)