You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ziyad Muhammed <mm...@gmail.com> on 2017/07/08 02:27:26 UTC
FlinkML ALS is taking too long to run
Dear all
I'm trying to run Flink ALS against Yahoo-R2 data set[1] on HDFS. The
program is running without showing any errors, but it does not finish. The
operators running indefinitely are:
CoGroup (CoGroup at
org.apache.flink.ml.recommendation.ALS$.updateFactors(ALS.scala:606))(11/240)
Join(Join at
org.apache.flink.ml.recommendation.ALS$.updateFactors(ALS.scala:576))(15/240)
I was using the below parameters to run:
val als = ALS().setIterations(10).setNumFactors(10).setBlocks(100)
And I didn't set the hdfs temporary path. Can someone tell me the
parameters to set to run ALS on such large data sets? Why are these
operators running indefinitely?
[1] https://webscope.sandbox.yahoo.com/catalog.php?datatype=r
Best
Ziyad
Re: FlinkML ALS is taking too long to run
Posted by Sebastian Schelter <ss...@googlemail.com>.
I don't think you need to employ a distributed system for working with this
dataset. An SGD implementation on a single machine should easily handle the
job.
Best,
Sebastian
2017-07-12 9:26 GMT+02:00 Andrea Spina <an...@radicalbit.io>:
> Dear Ziyad,
>
> Yep, I had encountered same very long runtimes with ALS as well at the time
> and I recorded improvements by increasing the number of blocks / decreasing
> #TSs/TM like you've stated out.
>
> Cheers,
>
> Andrea
>
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/FlinkML-ALS-is-
> taking-too-long-to-run-tp14154p14192.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>
Re: FlinkML ALS is taking too long to run
Posted by Andrea Spina <an...@radicalbit.io>.
Dear Ziyad,
Yep, I had encountered same very long runtimes with ALS as well at the time
and I recorded improvements by increasing the number of blocks / decreasing
#TSs/TM like you've stated out.
Cheers,
Andrea
--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkML-ALS-is-taking-too-long-to-run-tp14154p14192.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: FlinkML ALS is taking too long to run
Posted by Ziyad Muhammed <mm...@gmail.com>.
Dear Andrea
Thank you for your reply.
The job was stuck at two operators I mentioned (for more than 17 hours).
See the screenshot.
I could solve the problem by:
1. Reducing the task slots in the cluster (to half the number of cores from
same as the number of cores)
2. Tuning the hyper parameter 'blocks'. I kept it at double the value of
job parallelism.
Best
Ziyad
On Tue, Jul 11, 2017 at 5:53 PM, Andrea Spina <an...@radicalbit.io>
wrote:
> Dear Ziyad,
> could you kindly share some additional info about your environment
> (local/cluster, nodes, machines' configuration)?
> What does exactly you mean by "indefinitely"? How much time the job is
> hanging?
>
> Hope to help you, then.
>
> Cheers,
>
> Andrea
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/FlinkML-ALS-is-
> taking-too-long-to-run-tp14154p14186.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>
Re: FlinkML ALS is taking too long to run
Posted by Andrea Spina <an...@radicalbit.io>.
Dear Ziyad,
could you kindly share some additional info about your environment
(local/cluster, nodes, machines' configuration)?
What does exactly you mean by "indefinitely"? How much time the job is
hanging?
Hope to help you, then.
Cheers,
Andrea
--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkML-ALS-is-taking-too-long-to-run-tp14154p14186.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.