You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aremirata <gi...@git.apache.org> on 2016/01/06 17:42:40 UTC

[GitHub] spark pull request: [SPARK-5955][MLLIB] add checkpointInterval to ...

Github user aremirata commented on the pull request:

    https://github.com/apache/spark/pull/5076#issuecomment-169383377
  
    Hi guys,
    
    First of all, I would like to thank you guys for developing spark and putting it open source that we can use. I'm Alger Remirata, a researcher from the Philippines. I'm new to Spark and Scala, and working in a project involving matrix factorizations in Spark. I have a problem regarding running ALS in Spark. It has a stackoverflow due to long linage chain as per comments on the internet. One of their suggestion is to use the setCheckpointInterval so that for every 10-20 iterations, we can checkpoint the RDDs and it prevents the error. Just want to ask details on how to do checkpointing with ALS. I am using spark-kernel developed by IBM: https://github.com/ibm-et/spark-kernel instead of spark-shell.
    
    Here are some of my specific questions regarding details on checkpoint:
    
    1. In setting checkpoint directory through SparkContext.setCheckPointDir(), it needs to be a hadoop compatible directory. Can we use any available hdfs-compatible directory?
    2. What do you mean by this comment on the code in ALS checkpointing:
    If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
      * this setting is ignored.
    3. Is the use of setCheckPointInterval the only code I needed to add to have checkpointing for ALS work?
    4. I am getting this error: Name: java.lang.IllegalArgumentException, Message: Wrong FS: expected file :///. How can I solve this? What is the proper way of using checkpointing.
    
    Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org