You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JerryLead <gi...@git.apache.org> on 2014/12/02 07:31:22 UTC

[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

GitHub user JerryLead opened a pull request:

    https://github.com/apache/spark/pull/3549

    [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage

    
    The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
    
    Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JerryLead/spark my_graphX_checkpoint

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3549
    
----
commit 52799e3ea2b22f4bcaec3d9cd4c8891e212be09e
Author: Lijie Xu <cs...@gmail.com>
Date:   2014-12-01T08:54:37Z

    Merge pull request #1 from apache/master
    
    update

commit c0169da181660281b3bd82678ae89a73f5926370
Author: JerryLead <je...@163.com>
Date:   2014-12-02T03:19:31Z

    Merge branch 'master' of https://github.com/apache/spark
    
    update to the latest version

commit ff08ed4a963127119d335a67d7977eaab0e4e437
Author: JerryLead <je...@163.com>
Date:   2014-12-02T04:42:43Z

    Merge branch 'master' of https://github.com/apache/spark

commit d1aa8d88fd9af0d78066c9023ec7b30cd8341a3b
Author: JerryLead <je...@163.com>
Date:   2014-12-02T06:18:14Z

    Perform checkpoint() on PartitionsRDD not VertexRDD and EdgeRDD themselves

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by jason-dai <gi...@git.apache.org>.
Github user jason-dai commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65343799
  
    Maybe we can try something like:
    
        class ZippedPartitionsRDD2 (sc, f, …) {
          val cleanF(part1, part2, ctx) = sc.clean(f(rdd1.iterator(part1, ctx), rdd2.iterator(part2, context)))
        
          override def compute(s: Partition, context: TaskContext): Iterator[V] = {
            …
            cleanF(partitions(0), partitions(1), context)
          }
          …
        }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65304304
  
      [Test build #24056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24056/consoleFull) for   PR 3549 at commit [`d1aa8d8`](https://github.com/apache/spark/commit/d1aa8d88fd9af0d78066c9023ec7b30cd8341a3b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65316895
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24056/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65303586
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3549


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65188935
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65316886
  
      [Test build #24056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24056/consoleFull) for   PR 3549 at commit [`d1aa8d8`](https://github.com/apache/spark/commit/d1aa8d88fd9af0d78066c9023ec7b30cd8341a3b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4672][GraphX]Perform checkpoint() on Pa...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/3549#issuecomment-65336297
  
    Thanks, merged into master and branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org