You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by larryxiao <gi...@git.apache.org> on 2014/09/18 15:59:58 UTC

[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

GitHub user larryxiao opened a pull request:

    https://github.com/apache/spark/pull/2446

    [SPARK-1987] EdgePartitionBuilder: More memory-efficient graph construction

    https://issues.apache.org/jira/browse/SPARK-1987
    To save overhead of Edge objects, separate an array of Edge objects into three arrays of srdId, dstId and data, for later EdgePartition(srcIdsTrim, dstIdsTrim, dataTrim, index, vertices).
    To sort arrays directly, I use [ParallelSorter](http://cglib.sourceforge.net/apidocs/net/sf/cglib/ParallelSorter.html) from mockito.cglib 
    
    Can't compile at the moment and I don't really know to solve.
    
    ```
    [info] Compiling 8 Scala sources to /home/xd/Developer/spark/graphx/target/scala-2.10/classes...
    [error] /home/xd/Developer/spark/graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartitionBuilder.scala:49: type mismatch;
    [error]  found   : Array[Array[_ >: ED with Long]]
    [error]  required: Array[Object]
    [error] Note: Array[_ >: ED with Long] <: Object, but class Array is invariant in type T.
    [error] You may wish to investigate a wildcard type such as `_ <: Object`. (SLS 3.2.10)
    [error]     val sorter = ParallelSorter.create(edgeArray)
    [error]                                        ^
    [error] one error found
    [error] (graphx/compile:compile) Compilation failed
    [error] Total time: 5 s, completed Sep 18, 2014 9:01:14 PM
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/larryxiao/spark 1987

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2446
    
----
commit eabc9c26948f12cc2874a553bf668a38b08b3301
Author: Larry Xiao <xi...@sjtu.edu.cn>
Date:   2014-08-25T07:29:56Z

    [SPARK-1987] EdgePartitionBuilder: More memory-efficient graph construction
    
    use ParallelSorter from mockito.cglib

commit c1d7e1171f30da7b26cdc80f0345ff2a69ec5649
Author: Larry Xiao <xi...@sjtu.edu.cn>
Date:   2014-09-18T13:10:53Z

    add dependency in build.sbt
    
    can't compile yet

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56087113
  
    Awesome, thanks! I didn't know about ParallelSorter.
    
    I submitted a PR to fix the compile error (larryxiao/spark#2). It also has to sort by (srcId, dstId) rather than just srcId because groupEdges relies on that. Finally, it works around a small [bug](http://mydailyjava.blogspot.com/2013/11/cglib-missing-manual.html) in ParallelSorter that requires you to supply explicit sorting ranges to avoid a ClassCastException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56207128
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20571/consoleFull) for   PR 2446 at commit [`e1a8f04`](https://github.com/apache/spark/commit/e1a8f04ba923935e26bc8a78c3e0aff03751aae4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `sealed trait Matrix extends Serializable `
      * `class SparseMatrix(`
      * `sealed trait Vector extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by larryxiao <gi...@git.apache.org>.
Github user larryxiao commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56123977
  
    Thanks Ankur! You are really efficient!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56249996
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20599/consoleFull) for   PR 2446 at commit [`f7bfa8b`](https://github.com/apache/spark/commit/f7bfa8b2d66eaf8d1e6648b90aa257d08e014cc5).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by larryxiao <gi...@git.apache.org>.
Github user larryxiao commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-76647871
  
    @nchammas 
    The QA test result is not available now, I'll run the test again tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56207089
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20571/consoleFull) for   PR 2446 at commit [`e1a8f04`](https://github.com/apache/spark/commit/e1a8f04ba923935e26bc8a78c3e0aff03751aae4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56151121
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-96770208
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56258447
  
    Is graphx/build.sbt necessary? I thought modifying graphx/pom.xml would be sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-76636221
  
    @ankurdave @larryxiao This PR has gone stale.
    
    Do we want to update it, or close and revisit it later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56042212
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56137274
  
    Thanks! ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56249949
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20599/consoleFull) for   PR 2446 at commit [`f7bfa8b`](https://github.com/apache/spark/commit/f7bfa8b2d66eaf8d1e6648b90aa257d08e014cc5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-168112372
  
    I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

Posted by larryxiao <gi...@git.apache.org>.
Github user larryxiao commented on the pull request:

    https://github.com/apache/spark/pull/2446#issuecomment-56265541
  
    Sorry I don't know about the build system much. I thought pom.xml is for maven, and build.sbt is for sbt.
    But I can only sbt assembly with build.sbt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org