You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by lukovnikov <gi...@git.apache.org> on 2015/02/17 15:48:13 UTC

[GitHub] spark pull request: RDF Loader added + documentation

GitHub user lukovnikov opened a pull request:

    https://github.com/apache/spark/pull/4650

    RDF Loader added + documentation

    Have been testing it with DBpedia dumps, works well so far.
    Any help with custom partitioning and optimization is welcome.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lukovnikov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4650
    
----
commit 10436d252ad4876d28c91c77036e3d993050438a
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T19:41:58Z

    fast forward from upstream

commit 595aed098fb423514b73263f96dfcaf1edbc72f5
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T21:41:00Z

    dictionary builder done

commit c2399023825e804476527f7e159b182a1b5c91c8
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T21:44:07Z

    [SPARK 5280]

commit f14e4835cf365fcbe5dd0979e61464b7cecb8774
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T22:50:06Z

    done dictionary version

commit 43cc53ab6d99a4a96a0764cc306f38fdce3a7e00
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T23:25:07Z

    [SPARK 5280] rdfloader using hashes as VertexIds

commit 2e1220d0938aee7d190439253e3b9bb1e73c77e8
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:04:48Z

    cleaned up + fixed style
    TODO: test + comment

commit 54e2c6eb24dade70753320a3ab2b3a64fef7a6d4
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:26:30Z

    made custom 64bit hash

commit b454560508c9d50c60e067d7e67405ca1e13c165
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:32:57Z

    proper

commit 45a9f57695e76c09c20fa99a1010168f63ef1da8
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T19:41:58Z

    fast forward from upstream

commit 6ee9a2b675d06675b5b591f16e8d52e63d2dc049
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T21:41:00Z

    dictionary builder done

commit 45c22160c52111066109f57a0d773aca211c2068
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T21:44:07Z

    [SPARK 5280]

commit fa5c0da9ea4f6ca662406b380432901022d6de55
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T22:50:06Z

    done dictionary version

commit c036f98476e96ac03124f758ed7f17c4a464cf86
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T23:25:07Z

    [SPARK 5280] rdfloader using hashes as VertexIds

commit 57553797f7404e686674b0bfb39d80bb24d6520c
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:04:48Z

    cleaned up + fixed style
    TODO: test + comment

commit e00123eae4a84108af2c84cf253b1f4fb1fb69f1
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:26:30Z

    made custom 64bit hash

commit 6af9a7ad6198174597ae7d86ec5c15fc8467a082
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T00:32:57Z

    proper

commit 1ee34c9474bcf4500edecb08a848d15f3549055d
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T03:31:05Z

    Merge branch 'master' of github.com:lukovnikov/spark into rdfloaderhash

commit 9000a4713d286d5078c16f62b5fadf480941bc82
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T03:31:18Z

    Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash

commit 70eb725a102ae711a59c6d45794d191c18778c4b
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:02:48Z

    RDF Loader with hash, tested on small RDF dumps (more tests in progress)

commit 4398d93712777442ba0f2e8920423fcdd7b67d1f
Author: Denis <lu...@users.noreply.github.com>
Date:   2015-02-04T23:27:01Z

    added documentation for RDFLoader

commit 273a1b30dee1630333e0f7e683378b6dbb13c3a5
Author: Denis <lu...@users.noreply.github.com>
Date:   2015-02-04T23:29:05Z

    small update to RDFLoader description

commit 202ccf86901c3d2435564e544f90d6a49cda66fb
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:31:10Z

    sdf

commit 2d990cec1d48f62f4f1d9f9cf8082308a4eaf9e4
Author: lukovnikov <lu...@denis>
Date:   2015-02-03T19:41:58Z

    fast forward from upstream

commit 4a9b6222176749bee4a14e4b6d035b665c6ac7ea
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:43:31Z

    Merge branch 'master' of github.com:lukovnikov/spark

commit 062996c45d0443836c1b4b2bb714d8f459ea6980
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:43:52Z

    Merge branch 'rdfloaderhash'

commit 121bf14140573349424e7888da13ee2e8ea4f6f0
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:45:48Z

    [SPARK 5280]

commit 67ada514b98292ff647d8354545d37cc111499ba
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:47:21Z

    Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash

commit e5fcf758c0e4b54a38b2a01709681e11bbb6eae8
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:47:45Z

    Merge branch 'rdfloaderhash'

commit c5960af7b14d65b1d290c3af11d722075a54ad2d
Author: lukovnikov <lu...@denis>
Date:   2015-02-04T23:54:37Z

    Merge remote-tracking branch 'upstream/master'

commit 91361f3f760dbc78467f8e2b87a1d77061aa59de
Author: lukovnikov <lu...@denis>
Date:   2015-02-05T00:01:33Z

    undone unnecessary changes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4650


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387131
  
    Added test + a test file (small excerpt from DBpedia 3.9) + small fix in RDFLoader


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75388325
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27823/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75533258
  
    @maropu tests are added and build tests passed. Is it ready for merging now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75389096
  
      [Test build #27825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27825/consoleFull) for   PR 4650 at commit [`3db73ab`](https://github.com/apache/spark/commit/3db73ab6d98c1f1a7c2f92835d288a0724a4e58f).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75388315
  
      [Test build #27823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27823/consoleFull) for   PR 4650 at commit [`b658c55`](https://github.com/apache/spark/commit/b658c55105b6733ff68a6aad4c792cee4bb594b9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387365
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27822/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74860223
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27680/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75393198
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27826/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75388324
  
      [Test build #27823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27823/consoleFull) for   PR 4650 at commit [`b658c55`](https://github.com/apache/spark/commit/b658c55105b6733ff68a6aad4c792cee4bb594b9).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74851018
  
    style errors fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74738782
  
      [Test build #27643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for   PR 4650 at commit [`80d9b72`](https://github.com/apache/spark/commit/80d9b722c82e0cd07bef0e747d5aeecf6de798e5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387364
  
      [Test build #27822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27822/consoleFull) for   PR 4650 at commit [`1bec795`](https://github.com/apache/spark/commit/1bec795703901fe6c75776ce512b23e2563cf925).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387401
  
      [Test build #27817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27817/consoleFull) for   PR 4650 at commit [`04df47a`](https://github.com/apache/spark/commit/04df47acd209655b82c3dfc0997364d677eb9b61).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75383518
  
      [Test build #27817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27817/consoleFull) for   PR 4650 at commit [`04df47a`](https://github.com/apache/spark/commit/04df47acd209655b82c3dfc0997364d677eb9b61).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74679144
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74738784
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27643/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387356
  
      [Test build #27822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27822/consoleFull) for   PR 4650 at commit [`1bec795`](https://github.com/apache/spark/commit/1bec795703901fe6c75776ce512b23e2563cf925).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75387405
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27817/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75389092
  
      [Test build #27825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27825/consoleFull) for   PR 4650 at commit [`3db73ab`](https://github.com/apache/spark/commit/3db73ab6d98c1f1a7c2f92835d288a0724a4e58f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74851295
  
      [Test build #27680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull) for   PR 4650 at commit [`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75389380
  
      [Test build #27826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27826/consoleFull) for   PR 4650 at commit [`4daa6e9`](https://github.com/apache/spark/commit/4daa6e9043480bfc69b8b64a0a4550b25132c661).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by set0gut1 <gi...@git.apache.org>.
Github user set0gut1 commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-154509562
  
    ```scala
    def gethash(in:String):Long = {
      var h = 1125899906842597L
      for (x <- in) {
        h = 31 * h + x;
      }
      return h
    }
    ```
    
    This hash function seems to be weak.
    I tried this with subject URIs of DBpedia's `labels_en.nt` ([sample](http://downloads.dbpedia.org/preview.php?file=2015-04_sl_core-i18n_sl_en_sl_labels_en.nt.bz2)).
    There are 11,519,154 unique URIs, and the hash values of 20,741 URIs collided (0.18%).
    
    For example, the hash values of these URI are same (-3127496886112549146).
    
    * `http://dbpedia.org/resource/Dms`
    * `http://dbpedia.org/resource/EOT`
    * `http://dbpedia.org/resource/F15`
    * `http://dbpedia.org/resource/EP5`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-96769608
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-165304860
  
    +1 to making this a Spark package. I would recommend that we close this PR since it's gone stale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74879414
  
    Please add tests for your RDF loader, and see my codes as an example:
    https://github.com/maropu/spark/commit/cc5ac0b08ca39c3c339fdca905779bb3b037f8fa
    
    BTW, I think that it'd would be better to divide an interface and the implementations
    for GraphLoader because we'll possibly add the some types of GraphLoader
    for different formats in a future.
    
    e.g.,)
    - an interface
    o.a.spark.graphx.GraphLoader:
    abstract class GraphLoader {
      def edgeListFile(...)
    }
    
    - the implementations
    o.a.spark.graphx.impl.loader.LineLoader
    class LineLoader extends GraphLoader {
      def edgeListFile() = {the current implementation of GraphLoader#edgeListFile}
    }
    
    o.a.spark.graphx.impl.loader.RDFLoader
    class RDFLoader extends GraphLoader {
      def edgeListFile() = {your codes}
    }
    
    
    Thought?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74737920
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74860219
  
      [Test build #27680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull) for   PR 4650 at commit [`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-74738600
  
      [Test build #27643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for   PR 4650 at commit [`80d9b72`](https://github.com/apache/spark/commit/80d9b722c82e0cd07bef0e747d5aeecf6de798e5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75393189
  
      [Test build #27826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27826/consoleFull) for   PR 4650 at commit [`4daa6e9`](https://github.com/apache/spark/commit/4daa6e9043480bfc69b8b64a0a4550b25132c661).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-156036419
  
    @lukovnikov if there is still interest in this, the best approach would be to first release something in spark-packages.org as a set of utilities to create Graphs. Using existing 3rd party Hadoop formats makes the most sense as per @rvesse.
    
    Could you close this PR please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75103544
  
    Will add tests soon.
    
    I was also thinking about making one interface for different loaders (with a load() method instead of edgeListFile()) and maybe a facade combining all loaders.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4650#issuecomment-75389098
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27825/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org