You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by lukovnikov <gi...@git.apache.org> on 2015/02/17 15:48:13 UTC
[GitHub] spark pull request: RDF Loader added + documentation
GitHub user lukovnikov opened a pull request:
https://github.com/apache/spark/pull/4650
RDF Loader added + documentation
Have been testing it with DBpedia dumps, works well so far.
Any help with custom partitioning and optimization is welcome.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lukovnikov/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4650.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4650
----
commit 10436d252ad4876d28c91c77036e3d993050438a
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 595aed098fb423514b73263f96dfcaf1edbc72f5
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T21:41:00Z
dictionary builder done
commit c2399023825e804476527f7e159b182a1b5c91c8
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T21:44:07Z
[SPARK 5280]
commit f14e4835cf365fcbe5dd0979e61464b7cecb8774
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T22:50:06Z
done dictionary version
commit 43cc53ab6d99a4a96a0764cc306f38fdce3a7e00
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T23:25:07Z
[SPARK 5280] rdfloader using hashes as VertexIds
commit 2e1220d0938aee7d190439253e3b9bb1e73c77e8
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:04:48Z
cleaned up + fixed style
TODO: test + comment
commit 54e2c6eb24dade70753320a3ab2b3a64fef7a6d4
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:26:30Z
made custom 64bit hash
commit b454560508c9d50c60e067d7e67405ca1e13c165
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:32:57Z
proper
commit 45a9f57695e76c09c20fa99a1010168f63ef1da8
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 6ee9a2b675d06675b5b591f16e8d52e63d2dc049
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T21:41:00Z
dictionary builder done
commit 45c22160c52111066109f57a0d773aca211c2068
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T21:44:07Z
[SPARK 5280]
commit fa5c0da9ea4f6ca662406b380432901022d6de55
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T22:50:06Z
done dictionary version
commit c036f98476e96ac03124f758ed7f17c4a464cf86
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T23:25:07Z
[SPARK 5280] rdfloader using hashes as VertexIds
commit 57553797f7404e686674b0bfb39d80bb24d6520c
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:04:48Z
cleaned up + fixed style
TODO: test + comment
commit e00123eae4a84108af2c84cf253b1f4fb1fb69f1
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:26:30Z
made custom 64bit hash
commit 6af9a7ad6198174597ae7d86ec5c15fc8467a082
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T00:32:57Z
proper
commit 1ee34c9474bcf4500edecb08a848d15f3549055d
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T03:31:05Z
Merge branch 'master' of github.com:lukovnikov/spark into rdfloaderhash
commit 9000a4713d286d5078c16f62b5fadf480941bc82
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T03:31:18Z
Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash
commit 70eb725a102ae711a59c6d45794d191c18778c4b
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:02:48Z
RDF Loader with hash, tested on small RDF dumps (more tests in progress)
commit 4398d93712777442ba0f2e8920423fcdd7b67d1f
Author: Denis <lu...@users.noreply.github.com>
Date: 2015-02-04T23:27:01Z
added documentation for RDFLoader
commit 273a1b30dee1630333e0f7e683378b6dbb13c3a5
Author: Denis <lu...@users.noreply.github.com>
Date: 2015-02-04T23:29:05Z
small update to RDFLoader description
commit 202ccf86901c3d2435564e544f90d6a49cda66fb
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:31:10Z
sdf
commit 2d990cec1d48f62f4f1d9f9cf8082308a4eaf9e4
Author: lukovnikov <lu...@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 4a9b6222176749bee4a14e4b6d035b665c6ac7ea
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:43:31Z
Merge branch 'master' of github.com:lukovnikov/spark
commit 062996c45d0443836c1b4b2bb714d8f459ea6980
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:43:52Z
Merge branch 'rdfloaderhash'
commit 121bf14140573349424e7888da13ee2e8ea4f6f0
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:45:48Z
[SPARK 5280]
commit 67ada514b98292ff647d8354545d37cc111499ba
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:47:21Z
Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into rdfloaderhash
commit e5fcf758c0e4b54a38b2a01709681e11bbb6eae8
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:47:45Z
Merge branch 'rdfloaderhash'
commit c5960af7b14d65b1d290c3af11d722075a54ad2d
Author: lukovnikov <lu...@denis>
Date: 2015-02-04T23:54:37Z
Merge remote-tracking branch 'upstream/master'
commit 91361f3f760dbc78467f8e2b87a1d77061aa59de
Author: lukovnikov <lu...@denis>
Date: 2015-02-05T00:01:33Z
undone unnecessary changes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4650
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387131
Added test + a test file (small excerpt from DBpedia 3.9) + small fix in RDFLoader
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75388325
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27823/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75533258
@maropu tests are added and build tests passed. Is it ready for merging now?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75389096
[Test build #27825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27825/consoleFull) for PR 4650 at commit [`3db73ab`](https://github.com/apache/spark/commit/3db73ab6d98c1f1a7c2f92835d288a0724a4e58f).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75388315
[Test build #27823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27823/consoleFull) for PR 4650 at commit [`b658c55`](https://github.com/apache/spark/commit/b658c55105b6733ff68a6aad4c792cee4bb594b9).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387365
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27822/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74860223
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27680/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75393198
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27826/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75388324
[Test build #27823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27823/consoleFull) for PR 4650 at commit [`b658c55`](https://github.com/apache/spark/commit/b658c55105b6733ff68a6aad4c792cee4bb594b9).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74851018
style errors fixed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74738782
[Test build #27643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for PR 4650 at commit [`80d9b72`](https://github.com/apache/spark/commit/80d9b722c82e0cd07bef0e747d5aeecf6de798e5).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387364
[Test build #27822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27822/consoleFull) for PR 4650 at commit [`1bec795`](https://github.com/apache/spark/commit/1bec795703901fe6c75776ce512b23e2563cf925).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387401
[Test build #27817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27817/consoleFull) for PR 4650 at commit [`04df47a`](https://github.com/apache/spark/commit/04df47acd209655b82c3dfc0997364d677eb9b61).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75383518
[Test build #27817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27817/consoleFull) for PR 4650 at commit [`04df47a`](https://github.com/apache/spark/commit/04df47acd209655b82c3dfc0997364d677eb9b61).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74679144
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74738784
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27643/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387356
[Test build #27822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27822/consoleFull) for PR 4650 at commit [`1bec795`](https://github.com/apache/spark/commit/1bec795703901fe6c75776ce512b23e2563cf925).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75387405
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27817/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75389092
[Test build #27825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27825/consoleFull) for PR 4650 at commit [`3db73ab`](https://github.com/apache/spark/commit/3db73ab6d98c1f1a7c2f92835d288a0724a4e58f).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74851295
[Test build #27680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull) for PR 4650 at commit [`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75389380
[Test build #27826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27826/consoleFull) for PR 4650 at commit [`4daa6e9`](https://github.com/apache/spark/commit/4daa6e9043480bfc69b8b64a0a4550b25132c661).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by set0gut1 <gi...@git.apache.org>.
Github user set0gut1 commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-154509562
```scala
def gethash(in:String):Long = {
var h = 1125899906842597L
for (x <- in) {
h = 31 * h + x;
}
return h
}
```
This hash function seems to be weak.
I tried this with subject URIs of DBpedia's `labels_en.nt` ([sample](http://downloads.dbpedia.org/preview.php?file=2015-04_sl_core-i18n_sl_en_sl_labels_en.nt.bz2)).
There are 11,519,154 unique URIs, and the hash values of 20,741 URIs collided (0.18%).
For example, the hash values of these URI are same (-3127496886112549146).
* `http://dbpedia.org/resource/Dms`
* `http://dbpedia.org/resource/EOT`
* `http://dbpedia.org/resource/F15`
* `http://dbpedia.org/resource/EP5`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-96769608
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-165304860
+1 to making this a Spark package. I would recommend that we close this PR since it's gone stale.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74879414
Please add tests for your RDF loader, and see my codes as an example:
https://github.com/maropu/spark/commit/cc5ac0b08ca39c3c339fdca905779bb3b037f8fa
BTW, I think that it'd would be better to divide an interface and the implementations
for GraphLoader because we'll possibly add the some types of GraphLoader
for different formats in a future.
e.g.,)
- an interface
o.a.spark.graphx.GraphLoader:
abstract class GraphLoader {
def edgeListFile(...)
}
- the implementations
o.a.spark.graphx.impl.loader.LineLoader
class LineLoader extends GraphLoader {
def edgeListFile() = {the current implementation of GraphLoader#edgeListFile}
}
o.a.spark.graphx.impl.loader.RDFLoader
class RDFLoader extends GraphLoader {
def edgeListFile() = {your codes}
}
Thought?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74737920
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74860219
[Test build #27680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull) for PR 4650 at commit [`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-74738600
[Test build #27643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for PR 4650 at commit [`80d9b72`](https://github.com/apache/spark/commit/80d9b722c82e0cd07bef0e747d5aeecf6de798e5).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75393189
[Test build #27826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27826/consoleFull) for PR 4650 at commit [`4daa6e9`](https://github.com/apache/spark/commit/4daa6e9043480bfc69b8b64a0a4550b25132c661).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-156036419
@lukovnikov if there is still interest in this, the best approach would be to first release something in spark-packages.org as a set of utilities to create Graphs. Using existing 3rd party Hadoop formats makes the most sense as per @rvesse.
Could you close this PR please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by lukovnikov <gi...@git.apache.org>.
Github user lukovnikov commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75103544
Will add tests soon.
I was also thinking about making one interface for different loaders (with a load() method instead of edgeListFile()) and maybe a facade combining all loaders.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4650#issuecomment-75389098
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27825/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org