You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by th0rsten <th...@online.de> on 2014/12/11 17:57:00 UTC

Different Vertex Ids in Graph and Edges

Hello all,

I'm using GraphX (1.1.0) to process RDF-data. I want to build an graph out
of the data from the Berlin Benchmark ( BSBM
<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/> 
).
The steps that I'm doing to load the data into a graph are:

*1.* Split the RDF triples
*2.* Get all nodes (union subjects and objects and then distinct them,
/NodesRDD/)
*3.* Zip the nodes (NodesRDD) with "zipWithUniqueId" -> /ZippedNodesRDD/
*4.* Join the subjects and objects with the predicate to get the
corresponding ids for the nodes to build the edges
*5.* Build the graph nodes out of the /ZippedNodesRDD/, create the Java node
attribute
 *6.*Build the GraphX graph

My problem is that my nodes (/graph.vertices/) in the graph have different
ids than the nodes (/ZippedNodesRDD/) which I use to build the edges. I
don't know why because I build the final nodes out of the same RDD which I
use to join and this RDD is cached.

*For example:*
graph.vertices says: ID: 35255, Attribute:
bsbm-inst:dataFromVendor33/Offer62164/
ZippedNodesRDD says: ID: 35254 Attribute:
bsbm-inst:dataFromVendor33/Offer62164/

I have no idea why that happens, because the joining is correct only the ids
are wrong.


Thanks in Advance



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Different-Vertex-Ids-in-Graph-and-Edges-tp20632.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org