You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by spr <sp...@yarcdata.com> on 2014/12/05 01:26:50 UTC

representing RDF literals as vertex properties

@ankurdave's concise code at
https://gist.github.com/ankurdave/587eac4d08655d0eebf9, responding to an
earlier thread
(http://apache-spark-user-list.1001560.n3.nabble.com/How-to-construct-graph-in-graphx-tt16335.html#a16355)
shows how to build a graph with multiple edge-types ("predicates" in
RDF-speak).

I'm also looking at how to represent literals as vertex properties. It
seems one way to do this is via positional convention in an Array/Tuple/List
that is the VD; i.e., to represent height, weight, and eyeColor, the VD
could be a Tuple3(Double, Double, String). If any of the properties can be
present or not, then it seems the code needs to be precise about which
elements of the Array/Tuple/List are present and which are not. E.g., to
assign only weight, it could be Tuple3(Option(Double), 123.4,
Option(String)). Given that vertices can have many many properties, it
seems memory consumption for the properties should be as parsimonious as
possible. Will any of Array/Tuple/List support sparse usage? Is Option the
way to get there?

Is this a reasonable approach for representing vertex properties, or is
there a better way?

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: representing RDF literals as vertex properties

Posted by Ankur Dave <an...@gmail.com>.

At 2014-12-08 12:12:16 -0800, spr <sp...@yarcdata.com> wrote:
> OK, have waded into implementing this and have gotten pretty far, but am now
> hitting something I don't understand, an NoSuchMethodError. 
> [...]
> The (short) traceback looks like
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel;
> [...]
> Is the method that's not found (".../StorageLevel") something I need to
> initialize?  Using this same code on a toy problem works fine.  

This is a binary compatibility error and shouldn't happen as long as you're compiling and running with the same Spark assembly jar. Is it possible there's a version mismatch between compiling and running?

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: representing RDF literals as vertex properties

Posted by spr <sp...@yarcdata.com>.

OK, have waded into implementing this and have gotten pretty far, but am now
hitting something I don't understand, an NoSuchMethodError. 

The code looks like

      [...]
   val conf = new SparkConf().setAppName(appName)
    //conf.set("fs.default.name", "file://");
    val sc = new SparkContext(conf)
   
    val lines = sc.textFile(inFileArg)
    val foo = lines.count()
    val edgeTmp = lines.map( line => line.split(" ").slice(0,3)).
              // following filters omit comments, so no need to specifically
filter for comments ("#....")
              filter(x => x(0).startsWith("<") &&  x(0).endsWith(">") &&
                          x(2).startsWith("<") &&  x(2).endsWith(">")).
              map(x => Edge(hashToVId(x(0)),hashToVId(x(2)),x(1)))
    edgeTmp.foreach( edge => print(edge+"\n"))
    val edges: RDD[Edge[String]] = edgeTmp
    println("edges.count="+edges.count)

    val properties: RDD[(VertexId, Map[String, Any])] =
        lines.map( line => line.split(" ").slice(0,3)).
              filter(x => !x(0).startsWith("#")).       // omit RDF comments
              filter(x => !x(2).startsWith("<") || !x(2).endsWith(">")).
              map(x => { val m: Tuple2[VertexId, Map[String, Any]] =
(hashToVId(x(0)), Map((x(1).toString,x(2)))); m })
    properties.foreach( prop => print(prop+"\n"))

    val G = Graph(properties, edges)    /// <======== this is line 114
    println(G)

The (short) traceback looks like

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel;
	at com.cray.examples.spark.graphx.lubm.query9$.main(query9.scala:114)
	at com.cray.examples.spark.graphx.lubm.query9.main(query9.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Is the method that's not found (".../StorageLevel") something I need to
initialize?  Using this same code on a toy problem works fine.  

BTW, this is Spark 1.0, running locally on my laptop.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404p20582.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: representing RDF literals as vertex properties

Posted by Ankur Dave <an...@gmail.com>.

At 2014-12-04 16:26:50 -0800, spr <sp...@yarcdata.com> wrote:
> I'm also looking at how to represent literals as vertex properties. It seems
> one way to do this is via positional convention in an Array/Tuple/List that is
> the VD; i.e., to represent height, weight, and eyeColor, the VD could be a
> Tuple3(Double, Double, String).
> [...]
> Given that vertices can have many many properties, it seems memory consumption
> for the properties should be as parsimonious as possible. Will any of
> Array/Tuple/List support sparse usage? Is Option the way to get there?

Storing vertex properties positionally with Array[Option[Any]] or any of the other sequence types will provide a dense representation. For a sparse representation, the right data type is a Map[String, Any], which will let you access properties by name and will only store the nonempty properties.

Since the value type in the map has to be Any, or more precisely the least upper bound of the property types, this sacrifices type safety and you'll have to downcast when retrieving properties. If there are particular subsets of the properties that frequently go together, you could instead use a class hierarchy. For example, if the vertices are either people or products, you could use the following:

    sealed trait VertexProperty extends Serializable
    case class Person(name: String, weight: Int) extends VertexProperty
    case class Product(name: String, price: Int) extends VertexProperty

Then you could pattern match against the hierarchy instead of downcasting:

     List(Person("Bob", 180), Product("chair", 800), Product("desk", 200)).flatMap {
       case Person(name, weight) => Array.empty[Int]
       case Product(name, price) => Array(price)
     }.sum

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org