You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Paolo Castagna <ca...@googlemail.com> on 2010/03/26 11:47:35 UTC

Re: [jena-dev] SPARQL: Transformation of SPARQL

Hi Alexander

Alexander Schätzle wrote:
> I'm working on a translation of SPARQL queries into PigLatin Scripts 
> (language used by the Pig System for Hadoop developed by Yahoo). 

Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
idea. With a small(?) amount of glue code you join two big communities
delivering value to both. On one hand, there is the need for 
scalable/parallel processing systems. On the other hand, there is
the aim to support as many data formats as possible.

I give fully credit for this idea to Peter Mika and Ben Reed that kindly
shared an unpublished paper [1] on this topic. You should write them and 
ask for a copy of the paper.

Here follows a summary of their mapping between the SPARQL algebra
operators and solution modifiers and the Pig Latin syntax:

   SPARQL algebra                     Pig Latin syntax
   --------------------------------   ----------------------------------

   BGP operator                       A set of FILTER operations,
                                      followed by a number of JOINs
                                      equal to the number of triple
                                      patterns and a single FOREACH
                                      statement.

   Filter operator                    FILTER (with the limitation
                                      that not all expressions are
                                      directly supported by Pig Latin).
                                      However, Pig can be extended via
                                      user-defined functions (UDFs) to
                                      have a semantically equivalent
                                      filter behaviour.

   Join operator                      A series of JOINs (which is an
                                      inner join in Pig) followed by a
                                      FOREACH (a projection) to remove
                                      the duplicated columns.

   LeftJoin operator                  A series of outer JOINs plus a
                                      custom filter operator.

   Union operator                     UNION

   Graph operator                     ?

   OrderBy modifier                   ORDER

   Project modifier                   Achieved by FOREACH.

   Distinct modifier                  DISTINCT

   Reduced modifier                   Implemented using DISTINCT.

   Slice modifier                     Implemented using a custom filter.

   ToList modifier                    ?

They conclude:

   "In summary, we have shown a complete translation procedure from
    SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
    interpreter. (This interpreter is not complete yet, but sufficient to
    cover the most commonly used queries, including the ones discussed in
    this paper.) Note that the query plans generated by our interpreter
    are unlikely to be optimal: optimization is left as a task
    for Pig." [1]

I suggest you use N-Triples and/or N-Quads (parsers are available in
TDB) as input/output with Pig.

Which version of Pig are you planning to use?

My 2 cents,
Paolo

  [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
      store using MapReduce", August 29, 2008 - unpublished

PS:
For the benefit of Pig users/developers (in CC to this email):

SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
recommended RDF query language befined by W3C.

More specifically, the SPARQL Algebra defines six operators and six
solution modifiers:

  Operators                Solution Modifiers
  -----------------------  ---------------------
  BGP                      ToList
  Join                     OrderBy
  LeftJoin                 Project
  Filter                   Dinstict
  Union                    Reduced
  Graph                    Slice

For each operator and solution modifier an evaluation semantics is 
clearly defined by the SPARQL specification.

ARQ is an open source query engine for Jena that supports the SPARQL RDF
Query language. ARQ provide a fully compliant parser for the SPARQL
syntax and, internally, represents SPARQL query using an abstract
syntaxt tree with operators and solution modifiers as elements.
The well known visitor design pattern is used to manipulate and 
transform the internal representation of a SPARQL query.

  - http://www.w3.org/TR/rdf-sparql-query/
  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
  - http://openjena.org/ARQ/

Re: [jena-dev] SPARQL: Transformation of SPARQL

Posted by Andy Seaborne <an...@talis.com>.

You can inspect the algebra for a query by using

arq.qparse --print=op --query=QueryFile

	Andy

On 26/03/2010 8:41 AM, Alexander Schätzle wrote:
 >
 >
 > Hi all,
 > I'm working on a translation of SPARQL queries into PigLatin Scripts
 > (language used by the Pig System for Hadoop developed by Yahoo). In
 > order to avoid writing a parser for SPARQL on my own (because I'm not
 > experienced with writing parsers and compilers) I intended to use ARQ
 > instead because it already has a parser for SPARQL and also a
 > translation into the SPARQL Algebra. For the translation into PigLatin I
 > developed a translation for every Algebra expression of SPARQL (BGP,
 > LeftJoin and so on) into PigLatin commands.
 > Now the question:
 > When parsing a SPARQL query into a corresponding Algebra expression
 > using ARQ is it possible to traverse the resulting Abstract Query Tree
 > (with elements of the Algebra expression, not the Abstract Syntax Tree)
 > in order to output the corresponding PigLatin commands? Unfortunatelly
 > the API of ARQ which I found online is not very helpful because there
 > are only very little comments and also not all packages are described.
 > If I got it right I will get an object of class
 > com.hp.hpl.jena.sparql.algebra.Op after parsing the query into an
 > Algebra expression. Is this the "root element" of the Algebra Tree? How
 > would it be possible to traverse the Algebra expression bottom-up
 > (starting with the BGPs) and output the corresponding PigLatin commands?
 > If someone could help me who is more experienced with this kind of
 > stuff, I would be very very grateful!
 > Is there any complete documentation about ARQ where the packages and
 > classes are described in detail and where it is explained how to use it
 > (perhaps with more code examples)?
 > Thanks very much in advance,
 > Alexander Schätzle
 > ----------------------------------
 > B.Sc. Univerity of Freiburg
 > email: mail@alexander-schaetzle.de <ma...@alexander-schaetzle.de>
 >
 > __________________________________________________
 > Do You Yahoo!?
 > Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz
 > gegen Massenmails.
 > http://mail.yahoo.com
 >
 > __._,_.___
 >
 >
 > Your email settings: Individual Email|Traditional
 > Change settings via the Web
 > 
<http://groups.yahoo.com/group/jena-dev/join;_ylc=X3oDMTJmb3JuaHY2BF9TAzk3NDc2NTkwBGdycElkAzM5OTc1NTMEZ3Jwc3BJZAMxNzA1MDA3MTgxBHNlYwNmdHIEc2xrA3N0bmdzBHN0aW1lAzEyNjk1OTI4OTM-> 

 > (Yahoo! ID required)
 > Change settings via email: Switch delivery to Daily Digest
 > <mailto:jena-dev-digest@yahoogroups.com?subject=Email Delivery: Digest>
 > | Switch to Fully Featured
 > <mailto:jena-dev-fullfeatured@yahoogroups.com?subject=Change Delivery
 > Format: Fully Featured>
 > Visit Your Group
 > 
<http://groups.yahoo.com/group/jena-dev;_ylc=X3oDMTJkbzR1azMyBF9TAzk3NDc2NTkwBGdycElkAzM5OTc1NTMEZ3Jwc3BJZAMxNzA1MDA3MTgxBHNlYwNmdHIEc2xrA2hwZgRzdGltZQMxMjY5NTkyODkz> 

 > | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> |
 > Unsubscribe
 > <mailto:jena-dev-unsubscribe@yahoogroups.com?subject=Unsubscribe>
 >
 > __,_._,___

Re: [jena-dev] SPARQL: Transformation of SPARQL

Posted by Paolo Castagna <ca...@googlemail.com>.

Ashutosh Chauhan wrote:
> This seems to be a nice and useful contrib project for Pig. Is anyone
> actively working on it?

I do not know anyone working on this in the open (i.e. within an open
source project). I'd like to see this idea implemented, if I can help
I'll do it.

> Ashutosh
> 
> PS: Since I am not on jena-dev list, it might not make it there. Paolo
> can you post it there in case it doesnt appear there.

Done.

> 
> On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
> <ca...@googlemail.com> wrote:
>> Hi Alexander
>>
>> Alexander Schätzle wrote:
>>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>>> (language used by the Pig System for Hadoop developed by Yahoo).
>> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
>> idea. With a small(?) amount of glue code you join two big communities
>> delivering value to both. On one hand, there is the need for
>> scalable/parallel processing systems. On the other hand, there is
>> the aim to support as many data formats as possible.
>>
>> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
>> shared an unpublished paper [1] on this topic. You should write them and ask
>> for a copy of the paper.
>>
>> Here follows a summary of their mapping between the SPARQL algebra
>> operators and solution modifiers and the Pig Latin syntax:
>>
>>
>>  SPARQL algebra                     Pig Latin syntax
>>  --------------------------------   ----------------------------------
>>
>>  BGP operator                       A set of FILTER operations,
>>                                     followed by a number of JOINs
>>                                     equal to the number of triple
>>                                     patterns and a single FOREACH
>>                                     statement.
>>
>>  Filter operator                    FILTER (with the limitation
>>                                     that not all expressions are
>>                                     directly supported by Pig Latin).
>>                                     However, Pig can be extended via
>>                                     user-defined functions (UDFs) to
>>                                     have a semantically equivalent
>>                                     filter behaviour.
>>
>>  Join operator                      A series of JOINs (which is an
>>                                     inner join in Pig) followed by a
>>                                     FOREACH (a projection) to remove
>>                                     the duplicated columns.
>>
>>  LeftJoin operator                  A series of outer JOINs plus a
>>                                     custom filter operator.
>>
>>  Union operator                     UNION
>>
>>  Graph operator                     ?
>>
>>  OrderBy modifier                   ORDER
>>
>>  Project modifier                   Achieved by FOREACH.
>>
>>  Distinct modifier                  DISTINCT
>>
>>  Reduced modifier                   Implemented using DISTINCT.
>>
>>  Slice modifier                     Implemented using a custom filter.
>>
>>  ToList modifier                    ?
>>
>>
>> They conclude:
>>
>>  "In summary, we have shown a complete translation procedure from
>>   SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
>>   interpreter. (This interpreter is not complete yet, but sufficient to
>>   cover the most commonly used queries, including the ones discussed in
>>   this paper.) Note that the query plans generated by our interpreter
>>   are unlikely to be optimal: optimization is left as a task
>>   for Pig." [1]
>>
>> I suggest you use N-Triples and/or N-Quads (parsers are available in
>> TDB) as input/output with Pig.
>>
>> Which version of Pig are you planning to use?
>>
>> My 2 cents,
>> Paolo
>>
>>  [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
>>     store using MapReduce", August 29, 2008 - unpublished
>>
>>
>>
>> PS:
>> For the benefit of Pig users/developers (in CC to this email):
>>
>> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
>> recommended RDF query language befined by W3C.
>>
>> More specifically, the SPARQL Algebra defines six operators and six
>> solution modifiers:
>>
>>  Operators                Solution Modifiers
>>  -----------------------  ---------------------
>>  BGP                      ToList
>>  Join                     OrderBy
>>  LeftJoin                 Project
>>  Filter                   Dinstict
>>  Union                    Reduced
>>  Graph                    Slice
>>
>> For each operator and solution modifier an evaluation semantics is clearly
>> defined by the SPARQL specification.
>>
>> ARQ is an open source query engine for Jena that supports the SPARQL RDF
>> Query language. ARQ provide a fully compliant parser for the SPARQL
>> syntax and, internally, represents SPARQL query using an abstract
>> syntaxt tree with operators and solution modifiers as elements.
>> The well known visitor design pattern is used to manipulate and transform
>> the internal representation of a SPARQL query.
>>
>>  - http://www.w3.org/TR/rdf-sparql-query/
>>  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
>>  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
>>  - http://openjena.org/ARQ/
>>
>>
>>
>>

Re: [jena-dev] SPARQL: Transformation of SPARQL

Posted by Ashutosh Chauhan <as...@gmail.com>.

This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it?

Ashutosh

PS: Since I am not on jena-dev list, it might not make it there. Paolo
can you post it there in case it doesnt appear there.

On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
<ca...@googlemail.com> wrote:
> Hi Alexander
>
> Alexander Schätzle wrote:
>>
>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>> (language used by the Pig System for Hadoop developed by Yahoo).
>
> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
> idea. With a small(?) amount of glue code you join two big communities
> delivering value to both. On one hand, there is the need for
> scalable/parallel processing systems. On the other hand, there is
> the aim to support as many data formats as possible.
>
> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
> shared an unpublished paper [1] on this topic. You should write them and ask
> for a copy of the paper.
>
> Here follows a summary of their mapping between the SPARQL algebra
> operators and solution modifiers and the Pig Latin syntax:
>
>
>  SPARQL algebra                     Pig Latin syntax
>  --------------------------------   ----------------------------------
>
>  BGP operator                       A set of FILTER operations,
>                                     followed by a number of JOINs
>                                     equal to the number of triple
>                                     patterns and a single FOREACH
>                                     statement.
>
>  Filter operator                    FILTER (with the limitation
>                                     that not all expressions are
>                                     directly supported by Pig Latin).
>                                     However, Pig can be extended via
>                                     user-defined functions (UDFs) to
>                                     have a semantically equivalent
>                                     filter behaviour.
>
>  Join operator                      A series of JOINs (which is an
>                                     inner join in Pig) followed by a
>                                     FOREACH (a projection) to remove
>                                     the duplicated columns.
>
>  LeftJoin operator                  A series of outer JOINs plus a
>                                     custom filter operator.
>
>  Union operator                     UNION
>
>  Graph operator                     ?
>
>  OrderBy modifier                   ORDER
>
>  Project modifier                   Achieved by FOREACH.
>
>  Distinct modifier                  DISTINCT
>
>  Reduced modifier                   Implemented using DISTINCT.
>
>  Slice modifier                     Implemented using a custom filter.
>
>  ToList modifier                    ?
>
>
> They conclude:
>
>  "In summary, we have shown a complete translation procedure from
>   SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
>   interpreter. (This interpreter is not complete yet, but sufficient to
>   cover the most commonly used queries, including the ones discussed in
>   this paper.) Note that the query plans generated by our interpreter
>   are unlikely to be optimal: optimization is left as a task
>   for Pig." [1]
>
> I suggest you use N-Triples and/or N-Quads (parsers are available in
> TDB) as input/output with Pig.
>
> Which version of Pig are you planning to use?
>
> My 2 cents,
> Paolo
>
>  [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
>     store using MapReduce", August 29, 2008 - unpublished
>
>
>
> PS:
> For the benefit of Pig users/developers (in CC to this email):
>
> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
> recommended RDF query language befined by W3C.
>
> More specifically, the SPARQL Algebra defines six operators and six
> solution modifiers:
>
>  Operators                Solution Modifiers
>  -----------------------  ---------------------
>  BGP                      ToList
>  Join                     OrderBy
>  LeftJoin                 Project
>  Filter                   Dinstict
>  Union                    Reduced
>  Graph                    Slice
>
> For each operator and solution modifier an evaluation semantics is clearly
> defined by the SPARQL specification.
>
> ARQ is an open source query engine for Jena that supports the SPARQL RDF
> Query language. ARQ provide a fully compliant parser for the SPARQL
> syntax and, internally, represents SPARQL query using an abstract
> syntaxt tree with operators and solution modifiers as elements.
> The well known visitor design pattern is used to manipulate and transform
> the internal representation of a SPARQL query.
>
>  - http://www.w3.org/TR/rdf-sparql-query/
>  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
>  - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
>  - http://openjena.org/ARQ/
>
>
>
>