You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Paolo Castagna <ca...@googlemail.com> on 2010/03/26 11:47:35 UTC
Re: [jena-dev] SPARQL: Transformation of SPARQL
Hi Alexander
Alexander Schätzle wrote:
> I'm working on a translation of SPARQL queries into PigLatin Scripts
> (language used by the Pig System for Hadoop developed by Yahoo).
Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
idea. With a small(?) amount of glue code you join two big communities
delivering value to both. On one hand, there is the need for
scalable/parallel processing systems. On the other hand, there is
the aim to support as many data formats as possible.
I give fully credit for this idea to Peter Mika and Ben Reed that kindly
shared an unpublished paper [1] on this topic. You should write them and
ask for a copy of the paper.
Here follows a summary of their mapping between the SPARQL algebra
operators and solution modifiers and the Pig Latin syntax:
SPARQL algebra Pig Latin syntax
-------------------------------- ----------------------------------
BGP operator A set of FILTER operations,
followed by a number of JOINs
equal to the number of triple
patterns and a single FOREACH
statement.
Filter operator FILTER (with the limitation
that not all expressions are
directly supported by Pig Latin).
However, Pig can be extended via
user-defined functions (UDFs) to
have a semantically equivalent
filter behaviour.
Join operator A series of JOINs (which is an
inner join in Pig) followed by a
FOREACH (a projection) to remove
the duplicated columns.
LeftJoin operator A series of outer JOINs plus a
custom filter operator.
Union operator UNION
Graph operator ?
OrderBy modifier ORDER
Project modifier Achieved by FOREACH.
Distinct modifier DISTINCT
Reduced modifier Implemented using DISTINCT.
Slice modifier Implemented using a custom filter.
ToList modifier ?
They conclude:
"In summary, we have shown a complete translation procedure from
SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
interpreter. (This interpreter is not complete yet, but sufficient to
cover the most commonly used queries, including the ones discussed in
this paper.) Note that the query plans generated by our interpreter
are unlikely to be optimal: optimization is left as a task
for Pig." [1]
I suggest you use N-Triples and/or N-Quads (parsers are available in
TDB) as input/output with Pig.
Which version of Pig are you planning to use?
My 2 cents,
Paolo
[1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
store using MapReduce", August 29, 2008 - unpublished
PS:
For the benefit of Pig users/developers (in CC to this email):
SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
recommended RDF query language befined by W3C.
More specifically, the SPARQL Algebra defines six operators and six
solution modifiers:
Operators Solution Modifiers
----------------------- ---------------------
BGP ToList
Join OrderBy
LeftJoin Project
Filter Dinstict
Union Reduced
Graph Slice
For each operator and solution modifier an evaluation semantics is
clearly defined by the SPARQL specification.
ARQ is an open source query engine for Jena that supports the SPARQL RDF
Query language. ARQ provide a fully compliant parser for the SPARQL
syntax and, internally, represents SPARQL query using an abstract
syntaxt tree with operators and solution modifiers as elements.
The well known visitor design pattern is used to manipulate and
transform the internal representation of a SPARQL query.
- http://www.w3.org/TR/rdf-sparql-query/
- http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
- http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
- http://openjena.org/ARQ/
Re: [jena-dev] SPARQL: Transformation of SPARQL
Posted by Andy Seaborne <an...@talis.com>.
You can inspect the algebra for a query by using
arq.qparse --print=op --query=QueryFile
Andy
On 26/03/2010 8:41 AM, Alexander Schätzle wrote:
>
>
> Hi all,
> I'm working on a translation of SPARQL queries into PigLatin Scripts
> (language used by the Pig System for Hadoop developed by Yahoo). In
> order to avoid writing a parser for SPARQL on my own (because I'm not
> experienced with writing parsers and compilers) I intended to use ARQ
> instead because it already has a parser for SPARQL and also a
> translation into the SPARQL Algebra. For the translation into PigLatin I
> developed a translation for every Algebra expression of SPARQL (BGP,
> LeftJoin and so on) into PigLatin commands.
> Now the question:
> When parsing a SPARQL query into a corresponding Algebra expression
> using ARQ is it possible to traverse the resulting Abstract Query Tree
> (with elements of the Algebra expression, not the Abstract Syntax Tree)
> in order to output the corresponding PigLatin commands? Unfortunatelly
> the API of ARQ which I found online is not very helpful because there
> are only very little comments and also not all packages are described.
> If I got it right I will get an object of class
> com.hp.hpl.jena.sparql.algebra.Op after parsing the query into an
> Algebra expression. Is this the "root element" of the Algebra Tree? How
> would it be possible to traverse the Algebra expression bottom-up
> (starting with the BGPs) and output the corresponding PigLatin commands?
> If someone could help me who is more experienced with this kind of
> stuff, I would be very very grateful!
> Is there any complete documentation about ARQ where the packages and
> classes are described in detail and where it is explained how to use it
> (perhaps with more code examples)?
> Thanks very much in advance,
> Alexander Schätzle
> ----------------------------------
> B.Sc. Univerity of Freiburg
> email: mail@alexander-schaetzle.de <ma...@alexander-schaetzle.de>
>
> __________________________________________________
> Do You Yahoo!?
> Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz
> gegen Massenmails.
> http://mail.yahoo.com
>
> __._,_.___
>
>
> Your email settings: Individual Email|Traditional
> Change settings via the Web
>
<http://groups.yahoo.com/group/jena-dev/join;_ylc=X3oDMTJmb3JuaHY2BF9TAzk3NDc2NTkwBGdycElkAzM5OTc1NTMEZ3Jwc3BJZAMxNzA1MDA3MTgxBHNlYwNmdHIEc2xrA3N0bmdzBHN0aW1lAzEyNjk1OTI4OTM->
> (Yahoo! ID required)
> Change settings via email: Switch delivery to Daily Digest
> <mailto:jena-dev-digest@yahoogroups.com?subject=Email Delivery: Digest>
> | Switch to Fully Featured
> <mailto:jena-dev-fullfeatured@yahoogroups.com?subject=Change Delivery
> Format: Fully Featured>
> Visit Your Group
>
<http://groups.yahoo.com/group/jena-dev;_ylc=X3oDMTJkbzR1azMyBF9TAzk3NDc2NTkwBGdycElkAzM5OTc1NTMEZ3Jwc3BJZAMxNzA1MDA3MTgxBHNlYwNmdHIEc2xrA2hwZgRzdGltZQMxMjY5NTkyODkz>
> | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> |
> Unsubscribe
> <mailto:jena-dev-unsubscribe@yahoogroups.com?subject=Unsubscribe>
>
> __,_._,___
Re: [jena-dev] SPARQL: Transformation of SPARQL
Posted by Paolo Castagna <ca...@googlemail.com>.
Ashutosh Chauhan wrote:
> This seems to be a nice and useful contrib project for Pig. Is anyone
> actively working on it?
I do not know anyone working on this in the open (i.e. within an open
source project). I'd like to see this idea implemented, if I can help
I'll do it.
> Ashutosh
>
> PS: Since I am not on jena-dev list, it might not make it there. Paolo
> can you post it there in case it doesnt appear there.
Done.
>
> On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
> <ca...@googlemail.com> wrote:
>> Hi Alexander
>>
>> Alexander Schätzle wrote:
>>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>>> (language used by the Pig System for Hadoop developed by Yahoo).
>> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
>> idea. With a small(?) amount of glue code you join two big communities
>> delivering value to both. On one hand, there is the need for
>> scalable/parallel processing systems. On the other hand, there is
>> the aim to support as many data formats as possible.
>>
>> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
>> shared an unpublished paper [1] on this topic. You should write them and ask
>> for a copy of the paper.
>>
>> Here follows a summary of their mapping between the SPARQL algebra
>> operators and solution modifiers and the Pig Latin syntax:
>>
>>
>> SPARQL algebra Pig Latin syntax
>> -------------------------------- ----------------------------------
>>
>> BGP operator A set of FILTER operations,
>> followed by a number of JOINs
>> equal to the number of triple
>> patterns and a single FOREACH
>> statement.
>>
>> Filter operator FILTER (with the limitation
>> that not all expressions are
>> directly supported by Pig Latin).
>> However, Pig can be extended via
>> user-defined functions (UDFs) to
>> have a semantically equivalent
>> filter behaviour.
>>
>> Join operator A series of JOINs (which is an
>> inner join in Pig) followed by a
>> FOREACH (a projection) to remove
>> the duplicated columns.
>>
>> LeftJoin operator A series of outer JOINs plus a
>> custom filter operator.
>>
>> Union operator UNION
>>
>> Graph operator ?
>>
>> OrderBy modifier ORDER
>>
>> Project modifier Achieved by FOREACH.
>>
>> Distinct modifier DISTINCT
>>
>> Reduced modifier Implemented using DISTINCT.
>>
>> Slice modifier Implemented using a custom filter.
>>
>> ToList modifier ?
>>
>>
>> They conclude:
>>
>> "In summary, we have shown a complete translation procedure from
>> SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
>> interpreter. (This interpreter is not complete yet, but sufficient to
>> cover the most commonly used queries, including the ones discussed in
>> this paper.) Note that the query plans generated by our interpreter
>> are unlikely to be optimal: optimization is left as a task
>> for Pig." [1]
>>
>> I suggest you use N-Triples and/or N-Quads (parsers are available in
>> TDB) as input/output with Pig.
>>
>> Which version of Pig are you planning to use?
>>
>> My 2 cents,
>> Paolo
>>
>> [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
>> store using MapReduce", August 29, 2008 - unpublished
>>
>>
>>
>> PS:
>> For the benefit of Pig users/developers (in CC to this email):
>>
>> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
>> recommended RDF query language befined by W3C.
>>
>> More specifically, the SPARQL Algebra defines six operators and six
>> solution modifiers:
>>
>> Operators Solution Modifiers
>> ----------------------- ---------------------
>> BGP ToList
>> Join OrderBy
>> LeftJoin Project
>> Filter Dinstict
>> Union Reduced
>> Graph Slice
>>
>> For each operator and solution modifier an evaluation semantics is clearly
>> defined by the SPARQL specification.
>>
>> ARQ is an open source query engine for Jena that supports the SPARQL RDF
>> Query language. ARQ provide a fully compliant parser for the SPARQL
>> syntax and, internally, represents SPARQL query using an abstract
>> syntaxt tree with operators and solution modifiers as elements.
>> The well known visitor design pattern is used to manipulate and transform
>> the internal representation of a SPARQL query.
>>
>> - http://www.w3.org/TR/rdf-sparql-query/
>> - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
>> - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
>> - http://openjena.org/ARQ/
>>
>>
>>
>>
Re: [jena-dev] SPARQL: Transformation of SPARQL
Posted by Ashutosh Chauhan <as...@gmail.com>.
This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it?
Ashutosh
PS: Since I am not on jena-dev list, it might not make it there. Paolo
can you post it there in case it doesnt appear there.
On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
<ca...@googlemail.com> wrote:
> Hi Alexander
>
> Alexander Schätzle wrote:
>>
>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>> (language used by the Pig System for Hadoop developed by Yahoo).
>
> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
> idea. With a small(?) amount of glue code you join two big communities
> delivering value to both. On one hand, there is the need for
> scalable/parallel processing systems. On the other hand, there is
> the aim to support as many data formats as possible.
>
> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
> shared an unpublished paper [1] on this topic. You should write them and ask
> for a copy of the paper.
>
> Here follows a summary of their mapping between the SPARQL algebra
> operators and solution modifiers and the Pig Latin syntax:
>
>
> SPARQL algebra Pig Latin syntax
> -------------------------------- ----------------------------------
>
> BGP operator A set of FILTER operations,
> followed by a number of JOINs
> equal to the number of triple
> patterns and a single FOREACH
> statement.
>
> Filter operator FILTER (with the limitation
> that not all expressions are
> directly supported by Pig Latin).
> However, Pig can be extended via
> user-defined functions (UDFs) to
> have a semantically equivalent
> filter behaviour.
>
> Join operator A series of JOINs (which is an
> inner join in Pig) followed by a
> FOREACH (a projection) to remove
> the duplicated columns.
>
> LeftJoin operator A series of outer JOINs plus a
> custom filter operator.
>
> Union operator UNION
>
> Graph operator ?
>
> OrderBy modifier ORDER
>
> Project modifier Achieved by FOREACH.
>
> Distinct modifier DISTINCT
>
> Reduced modifier Implemented using DISTINCT.
>
> Slice modifier Implemented using a custom filter.
>
> ToList modifier ?
>
>
> They conclude:
>
> "In summary, we have shown a complete translation procedure from
> SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
> interpreter. (This interpreter is not complete yet, but sufficient to
> cover the most commonly used queries, including the ones discussed in
> this paper.) Note that the query plans generated by our interpreter
> are unlikely to be optimal: optimization is left as a task
> for Pig." [1]
>
> I suggest you use N-Triples and/or N-Quads (parsers are available in
> TDB) as input/output with Pig.
>
> Which version of Pig are you planning to use?
>
> My 2 cents,
> Paolo
>
> [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
> store using MapReduce", August 29, 2008 - unpublished
>
>
>
> PS:
> For the benefit of Pig users/developers (in CC to this email):
>
> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
> recommended RDF query language befined by W3C.
>
> More specifically, the SPARQL Algebra defines six operators and six
> solution modifiers:
>
> Operators Solution Modifiers
> ----------------------- ---------------------
> BGP ToList
> Join OrderBy
> LeftJoin Project
> Filter Dinstict
> Union Reduced
> Graph Slice
>
> For each operator and solution modifier an evaluation semantics is clearly
> defined by the SPARQL specification.
>
> ARQ is an open source query engine for Jena that supports the SPARQL RDF
> Query language. ARQ provide a fully compliant parser for the SPARQL
> syntax and, internally, represents SPARQL query using an abstract
> syntaxt tree with operators and solution modifiers as elements.
> The well known visitor design pattern is used to manipulate and transform
> the internal representation of a SPARQL query.
>
> - http://www.w3.org/TR/rdf-sparql-query/
> - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra
> - http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebraEval
> - http://openjena.org/ARQ/
>
>
>
>