You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Nouwt, B. (Barry)" <ba...@tno.nl.INVALID> on 2022/04/05 11:21:30 UTC

ARQ variables with dashes

Hi everyone,

We are using ARQ's SPARQL parser to parse graph patterns and noticed that it allows dashes in variable names if these variables occur as the *object* location of a triple pattern. If the variable names at the *subject* location of a triple pattern contains dashes, it fails with a ParseException. As far as we could tell the SPARQL specification does not allow dashes in variable names at all (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and pattern2 below should both fail, but the first one does not fail and the second does fail.

String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID .";
ARQParser parser1 = new ARQParser(new StringReader(pattern1));
parser1.GroupGraphPatternSub();

String pattern2 = "?community-ID https://www.tno.nl/example/b <test> .";
ARQParser parser2 = new ARQParser(new StringReader(pattern2));
parser2.GroupGraphPatternSub();

Is this a bug?

Best regards,

Barry
This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

RE: ARQ variables with dashes

Posted by "Nouwt, B. (Barry)" <ba...@tno.nl.INVALID>.
Thanks Andy, I now no longer use the ARQParser directly, but do as you suggested. I wrap the graph pattern in `SELECT * { ... }` and parse it as a regular Query and retrieve the ElementPathBlock using getQueryPattern(). Thanks!

-----Original Message-----
From: Andy Seaborne <an...@apache.org> 
Sent: vrijdag 8 april 2022 15:42
To: users@jena.apache.org
Subject: Re: ARQ variables with dashes



On 08/04/2022 14:00, Nouwt, B. (Barry) wrote:
> Hi Andy and Lorenz, thanks for your quick replies. I am not trying to parse full SPARQL, but actually only the Basic Graph Pattern part of a query. 

The most robust way is to wrap in enough query text, "SELECT * {"+ + "}" 
and pull out the WHERE clause with getQueryPattern().

> Is the org.apache.jena.sparql.lang.arq.ARQParser class not parsing SPARQL 1.1?

Yes, it is.  ARQ is a superset of SPARQL 1.1

> Based on Andy's directions it seems like doing the following additional check after parsing the string works for detecting graph patterns with dashes in their variable name at the object location.
> 
> if (parser1.token.next.kind == ARQParser.EOF) {
> 	// found valid graph pattern
> 	System.out.println("Graph pattern parse successful!"); } else {
> 	// stream not empty, so not a valid graph pattern
> 	System.out.println("Graph pattern parse failed!"); }

That assumes the parser did not peek ahead.

token.next may be null and you need to call getNextToken() -- from looking at the parser have code. Or maybe check parser.token.endColumn / parser.token.endLine ;

Best is to use a parser rule - either full query or modify the parser.

> 
> Thanks for your help!
> 
> Best regards,
> 
> Barry
> 
> -----Original Message-----
> From: Andy Seaborne <an...@apache.org>
> Sent: dinsdag 5 april 2022 14:07
> To: users@jena.apache.org
> Subject: Re: ARQ variables with dashes
> 
> Inline.
> 
> Summary : it didn't consume the whole input, only up to the end of the legal part.
> 
> On 05/04/2022 12:43, Lorenz Buehmann wrote:
>> Hi Barry,
>>
>>
>> Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond 
>> SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.
>>
>> Indeed, Andy will correct me soon :D
>>
>> The grammar files for JavaCC are here:
>>
>> https://github.com/apache/jena/tree/main/jena-arq/Grammar
>>
>> You can check arq.jj and sparql_11.jj
>>
>>
>> Or just wait for Andy's response ...
>>
>>
>> Cheers,
>>
>> Lorenz
>>
>>
>>
>> On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
>>> Hi everyone,
>>>
>>> We are using ARQ's SPARQL parser to parse graph patterns and noticed 
>>> that it allows dashes in variable names if these variables occur as 
>>> the *object* location of a triple pattern. If the variable names at 
>>> the *subject* location of a triple pattern contains dashes, it fails 
>>> with a ParseException. As far as we could tell the SPARQL 
>>> specification does not allow dashes in variable names at all 
>>> (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and
>>> pattern2 below should both fail, but the first one does not fail and 
>>> the second does fail.
>>>
>>> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID 
>>> ."; ARQParser parser1 = new ARQParser(new StringReader(pattern1)); 
>>> parser1.GroupGraphPatternSub();
> 
> Calling into the middle of the parse doesn't work so easily.
> 
> It has parsed up to the end of legal triple pattern.
> 
> "<test> https://www.tno.nl/example/b ?community"
> 
> when it sees the "-" the variable name has ended and (because the "." 
> is not required) it is a legal GroupGraphPatternSub
> 
> The "-ID ." is left in the token input stream.
> 
> You have to test whether end-of-input has been reached.
> 
> 
> try
> 
> qparse 'SELECT * { <test> <p> ?o-1 }'
> 
> Parse error because "-1", the next token (tokenizing is done ahead of where the parser grammar is the 1 in LL(1)) is not legal.
> 
> This is illegal because there is check for end of input:
> 
> qparse 'SELECT * { <test> <p> ?o } XXX'
> 
> The top level entry point is
> 
> void QueryUnit(): { }
> {
>     ByteOrderMark()
>     Query()
>     <EOF>
> }
> 
> so the parser must see <EOF> to be valid and exit without error.
> 
>       Andy
> 
>>>
>>> String pattern2 = "?community-ID https://www.tno.nl/example/b <test> 
>>> ."; ARQParser parser2 = new ARQParser(new StringReader(pattern2)); 
>>> parser2.GroupGraphPatternSub();
>>>
>>> Is this a bug?
>>>
>>> Best regards,
>>>
>>> Barry
>>> This message may contain information that is not intended for you. 
>>> If you are not the addressee or if this message was sent to you by 
>>> mistake, you are requested to inform the sender and delete the 
>>> message. TNO accepts no liability for the content of this e-mail, 
>>> for the manner in which you use it and for damage of any kind 
>>> resulting from the risks inherent to the electronic transmission of messages.
>>>

Re: ARQ variables with dashes

Posted by Andy Seaborne <an...@apache.org>.

On 08/04/2022 14:00, Nouwt, B. (Barry) wrote:
> Hi Andy and Lorenz, thanks for your quick replies. I am not trying to parse full SPARQL, but actually only the Basic Graph Pattern part of a query. 

The most robust way is to wrap in enough query text, "SELECT * {"+ + "}" 
and pull out the WHERE clause with getQueryPattern().

> Is the org.apache.jena.sparql.lang.arq.ARQParser class not parsing SPARQL 1.1?

Yes, it is.  ARQ is a superset of SPARQL 1.1

> Based on Andy's directions it seems like doing the following additional check after parsing the string works for detecting graph patterns with dashes in their variable name at the object location.
> 
> if (parser1.token.next.kind == ARQParser.EOF) {
> 	// found valid graph pattern
> 	System.out.println("Graph pattern parse successful!");
> } else {
> 	// stream not empty, so not a valid graph pattern
> 	System.out.println("Graph pattern parse failed!");
> }

That assumes the parser did not peek ahead.

token.next may be null and you need to call getNextToken() -- from 
looking at the parser have code. Or maybe check parser.token.endColumn / 
parser.token.endLine ;

Best is to use a parser rule - either full query or modify the parser.

> 
> Thanks for your help!
> 
> Best regards,
> 
> Barry
> 
> -----Original Message-----
> From: Andy Seaborne <an...@apache.org>
> Sent: dinsdag 5 april 2022 14:07
> To: users@jena.apache.org
> Subject: Re: ARQ variables with dashes
> 
> Inline.
> 
> Summary : it didn't consume the whole input, only up to the end of the legal part.
> 
> On 05/04/2022 12:43, Lorenz Buehmann wrote:
>> Hi Barry,
>>
>>
>> Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond
>> SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.
>>
>> Indeed, Andy will correct me soon :D
>>
>> The grammar files for JavaCC are here:
>>
>> https://github.com/apache/jena/tree/main/jena-arq/Grammar
>>
>> You can check arq.jj and sparql_11.jj
>>
>>
>> Or just wait for Andy's response ...
>>
>>
>> Cheers,
>>
>> Lorenz
>>
>>
>>
>> On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
>>> Hi everyone,
>>>
>>> We are using ARQ's SPARQL parser to parse graph patterns and noticed
>>> that it allows dashes in variable names if these variables occur as
>>> the *object* location of a triple pattern. If the variable names at
>>> the *subject* location of a triple pattern contains dashes, it fails
>>> with a ParseException. As far as we could tell the SPARQL
>>> specification does not allow dashes in variable names at all
>>> (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and
>>> pattern2 below should both fail, but the first one does not fail and
>>> the second does fail.
>>>
>>> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID
>>> ."; ARQParser parser1 = new ARQParser(new StringReader(pattern1));
>>> parser1.GroupGraphPatternSub();
> 
> Calling into the middle of the parse doesn't work so easily.
> 
> It has parsed up to the end of legal triple pattern.
> 
> "<test> https://www.tno.nl/example/b ?community"
> 
> when it sees the "-" the variable name has ended and (because the "." is not required) it is a legal GroupGraphPatternSub
> 
> The "-ID ." is left in the token input stream.
> 
> You have to test whether end-of-input has been reached.
> 
> 
> try
> 
> qparse 'SELECT * { <test> <p> ?o-1 }'
> 
> Parse error because "-1", the next token (tokenizing is done ahead of where the parser grammar is the 1 in LL(1)) is not legal.
> 
> This is illegal because there is check for end of input:
> 
> qparse 'SELECT * { <test> <p> ?o } XXX'
> 
> The top level entry point is
> 
> void QueryUnit(): { }
> {
>     ByteOrderMark()
>     Query()
>     <EOF>
> }
> 
> so the parser must see <EOF> to be valid and exit without error.
> 
>       Andy
> 
>>>
>>> String pattern2 = "?community-ID https://www.tno.nl/example/b <test>
>>> ."; ARQParser parser2 = new ARQParser(new StringReader(pattern2));
>>> parser2.GroupGraphPatternSub();
>>>
>>> Is this a bug?
>>>
>>> Best regards,
>>>
>>> Barry
>>> This message may contain information that is not intended for you. If
>>> you are not the addressee or if this message was sent to you by
>>> mistake, you are requested to inform the sender and delete the
>>> message. TNO accepts no liability for the content of this e-mail, for
>>> the manner in which you use it and for damage of any kind resulting
>>> from the risks inherent to the electronic transmission of messages.
>>>

RE: ARQ variables with dashes

Posted by "Nouwt, B. (Barry)" <ba...@tno.nl.INVALID>.
Hi Andy and Lorenz, thanks for your quick replies. I am not trying to parse full SPARQL, but actually only the Basic Graph Pattern part of a query. Is the org.apache.jena.sparql.lang.arq.ARQParser class not parsing SPARQL 1.1?

Based on Andy's directions it seems like doing the following additional check after parsing the string works for detecting graph patterns with dashes in their variable name at the object location.

if (parser1.token.next.kind == ARQParser.EOF) {
	// found valid graph pattern
	System.out.println("Graph pattern parse successful!");
} else {
	// stream not empty, so not a valid graph pattern
	System.out.println("Graph pattern parse failed!");
}

Thanks for your help!

Best regards,

Barry

-----Original Message-----
From: Andy Seaborne <an...@apache.org> 
Sent: dinsdag 5 april 2022 14:07
To: users@jena.apache.org
Subject: Re: ARQ variables with dashes

Inline.

Summary : it didn't consume the whole input, only up to the end of the legal part.

On 05/04/2022 12:43, Lorenz Buehmann wrote:
> Hi Barry,
> 
> 
> Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond 
> SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.
> 
> Indeed, Andy will correct me soon :D
> 
> The grammar files for JavaCC are here:
> 
> https://github.com/apache/jena/tree/main/jena-arq/Grammar
> 
> You can check arq.jj and sparql_11.jj
> 
> 
> Or just wait for Andy's response ...
> 
> 
> Cheers,
> 
> Lorenz
> 
> 
> 
> On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
>> Hi everyone,
>>
>> We are using ARQ's SPARQL parser to parse graph patterns and noticed 
>> that it allows dashes in variable names if these variables occur as 
>> the *object* location of a triple pattern. If the variable names at 
>> the *subject* location of a triple pattern contains dashes, it fails 
>> with a ParseException. As far as we could tell the SPARQL 
>> specification does not allow dashes in variable names at all 
>> (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and
>> pattern2 below should both fail, but the first one does not fail and 
>> the second does fail.
>>
>> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID 
>> ."; ARQParser parser1 = new ARQParser(new StringReader(pattern1)); 
>> parser1.GroupGraphPatternSub();

Calling into the middle of the parse doesn't work so easily.

It has parsed up to the end of legal triple pattern.

"<test> https://www.tno.nl/example/b ?community"

when it sees the "-" the variable name has ended and (because the "." is not required) it is a legal GroupGraphPatternSub

The "-ID ." is left in the token input stream.

You have to test whether end-of-input has been reached.


try

qparse 'SELECT * { <test> <p> ?o-1 }'

Parse error because "-1", the next token (tokenizing is done ahead of where the parser grammar is the 1 in LL(1)) is not legal.

This is illegal because there is check for end of input:

qparse 'SELECT * { <test> <p> ?o } XXX'

The top level entry point is

void QueryUnit(): { }
{
   ByteOrderMark()
   Query()
   <EOF>
}

so the parser must see <EOF> to be valid and exit without error.

     Andy

>>
>> String pattern2 = "?community-ID https://www.tno.nl/example/b <test> 
>> ."; ARQParser parser2 = new ARQParser(new StringReader(pattern2)); 
>> parser2.GroupGraphPatternSub();
>>
>> Is this a bug?
>>
>> Best regards,
>>
>> Barry
>> This message may contain information that is not intended for you. If 
>> you are not the addressee or if this message was sent to you by 
>> mistake, you are requested to inform the sender and delete the 
>> message. TNO accepts no liability for the content of this e-mail, for 
>> the manner in which you use it and for damage of any kind resulting 
>> from the risks inherent to the electronic transmission of messages.
>>

Re: ARQ variables with dashes

Posted by Andy Seaborne <an...@apache.org>.
Inline.

Summary : it didn't consume the whole input, only up to the end of the 
legal part.

On 05/04/2022 12:43, Lorenz Buehmann wrote:
> Hi Barry,
> 
> 
> Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond 
> SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.
> 
> Indeed, Andy will correct me soon :D
> 
> The grammar files for JavaCC are here:
> 
> https://github.com/apache/jena/tree/main/jena-arq/Grammar
> 
> You can check arq.jj and sparql_11.jj
> 
> 
> Or just wait for Andy's response ...
> 
> 
> Cheers,
> 
> Lorenz
> 
> 
> 
> On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
>> Hi everyone,
>>
>> We are using ARQ's SPARQL parser to parse graph patterns and noticed 
>> that it allows dashes in variable names if these variables occur as 
>> the *object* location of a triple pattern. If the variable names at 
>> the *subject* location of a triple pattern contains dashes, it fails 
>> with a ParseException. As far as we could tell the SPARQL 
>> specification does not allow dashes in variable names at all 
>> (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and 
>> pattern2 below should both fail, but the first one does not fail and 
>> the second does fail.
>>
>> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID .";
>> ARQParser parser1 = new ARQParser(new StringReader(pattern1));
>> parser1.GroupGraphPatternSub();

Calling into the middle of the parse doesn't work so easily.

It has parsed up to the end of legal triple pattern.

"<test> https://www.tno.nl/example/b ?community"

when it sees the "-" the variable name has ended and (because the "." is 
not required) it is a legal GroupGraphPatternSub

The "-ID ." is left in the token input stream.

You have to test whether end-of-input has been reached.


try

qparse 'SELECT * { <test> <p> ?o-1 }'

Parse error because "-1", the next token (tokenizing is done ahead of 
where the parser grammar is the 1 in LL(1)) is not legal.

This is illegal because there is check for end of input:

qparse 'SELECT * { <test> <p> ?o } XXX'

The top level entry point is

void QueryUnit(): { }
{
   ByteOrderMark()
   Query()
   <EOF>
}

so the parser must see <EOF> to be valid and exit without error.

     Andy

>>
>> String pattern2 = "?community-ID https://www.tno.nl/example/b <test> .";
>> ARQParser parser2 = new ARQParser(new StringReader(pattern2));
>> parser2.GroupGraphPatternSub();
>>
>> Is this a bug?
>>
>> Best regards,
>>
>> Barry
>> This message may contain information that is not intended for you. If 
>> you are not the addressee or if this message was sent to you by 
>> mistake, you are requested to inform the sender and delete the 
>> message. TNO accepts no liability for the content of this e-mail, for 
>> the manner in which you use it and for damage of any kind resulting 
>> from the risks inherent to the electronic transmission of messages.
>>

Re: ARQ variables with dashes

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
Hi Barry,


Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond 
SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.

Indeed, Andy will correct me soon :D

The grammar files for JavaCC are here:

https://github.com/apache/jena/tree/main/jena-arq/Grammar

You can check arq.jj and sparql_11.jj


Or just wait for Andy's response ...


Cheers,

Lorenz



On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
> Hi everyone,
>
> We are using ARQ's SPARQL parser to parse graph patterns and noticed that it allows dashes in variable names if these variables occur as the *object* location of a triple pattern. If the variable names at the *subject* location of a triple pattern contains dashes, it fails with a ParseException. As far as we could tell the SPARQL specification does not allow dashes in variable names at all (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and pattern2 below should both fail, but the first one does not fail and the second does fail.
>
> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID .";
> ARQParser parser1 = new ARQParser(new StringReader(pattern1));
> parser1.GroupGraphPatternSub();
>
> String pattern2 = "?community-ID https://www.tno.nl/example/b <test> .";
> ARQParser parser2 = new ARQParser(new StringReader(pattern2));
> parser2.GroupGraphPatternSub();
>
> Is this a bug?
>
> Best regards,
>
> Barry
> This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.
>