You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Claude Warren <cl...@xenei.com> on 2012/04/01 10:01:34 UTC

Re: Should data errors from federated calls be ignored when silent?

>
> On 30/03/12 16:52, Claude Warren wrote:
> > I have a case where I am using multiple federated calls where each call
> is
> > of the form
> > Service silent<uri> {
> > --snip--
> > }
> >
> > One of the endpoints is returning bad data in that the XML does not
parse
> > and so the XML parser throws an exception and my entire query dies.
> >
> > Now the best answer would be to get the data corected but I don't own
> that
> > data and have no idea if they will fix it.
> >
> > What I want to know is shouldn't the "Silent" keyword on the Service
call
> > indicate that if the remote fails it should be ignored.
> >
> > From http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure it
> > appears that a single solution with no bindings should be returned. If
> > this is a correct interpretation I am willing to report a bug and
> implement
> > a bug fix. The issue that I see is that the error is not detected until
a
> > hasNext() is called on the iterator. This means that the service could
> > have returned some data before the error was detected. I would propose
> > that the solution be to have the iterator return "false" at that point
> and
> > move forward with the partial data that was already returned.
> >
> > Does anyone have a different interpretation of the specification or see
> an
> > issue with the possible solution?
> >
> > Many thanks,
> > Claude
> Hi Claude,
> Hmm - tricky :-)
> The key sentence is:
> [[
> The SILENT keyword indicates that errors encountered while accessing a
> remote SPARQL endpoint should be ignored while processing the query.
> ]]
> but HTTP has a bit of an issue here.
> Suppose the request is made and "200 OK" is received. That's a contract
> that the results are going to be sent and be valid. Bad syntax of
> results isn't considered nor are breaks in communications.
> The only way the address is for the service operations (class
> QueryIterService) to consume and buffer all the results. I've just
> added this in QueryIterService.
> An effect of this is that you will not get any valid earlier results;
> which is what you propose and quite sensible.
> There needs to be a QueryIterator implementation that reads another
> QueryIterator until some error occurs and signal end at that point.
> That would be worthwhile - please do contribute such a thing.
> Theer's a QueryIteratorWrapper that can be used to intercept
> .hasNext/.next calls so you can add try-ctach.
> Do you which SPARQL implementation is generating bad results?
> Andy

Andy,

I am not certain which sparql endpoint is generating the bad results
-- though I do intend to find out.

I will look at implementing a QueryIterator as you noted above.  I
then need to plug it into the Fuseki query engine chain.

Since I am querying multiple Sparql endpoints and performing unions on
their results and since it seems to take quite a long time for some
results I was considering implementing a Union query for service calls
that would effectively poll each service endpoint in turn looking for
the next one that has query result avaiable.  A polling query iterator
if you will.  The hope is to parrallelize the queries as much as
possible.

But I should probably open another discussion for that topic.

Claude

-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Should data errors from federated calls be ignored when silent?

Posted by Andy Seaborne <an...@apache.org>.
On 02/04/12 11:54, Andy Seaborne wrote:
> On 02/04/12 11:41, Claude Warren wrote:
>> The query that returns the error is:
>>
>> select * where {
>> SERVICE<http://s4.semanticscience.org:12027/sparql> {
>> ?coFactor<http://bio2rdf.org/ns/bio2rdf#synonym> ?syn
>> }
>> }
>
> OpenLink Virtuoso.
>
> This has been known to generate illegal XML (let alone SPARQL results
> format). In this case: line 809:
>
> <result>
> <binding name="coFactor"><uri>http://bio2rdf.org/mgi:88600</uri></binding>
> <binding name="syn"><literal>16&#7;lphaoh-a</literal></binding>
> </result>
>
> <literal>16&#7;lphaoh-a</literal> is illegal XML.
>
> Must be two numbers after &#. This crashes the XML parser in Java7
> (which is derived from Xerces).

My mistake - it's because in XML 1.0, not all Unicode characters are legal.

[2]   	Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] 
| [#x10000-#x10FFFF]

so no 7.

The SPARQL results output is passing through bad data.

Actually, looking at the results (I grabbed a copy via the web UI), 
there are other data quality issues.

<binding name="syn"><literal>113.G6-3&amp;amp;#39;</literal></binding>

	Andy

>
> Andy
>
>>
>>
>>
>> On Sun, Apr 1, 2012 at 6:48 PM, Andy Seaborne<an...@apache.org> wrote:
>>
>>> On 01/04/12 09:01, Claude Warren wrote:
>>>
>>>>
>>>>> On 30/03/12 16:52, Claude Warren wrote:
>>>>>
>>>>>> I have a case where I am using multiple federated calls where each
>>>>>> call
>>>>>>
>>>>> is
>>>>>
>>>>>> of the form
>>>>>> Service silent<uri> {
>>>>>> --snip--
>>>>>> }
>>>>>>
>>>>>> One of the endpoints is returning bad data in that the XML does not
>>>>>>
>>>>> parse
>>>>
>>>>> and so the XML parser throws an exception and my entire query dies.
>>>>>>
>>>>>> Now the best answer would be to get the data corected but I don't own
>>>>>>
>>>>> that
>>>>>
>>>>>> data and have no idea if they will fix it.
>>>>>>
>>>>>> What I want to know is shouldn't the "Silent" keyword on the Service
>>>>>>
>>>>> call
>>>>
>>>>> indicate that if the remote fails it should be ignored.
>>>>>>
>>>>>> From
>>>>>> http://www.w3.org/2009/sparql/**docs/fed/service#**serviceFailure<http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure>it
>>>>>>
>>>>>> appears that a single solution with no bindings should be
>>>>>> returned. If
>>>>>> this is a correct interpretation I am willing to report a bug and
>>>>>>
>>>>> implement
>>>>>
>>>>>> a bug fix. The issue that I see is that the error is not detected
>>>>>> until
>>>>>>
>>>>> a
>>>>
>>>>> hasNext() is called on the iterator. This means that the service could
>>>>>> have returned some data before the error was detected. I would
>>>>>> propose
>>>>>> that the solution be to have the iterator return "false" at that
>>>>>> point
>>>>>>
>>>>> and
>>>>>
>>>>>> move forward with the partial data that was already returned.
>>>>>>
>>>>>> Does anyone have a different interpretation of the specification
>>>>>> or see
>>>>>>
>>>>> an
>>>>>
>>>>>> issue with the possible solution?
>>>>>>
>>>>>> Many thanks,
>>>>>> Claude
>>>>>>
>>>>> Hi Claude,
>>>>> Hmm - tricky :-)
>>>>> The key sentence is:
>>>>> [[
>>>>> The SILENT keyword indicates that errors encountered while accessing a
>>>>> remote SPARQL endpoint should be ignored while processing the query.
>>>>> ]]
>>>>> but HTTP has a bit of an issue here.
>>>>> Suppose the request is made and "200 OK" is received. That's a
>>>>> contract
>>>>> that the results are going to be sent and be valid. Bad syntax of
>>>>> results isn't considered nor are breaks in communications.
>>>>> The only way the address is for the service operations (class
>>>>> QueryIterService) to consume and buffer all the results. I've just
>>>>> added this in QueryIterService.
>>>>> An effect of this is that you will not get any valid earlier results;
>>>>> which is what you propose and quite sensible.
>>>>> There needs to be a QueryIterator implementation that reads another
>>>>> QueryIterator until some error occurs and signal end at that point.
>>>>> That would be worthwhile - please do contribute such a thing.
>>>>> Theer's a QueryIteratorWrapper that can be used to intercept
>>>>> .hasNext/.next calls so you can add try-ctach.
>>>>> Do you which SPARQL implementation is generating bad results?
>>>>> Andy
>>>>>
>>>>
>>>> Andy,
>>>>
>>>> I am not certain which sparql endpoint is generating the bad results
>>>> -- though I do intend to find out.
>>>>
>>>> I will look at implementing a QueryIterator as you noted above. I
>>>> then need to plug it into the Fuseki query engine chain.
>>>>
>>>
>>> The place to plug it in is in QueryIterService in ARQ (Fuseki is the
>>> protocol engine; ARQ ships with Fuseki).
>>>
>>>
>>> Since I am querying multiple Sparql endpoints and performing unions on
>>>> their results and since it seems to take quite a long time for some
>>>> results I was considering implementing a Union query for service calls
>>>> that would effectively poll each service endpoint in turn looking for
>>>> the next one that has query result avaiable. A polling query iterator
>>>> if you will. The hope is to parrallelize the queries as much as
>>>> possible.
>>>>
>>>
>>> There as a discussion of this recently on this list.
>>>
>>> ARQ is rather prone to serial execution (parallelism in Fuseki is
>>> used to
>>> execute multiple concurrent requests, not to give all system
>>> resources for
>>> one query). There's nothing fundamental about ARQ's serial execution of
>>> UNIONs - a different implementation of execution or a different operator
>>> could make parallel SERVICE calls.
>>>
>>> Andy
>>>
>>>
>>>
>>>> But I should probably open another discussion for that topic.
>>>>
>>>> Claude
>>>>
>>>>
>>>
>>
>>
>


Re: Should data errors from federated calls be ignored when silent?

Posted by Andy Seaborne <an...@apache.org>.
On 02/04/12 11:41, Claude Warren wrote:
> The query that returns the error is:
>
> select * where {
> SERVICE<http://s4.semanticscience.org:12027/sparql>  {
> ?coFactor<http://bio2rdf.org/ns/bio2rdf#synonym>  ?syn
> }
> }

OpenLink Virtuoso.

This has been known to generate illegal XML (let alone SPARQL results 
format). In this case: line 809:

  <result>
    <binding 
name="coFactor"><uri>http://bio2rdf.org/mgi:88600</uri></binding>
    <binding name="syn"><literal>16&#7;lphaoh-a</literal></binding>
   </result>

<literal>16&#7;lphaoh-a</literal> is illegal XML.

Must be two numbers after &#.  This crashes the XML parser in Java7 
(which is derived from Xerces).

	Andy

>
>
>
> On Sun, Apr 1, 2012 at 6:48 PM, Andy Seaborne<an...@apache.org>  wrote:
>
>> On 01/04/12 09:01, Claude Warren wrote:
>>
>>>
>>>> On 30/03/12 16:52, Claude Warren wrote:
>>>>
>>>>> I have a case where I am using multiple federated calls where each call
>>>>>
>>>> is
>>>>
>>>>> of the form
>>>>> Service silent<uri>   {
>>>>> --snip--
>>>>> }
>>>>>
>>>>> One of the endpoints is returning bad data in that the XML does not
>>>>>
>>>> parse
>>>
>>>> and so the XML parser throws an exception and my entire query dies.
>>>>>
>>>>> Now the best answer would be to get the data corected but I don't own
>>>>>
>>>> that
>>>>
>>>>> data and have no idea if they will fix it.
>>>>>
>>>>> What I want to know is shouldn't the "Silent" keyword on the Service
>>>>>
>>>> call
>>>
>>>> indicate that if the remote fails it should be ignored.
>>>>>
>>>>>    From http://www.w3.org/2009/sparql/**docs/fed/service#**serviceFailure<http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure>it
>>>>> appears that a single solution with no bindings should be returned. If
>>>>> this is a correct interpretation I am willing to report a bug and
>>>>>
>>>> implement
>>>>
>>>>> a bug fix. The issue that I see is that the error is not detected until
>>>>>
>>>> a
>>>
>>>> hasNext() is called on the iterator. This means that the service could
>>>>> have returned some data before the error was detected. I would propose
>>>>> that the solution be to have the iterator return "false" at that point
>>>>>
>>>> and
>>>>
>>>>> move forward with the partial data that was already returned.
>>>>>
>>>>> Does anyone have a different interpretation of the specification or see
>>>>>
>>>> an
>>>>
>>>>> issue with the possible solution?
>>>>>
>>>>> Many thanks,
>>>>> Claude
>>>>>
>>>> Hi Claude,
>>>> Hmm - tricky :-)
>>>> The key sentence is:
>>>> [[
>>>> The SILENT keyword indicates that errors encountered while accessing a
>>>> remote SPARQL endpoint should be ignored while processing the query.
>>>> ]]
>>>> but HTTP has a bit of an issue here.
>>>> Suppose the request is made and "200 OK" is received. That's a contract
>>>> that the results are going to be sent and be valid. Bad syntax of
>>>> results isn't considered nor are breaks in communications.
>>>> The only way the address is for the service operations (class
>>>> QueryIterService) to consume and buffer all the results. I've just
>>>> added this in QueryIterService.
>>>> An effect of this is that you will not get any valid earlier results;
>>>> which is what you propose and quite sensible.
>>>> There needs to be a QueryIterator implementation that reads another
>>>> QueryIterator until some error occurs and signal end at that point.
>>>> That would be worthwhile - please do contribute such a thing.
>>>> Theer's a QueryIteratorWrapper that can be used to intercept
>>>> .hasNext/.next calls so you can add try-ctach.
>>>> Do you which SPARQL implementation is generating bad results?
>>>> Andy
>>>>
>>>
>>> Andy,
>>>
>>> I am not certain which sparql endpoint is generating the bad results
>>> -- though I do intend to find out.
>>>
>>> I will look at implementing a QueryIterator as you noted above.  I
>>> then need to plug it into the Fuseki query engine chain.
>>>
>>
>> The place to plug it in is in QueryIterService in ARQ (Fuseki is the
>> protocol engine; ARQ ships with Fuseki).
>>
>>
>>   Since I am querying multiple Sparql endpoints and performing unions on
>>> their results and since it seems to take quite a long time for some
>>> results I was considering implementing a Union query for service calls
>>> that would effectively poll each service endpoint in turn looking for
>>> the next one that has query result avaiable.  A polling query iterator
>>> if you will.  The hope is to parrallelize the queries as much as
>>> possible.
>>>
>>
>> There as a discussion of this recently on this list.
>>
>> ARQ is rather prone to serial execution (parallelism in Fuseki is used to
>> execute multiple concurrent requests, not to give all system resources for
>> one query).  There's nothing fundamental about ARQ's serial execution of
>> UNIONs - a different implementation of execution or a different operator
>> could make parallel SERVICE calls.
>>
>>         Andy
>>
>>
>>
>>> But I should probably open another discussion for that topic.
>>>
>>> Claude
>>>
>>>
>>
>
>


Re: Should data errors from federated calls be ignored when silent?

Posted by Claude Warren <cl...@xenei.com>.
The query that returns the error is:

select * where {
SERVICE <http://s4.semanticscience.org:12027/sparql> {
?coFactor <http://bio2rdf.org/ns/bio2rdf#synonym> ?syn
}
}



On Sun, Apr 1, 2012 at 6:48 PM, Andy Seaborne <an...@apache.org> wrote:

> On 01/04/12 09:01, Claude Warren wrote:
>
>>
>>> On 30/03/12 16:52, Claude Warren wrote:
>>>
>>>> I have a case where I am using multiple federated calls where each call
>>>>
>>> is
>>>
>>>> of the form
>>>> Service silent<uri>  {
>>>> --snip--
>>>> }
>>>>
>>>> One of the endpoints is returning bad data in that the XML does not
>>>>
>>> parse
>>
>>> and so the XML parser throws an exception and my entire query dies.
>>>>
>>>> Now the best answer would be to get the data corected but I don't own
>>>>
>>> that
>>>
>>>> data and have no idea if they will fix it.
>>>>
>>>> What I want to know is shouldn't the "Silent" keyword on the Service
>>>>
>>> call
>>
>>> indicate that if the remote fails it should be ignored.
>>>>
>>>>  From http://www.w3.org/2009/sparql/**docs/fed/service#**serviceFailure<http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure>it
>>>> appears that a single solution with no bindings should be returned. If
>>>> this is a correct interpretation I am willing to report a bug and
>>>>
>>> implement
>>>
>>>> a bug fix. The issue that I see is that the error is not detected until
>>>>
>>> a
>>
>>> hasNext() is called on the iterator. This means that the service could
>>>> have returned some data before the error was detected. I would propose
>>>> that the solution be to have the iterator return "false" at that point
>>>>
>>> and
>>>
>>>> move forward with the partial data that was already returned.
>>>>
>>>> Does anyone have a different interpretation of the specification or see
>>>>
>>> an
>>>
>>>> issue with the possible solution?
>>>>
>>>> Many thanks,
>>>> Claude
>>>>
>>> Hi Claude,
>>> Hmm - tricky :-)
>>> The key sentence is:
>>> [[
>>> The SILENT keyword indicates that errors encountered while accessing a
>>> remote SPARQL endpoint should be ignored while processing the query.
>>> ]]
>>> but HTTP has a bit of an issue here.
>>> Suppose the request is made and "200 OK" is received. That's a contract
>>> that the results are going to be sent and be valid. Bad syntax of
>>> results isn't considered nor are breaks in communications.
>>> The only way the address is for the service operations (class
>>> QueryIterService) to consume and buffer all the results. I've just
>>> added this in QueryIterService.
>>> An effect of this is that you will not get any valid earlier results;
>>> which is what you propose and quite sensible.
>>> There needs to be a QueryIterator implementation that reads another
>>> QueryIterator until some error occurs and signal end at that point.
>>> That would be worthwhile - please do contribute such a thing.
>>> Theer's a QueryIteratorWrapper that can be used to intercept
>>> .hasNext/.next calls so you can add try-ctach.
>>> Do you which SPARQL implementation is generating bad results?
>>> Andy
>>>
>>
>> Andy,
>>
>> I am not certain which sparql endpoint is generating the bad results
>> -- though I do intend to find out.
>>
>> I will look at implementing a QueryIterator as you noted above.  I
>> then need to plug it into the Fuseki query engine chain.
>>
>
> The place to plug it in is in QueryIterService in ARQ (Fuseki is the
> protocol engine; ARQ ships with Fuseki).
>
>
>  Since I am querying multiple Sparql endpoints and performing unions on
>> their results and since it seems to take quite a long time for some
>> results I was considering implementing a Union query for service calls
>> that would effectively poll each service endpoint in turn looking for
>> the next one that has query result avaiable.  A polling query iterator
>> if you will.  The hope is to parrallelize the queries as much as
>> possible.
>>
>
> There as a discussion of this recently on this list.
>
> ARQ is rather prone to serial execution (parallelism in Fuseki is used to
> execute multiple concurrent requests, not to give all system resources for
> one query).  There's nothing fundamental about ARQ's serial execution of
> UNIONs - a different implementation of execution or a different operator
> could make parallel SERVICE calls.
>
>        Andy
>
>
>
>> But I should probably open another discussion for that topic.
>>
>> Claude
>>
>>
>


-- 
I like: Like Like - The likeliest place on the web<http://like-like.xenei.com>
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Should data errors from federated calls be ignored when silent?

Posted by Andy Seaborne <an...@apache.org>.
On 01/04/12 09:01, Claude Warren wrote:
>>
>> On 30/03/12 16:52, Claude Warren wrote:
>>> I have a case where I am using multiple federated calls where each call
>> is
>>> of the form
>>> Service silent<uri>  {
>>> --snip--
>>> }
>>>
>>> One of the endpoints is returning bad data in that the XML does not
> parse
>>> and so the XML parser throws an exception and my entire query dies.
>>>
>>> Now the best answer would be to get the data corected but I don't own
>> that
>>> data and have no idea if they will fix it.
>>>
>>> What I want to know is shouldn't the "Silent" keyword on the Service
> call
>>> indicate that if the remote fails it should be ignored.
>>>
>>>  From http://www.w3.org/2009/sparql/docs/fed/service#serviceFailure it
>>> appears that a single solution with no bindings should be returned. If
>>> this is a correct interpretation I am willing to report a bug and
>> implement
>>> a bug fix. The issue that I see is that the error is not detected until
> a
>>> hasNext() is called on the iterator. This means that the service could
>>> have returned some data before the error was detected. I would propose
>>> that the solution be to have the iterator return "false" at that point
>> and
>>> move forward with the partial data that was already returned.
>>>
>>> Does anyone have a different interpretation of the specification or see
>> an
>>> issue with the possible solution?
>>>
>>> Many thanks,
>>> Claude
>> Hi Claude,
>> Hmm - tricky :-)
>> The key sentence is:
>> [[
>> The SILENT keyword indicates that errors encountered while accessing a
>> remote SPARQL endpoint should be ignored while processing the query.
>> ]]
>> but HTTP has a bit of an issue here.
>> Suppose the request is made and "200 OK" is received. That's a contract
>> that the results are going to be sent and be valid. Bad syntax of
>> results isn't considered nor are breaks in communications.
>> The only way the address is for the service operations (class
>> QueryIterService) to consume and buffer all the results. I've just
>> added this in QueryIterService.
>> An effect of this is that you will not get any valid earlier results;
>> which is what you propose and quite sensible.
>> There needs to be a QueryIterator implementation that reads another
>> QueryIterator until some error occurs and signal end at that point.
>> That would be worthwhile - please do contribute such a thing.
>> Theer's a QueryIteratorWrapper that can be used to intercept
>> .hasNext/.next calls so you can add try-ctach.
>> Do you which SPARQL implementation is generating bad results?
>> Andy
>
> Andy,
>
> I am not certain which sparql endpoint is generating the bad results
> -- though I do intend to find out.
>
> I will look at implementing a QueryIterator as you noted above.  I
> then need to plug it into the Fuseki query engine chain.

The place to plug it in is in QueryIterService in ARQ (Fuseki is the 
protocol engine; ARQ ships with Fuseki).

> Since I am querying multiple Sparql endpoints and performing unions on
> their results and since it seems to take quite a long time for some
> results I was considering implementing a Union query for service calls
> that would effectively poll each service endpoint in turn looking for
> the next one that has query result avaiable.  A polling query iterator
> if you will.  The hope is to parrallelize the queries as much as
> possible.

There as a discussion of this recently on this list.

ARQ is rather prone to serial execution (parallelism in Fuseki is used 
to execute multiple concurrent requests, not to give all system 
resources for one query).  There's nothing fundamental about ARQ's 
serial execution of UNIONs - a different implementation of execution or 
a different operator could make parallel SERVICE calls.

	Andy

>
> But I should probably open another discussion for that topic.
>
> Claude
>