You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Rose Beck <ro...@gmail.com> on 2013/12/26 21:51:49 UTC

Incorrect output: Request guidance

I created my data file containing the following data(try.nq):

 <http://dbpedia.org/data/Plasmodium_hegneri.xml> <
http://code.google.com/p/ldspider/ns#headerInfo>
_:header16125770191335188966549  <a> .
<a> <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>  <id_1> .
 _:header16125770191335188966549 <http://www.w3.org/2006/http#responseCode>
"200"^^<http://www.w3.org/2001/XMLSchema#integer>  <c> .
<c> <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>  <id_3> .
 _:header16125770191335188966549 <http://www.w3.org/2006/http#date> "Mon,
23 Apr 2012 13:49:27 GMT"  <d> .
<d> <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>  <id_5> .

After that I entered the following command for loading data into Jena-TDB:
root@server:/home/apache-jena-2.10.0/bin# ./tdbloader --loc=/home/Jena/try
-v /home/try.nq

Then I fired the following SPARQL command:
root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
--loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a ?b  <
http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"

I got the following output:
-----------------
| a   | b   | c |
=================
| <a> | <b> |   |
| <c> | <b> |   |
| <d> | <b> |   |
-----------------
Time: 0.104 sec

After this I tried another SPARQL query(given below) for which I obtained
an incorrect output:
SPARQL query:
root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
--loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b>  <
http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
Output:
-------------
| a | b | c |
=============
-------------
Time: 0.095 sec

This output seems to be incorrect. Can someone please help me as to where
am I going wrong?



Cheers,
Rose

Re: Incorrect output: Request guidance

Posted by Andy Seaborne <an...@apache.org>.
On 27/12/13 10:38, Andy Seaborne wrote:
> On 27/12/13 01:45, Rose Beck wrote:
>> I am afraid I had data in this format and by mistake now I have loaded
>> the
>> data in this format...and it took TDB 10 days to load it on a server with
>> 64GB RAM. Is there some way by which I may still query it on relative
>> IRIs?
>> (I am asking because I have to report the results today and my boss will
>> get hyper angry on me).
>>
>> If there is some way out then please let me know?
>
> I managed to force ARQ to parse without a base by bypassing the
> QueryFactory and calling the parser directly --
>
> public static void main(String... argv) throws Exception {
>      String x = FileUtils.readWholeFileAsUTF8("/home/afs/tmp/Q.rq") ;
>
>      // Create empty query
>      Query q = new Query() ;
>
>      // Create a parser
>      SPARQLParser parser =
> SPARQLParser.createParser(Syntax.syntaxSPARQL_11) ;
>
>      // Call directly
>      parser.parse(q, x) ;
>
>      // To show they are relative URIs:
>
>      Op op = Algebra.compile(q) ;
>      System.out.println(op) ;
> }
>
>      Andy

Or, better, you can set the IRI resolver for the query directly and then 
call QueryFactory.parse.

import com.hp.hpl.jena.n3.IRIResolver ;

	Query query = new Query() ;

         IRIResolver resolver = new IRIResolver() {
             @Override public String resolve(String uri) { return uri ; }
         } ;

         query.setResolver(resolver) ;
         QueryFactory.parse(query, x, null, Syntax.defaultQuerySyntax) ;

beware there are two IRIResolver classes - the old used here (but 
unfortunately tied to the public API of Query) and the one in RIOT.

	Andy


>
>
>
>
>>
>>
>> On Fri, Dec 27, 2013 at 6:35 AM, Damian Steer <d....@bris.ac.uk> wrote:
>>
>>>
>>> On 26 Dec 2013, at 20:51, Rose Beck <ro...@gmail.com> wrote:
>>>
>>>> I created my data file containing the following data(try.nq):
>>>>
>>>> <http://dbpedia.org/data/Plasmodium_hegneri.xml> <
>>>> http://code.google.com/p/ldspider/ns#headerInfo>
>>>> _:header16125770191335188966549  <a> .
>>>
>>> Ah, here is the issue.
>>>
>>> N-Quads _doesn't_ permit relative URIs / IRIs. [1] TDB is being kind /
>>> unhelpful and loading the data as requested, full of relative IRIs.
>>> This is, strictly, broken RDF. The behaviour when you work on it is as a
>>> consequence undefined.
>>>
>>> (If you run the data through validation this issue is apparent:
>>>
>>> $ riot --validate try.nq
>>> ERROR [line: 1, col: 133] Relative IRI: a
>>> ...)
>>>
>>>> Then I fired the following SPARQL command:
>>>> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
>>>
>>> ...
>>>
>>>> I got the following output:
>>>> -----------------
>>>> | a   | b   | c |
>>>> =================
>>>> | <a> | <b> |
>>>
>>> The answer is a) correct but b) very unhelpful. These are relative IRIs
>>> but no base is given.
>
> In this case, because the relative URIs are in the data.
>
> ARQ will also print relative URIs in text output when making URIs
> relative tot he base of the query.
>
> If you want to check details, might be easier to look at the JSON output.
>
>      Andy
>
>>>
>>>> After this I tried another SPARQL query(given below) for which I
>>>> obtained
>>>> an incorrect output:
>>>> SPARQL query:
>>>> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
>>>> --loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b>  <
>>>> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
>>>> Output:
>>>> -------------
>>>> | a | b | c |
>>>> =============
>>>> -------------
>>>> Time: 0.095 sec
>>>
>>> SPARQL, unlike N-Quads, allows relative IRIs. In the absence of a BASE
>>> directive the IRIs are resolved relative to the query itself. In this
>>> case
>>> the current directory is used as the base, so <b> is understood as
>>> <CURRENT_DIR/b>. You can see this if you add --explain:
>>>
>>> $ tdbquery --explain --loc=try "select ?a?b?c where{ graph ?j1{?a <b>  <
>>> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
>>> 00:58:24 INFO  exec                 :: QUERY
>>>    SELECT  ?a ?b ?c
>>>    WHERE
>>>      { GRAPH ?j1
>>>          { ?a <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml> }
>>>      }
>>> 00:58:24 INFO  exec                 :: ALGEBRA
>>>    (project (?a ?b ?c)
>>>      (quadpattern (quad ?j1 ?a <file:///private/tmp/b> <
>>> http://dbpedia.org/data/Plasmodium_hegneri.xml>)))
>>> 00:58:24 INFO  exec                 :: Execute ::   (?j1 ?a
>>> <file:///private/tmp/b>
>>> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)
>>> -------------
>>> | a | b | c |
>>> =============
>>> -------------
>>>
>>> <b> is resolved to <file:///private/tmp/b> (I ran this in the temp dir).
>>> Thus no results.
>>>
>>> So the short answer is that the input data is broken.
>>>
>>> Damian
>>>
>>> [1] <http://www.w3.org/TR/n-quads/#sec-iri>
>>> [2] <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#relIRIs>
>>
>


Re: Incorrect output: Request guidance

Posted by Andy Seaborne <an...@apache.org>.
On 27/12/13 01:45, Rose Beck wrote:
> I am afraid I had data in this format and by mistake now I have loaded the
> data in this format...and it took TDB 10 days to load it on a server with
> 64GB RAM. Is there some way by which I may still query it on relative IRIs?
> (I am asking because I have to report the results today and my boss will
> get hyper angry on me).
>
> If there is some way out then please let me know?

I managed to force ARQ to parse without a base by bypassing the 
QueryFactory and calling the parser directly --

public static void main(String... argv) throws Exception {
     String x = FileUtils.readWholeFileAsUTF8("/home/afs/tmp/Q.rq") ;

     // Create empty query
     Query q = new Query() ;

     // Create a parser
     SPARQLParser parser = 
SPARQLParser.createParser(Syntax.syntaxSPARQL_11) ;

     // Call directly
     parser.parse(q, x) ;

     // To show they are relative URIs:

     Op op = Algebra.compile(q) ;
     System.out.println(op) ;
}

	Andy




>
>
> On Fri, Dec 27, 2013 at 6:35 AM, Damian Steer <d....@bris.ac.uk> wrote:
>
>>
>> On 26 Dec 2013, at 20:51, Rose Beck <ro...@gmail.com> wrote:
>>
>>> I created my data file containing the following data(try.nq):
>>>
>>> <http://dbpedia.org/data/Plasmodium_hegneri.xml> <
>>> http://code.google.com/p/ldspider/ns#headerInfo>
>>> _:header16125770191335188966549  <a> .
>>
>> Ah, here is the issue.
>>
>> N-Quads _doesn't_ permit relative URIs / IRIs. [1] TDB is being kind /
>> unhelpful and loading the data as requested, full of relative IRIs.
>> This is, strictly, broken RDF. The behaviour when you work on it is as a
>> consequence undefined.
>>
>> (If you run the data through validation this issue is apparent:
>>
>> $ riot --validate try.nq
>> ERROR [line: 1, col: 133] Relative IRI: a
>> ...)
>>
>>> Then I fired the following SPARQL command:
>>> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
>>
>> ...
>>
>>> I got the following output:
>>> -----------------
>>> | a   | b   | c |
>>> =================
>>> | <a> | <b> |
>>
>> The answer is a) correct but b) very unhelpful. These are relative IRIs
>> but no base is given.

In this case, because the relative URIs are in the data.

ARQ will also print relative URIs in text output when making URIs 
relative tot he base of the query.

If you want to check details, might be easier to look at the JSON output.

	Andy

>>
>>> After this I tried another SPARQL query(given below) for which I obtained
>>> an incorrect output:
>>> SPARQL query:
>>> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
>>> --loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b>  <
>>> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
>>> Output:
>>> -------------
>>> | a | b | c |
>>> =============
>>> -------------
>>> Time: 0.095 sec
>>
>> SPARQL, unlike N-Quads, allows relative IRIs. In the absence of a BASE
>> directive the IRIs are resolved relative to the query itself. In this case
>> the current directory is used as the base, so <b> is understood as
>> <CURRENT_DIR/b>. You can see this if you add --explain:
>>
>> $ tdbquery --explain --loc=try "select ?a?b?c where{ graph ?j1{?a <b>  <
>> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
>> 00:58:24 INFO  exec                 :: QUERY
>>    SELECT  ?a ?b ?c
>>    WHERE
>>      { GRAPH ?j1
>>          { ?a <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml> }
>>      }
>> 00:58:24 INFO  exec                 :: ALGEBRA
>>    (project (?a ?b ?c)
>>      (quadpattern (quad ?j1 ?a <file:///private/tmp/b> <
>> http://dbpedia.org/data/Plasmodium_hegneri.xml>)))
>> 00:58:24 INFO  exec                 :: Execute ::   (?j1 ?a
>> <file:///private/tmp/b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)
>> -------------
>> | a | b | c |
>> =============
>> -------------
>>
>> <b> is resolved to <file:///private/tmp/b> (I ran this in the temp dir).
>> Thus no results.
>>
>> So the short answer is that the input data is broken.
>>
>> Damian
>>
>> [1] <http://www.w3.org/TR/n-quads/#sec-iri>
>> [2] <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#relIRIs>
>


Re: Incorrect output: Request guidance

Posted by Rose Beck <ro...@gmail.com>.
I am afraid I had data in this format and by mistake now I have loaded the
data in this format...and it took TDB 10 days to load it on a server with
64GB RAM. Is there some way by which I may still query it on relative IRIs?
(I am asking because I have to report the results today and my boss will
get hyper angry on me).

If there is some way out then please let me know?


On Fri, Dec 27, 2013 at 6:35 AM, Damian Steer <d....@bris.ac.uk> wrote:

>
> On 26 Dec 2013, at 20:51, Rose Beck <ro...@gmail.com> wrote:
>
> > I created my data file containing the following data(try.nq):
> >
> > <http://dbpedia.org/data/Plasmodium_hegneri.xml> <
> > http://code.google.com/p/ldspider/ns#headerInfo>
> > _:header16125770191335188966549  <a> .
>
> Ah, here is the issue.
>
> N-Quads _doesn't_ permit relative URIs / IRIs. [1] TDB is being kind /
> unhelpful and loading the data as requested, full of relative IRIs.
> This is, strictly, broken RDF. The behaviour when you work on it is as a
> consequence undefined.
>
> (If you run the data through validation this issue is apparent:
>
> $ riot --validate try.nq
> ERROR [line: 1, col: 133] Relative IRI: a
> ...)
>
> > Then I fired the following SPARQL command:
> > root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
>
> ...
>
> > I got the following output:
> > -----------------
> > | a   | b   | c |
> > =================
> > | <a> | <b> |
>
> The answer is a) correct but b) very unhelpful. These are relative IRIs
> but no base is given.
>
> > After this I tried another SPARQL query(given below) for which I obtained
> > an incorrect output:
> > SPARQL query:
> > root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
> > --loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b>  <
> > http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
> > Output:
> > -------------
> > | a | b | c |
> > =============
> > -------------
> > Time: 0.095 sec
>
> SPARQL, unlike N-Quads, allows relative IRIs. In the absence of a BASE
> directive the IRIs are resolved relative to the query itself. In this case
> the current directory is used as the base, so <b> is understood as
> <CURRENT_DIR/b>. You can see this if you add --explain:
>
> $ tdbquery --explain --loc=try "select ?a?b?c where{ graph ?j1{?a <b>  <
> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
> 00:58:24 INFO  exec                 :: QUERY
>   SELECT  ?a ?b ?c
>   WHERE
>     { GRAPH ?j1
>         { ?a <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml> }
>     }
> 00:58:24 INFO  exec                 :: ALGEBRA
>   (project (?a ?b ?c)
>     (quadpattern (quad ?j1 ?a <file:///private/tmp/b> <
> http://dbpedia.org/data/Plasmodium_hegneri.xml>)))
> 00:58:24 INFO  exec                 :: Execute ::   (?j1 ?a
> <file:///private/tmp/b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)
> -------------
> | a | b | c |
> =============
> -------------
>
> <b> is resolved to <file:///private/tmp/b> (I ran this in the temp dir).
> Thus no results.
>
> So the short answer is that the input data is broken.
>
> Damian
>
> [1] <http://www.w3.org/TR/n-quads/#sec-iri>
> [2] <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#relIRIs>

Re: Incorrect output: Request guidance

Posted by Damian Steer <d....@bris.ac.uk>.
On 26 Dec 2013, at 20:51, Rose Beck <ro...@gmail.com> wrote:

> I created my data file containing the following data(try.nq):
> 
> <http://dbpedia.org/data/Plasmodium_hegneri.xml> <
> http://code.google.com/p/ldspider/ns#headerInfo>
> _:header16125770191335188966549  <a> .

Ah, here is the issue.

N-Quads _doesn't_ permit relative URIs / IRIs. [1] TDB is being kind / unhelpful and loading the data as requested, full of relative IRIs.
This is, strictly, broken RDF. The behaviour when you work on it is as a consequence undefined.

(If you run the data through validation this issue is apparent:

$ riot --validate try.nq 
ERROR [line: 1, col: 133] Relative IRI: a
...)

> Then I fired the following SPARQL command:
> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time

...

> I got the following output:
> -----------------
> | a   | b   | c |
> =================
> | <a> | <b> |  

The answer is a) correct but b) very unhelpful. These are relative IRIs but no base is given. 

> After this I tried another SPARQL query(given below) for which I obtained
> an incorrect output:
> SPARQL query:
> root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
> --loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b>  <
> http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
> Output:
> -------------
> | a | b | c |
> =============
> -------------
> Time: 0.095 sec

SPARQL, unlike N-Quads, allows relative IRIs. In the absence of a BASE directive the IRIs are resolved relative to the query itself. In this case the current directory is used as the base, so <b> is understood as <CURRENT_DIR/b>. You can see this if you add --explain:

$ tdbquery --explain --loc=try "select ?a?b?c where{ graph ?j1{?a <b>  <http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
00:58:24 INFO  exec                 :: QUERY
  SELECT  ?a ?b ?c
  WHERE
    { GRAPH ?j1
        { ?a <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml> }
    }
00:58:24 INFO  exec                 :: ALGEBRA
  (project (?a ?b ?c)
    (quadpattern (quad ?j1 ?a <file:///private/tmp/b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)))
00:58:24 INFO  exec                 :: Execute ::   (?j1 ?a <file:///private/tmp/b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)
-------------
| a | b | c |
=============
-------------

<b> is resolved to <file:///private/tmp/b> (I ran this in the temp dir). Thus no results.

So the short answer is that the input data is broken.

Damian

[1] <http://www.w3.org/TR/n-quads/#sec-iri>
[2] <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#relIRIs>