You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Glenn Proctor <gl...@eaglegenomics.com> on 2012/02/27 14:23:18 UTC
400 Unknown error from Fuseki when trying to load nq and n3 files
Hi
I am trying to load a large (500Mb) file of n-quads into an instance
of Fuseki. The command I am using is
s-put http://localhost:3030/dataset/data hgnc ~/Desktop/hgnc.nq
This fails with the following error:
400 Unknown: text/n-quads;charset=ascii
I'm assuming the 400 here is the HTTP status code for "bad request".
I get the same error whether I use a memory-backed or TDB-backed
Fuseki instance. I've tried a concatenated version of the file in
question and I get the same behaviour.
The file in question is the uncompressed version of
http://download.bio2rdf.org/data/hgnc/hgnc.nq.gz
Another file, in n3 format, gives a very similar error. I'm sure
there's something simple I'm missing, and I'd be grateful for any
pointers.
Regards
Glenn.
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Andy Seaborne <an...@apache.org>.
> Glenn Proctor wrote:
> s-put http://localhost:3030/dataset/data hgnc ~/Desktop/hgnc.nq
>
> This fails with the following error:
> 400 Unknown: text/n-quads;charset=ascii
Your PUTing a file to a named graph "hgnc" in dataset "dataset".
Put you are PUTing N-Quads, which is multigraph (even if the data is all
for the default graph i.e. triples - the system does not know when it's
deciding the parser to use).
You can't send quads into a graph.
If it's truly N-Triples, then use file extensions ".nt" or use curl/wget
and set the content type to "text/plain" or (IMHO better)
"application/n-triples". Or "text/turtle".
If you want to load quads into a dataset, you may be better off doing a
bulk loader operation offline and then publishing the database.
Andy
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Glenn Proctor <gl...@eaglegenomics.com>.
Thanks Paulo - I know that tdbloader2 can handle n3 files, however the
particular files I was using had some issues with malformed URIs, as
well as the ------------------- lines you spotted, so the conversion
step helped clean these up. This step wouldn't have been necessary if
the files had been properly formatted in the first place ...
Regards
Glenn.
On Tue, Feb 28, 2012 at 4:01 PM, Paolo Castagna
<ca...@googlemail.com> wrote:
> Glenn Proctor wrote:
>> Hi folks
>>
>> Thanks for the helpful replies. In the end I used rapper to convert
>> the n3/nq files to rdf/xml, and then tdbloader2 to bulk load the
>> resulting files into TDB. As Andy suggested this was much quicker than
>> doing everything via Fuseki.
>
> You can load N-Triples | N-Quads with tdbloader|tdbloader2,
> that should even be faster.
>
> Paolo
>
>>
>> I've now started a Fuseki server on top of the TDB I created and it's
>> working very well.
>>
>> Thanks for the help
>>
>> Glenn.
>>
>>
>> On Mon, Feb 27, 2012 at 3:47 PM, Paolo Castagna
>> <ca...@googlemail.com> wrote:
>>> Paolo Castagna wrote:
>>>> Next step (mine or your) is to check in the Fuseki source code if the
>>>> PUT handles other RDF serializations (and if not, this could be a good
>>>> candidate to open a new feature request).
>>>>
>>>> I found the parseBody method in Fuseki, but I'll look in details later,
>>>> here it is, just in case another pair of eyes is faster than mine:
>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_REST.java
>>> After having seen Andy's reply... oh, yes!
>>>
>>> No problem in Fuseki, this also works:
>>> curl -X PUT -H "Content-Type: application/n-triples" -d@/tmp/hgnc-100.nt
>>> http://localhost:3030/dataset/data?default
>>>
>>> Andy, do we have a problem in soh [1], line 47?
>>> $fileMediaTypes['n3'] = 'text/rdf+n3application/rdf+n3'
>>>
>>> I am not sure which one is the correct one.
>>>
>>> Paolo
>>>
>>> [1] http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/soh
>
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Paolo Castagna <ca...@googlemail.com>.
Glenn Proctor wrote:
> Hi folks
>
> Thanks for the helpful replies. In the end I used rapper to convert
> the n3/nq files to rdf/xml, and then tdbloader2 to bulk load the
> resulting files into TDB. As Andy suggested this was much quicker than
> doing everything via Fuseki.
You can load N-Triples | N-Quads with tdbloader|tdbloader2,
that should even be faster.
Paolo
>
> I've now started a Fuseki server on top of the TDB I created and it's
> working very well.
>
> Thanks for the help
>
> Glenn.
>
>
> On Mon, Feb 27, 2012 at 3:47 PM, Paolo Castagna
> <ca...@googlemail.com> wrote:
>> Paolo Castagna wrote:
>>> Next step (mine or your) is to check in the Fuseki source code if the
>>> PUT handles other RDF serializations (and if not, this could be a good
>>> candidate to open a new feature request).
>>>
>>> I found the parseBody method in Fuseki, but I'll look in details later,
>>> here it is, just in case another pair of eyes is faster than mine:
>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_REST.java
>> After having seen Andy's reply... oh, yes!
>>
>> No problem in Fuseki, this also works:
>> curl -X PUT -H "Content-Type: application/n-triples" -d@/tmp/hgnc-100.nt
>> http://localhost:3030/dataset/data?default
>>
>> Andy, do we have a problem in soh [1], line 47?
>> $fileMediaTypes['n3'] = 'text/rdf+n3application/rdf+n3'
>>
>> I am not sure which one is the correct one.
>>
>> Paolo
>>
>> [1] http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/soh
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Glenn Proctor <gl...@eaglegenomics.com>.
Hi folks
Thanks for the helpful replies. In the end I used rapper to convert
the n3/nq files to rdf/xml, and then tdbloader2 to bulk load the
resulting files into TDB. As Andy suggested this was much quicker than
doing everything via Fuseki.
I've now started a Fuseki server on top of the TDB I created and it's
working very well.
Thanks for the help
Glenn.
On Mon, Feb 27, 2012 at 3:47 PM, Paolo Castagna
<ca...@googlemail.com> wrote:
> Paolo Castagna wrote:
>> Next step (mine or your) is to check in the Fuseki source code if the
>> PUT handles other RDF serializations (and if not, this could be a good
>> candidate to open a new feature request).
>>
>> I found the parseBody method in Fuseki, but I'll look in details later,
>> here it is, just in case another pair of eyes is faster than mine:
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_REST.java
>
> After having seen Andy's reply... oh, yes!
>
> No problem in Fuseki, this also works:
> curl -X PUT -H "Content-Type: application/n-triples" -d@/tmp/hgnc-100.nt
> http://localhost:3030/dataset/data?default
>
> Andy, do we have a problem in soh [1], line 47?
> $fileMediaTypes['n3'] = 'text/rdf+n3application/rdf+n3'
>
> I am not sure which one is the correct one.
>
> Paolo
>
> [1] http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/soh
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Paolo Castagna <ca...@googlemail.com>.
Paolo Castagna wrote:
> Next step (mine or your) is to check in the Fuseki source code if the
> PUT handles other RDF serializations (and if not, this could be a good
> candidate to open a new feature request).
>
> I found the parseBody method in Fuseki, but I'll look in details later,
> here it is, just in case another pair of eyes is faster than mine:
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_REST.java
After having seen Andy's reply... oh, yes!
No problem in Fuseki, this also works:
curl -X PUT -H "Content-Type: application/n-triples" -d@/tmp/hgnc-100.nt
http://localhost:3030/dataset/data?default
Andy, do we have a problem in soh [1], line 47?
$fileMediaTypes['n3'] = 'text/rdf+n3application/rdf+n3'
I am not sure which one is the correct one.
Paolo
[1] http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/soh
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Paolo Castagna <ca...@googlemail.com>.
Glenn Proctor wrote:
> Hi Paolo
>
> Thanks for looking into this for me. I've tried using the n3 file from
> the same source, filtering out any --------- lines, and including only
> 100 lines. The test file I'm using is
>
> http://dl.dropbox.com/u/23033/hgnc-100.n3
Hi Glenn,
ok... this file is correct.
You are using the SPARQL 1.1 Graph Store HTTP Protocol spec to upload
your data.
All the examples in that spec use RDF/XML (i.e. application/rdf+xml),
but this is not a good reason not to support other serializations.
I've also tried to use curl instead on s-put (just to remove another
variable from the table). I have your problem as well.
No problem with:
curl -X PUT -H "Content-Type: application/rdf+xml" -d@/tmp/hgnc-100.rdf
http://localhost:3030/dataset/data?default
Next step (mine or your) is to check in the Fuseki source code if the
PUT handles other RDF serializations (and if not, this could be a good
candidate to open a new feature request).
I found the parseBody method in Fuseki, but I'll look in details later,
here it is, just in case another pair of eyes is faster than mine:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_REST.java
>
> There are no unusual lines (as far as I can see) and no ^^ prefixes.
> The file linked above validates using
> http://www.rdfabout.com/demo/validator/ and also on the command line
> using the rapper utility from the Raptor library (89 triples in
> total).
>
> However when I try to start a simple Fuseki instance using
>
> fuseki-server --update --mem /dataset
>
> and load in the n3 file using
>
> s-put http://localhost:3030/dataset/data default ~/Dropbox/Public/hgnc-100.n3
>
> I get
>
> 400 Unknown: text/rdf+n3application/rdf+n3
I think this: "text/rdf+n3application/rdf+n3" is also a problem in
the s-put file. Should we have a comma? Or just one?
Paolo
> http://localhost:3030/dataset/data?default
>
> I can't see what it is about the file that Fuseki doesn't like.
>
> Glenn.
>
> On Mon, Feb 27, 2012 at 1:43 PM, Paolo Castagna
> <ca...@googlemail.com> wrote:
>> Glenn Proctor wrote:
>>> The file in question is the uncompressed version of
>>> http://download.bio2rdf.org/data/hgnc/hgnc.nq.gz
>> Hi Glenn,
>> maybe this is not the problem (or maybe it is).
>>
>> I've just noticed that the hgnc.nq.gz file above starts/ends
>> with "--------------------" (i.e. it's not a valid N-Quads
>> file).
>>
>> Paolo
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Glenn Proctor <gl...@eaglegenomics.com>.
Hi Paolo
Thanks for looking into this for me. I've tried using the n3 file from
the same source, filtering out any --------- lines, and including only
100 lines. The test file I'm using is
http://dl.dropbox.com/u/23033/hgnc-100.n3
There are no unusual lines (as far as I can see) and no ^^ prefixes.
The file linked above validates using
http://www.rdfabout.com/demo/validator/ and also on the command line
using the rapper utility from the Raptor library (89 triples in
total).
However when I try to start a simple Fuseki instance using
fuseki-server --update --mem /dataset
and load in the n3 file using
s-put http://localhost:3030/dataset/data default ~/Dropbox/Public/hgnc-100.n3
I get
400 Unknown: text/rdf+n3application/rdf+n3
http://localhost:3030/dataset/data?default
I can't see what it is about the file that Fuseki doesn't like.
Glenn.
On Mon, Feb 27, 2012 at 1:43 PM, Paolo Castagna
<ca...@googlemail.com> wrote:
> Glenn Proctor wrote:
>> The file in question is the uncompressed version of
>> http://download.bio2rdf.org/data/hgnc/hgnc.nq.gz
>
> Hi Glenn,
> maybe this is not the problem (or maybe it is).
>
> I've just noticed that the hgnc.nq.gz file above starts/ends
> with "--------------------" (i.e. it's not a valid N-Quads
> file).
>
> Paolo
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Paolo Castagna <ca...@googlemail.com>.
Glenn Proctor wrote:
> The file in question is the uncompressed version of
> http://download.bio2rdf.org/data/hgnc/hgnc.nq.gz
Hi Glenn,
maybe this is not the problem (or maybe it is).
I've just noticed that the hgnc.nq.gz file above starts/ends
with "--------------------" (i.e. it's not a valid N-Quads
file).
Paolo
Re: 400 Unknown error from Fuseki when trying to load nq and n3 files
Posted by Paolo Castagna <ca...@googlemail.com>.
Glenn Proctor wrote:
> The file in question is the uncompressed version of
> http://download.bio2rdf.org/data/hgnc/hgnc.nq.gz
>
> Another file, in n3 format, gives a very similar error. I'm sure
> there's something simple I'm missing, and I'd be grateful for any
> pointers.
Even, filtering out the lines with "-------...", you have lines
such as:
<http://bio2rdf.org/hugo:A1BG> <http://bio2rdf.org/hugo_resource:approvedSymbol>
"A1BG"^^xsd:string <http://bio2rdf.org/hgnc_record:5> .
I don't think ^^xsd:string is correct (i.e. N-Quads file do not have
any notion of prefix).
Paolo