You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Vincent Ventresque <vi...@ens-lyon.fr> on 2019/01/28 11:04:03 UTC

Export named graph from TDB to several ntriples files

Hello,

I want to export a named graph which is stored in a TDB dataset, and I 
want to store the output in several files (for the named graph contains 
+/- 9.5 M triples).

My idea is to use "split" command in order to cut the output of the 
export into pieces. However, this solution with "split" requires 
ntriples or nquads (one triple per line, so that the files are not cut 
in the middle of an assertion ; besides, it's also more practical to 
have a triple per line if I want to transform the data with perl or sed).

I found a solution with s-query but had to edit the ruby s-query script 
to get ntriples (see below).

There are other possible solutions for an export via command-line 
utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives 
nquads as output, but one can't export only a part of the data, 
everything is exported at once. The "s-get" solution allows to select a 
named graph in the dataset, but I couldn't change the output format.

Are there better solutions to get an export in several files?

Thanks in advance,

VV.



~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~

1.1) Edit s-query ruby script (add nt)

-- l. 572 : when  "json","xml","text","csv","tsv","nt"
-- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
-- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
-- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],

1.2) Command

/my/path/to/fuseki/bin/s-query 
--service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o } 
where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 
500000 - --additional-suffix=.nt BnfTextTitres-

~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
~~~~~~~~~~~~~~~~~~~~~

/my/path/to/jena/bin/tdbdump 
--loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
--graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt 
BnfTextTitres-

=> Unknown argument: graph

~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) 
~~~~~~~~~~~~~~~~~~~~~

/my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
http://bnf_titres --output=text | split -l 500000 - 
--additional-suffix=.nt BnfTextTitres-

=> /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
--output=text (OptionParser::InvalidOption)
from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'

--

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Ok, I'll do that. I'll try to create the pull request within a few days.

Have a nice day

Vincent

Le 31/01/2019 à 16:38, Andy Seaborne a écrit :
>
>
> On 31/01/2019 15:01, vincent ventresque wrote:
>> Many thanks for the detailed explanation, all the more as I struggled 
>> a lot of time with a similar problem (see my question on 
>> stackoverflow : 
>> https://stackoverflow.com/questions/52221250/php-easyrdf-unable-to-get-graph-from-construct-query 
>> ).
>>
>> About the pull request : should I remove the 'plain JSON' option?
>
> IMHO Yes, it is a legacy, non-standard format format. Because of 
> overlap with JSON-LD, there is a good chance of confusion. For a 
> conveniency tool like SOH/s-get, :json => JSON-LD.
>
> There is always curl !
>
>> Here is the code of the new function I added :
>>
>> ~~~~~~~~~~~~~~~~~~~~~~
>>
>> def set_ouput_format(type)
>>
>>    case type
>>    when :nt
>>      #print "here is ntriples!\n"
>>      $accept_rdf = $mtNTriples
>>    when :xml
>>      #print "here is XML!\n"
>>      $accept_rdf = $mtAppXML
>>    when :jsonLD
>>      #print "here is JSONLD!\n"
>>      $accept_rdf = $mtJSONLD
>>    when :json
>>      #print "here is plain JSON"
>>      $accept_rdf = $mtAppJSON
>>    end
>>
>> end
>>
>> ~~~~~~~~~~~~~~~~~~~~
>>
>>
>> Le 31/01/2019 à 15:21, Andy Seaborne a écrit :
>>> Jena has it's own content negotiation mechanism - I couldn't find an 
>>> existing one at the time and it has turned out to be "quite 
>>> complicated" for linked data as control of the defaults and choices 
>>> when not an exact match is important.
>>>
>>> So we have control of the corner cases and defaults.
>>>
>>> Internally in Fuseki:
>>>
>>> "application/json" isn't registered for graphs or datasets.
>>>
>>> There are two related registrations:
>>>
>>> "application/rdf+json"
>>> "application/ld+json"
>>>
>>> Fuskei doesn't do an "+" processing.
>>>
>>> Fuseki could default "application/json" to "application/ld+json".
>>>
>>>
>>> curl without a header sends:
>>>
>>> "Accept: */*"
>>>
>>> Fuseki chooses the first it is internal list of choices.
>>>
>>> It is not the same as sending no "Accept" when Fuseki chooses a 
>>> default although none and */* give the server free choice of return.
>>>
>>> curl -v -g 'http://localhost:3030/ds?query=ASK{}'
>>> curl -v -g --header 'Accept:' 'http://localhost:3030/ds?query=ASK{}'
>>>
>>> Content negotiation is quite sensitive to client setup and, well, 
>>> some HTTP clients hate and don't set conneg then can't handle the 
>>> results.
>>>
>>> Some servers don't have content type setup.
>>>
>>> On the client side, Jena pokes about in the file name to use the 
>>> extension if all else fails.
>>>
>>>     Andy
>>>
>>> On 31/01/2019 13:27, vincent ventresque wrote:
>>>> Sorry, let me sum up the previous messages :
>>>>
>>>> 1) I wanted to export a named graph from tdb to ntriples
>>>>
>>>> 2) Andy advised to modify s-get, which I did
>>>>
>>>> 3) when modifying s-get, I noticed there were 2 wrong content-types 
>>>> : application/json & application/n-quads ; both give rdf-xml output
>>>>
>>>> 4) Andy suggested it came from s-get settings
>>>>
>>>> 5) I showed that commenting the settings in s-get have no effect 
>>>> AND that the problem is the same with curl.
>>>>
>>>> 6) my purpose is also to understand how all this stuff works!
>>>>
>>>>
>>>> Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
>>>>> Vincent,
>>>>>
>>>>> can you start by explaining what you are trying to do and why, rather
>>>>> describing how you're doing it?
>>>>>
>>>>> On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
>>>>> <vi...@ens-lyon.fr> wrote:
>>>>>> Sorry, I should have explained more clearly : the previous messages
>>>>>> where about default settings in s-get, and when creating a new 
>>>>>> function
>>>>>> to handle --output option, I noticed there was a wrong 
>>>>>> content-type in
>>>>>> s-get for plain json (see my s-get file here :
>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download). 
>>>>>>
>>>>>>
>>>>>>
>>>>>> My purpose was to demonstrate that the problem isn't linked to 
>>>>>> s-get,
>>>>>> since it's the same with curl. Besides, I noticed the same 
>>>>>> problem with
>>>>>> n-quads.
>>>>>>
>>>>>> curl --header 'Accept: application/n-quads'
>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>> <rdf:RDF
>>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>>       xmlns:j.0="http://example.org/" >
>>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>>     </rdf:Description>
>>>>>> </rdf:RDF>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 31/01/2019 à 14:12, ajs6f a écrit :
>>>>>>> I'm not sure what you expect to get back from Fuseki with an 
>>>>>>> "application/json" mimetype? There is no W3C-spec plain-JSON RDF 
>>>>>>> serialization that I know of. I suppose there's the old Tallis 
>>>>>>> idea:
>>>>>>>
>>>>>>> https://jena.apache.org/documentation/io/rdf-json.html
>>>>>>>
>>>>>>> but I can't imagine that's what you're looking for.
>>>>>>>
>>>>>>> ajs6f
>>>>>>>
>>>>>>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque 
>>>>>>>> <vi...@ens-lyon.fr> wrote:
>>>>>>>>
>>>>>>>> It seems that the problem is completely independent from s-get 
>>>>>>>> (see these results with curl below). So I think there's a 
>>>>>>>> default setting somewhere in Fuseki itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> #~~~~~~~  --header 'Accept: application/json' 
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>
>>>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: 
>>>>>>>> application/json' 
>>>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>>>> <rdf:RDF
>>>>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>>>>       xmlns:j.0="http://example.org/" >
>>>>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>>>>     </rdf:Description>
>>>>>>>> </rdf:RDF>
>>>>>>>>
>>>>>>>>
>>>>>>>> #~~~~~~~  --header 'Accept: application/rdf+json' 
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>
>>>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: 
>>>>>>>> application/rdf+json' 
>>>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>>>> {
>>>>>>>>     "http://example.org/titi" : {
>>>>>>>>       "http://example.org/tata" : [ {
>>>>>>>>         "type" : "literal" ,
>>>>>>>>         "value" : "coucou"
>>>>>>>>       }
>>>>>>>>        ]
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>>>>>>>> Thanks for your quick reply!
>>>>>>>>>
>>>>>>>>>> $mtAppJSON isn't used.
>>>>>>>>> I think my previous msg wasn't clear : I meant raw json and 
>>>>>>>>> not json-ld (my code works now for both, and I use  $mtAppJSON 
>>>>>>>>> ; but I had to replace 'application/json' with 
>>>>>>>>> 'application/rdf+json' in order to get json instead of XML ; 
>>>>>>>>> see the file here 
>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The settings are: ...
>>>>>>>>> I made a little test : comment these lines and the "names" 
>>>>>>>>> part, and you'll get XML!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>>>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I found the origin of the problem for json : the $mtAppJSON 
>>>>>>>>>>> had the value
>>>>>>>>>>>
>>>>>>>>>>> 'application/json'
>>>>>>>>>> $mtAppJSON isn't used.
>>>>>>>>>>
>>>>>>>>>> "application/rdf+json"
>>>>>>>>>> isn't JSON-LD (it's the old Talis format).
>>>>>>>>>>
>>>>>>>>>> There is:
>>>>>>>>>>
>>>>>>>>>> $mtJSONLD           = 'application/ld+json'
>>>>>>>>>>
>>>>>>>>>>> it has to be replaced with
>>>>>>>>>>>
>>>>>>>>>>> 'application/rdf+json'
>>>>>>>>>>>
>>>>>>>>>>> I've updated the file here :
>>>>>>>>>>>
>>>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Maybe I'm going to submit a pull request as Andy suggested, 
>>>>>>>>>>> but I'd like to understand why 'application/json' returns 
>>>>>>>>>>> xml. Besides, it's the same thing for nquads : I tried to 
>>>>>>>>>>> replace
>>>>>>>>>>>
>>>>>>>>>>> $mtNQuads = 'application/n-quads'
>>>>>>>>>>>
>>>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>>> $mtNQuads = 'application/x-trig'
>>>>>>>>>>>
>>>>>>>>>>> but still have xml...
>>>>>>>>>> The settings are:
>>>>>>>>>>
>>>>>>>>>> # Default for GET
>>>>>>>>>> # At least allow anything (and hope!)
>>>>>>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , 
>>>>>>>>>> #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>>>>>>>> # Datasets
>>>>>>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , 
>>>>>>>>>> #{$mtJSONLD};q=0.5"
>>>>>>>>>> # For SPARQL query
>>>>>>>>>> $accept_results="#{$mtSparqlResultsJ} , 
>>>>>>>>>> #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>>>>>>>
>>>>>>>>>> # Accept any in case of trouble.
>>>>>>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>>>>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Is there a kind of default setting somewhere (if 
>>>>>>>>>>> content-type isn't recognized in Fuseki, the response is xml) ?
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>>>>>>>
>>>>>>>>>> Run Fuseki/full with "-v" and it should print the content 
>>>>>>>>>> negotiation details.
>>>>>>>>>>
>>>>>>>>>>       Andy
>>>>>>>>>>
>>>>>>>>>>> Thanks in advance
>>>>>>>>>>>
>>>>>>>>>>> VV
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>>>>>>>
>>>>>>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again for your idea to modify the s-get script, it 
>>>>>>>>>>>> helped me understand ruby utilities and http requests (I 
>>>>>>>>>>>> often use the ruby scripts but never really looked inside).
>>>>>>>>>>>>
>>>>>>>>>>>> Don't know how to submit a pull request, and I'm not a ruby 
>>>>>>>>>>>> expert! Therefore I've put a small test file here :
>>>>>>>>>>>>
>>>>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- added "--output" in options + created a new function 
>>>>>>>>>>>> (set_output_format)
>>>>>>>>>>>>
>>>>>>>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>>>>>>>
>>>>>>>>>>>> -- doesn't work for json (returns xml...)
>>>>>>>>>>>>
>>>>>>>>>>>> N.B. : in this test file, I've removed large parts of the 
>>>>>>>>>>>> original code in order to improve readability
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many thanks for these ideas, I'm going to try the curl & 
>>>>>>>>>>>>> riot solutions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Modify the s-get script to handle --output and set the 
>>>>>>>>>>>>>> "Accept:" header then please submit a pull request for 
>>>>>>>>>>>>>> the changes
>>>>>>>>>>>>> I had made an attempt to modify the s-get script in the 
>>>>>>>>>>>>> same way as for s-query but it didn't work : if I have a 
>>>>>>>>>>>>> moment I'll try to understand how the options are handled.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I want to export a named graph which is stored in a TDB 
>>>>>>>>>>>>>>> dataset, and I want to store the output in several files 
>>>>>>>>>>>>>>> (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My idea is to use "split" command in order to cut the 
>>>>>>>>>>>>>>> output of the export into pieces. However, this solution 
>>>>>>>>>>>>>>> with "split" requires ntriples or nquads (one triple per 
>>>>>>>>>>>>>>> line, so that the files are not cut in the middle of an 
>>>>>>>>>>>>>>> assertion ; besides, it's also more practical to have a 
>>>>>>>>>>>>>>> triple per line if I want to transform the data with 
>>>>>>>>>>>>>>> perl or sed).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I found a solution with s-query but had to edit the ruby 
>>>>>>>>>>>>>>> s-query script to get ntriples (see below).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are other possible solutions for an export via 
>>>>>>>>>>>>>>> command-line utilities : "s-get" and "tdbdump". If I 
>>>>>>>>>>>>>>> understand well, "tdbdump" gives nquads as output, but 
>>>>>>>>>>>>>>> one can't export only a part of the data, everything is 
>>>>>>>>>>>>>>> exported at once. The "s-get" solution allows to select 
>>>>>>>>>>>>>>> a named graph in the dataset, but I couldn't change the 
>>>>>>>>>>>>>>> output format.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are there better solutions to get an export in several 
>>>>>>>>>>>>>>> files?
>>>>>>>>>>>>>> Ways I can think of:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>>>>>>>>>>> "Accept:" header then please submit a pull request for 
>>>>>>>>>>>>>> the changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2/ Use curl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>>>>>>> 'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       Andy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> VV.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- l. 572 : when "json","xml","text","csv","tsv","nt"
>>>>>>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.2) Command
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>>>>>>>>>>> --service=http://localhost:3030/BnF_text_v2/ "construct 
>>>>>>>>>>>>>>> { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p 
>>>>>>>>>>>>>>> ?o }}" --output=nt | split -l 500000 - 
>>>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no 
>>>>>>>>>>>>>>> named graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>>>>>>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>>>>>>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> => Unknown argument: graph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but 
>>>>>>>>>>>>>>> turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>>>>>>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>>>>>>>>>>> --output=text | split -l 500000 - 
>>>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': 
>>>>>>>>>>>>>>> invalid option: --output=text (OptionParser::InvalidOption)
>>>>>>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in 
>>>>>>>>>>>>>>> `<main>'
>>>>>>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by Andy Seaborne <an...@apache.org>.


On 31/01/2019 15:01, vincent ventresque wrote:
> Many thanks for the detailed explanation, all the more as I struggled a 
> lot of time with a similar problem (see my question on stackoverflow : 
> https://stackoverflow.com/questions/52221250/php-easyrdf-unable-to-get-graph-from-construct-query 
> ).
> 
> About the pull request : should I remove the 'plain JSON' option?

IMHO Yes, it is a legacy, non-standard format format. Because of overlap 
with JSON-LD, there is a good chance of confusion. For a conveniency 
tool like SOH/s-get, :json => JSON-LD.

There is always curl !

> Here 
> is the code of the new function I added :
> 
> ~~~~~~~~~~~~~~~~~~~~~~
> 
> def set_ouput_format(type)
> 
>    case type
>    when :nt
>      #print "here is ntriples!\n"
>      $accept_rdf = $mtNTriples
>    when :xml
>      #print "here is XML!\n"
>      $accept_rdf = $mtAppXML
>    when :jsonLD
>      #print "here is JSONLD!\n"
>      $accept_rdf = $mtJSONLD
>    when :json
>      #print "here is plain JSON"
>      $accept_rdf = $mtAppJSON
>    end
> 
> end
> 
> ~~~~~~~~~~~~~~~~~~~~
> 
> 
> Le 31/01/2019 à 15:21, Andy Seaborne a écrit :
>> Jena has it's own content negotiation mechanism - I couldn't find an 
>> existing one at the time and it has turned out to be "quite 
>> complicated" for linked data as control of the defaults and choices 
>> when not an exact match is important.
>>
>> So we have control of the corner cases and defaults.
>>
>> Internally in Fuseki:
>>
>> "application/json" isn't registered for graphs or datasets.
>>
>> There are two related registrations:
>>
>> "application/rdf+json"
>> "application/ld+json"
>>
>> Fuskei doesn't do an "+" processing.
>>
>> Fuseki could default "application/json" to "application/ld+json".
>>
>>
>> curl without a header sends:
>>
>> "Accept: */*"
>>
>> Fuseki chooses the first it is internal list of choices.
>>
>> It is not the same as sending no "Accept" when Fuseki chooses a 
>> default although none and */* give the server free choice of return.
>>
>> curl -v -g 'http://localhost:3030/ds?query=ASK{}'
>> curl -v -g --header 'Accept:' 'http://localhost:3030/ds?query=ASK{}'
>>
>> Content negotiation is quite sensitive to client setup and, well, some 
>> HTTP clients hate and don't set conneg then can't handle the results.
>>
>> Some servers don't have content type setup.
>>
>> On the client side, Jena pokes about in the file name to use the 
>> extension if all else fails.
>>
>>     Andy
>>
>> On 31/01/2019 13:27, vincent ventresque wrote:
>>> Sorry, let me sum up the previous messages :
>>>
>>> 1) I wanted to export a named graph from tdb to ntriples
>>>
>>> 2) Andy advised to modify s-get, which I did
>>>
>>> 3) when modifying s-get, I noticed there were 2 wrong content-types : 
>>> application/json & application/n-quads ; both give rdf-xml output
>>>
>>> 4) Andy suggested it came from s-get settings
>>>
>>> 5) I showed that commenting the settings in s-get have no effect AND 
>>> that the problem is the same with curl.
>>>
>>> 6) my purpose is also to understand how all this stuff works!
>>>
>>>
>>> Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
>>>> Vincent,
>>>>
>>>> can you start by explaining what you are trying to do and why, rather
>>>> describing how you're doing it?
>>>>
>>>> On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
>>>> <vi...@ens-lyon.fr> wrote:
>>>>> Sorry, I should have explained more clearly : the previous messages
>>>>> where about default settings in s-get, and when creating a new 
>>>>> function
>>>>> to handle --output option, I noticed there was a wrong content-type in
>>>>> s-get for plain json (see my s-get file here :
>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download). 
>>>>>
>>>>>
>>>>>
>>>>> My purpose was to demonstrate that the problem isn't linked to s-get,
>>>>> since it's the same with curl. Besides, I noticed the same problem 
>>>>> with
>>>>> n-quads.
>>>>>
>>>>> curl --header 'Accept: application/n-quads'
>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>> <rdf:RDF
>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>       xmlns:j.0="http://example.org/" >
>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>     </rdf:Description>
>>>>> </rdf:RDF>
>>>>>
>>>>>
>>>>>
>>>>> Le 31/01/2019 à 14:12, ajs6f a écrit :
>>>>>> I'm not sure what you expect to get back from Fuseki with an 
>>>>>> "application/json" mimetype? There is no W3C-spec plain-JSON RDF 
>>>>>> serialization that I know of. I suppose there's the old Tallis idea:
>>>>>>
>>>>>> https://jena.apache.org/documentation/io/rdf-json.html
>>>>>>
>>>>>> but I can't imagine that's what you're looking for.
>>>>>>
>>>>>> ajs6f
>>>>>>
>>>>>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque 
>>>>>>> <vi...@ens-lyon.fr> wrote:
>>>>>>>
>>>>>>> It seems that the problem is completely independent from s-get 
>>>>>>> (see these results with curl below). So I think there's a default 
>>>>>>> setting somewhere in Fuseki itself.
>>>>>>>
>>>>>>>
>>>>>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 
>>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>>> <rdf:RDF
>>>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>>>       xmlns:j.0="http://example.org/" >
>>>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>>>     </rdf:Description>
>>>>>>> </rdf:RDF>
>>>>>>>
>>>>>>>
>>>>>>> #~~~~~~~  --header 'Accept: application/rdf+json' 
>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: 
>>>>>>> application/rdf+json' 
>>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>>> {
>>>>>>>     "http://example.org/titi" : {
>>>>>>>       "http://example.org/tata" : [ {
>>>>>>>         "type" : "literal" ,
>>>>>>>         "value" : "coucou"
>>>>>>>       }
>>>>>>>        ]
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>>>>>>> Thanks for your quick reply!
>>>>>>>>
>>>>>>>>> $mtAppJSON isn't used.
>>>>>>>> I think my previous msg wasn't clear : I meant raw json and not 
>>>>>>>> json-ld (my code works now for both, and I use  $mtAppJSON ; but 
>>>>>>>> I had to replace 'application/json' with 'application/rdf+json' 
>>>>>>>> in order to get json instead of XML ; see the file here 
>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download) 
>>>>>>>>
>>>>>>>>
>>>>>>>>> The settings are: ...
>>>>>>>> I made a little test : comment these lines and the "names" part, 
>>>>>>>> and you'll get XML!
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I found the origin of the problem for json : the $mtAppJSON 
>>>>>>>>>> had the value
>>>>>>>>>>
>>>>>>>>>> 'application/json'
>>>>>>>>> $mtAppJSON isn't used.
>>>>>>>>>
>>>>>>>>> "application/rdf+json"
>>>>>>>>> isn't JSON-LD (it's the old Talis format).
>>>>>>>>>
>>>>>>>>> There is:
>>>>>>>>>
>>>>>>>>> $mtJSONLD           = 'application/ld+json'
>>>>>>>>>
>>>>>>>>>> it has to be replaced with
>>>>>>>>>>
>>>>>>>>>> 'application/rdf+json'
>>>>>>>>>>
>>>>>>>>>> I've updated the file here :
>>>>>>>>>>
>>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Maybe I'm going to submit a pull request as Andy suggested, 
>>>>>>>>>> but I'd like to understand why 'application/json' returns xml. 
>>>>>>>>>> Besides, it's the same thing for nquads : I tried to replace
>>>>>>>>>>
>>>>>>>>>> $mtNQuads = 'application/n-quads'
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> $mtNQuads = 'application/x-trig'
>>>>>>>>>>
>>>>>>>>>> but still have xml...
>>>>>>>>> The settings are:
>>>>>>>>>
>>>>>>>>> # Default for GET
>>>>>>>>> # At least allow anything (and hope!)
>>>>>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , 
>>>>>>>>> #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>>>>>>> # Datasets
>>>>>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>>>>>>>> # For SPARQL query
>>>>>>>>> $accept_results="#{$mtSparqlResultsJ} , 
>>>>>>>>> #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>>>>>>
>>>>>>>>> # Accept any in case of trouble.
>>>>>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>>>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Is there a kind of default setting somewhere (if content-type 
>>>>>>>>>> isn't recognized in Fuseki, the response is xml) ?
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>>>>>>
>>>>>>>>> Run Fuseki/full with "-v" and it should print the content 
>>>>>>>>> negotiation details.
>>>>>>>>>
>>>>>>>>>       Andy
>>>>>>>>>
>>>>>>>>>> Thanks in advance
>>>>>>>>>>
>>>>>>>>>> VV
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>>>>>>
>>>>>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>
>>>>>>>>>>> Thanks again for your idea to modify the s-get script, it 
>>>>>>>>>>> helped me understand ruby utilities and http requests (I 
>>>>>>>>>>> often use the ruby scripts but never really looked inside).
>>>>>>>>>>>
>>>>>>>>>>> Don't know how to submit a pull request, and I'm not a ruby 
>>>>>>>>>>> expert! Therefore I've put a small test file here :
>>>>>>>>>>>
>>>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- added "--output" in options + created a new function 
>>>>>>>>>>> (set_output_format)
>>>>>>>>>>>
>>>>>>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>>>>>>
>>>>>>>>>>> -- doesn't work for json (returns xml...)
>>>>>>>>>>>
>>>>>>>>>>> N.B. : in this test file, I've removed large parts of the 
>>>>>>>>>>> original code in order to improve readability
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks for these ideas, I'm going to try the curl & 
>>>>>>>>>>>> riot solutions.
>>>>>>>>>>>>
>>>>>>>>>>>>> Modify the s-get script to handle --output and set the 
>>>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>>>> changes
>>>>>>>>>>>> I had made an attempt to modify the s-get script in the same 
>>>>>>>>>>>> way as for s-query but it didn't work : if I have a moment 
>>>>>>>>>>>> I'll try to understand how the options are handled.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I want to export a named graph which is stored in a TDB 
>>>>>>>>>>>>>> dataset, and I want to store the output in several files 
>>>>>>>>>>>>>> (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My idea is to use "split" command in order to cut the 
>>>>>>>>>>>>>> output of the export into pieces. However, this solution 
>>>>>>>>>>>>>> with "split" requires ntriples or nquads (one triple per 
>>>>>>>>>>>>>> line, so that the files are not cut in the middle of an 
>>>>>>>>>>>>>> assertion ; besides, it's also more practical to have a 
>>>>>>>>>>>>>> triple per line if I want to transform the data with perl 
>>>>>>>>>>>>>> or sed).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I found a solution with s-query but had to edit the ruby 
>>>>>>>>>>>>>> s-query script to get ntriples (see below).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are other possible solutions for an export via 
>>>>>>>>>>>>>> command-line utilities : "s-get" and "tdbdump". If I 
>>>>>>>>>>>>>> understand well, "tdbdump" gives nquads as output, but one 
>>>>>>>>>>>>>> can't export only a part of the data, everything is 
>>>>>>>>>>>>>> exported at once. The "s-get" solution allows to select a 
>>>>>>>>>>>>>> named graph in the dataset, but I couldn't change the 
>>>>>>>>>>>>>> output format.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>>>>>>>> Ways I can think of:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2/ Use curl
>>>>>>>>>>>>>
>>>>>>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>>>>>> 'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>>>>>>
>>>>>>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>>>>>>
>>>>>>>>>>>>>       Andy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> VV.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- l. 572 : when "json","xml","text","csv","tsv","nt"
>>>>>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.2) Command
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>>>>>>>>>> --service=http://localhost:3030/BnF_text_v2/ "construct { 
>>>>>>>>>>>>>> ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o 
>>>>>>>>>>>>>> }}" --output=nt | split -l 500000 - 
>>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named 
>>>>>>>>>>>>>> graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>>>>>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>>>>>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> => Unknown argument: graph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but 
>>>>>>>>>>>>>> turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>>>>>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>>>>>>>>>> --output=text | split -l 500000 - --additional-suffix=.nt 
>>>>>>>>>>>>>> BnfTextTitres-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid 
>>>>>>>>>>>>>> option: --output=text (OptionParser::InvalidOption)
>>>>>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Many thanks for the detailed explanation, all the more as I struggled a 
lot of time with a similar problem (see my question on stackoverflow : 
https://stackoverflow.com/questions/52221250/php-easyrdf-unable-to-get-graph-from-construct-query 
).

About the pull request : should I remove the 'plain JSON' option? Here 
is the code of the new function I added :

~~~~~~~~~~~~~~~~~~~~~~

def set_ouput_format(type)

   case type
   when :nt
     #print "here is ntriples!\n"
     $accept_rdf = $mtNTriples
   when :xml
     #print "here is XML!\n"
     $accept_rdf = $mtAppXML
   when :jsonLD
     #print "here is JSONLD!\n"
     $accept_rdf = $mtJSONLD
   when :json
     #print "here is plain JSON"
     $accept_rdf = $mtAppJSON
   end

end

~~~~~~~~~~~~~~~~~~~~


Le 31/01/2019 à 15:21, Andy Seaborne a écrit :
> Jena has it's own content negotiation mechanism - I couldn't find an 
> existing one at the time and it has turned out to be "quite 
> complicated" for linked data as control of the defaults and choices 
> when not an exact match is important.
>
> So we have control of the corner cases and defaults.
>
> Internally in Fuseki:
>
> "application/json" isn't registered for graphs or datasets.
>
> There are two related registrations:
>
> "application/rdf+json"
> "application/ld+json"
>
> Fuskei doesn't do an "+" processing.
>
> Fuseki could default "application/json" to "application/ld+json".
>
>
> curl without a header sends:
>
> "Accept: */*"
>
> Fuseki chooses the first it is internal list of choices.
>
> It is not the same as sending no "Accept" when Fuseki chooses a 
> default although none and */* give the server free choice of return.
>
> curl -v -g 'http://localhost:3030/ds?query=ASK{}'
> curl -v -g --header 'Accept:' 'http://localhost:3030/ds?query=ASK{}'
>
> Content negotiation is quite sensitive to client setup and, well, some 
> HTTP clients hate and don't set conneg then can't handle the results.
>
> Some servers don't have content type setup.
>
> On the client side, Jena pokes about in the file name to use the 
> extension if all else fails.
>
>     Andy
>
> On 31/01/2019 13:27, vincent ventresque wrote:
>> Sorry, let me sum up the previous messages :
>>
>> 1) I wanted to export a named graph from tdb to ntriples
>>
>> 2) Andy advised to modify s-get, which I did
>>
>> 3) when modifying s-get, I noticed there were 2 wrong content-types : 
>> application/json & application/n-quads ; both give rdf-xml output
>>
>> 4) Andy suggested it came from s-get settings
>>
>> 5) I showed that commenting the settings in s-get have no effect AND 
>> that the problem is the same with curl.
>>
>> 6) my purpose is also to understand how all this stuff works!
>>
>>
>> Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
>>> Vincent,
>>>
>>> can you start by explaining what you are trying to do and why, rather
>>> describing how you're doing it?
>>>
>>> On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
>>> <vi...@ens-lyon.fr> wrote:
>>>> Sorry, I should have explained more clearly : the previous messages
>>>> where about default settings in s-get, and when creating a new 
>>>> function
>>>> to handle --output option, I noticed there was a wrong content-type in
>>>> s-get for plain json (see my s-get file here :
>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download). 
>>>>
>>>>
>>>>
>>>> My purpose was to demonstrate that the problem isn't linked to s-get,
>>>> since it's the same with curl. Besides, I noticed the same problem 
>>>> with
>>>> n-quads.
>>>>
>>>> curl --header 'Accept: application/n-quads'
>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>> <rdf:RDF
>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>       xmlns:j.0="http://example.org/" >
>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>       <j.0:tata>coucou</j.0:tata>
>>>>     </rdf:Description>
>>>> </rdf:RDF>
>>>>
>>>>
>>>>
>>>> Le 31/01/2019 à 14:12, ajs6f a écrit :
>>>>> I'm not sure what you expect to get back from Fuseki with an 
>>>>> "application/json" mimetype? There is no W3C-spec plain-JSON RDF 
>>>>> serialization that I know of. I suppose there's the old Tallis idea:
>>>>>
>>>>> https://jena.apache.org/documentation/io/rdf-json.html
>>>>>
>>>>> but I can't imagine that's what you're looking for.
>>>>>
>>>>> ajs6f
>>>>>
>>>>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque 
>>>>>> <vi...@ens-lyon.fr> wrote:
>>>>>>
>>>>>> It seems that the problem is completely independent from s-get 
>>>>>> (see these results with curl below). So I think there's a default 
>>>>>> setting somewhere in Fuseki itself.
>>>>>>
>>>>>>
>>>>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 
>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>> <rdf:RDF
>>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>>       xmlns:j.0="http://example.org/" >
>>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>>     </rdf:Description>
>>>>>> </rdf:RDF>
>>>>>>
>>>>>>
>>>>>> #~~~~~~~  --header 'Accept: application/rdf+json' 
>>>>>> ~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: 
>>>>>> application/rdf+json' 
>>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>>> {
>>>>>>     "http://example.org/titi" : {
>>>>>>       "http://example.org/tata" : [ {
>>>>>>         "type" : "literal" ,
>>>>>>         "value" : "coucou"
>>>>>>       }
>>>>>>        ]
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>>>>>> Thanks for your quick reply!
>>>>>>>
>>>>>>>> $mtAppJSON isn't used.
>>>>>>> I think my previous msg wasn't clear : I meant raw json and not 
>>>>>>> json-ld (my code works now for both, and I use  $mtAppJSON ; but 
>>>>>>> I had to replace 'application/json' with 'application/rdf+json' 
>>>>>>> in order to get json instead of XML ; see the file here 
>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download) 
>>>>>>>
>>>>>>>
>>>>>>>> The settings are: ...
>>>>>>> I made a little test : comment these lines and the "names" part, 
>>>>>>> and you'll get XML!
>>>>>>>
>>>>>>>
>>>>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I found the origin of the problem for json : the $mtAppJSON 
>>>>>>>>> had the value
>>>>>>>>>
>>>>>>>>> 'application/json'
>>>>>>>> $mtAppJSON isn't used.
>>>>>>>>
>>>>>>>> "application/rdf+json"
>>>>>>>> isn't JSON-LD (it's the old Talis format).
>>>>>>>>
>>>>>>>> There is:
>>>>>>>>
>>>>>>>> $mtJSONLD           = 'application/ld+json'
>>>>>>>>
>>>>>>>>> it has to be replaced with
>>>>>>>>>
>>>>>>>>> 'application/rdf+json'
>>>>>>>>>
>>>>>>>>> I've updated the file here :
>>>>>>>>>
>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maybe I'm going to submit a pull request as Andy suggested, 
>>>>>>>>> but I'd like to understand why 'application/json' returns xml. 
>>>>>>>>> Besides, it's the same thing for nquads : I tried to replace
>>>>>>>>>
>>>>>>>>> $mtNQuads = 'application/n-quads'
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> $mtNQuads = 'application/x-trig'
>>>>>>>>>
>>>>>>>>> but still have xml...
>>>>>>>> The settings are:
>>>>>>>>
>>>>>>>> # Default for GET
>>>>>>>> # At least allow anything (and hope!)
>>>>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , 
>>>>>>>> #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>>>>>> # Datasets
>>>>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>>>>>>> # For SPARQL query
>>>>>>>> $accept_results="#{$mtSparqlResultsJ} , 
>>>>>>>> #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>>>>>
>>>>>>>> # Accept any in case of trouble.
>>>>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>>>>>
>>>>>>>>
>>>>>>>>> Is there a kind of default setting somewhere (if content-type 
>>>>>>>>> isn't recognized in Fuseki, the response is xml) ?
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>>>>>
>>>>>>>> Run Fuseki/full with "-v" and it should print the content 
>>>>>>>> negotiation details.
>>>>>>>>
>>>>>>>>       Andy
>>>>>>>>
>>>>>>>>> Thanks in advance
>>>>>>>>>
>>>>>>>>> VV
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>>>>>
>>>>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>>>>>> Hi Andy,
>>>>>>>>>>
>>>>>>>>>> Thanks again for your idea to modify the s-get script, it 
>>>>>>>>>> helped me understand ruby utilities and http requests (I 
>>>>>>>>>> often use the ruby scripts but never really looked inside).
>>>>>>>>>>
>>>>>>>>>> Don't know how to submit a pull request, and I'm not a ruby 
>>>>>>>>>> expert! Therefore I've put a small test file here :
>>>>>>>>>>
>>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- added "--output" in options + created a new function 
>>>>>>>>>> (set_output_format)
>>>>>>>>>>
>>>>>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>>>>>
>>>>>>>>>> -- doesn't work for json (returns xml...)
>>>>>>>>>>
>>>>>>>>>> N.B. : in this test file, I've removed large parts of the 
>>>>>>>>>> original code in order to improve readability
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>
>>>>>>>>>>> Many thanks for these ideas, I'm going to try the curl & 
>>>>>>>>>>> riot solutions.
>>>>>>>>>>>
>>>>>>>>>>>> Modify the s-get script to handle --output and set the 
>>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>>> changes
>>>>>>>>>>> I had made an attempt to modify the s-get script in the same 
>>>>>>>>>>> way as for s-query but it didn't work : if I have a moment 
>>>>>>>>>>> I'll try to understand how the options are handled.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to export a named graph which is stored in a TDB 
>>>>>>>>>>>>> dataset, and I want to store the output in several files 
>>>>>>>>>>>>> (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>>>>>
>>>>>>>>>>>>> My idea is to use "split" command in order to cut the 
>>>>>>>>>>>>> output of the export into pieces. However, this solution 
>>>>>>>>>>>>> with "split" requires ntriples or nquads (one triple per 
>>>>>>>>>>>>> line, so that the files are not cut in the middle of an 
>>>>>>>>>>>>> assertion ; besides, it's also more practical to have a 
>>>>>>>>>>>>> triple per line if I want to transform the data with perl 
>>>>>>>>>>>>> or sed).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I found a solution with s-query but had to edit the ruby 
>>>>>>>>>>>>> s-query script to get ntriples (see below).
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are other possible solutions for an export via 
>>>>>>>>>>>>> command-line utilities : "s-get" and "tdbdump". If I 
>>>>>>>>>>>>> understand well, "tdbdump" gives nquads as output, but one 
>>>>>>>>>>>>> can't export only a part of the data, everything is 
>>>>>>>>>>>>> exported at once. The "s-get" solution allows to select a 
>>>>>>>>>>>>> named graph in the dataset, but I couldn't change the 
>>>>>>>>>>>>> output format.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>>>>>>> Ways I can think of:
>>>>>>>>>>>>
>>>>>>>>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>>> changes.
>>>>>>>>>>>>
>>>>>>>>>>>> 2/ Use curl
>>>>>>>>>>>>
>>>>>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>>>>> 'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>>>>>
>>>>>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>>>>>
>>>>>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>>>>>
>>>>>>>>>>>>       Andy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> VV.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- l. 572 : when "json","xml","text","csv","tsv","nt"
>>>>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.2) Command
>>>>>>>>>>>>>
>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>>>>>>>>> --service=http://localhost:3030/BnF_text_v2/ "construct { 
>>>>>>>>>>>>> ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o 
>>>>>>>>>>>>> }}" --output=nt | split -l 500000 - 
>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named 
>>>>>>>>>>>>> graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>
>>>>>>>>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>>>>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>>>>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>>
>>>>>>>>>>>>> => Unknown argument: graph
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but 
>>>>>>>>>>>>> turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>
>>>>>>>>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>>>>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>>>>>>>>> --output=text | split -l 500000 - --additional-suffix=.nt 
>>>>>>>>>>>>> BnfTextTitres-
>>>>>>>>>>>>>
>>>>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid 
>>>>>>>>>>>>> option: --output=text (OptionParser::InvalidOption)
>>>>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by Andy Seaborne <an...@apache.org>.

Jena has it's own content negotiation mechanism - I couldn't find an 
existing one at the time and it has turned out to be "quite complicated" 
for linked data as control of the defaults and choices when not an exact 
match is important.

So we have control of the corner cases and defaults.

Internally in Fuseki:

"application/json" isn't registered for graphs or datasets.

There are two related registrations:

"application/rdf+json"
"application/ld+json"

Fuskei doesn't do an "+" processing.

Fuseki could default "application/json" to "application/ld+json".


curl without a header sends:

"Accept: */*"

Fuseki chooses the first it is internal list of choices.

It is not the same as sending no "Accept" when Fuseki chooses a default 
although none and */* give the server free choice of return.

curl -v -g 'http://localhost:3030/ds?query=ASK{}'
curl -v -g --header 'Accept:' 'http://localhost:3030/ds?query=ASK{}'

Content negotiation is quite sensitive to client setup and, well, some 
HTTP clients hate and don't set conneg then can't handle the results.

Some servers don't have content type setup.

On the client side, Jena pokes about in the file name to use the 
extension if all else fails.

     Andy

On 31/01/2019 13:27, vincent ventresque wrote:
> Sorry, let me sum up the previous messages :
> 
> 1) I wanted to export a named graph from tdb to ntriples
> 
> 2) Andy advised to modify s-get, which I did
> 
> 3) when modifying s-get, I noticed there were 2 wrong content-types : 
> application/json & application/n-quads ; both give rdf-xml output
> 
> 4) Andy suggested it came from s-get settings
> 
> 5) I showed that commenting the settings in s-get have no effect AND 
> that the problem is the same with curl.
> 
> 6) my purpose is also to understand how all this stuff works!
> 
> 
> Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
>> Vincent,
>>
>> can you start by explaining what you are trying to do and why, rather
>> describing how you're doing it?
>>
>> On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
>> <vi...@ens-lyon.fr> wrote:
>>> Sorry, I should have explained more clearly : the previous messages
>>> where about default settings in s-get, and when creating a new function
>>> to handle --output option, I noticed there was a wrong content-type in
>>> s-get for plain json (see my s-get file here :
>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download). 
>>>
>>>
>>>
>>> My purpose was to demonstrate that the problem isn't linked to s-get,
>>> since it's the same with curl. Besides, I noticed the same problem with
>>> n-quads.
>>>
>>> curl --header 'Accept: application/n-quads'
>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>> <rdf:RDF
>>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>       xmlns:j.0="http://example.org/" >
>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>       <j.0:tata>coucou</j.0:tata>
>>>     </rdf:Description>
>>> </rdf:RDF>
>>>
>>>
>>>
>>> Le 31/01/2019 à 14:12, ajs6f a écrit :
>>>> I'm not sure what you expect to get back from Fuseki with an 
>>>> "application/json" mimetype? There is no W3C-spec plain-JSON RDF 
>>>> serialization that I know of. I suppose there's the old Tallis idea:
>>>>
>>>> https://jena.apache.org/documentation/io/rdf-json.html
>>>>
>>>> but I can't imagine that's what you're looking for.
>>>>
>>>> ajs6f
>>>>
>>>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque 
>>>>> <vi...@ens-lyon.fr> wrote:
>>>>>
>>>>> It seems that the problem is completely independent from s-get (see 
>>>>> these results with curl below). So I think there's a default 
>>>>> setting somewhere in Fuseki itself.
>>>>>
>>>>>
>>>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 
>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>> <rdf:RDF
>>>>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>       xmlns:j.0="http://example.org/" >
>>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>>       <j.0:tata>coucou</j.0:tata>
>>>>>     </rdf:Description>
>>>>> </rdf:RDF>
>>>>>
>>>>>
>>>>> #~~~~~~~  --header 'Accept: application/rdf+json' 
>>>>> ~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: 
>>>>> application/rdf+json' 
>>>>> 'http://localhost:3030/test_tdb2?graph=http://test'
>>>>> {
>>>>>     "http://example.org/titi" : {
>>>>>       "http://example.org/tata" : [ {
>>>>>         "type" : "literal" ,
>>>>>         "value" : "coucou"
>>>>>       }
>>>>>        ]
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>>>>> Thanks for your quick reply!
>>>>>>
>>>>>>> $mtAppJSON isn't used.
>>>>>> I think my previous msg wasn't clear : I meant raw json and not 
>>>>>> json-ld (my code works now for both, and I use  $mtAppJSON ; but I 
>>>>>> had to replace 'application/json' with 'application/rdf+json' in 
>>>>>> order to get json instead of XML ; see the file here 
>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download) 
>>>>>>
>>>>>>
>>>>>>> The settings are: ...
>>>>>> I made a little test : comment these lines and the "names" part, 
>>>>>> and you'll get XML!
>>>>>>
>>>>>>
>>>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I found the origin of the problem for json : the $mtAppJSON had 
>>>>>>>> the value
>>>>>>>>
>>>>>>>> 'application/json'
>>>>>>> $mtAppJSON isn't used.
>>>>>>>
>>>>>>> "application/rdf+json"
>>>>>>> isn't JSON-LD (it's the old Talis format).
>>>>>>>
>>>>>>> There is:
>>>>>>>
>>>>>>> $mtJSONLD           = 'application/ld+json'
>>>>>>>
>>>>>>>> it has to be replaced with
>>>>>>>>
>>>>>>>> 'application/rdf+json'
>>>>>>>>
>>>>>>>> I've updated the file here :
>>>>>>>>
>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe I'm going to submit a pull request as Andy suggested, but 
>>>>>>>> I'd like to understand why 'application/json' returns xml. 
>>>>>>>> Besides, it's the same thing for nquads : I tried to replace
>>>>>>>>
>>>>>>>> $mtNQuads = 'application/n-quads'
>>>>>>>>
>>>>>>>> with
>>>>>>>>
>>>>>>>> $mtNQuads = 'application/x-trig'
>>>>>>>>
>>>>>>>> but still have xml...
>>>>>>> The settings are:
>>>>>>>
>>>>>>> # Default for GET
>>>>>>> # At least allow anything (and hope!)
>>>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , 
>>>>>>> #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>>>>> # Datasets
>>>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>>>>>> # For SPARQL query
>>>>>>> $accept_results="#{$mtSparqlResultsJ} , 
>>>>>>> #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>>>>
>>>>>>> # Accept any in case of trouble.
>>>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>>>>
>>>>>>>
>>>>>>>> Is there a kind of default setting somewhere (if content-type 
>>>>>>>> isn't recognized in Fuseki, the response is xml) ?
>>>>>>> Yes.
>>>>>>>
>>>>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>>>>
>>>>>>> Run Fuseki/full with "-v" and it should print the content 
>>>>>>> negotiation details.
>>>>>>>
>>>>>>>       Andy
>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>
>>>>>>>> VV
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>>>>
>>>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>>>>> Hi Andy,
>>>>>>>>>
>>>>>>>>> Thanks again for your idea to modify the s-get script, it 
>>>>>>>>> helped me understand ruby utilities and http requests (I often 
>>>>>>>>> use the ruby scripts but never really looked inside).
>>>>>>>>>
>>>>>>>>> Don't know how to submit a pull request, and I'm not a ruby 
>>>>>>>>> expert! Therefore I've put a small test file here :
>>>>>>>>>
>>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- added "--output" in options + created a new function 
>>>>>>>>> (set_output_format)
>>>>>>>>>
>>>>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>>>>
>>>>>>>>> -- doesn't work for json (returns xml...)
>>>>>>>>>
>>>>>>>>> N.B. : in this test file, I've removed large parts of the 
>>>>>>>>> original code in order to improve readability
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>>>>> Hi Andy,
>>>>>>>>>>
>>>>>>>>>> Many thanks for these ideas, I'm going to try the curl & riot 
>>>>>>>>>> solutions.
>>>>>>>>>>
>>>>>>>>>>> Modify the s-get script to handle --output and set the 
>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>> changes
>>>>>>>>>> I had made an attempt to modify the s-get script in the same 
>>>>>>>>>> way as for s-query but it didn't work : if I have a moment 
>>>>>>>>>> I'll try to understand how the options are handled.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I want to export a named graph which is stored in a TDB 
>>>>>>>>>>>> dataset, and I want to store the output in several files 
>>>>>>>>>>>> (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>>>>
>>>>>>>>>>>> My idea is to use "split" command in order to cut the output 
>>>>>>>>>>>> of the export into pieces. However, this solution with 
>>>>>>>>>>>> "split" requires ntriples or nquads (one triple per line, so 
>>>>>>>>>>>> that the files are not cut in the middle of an assertion ; 
>>>>>>>>>>>> besides, it's also more practical to have a triple per line 
>>>>>>>>>>>> if I want to transform the data with perl or sed).
>>>>>>>>>>>>
>>>>>>>>>>>> I found a solution with s-query but had to edit the ruby 
>>>>>>>>>>>> s-query script to get ntriples (see below).
>>>>>>>>>>>>
>>>>>>>>>>>> There are other possible solutions for an export via 
>>>>>>>>>>>> command-line utilities : "s-get" and "tdbdump". If I 
>>>>>>>>>>>> understand well, "tdbdump" gives nquads as output, but one 
>>>>>>>>>>>> can't export only a part of the data, everything is exported 
>>>>>>>>>>>> at once. The "s-get" solution allows to select a named graph 
>>>>>>>>>>>> in the dataset, but I couldn't change the output format.
>>>>>>>>>>>>
>>>>>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>>>>>> Ways I can think of:
>>>>>>>>>>>
>>>>>>>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>>>>>>>> "Accept:" header then please submit a pull request for the 
>>>>>>>>>>> changes.
>>>>>>>>>>>
>>>>>>>>>>> 2/ Use curl
>>>>>>>>>>>
>>>>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>>>>      'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>>>>
>>>>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>>>>
>>>>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>>>>
>>>>>>>>>>>       Andy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> VV.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>
>>>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>>>>
>>>>>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>>
>>>>>>>>>>>> 1.2) Command
>>>>>>>>>>>>
>>>>>>>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>>>>>>>> --service=http://localhost:3030/BnF_text_v2/ "construct { ?s 
>>>>>>>>>>>> ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" 
>>>>>>>>>>>> --output=nt | split -l 500000 - --additional-suffix=.nt 
>>>>>>>>>>>> BnfTextTitres-
>>>>>>>>>>>>
>>>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named 
>>>>>>>>>>>> graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>
>>>>>>>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>>>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>>>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>>
>>>>>>>>>>>> => Unknown argument: graph
>>>>>>>>>>>>
>>>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but 
>>>>>>>>>>>> turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>
>>>>>>>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>>>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>>>>>>>> --output=text | split -l 500000 - --additional-suffix=.nt 
>>>>>>>>>>>> BnfTextTitres-
>>>>>>>>>>>>
>>>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid 
>>>>>>>>>>>> option: --output=text (OptionParser::InvalidOption)
>>>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Hello

Thanks a lot for the explanations!

 > And for application/n-quads that is a dataset serialization not a 
graph serialization

So do you know why the 'name' $mtNQuads is in s-get?? (line 50)

 > Per application/json several people have already commented that there 
is no serialization directly linked to application/json.

So why is "s-query" returning json as default?

./s-query --service=http://localhost:3030/test_tdb2 'select * where { ?s 
?p ?o } limit 5'
{
   "head": {
     "vars": [ "s" , "p" , "o" ]
   } ,
   "results": {
     "bindings": [
       {
         "s": { "type": "uri" , "value": "http://example.org/titi" } ,
         "p": { "type": "uri" , "value": "http://example.org/toto" } ,
         "o": { "type": "literal" , "value": "coucou" }
       }
     ]
   }
}



Le 31/01/2019 à 15:16, Rob Vesse a écrit :
> No the content types aren't wrong you're just using them for the wrong things.
>
> Per application/json several people have already commented that there is no serialization directly linked to application/json.  There are specific MIME types for specific variants of JSON e.g. application/ld+json for JSON-LD
>
> And for application/n-quads that is a dataset serialization not a graph serialization
>
> If you ask for a content type that isn't supported by the server you are talking to then the server can choose to do one of two things.  NB - this is standard HTTP spec stuff, nothing specific to Fuseki.
>
> 1. Reject the request and send back a 406 Not Acceptable i.e. give you an error
> 2. Fall back to its preferred default serialization
>
> So in both cases since you are asking for content types that have no associated graph serialization Fuseki falls back to using it's default
>
> Rob
>
> On 31/01/2019, 13:34, "vincent ventresque" <vi...@ens-lyon.fr> wrote:
>
>      Sorry, let me sum up the previous messages :
>      
>      1) I wanted to export a named graph from tdb to ntriples
>      
>      2) Andy advised to modify s-get, which I did
>      
>      3) when modifying s-get, I noticed there were 2 wrong content-types :
>      application/json & application/n-quads ; both give rdf-xml output
>
>      4) Andy suggested it came from s-get settings
>      
>      5) I showed that commenting the settings in s-get have no effect AND
>      that the problem is the same with curl.
>      
>      6) my purpose is also to understand how all this stuff works!
>      
>      
>      Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
>      > Vincent,
>      >
>      > can you start by explaining what you are trying to do and why, rather
>      > describing how you're doing it?
>      >
>      > On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
>      > <vi...@ens-lyon.fr> wrote:
>      >> Sorry, I should have explained more clearly : the previous messages
>      >> where about default settings in s-get, and when creating a new function
>      >> to handle --output option, I noticed there was a wrong content-type in
>      >> s-get for plain json (see my s-get file here :
>      >> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download).
>      >>
>      >>
>      >> My purpose was to demonstrate that the problem isn't linked to s-get,
>      >> since it's the same with curl. Besides, I noticed the same problem with
>      >> n-quads.
>      >>
>      >> curl --header 'Accept: application/n-quads'
>      >> 'http://localhost:3030/test_tdb2?graph=http://test'
>      >> <rdf:RDF
>      >>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>      >>       xmlns:j.0="http://example.org/" >
>      >>     <rdf:Description rdf:about="http://example.org/titi">
>      >>       <j.0:tata>coucou</j.0:tata>
>      >>     </rdf:Description>
>      >> </rdf:RDF>
>      >>
>      >>
>      >>
>      >> Le 31/01/2019 à 14:12, ajs6f a écrit :
>      >>> I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:
>      >>>
>      >>> https://jena.apache.org/documentation/io/rdf-json.html
>      >>>
>      >>> but I can't imagine that's what you're looking for.
>      >>>
>      >>> ajs6f
>      >>>
>      >>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
>      >>>>
>      >>>> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
>      >>>>
>      >>>>
>      >>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>      >>>>
>      >>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
>      >>>> <rdf:RDF
>      >>>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>      >>>>       xmlns:j.0="http://example.org/" >
>      >>>>     <rdf:Description rdf:about="http://example.org/titi">
>      >>>>       <j.0:tata>coucou</j.0:tata>
>      >>>>     </rdf:Description>
>      >>>> </rdf:RDF>
>      >>>>
>      >>>>
>      >>>> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
>      >>>>
>      >>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
>      >>>> {
>      >>>>     "http://example.org/titi" : {
>      >>>>       "http://example.org/tata" : [ {
>      >>>>         "type" : "literal" ,
>      >>>>         "value" : "coucou"
>      >>>>       }
>      >>>>        ]
>      >>>>     }
>      >>>> }
>      >>>>
>      >>>>
>      >>>>
>      >>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>      >>>>> Thanks for your quick reply!
>      >>>>>
>      >>>>>> $mtAppJSON isn't used.
>      >>>>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
>      >>>>>
>      >>>>>> The settings are: ...
>      >>>>> I made a little test : comment these lines and the "names" part, and you'll get XML!
>      >>>>>
>      >>>>>
>      >>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>      >>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>      >>>>>>> Hello,
>      >>>>>>>
>      >>>>>>> I found the origin of the problem for json : the $mtAppJSON had the value
>      >>>>>>>
>      >>>>>>> 'application/json'
>      >>>>>> $mtAppJSON isn't used.
>      >>>>>>
>      >>>>>> "application/rdf+json"
>      >>>>>> isn't JSON-LD (it's the old Talis format).
>      >>>>>>
>      >>>>>> There is:
>      >>>>>>
>      >>>>>> $mtJSONLD           = 'application/ld+json'
>      >>>>>>
>      >>>>>>> it has to be replaced with
>      >>>>>>>
>      >>>>>>> 'application/rdf+json'
>      >>>>>>>
>      >>>>>>> I've updated the file here :
>      >>>>>>>
>      >>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>      >>>>>>>
>      >>>>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
>      >>>>>>>
>      >>>>>>> $mtNQuads = 'application/n-quads'
>      >>>>>>>
>      >>>>>>> with
>      >>>>>>>
>      >>>>>>> $mtNQuads = 'application/x-trig'
>      >>>>>>>
>      >>>>>>> but still have xml...
>      >>>>>> The settings are:
>      >>>>>>
>      >>>>>> # Default for GET
>      >>>>>> # At least allow anything (and hope!)
>      >>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>      >>>>>> # Datasets
>      >>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>      >>>>>> # For SPARQL query
>      >>>>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>      >>>>>>
>      >>>>>> # Accept any in case of trouble.
>      >>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>      >>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>      >>>>>>
>      >>>>>>
>      >>>>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
>      >>>>>> Yes.
>      >>>>>>
>      >>>>>> RDF/XML for graphs, N-Quads for datasets.
>      >>>>>>
>      >>>>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
>      >>>>>>
>      >>>>>>       Andy
>      >>>>>>
>      >>>>>>> Thanks in advance
>      >>>>>>>
>      >>>>>>> VV
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>      >>>>>>>
>      >>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>      >>>>>>>> Hi Andy,
>      >>>>>>>>
>      >>>>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
>      >>>>>>>>
>      >>>>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
>      >>>>>>>>
>      >>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>      >>>>>>>>
>      >>>>>>>> -- added "--output" in options + created a new function (set_output_format)
>      >>>>>>>>
>      >>>>>>>> -- it works for ntriples, xml, Json-LD,
>      >>>>>>>>
>      >>>>>>>> -- doesn't work for json (returns xml...)
>      >>>>>>>>
>      >>>>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
>      >>>>>>>>
>      >>>>>>>>
>      >>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>      >>>>>>>>> Hi Andy,
>      >>>>>>>>>
>      >>>>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>      >>>>>>>>>
>      >>>>>>>>>> Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
>      >>>>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>      >>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>      >>>>>>>>>>> Hello,
>      >>>>>>>>>>>
>      >>>>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
>      >>>>>>>>>>>
>      >>>>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
>      >>>>>>>>>>>
>      >>>>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
>      >>>>>>>>>>>
>      >>>>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
>      >>>>>>>>>>>
>      >>>>>>>>>>> Are there better solutions to get an export in several files?
>      >>>>>>>>>> Ways I can think of:
>      >>>>>>>>>>
>      >>>>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
>      >>>>>>>>>>
>      >>>>>>>>>> 2/ Use curl
>      >>>>>>>>>>
>      >>>>>>>>>> curl --header 'Accept: application/n-triples' \
>      >>>>>>>>>>      'http://localhost:3030/ds?graph=http://bnf_titres'
>      >>>>>>>>>>
>      >>>>>>>>>> 3/ Parse the s-get output:
>      >>>>>>>>>>
>      >>>>>>>>>> s-get ... | riot --syntax TTL
>      >>>>>>>>>>
>      >>>>>>>>>>       Andy
>      >>>>>>>>>>
>      >>>>>>>>>>
>      >>>>>>>>>>> Thanks in advance,
>      >>>>>>>>>>>
>      >>>>>>>>>>> VV.
>      >>>>>>>>>>>
>      >>>>>>>>>>>
>      >>>>>>>>>>>
>      >>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>      >>>>>>>>>>>
>      >>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>      >>>>>>>>>>>
>      >>>>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>      >>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>      >>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>      >>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>      >>>>>>>>>>>
>      >>>>>>>>>>> 1.2) Command
>      >>>>>>>>>>>
>      >>>>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>      >>>>>>>>>>>
>      >>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
>      >>>>>>>>>>>
>      >>>>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>      >>>>>>>>>>>
>      >>>>>>>>>>> => Unknown argument: graph
>      >>>>>>>>>>>
>      >>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
>      >>>>>>>>>>>
>      >>>>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>      >>>>>>>>>>>
>      >>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
>      >>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>      >>>>>>>>>>>
>      
>
>
>
>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by Rob Vesse <rv...@dotnetrdf.org>.

No the content types aren't wrong you're just using them for the wrong things.

Per application/json several people have already commented that there is no serialization directly linked to application/json.  There are specific MIME types for specific variants of JSON e.g. application/ld+json for JSON-LD

And for application/n-quads that is a dataset serialization not a graph serialization

If you ask for a content type that isn't supported by the server you are talking to then the server can choose to do one of two things.  NB - this is standard HTTP spec stuff, nothing specific to Fuseki.

1. Reject the request and send back a 406 Not Acceptable i.e. give you an error
2. Fall back to its preferred default serialization

So in both cases since you are asking for content types that have no associated graph serialization Fuseki falls back to using it's default

Rob

On 31/01/2019, 13:34, "vincent ventresque" <vi...@ens-lyon.fr> wrote:

    Sorry, let me sum up the previous messages :
    
    1) I wanted to export a named graph from tdb to ntriples
    
    2) Andy advised to modify s-get, which I did
    
    3) when modifying s-get, I noticed there were 2 wrong content-types : 
    application/json & application/n-quads ; both give rdf-xml output

    4) Andy suggested it came from s-get settings
    
    5) I showed that commenting the settings in s-get have no effect AND 
    that the problem is the same with curl.
    
    6) my purpose is also to understand how all this stuff works!
    
    
    Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
    > Vincent,
    >
    > can you start by explaining what you are trying to do and why, rather
    > describing how you're doing it?
    >
    > On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
    > <vi...@ens-lyon.fr> wrote:
    >> Sorry, I should have explained more clearly : the previous messages
    >> where about default settings in s-get, and when creating a new function
    >> to handle --output option, I noticed there was a wrong content-type in
    >> s-get for plain json (see my s-get file here :
    >> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download).
    >>
    >>
    >> My purpose was to demonstrate that the problem isn't linked to s-get,
    >> since it's the same with curl. Besides, I noticed the same problem with
    >> n-quads.
    >>
    >> curl --header 'Accept: application/n-quads'
    >> 'http://localhost:3030/test_tdb2?graph=http://test'
    >> <rdf:RDF
    >>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    >>       xmlns:j.0="http://example.org/" >
    >>     <rdf:Description rdf:about="http://example.org/titi">
    >>       <j.0:tata>coucou</j.0:tata>
    >>     </rdf:Description>
    >> </rdf:RDF>
    >>
    >>
    >>
    >> Le 31/01/2019 à 14:12, ajs6f a écrit :
    >>> I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:
    >>>
    >>> https://jena.apache.org/documentation/io/rdf-json.html
    >>>
    >>> but I can't imagine that's what you're looking for.
    >>>
    >>> ajs6f
    >>>
    >>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
    >>>>
    >>>> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
    >>>>
    >>>>
    >>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
    >>>>
    >>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
    >>>> <rdf:RDF
    >>>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    >>>>       xmlns:j.0="http://example.org/" >
    >>>>     <rdf:Description rdf:about="http://example.org/titi">
    >>>>       <j.0:tata>coucou</j.0:tata>
    >>>>     </rdf:Description>
    >>>> </rdf:RDF>
    >>>>
    >>>>
    >>>> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
    >>>>
    >>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
    >>>> {
    >>>>     "http://example.org/titi" : {
    >>>>       "http://example.org/tata" : [ {
    >>>>         "type" : "literal" ,
    >>>>         "value" : "coucou"
    >>>>       }
    >>>>        ]
    >>>>     }
    >>>> }
    >>>>
    >>>>
    >>>>
    >>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
    >>>>> Thanks for your quick reply!
    >>>>>
    >>>>>> $mtAppJSON isn't used.
    >>>>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
    >>>>>
    >>>>>> The settings are: ...
    >>>>> I made a little test : comment these lines and the "names" part, and you'll get XML!
    >>>>>
    >>>>>
    >>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
    >>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
    >>>>>>> Hello,
    >>>>>>>
    >>>>>>> I found the origin of the problem for json : the $mtAppJSON had the value
    >>>>>>>
    >>>>>>> 'application/json'
    >>>>>> $mtAppJSON isn't used.
    >>>>>>
    >>>>>> "application/rdf+json"
    >>>>>> isn't JSON-LD (it's the old Talis format).
    >>>>>>
    >>>>>> There is:
    >>>>>>
    >>>>>> $mtJSONLD           = 'application/ld+json'
    >>>>>>
    >>>>>>> it has to be replaced with
    >>>>>>>
    >>>>>>> 'application/rdf+json'
    >>>>>>>
    >>>>>>> I've updated the file here :
    >>>>>>>
    >>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
    >>>>>>>
    >>>>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
    >>>>>>>
    >>>>>>> $mtNQuads = 'application/n-quads'
    >>>>>>>
    >>>>>>> with
    >>>>>>>
    >>>>>>> $mtNQuads = 'application/x-trig'
    >>>>>>>
    >>>>>>> but still have xml...
    >>>>>> The settings are:
    >>>>>>
    >>>>>> # Default for GET
    >>>>>> # At least allow anything (and hope!)
    >>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
    >>>>>> # Datasets
    >>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
    >>>>>> # For SPARQL query
    >>>>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
    >>>>>>
    >>>>>> # Accept any in case of trouble.
    >>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
    >>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
    >>>>>>
    >>>>>>
    >>>>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
    >>>>>> Yes.
    >>>>>>
    >>>>>> RDF/XML for graphs, N-Quads for datasets.
    >>>>>>
    >>>>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
    >>>>>>
    >>>>>>       Andy
    >>>>>>
    >>>>>>> Thanks in advance
    >>>>>>>
    >>>>>>> VV
    >>>>>>>
    >>>>>>>
    >>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
    >>>>>>>
    >>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
    >>>>>>>> Hi Andy,
    >>>>>>>>
    >>>>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
    >>>>>>>>
    >>>>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
    >>>>>>>>
    >>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
    >>>>>>>>
    >>>>>>>> -- added "--output" in options + created a new function (set_output_format)
    >>>>>>>>
    >>>>>>>> -- it works for ntriples, xml, Json-LD,
    >>>>>>>>
    >>>>>>>> -- doesn't work for json (returns xml...)
    >>>>>>>>
    >>>>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
    >>>>>>>>> Hi Andy,
    >>>>>>>>>
    >>>>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
    >>>>>>>>>
    >>>>>>>>>> Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
    >>>>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
    >>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
    >>>>>>>>>>> Hello,
    >>>>>>>>>>>
    >>>>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
    >>>>>>>>>>>
    >>>>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
    >>>>>>>>>>>
    >>>>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
    >>>>>>>>>>>
    >>>>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
    >>>>>>>>>>>
    >>>>>>>>>>> Are there better solutions to get an export in several files?
    >>>>>>>>>> Ways I can think of:
    >>>>>>>>>>
    >>>>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
    >>>>>>>>>>
    >>>>>>>>>> 2/ Use curl
    >>>>>>>>>>
    >>>>>>>>>> curl --header 'Accept: application/n-triples' \
    >>>>>>>>>>      'http://localhost:3030/ds?graph=http://bnf_titres'
    >>>>>>>>>>
    >>>>>>>>>> 3/ Parse the s-get output:
    >>>>>>>>>>
    >>>>>>>>>> s-get ... | riot --syntax TTL
    >>>>>>>>>>
    >>>>>>>>>>       Andy
    >>>>>>>>>>
    >>>>>>>>>>
    >>>>>>>>>>> Thanks in advance,
    >>>>>>>>>>>
    >>>>>>>>>>> VV.
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>>
    >>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
    >>>>>>>>>>>
    >>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
    >>>>>>>>>>>
    >>>>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
    >>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
    >>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
    >>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
    >>>>>>>>>>>
    >>>>>>>>>>> 1.2) Command
    >>>>>>>>>>>
    >>>>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
    >>>>>>>>>>>
    >>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
    >>>>>>>>>>>
    >>>>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
    >>>>>>>>>>>
    >>>>>>>>>>> => Unknown argument: graph
    >>>>>>>>>>>
    >>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
    >>>>>>>>>>>
    >>>>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
    >>>>>>>>>>>
    >>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
    >>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
    >>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Sorry, let me sum up the previous messages :

1) I wanted to export a named graph from tdb to ntriples

2) Andy advised to modify s-get, which I did

3) when modifying s-get, I noticed there were 2 wrong content-types : 
application/json & application/n-quads ; both give rdf-xml output

4) Andy suggested it came from s-get settings

5) I showed that commenting the settings in s-get have no effect AND 
that the problem is the same with curl.

6) my purpose is also to understand how all this stuff works!


Le 31/01/2019 à 14:22, Martynas Jusevičius a écrit :
> Vincent,
>
> can you start by explaining what you are trying to do and why, rather
> describing how you're doing it?
>
> On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
> <vi...@ens-lyon.fr> wrote:
>> Sorry, I should have explained more clearly : the previous messages
>> where about default settings in s-get, and when creating a new function
>> to handle --output option, I noticed there was a wrong content-type in
>> s-get for plain json (see my s-get file here :
>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download).
>>
>>
>> My purpose was to demonstrate that the problem isn't linked to s-get,
>> since it's the same with curl. Besides, I noticed the same problem with
>> n-quads.
>>
>> curl --header 'Accept: application/n-quads'
>> 'http://localhost:3030/test_tdb2?graph=http://test'
>> <rdf:RDF
>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>       xmlns:j.0="http://example.org/" >
>>     <rdf:Description rdf:about="http://example.org/titi">
>>       <j.0:tata>coucou</j.0:tata>
>>     </rdf:Description>
>> </rdf:RDF>
>>
>>
>>
>> Le 31/01/2019 à 14:12, ajs6f a écrit :
>>> I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:
>>>
>>> https://jena.apache.org/documentation/io/rdf-json.html
>>>
>>> but I can't imagine that's what you're looking for.
>>>
>>> ajs6f
>>>
>>>> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
>>>>
>>>> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
>>>>
>>>>
>>>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
>>>> <rdf:RDF
>>>>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>       xmlns:j.0="http://example.org/" >
>>>>     <rdf:Description rdf:about="http://example.org/titi">
>>>>       <j.0:tata>coucou</j.0:tata>
>>>>     </rdf:Description>
>>>> </rdf:RDF>
>>>>
>>>>
>>>> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
>>>> {
>>>>     "http://example.org/titi" : {
>>>>       "http://example.org/tata" : [ {
>>>>         "type" : "literal" ,
>>>>         "value" : "coucou"
>>>>       }
>>>>        ]
>>>>     }
>>>> }
>>>>
>>>>
>>>>
>>>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>>>> Thanks for your quick reply!
>>>>>
>>>>>> $mtAppJSON isn't used.
>>>>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
>>>>>
>>>>>> The settings are: ...
>>>>> I made a little test : comment these lines and the "names" part, and you'll get XML!
>>>>>
>>>>>
>>>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I found the origin of the problem for json : the $mtAppJSON had the value
>>>>>>>
>>>>>>> 'application/json'
>>>>>> $mtAppJSON isn't used.
>>>>>>
>>>>>> "application/rdf+json"
>>>>>> isn't JSON-LD (it's the old Talis format).
>>>>>>
>>>>>> There is:
>>>>>>
>>>>>> $mtJSONLD           = 'application/ld+json'
>>>>>>
>>>>>>> it has to be replaced with
>>>>>>>
>>>>>>> 'application/rdf+json'
>>>>>>>
>>>>>>> I've updated the file here :
>>>>>>>
>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>>>>>>>
>>>>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
>>>>>>>
>>>>>>> $mtNQuads = 'application/n-quads'
>>>>>>>
>>>>>>> with
>>>>>>>
>>>>>>> $mtNQuads = 'application/x-trig'
>>>>>>>
>>>>>>> but still have xml...
>>>>>> The settings are:
>>>>>>
>>>>>> # Default for GET
>>>>>> # At least allow anything (and hope!)
>>>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>>>> # Datasets
>>>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>>>>> # For SPARQL query
>>>>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>>>
>>>>>> # Accept any in case of trouble.
>>>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>>>
>>>>>>
>>>>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
>>>>>> Yes.
>>>>>>
>>>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>>>
>>>>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
>>>>>>
>>>>>>       Andy
>>>>>>
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>> VV
>>>>>>>
>>>>>>>
>>>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>>>
>>>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>>>> Hi Andy,
>>>>>>>>
>>>>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
>>>>>>>>
>>>>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
>>>>>>>>
>>>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>>>>>>>>
>>>>>>>> -- added "--output" in options + created a new function (set_output_format)
>>>>>>>>
>>>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>>>
>>>>>>>> -- doesn't work for json (returns xml...)
>>>>>>>>
>>>>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>>>> Hi Andy,
>>>>>>>>>
>>>>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>>>>>>>>>
>>>>>>>>>> Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
>>>>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>>>
>>>>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
>>>>>>>>>>>
>>>>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
>>>>>>>>>>>
>>>>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
>>>>>>>>>>>
>>>>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>>>>> Ways I can think of:
>>>>>>>>>>
>>>>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
>>>>>>>>>>
>>>>>>>>>> 2/ Use curl
>>>>>>>>>>
>>>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>>>      'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>>>
>>>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>>>
>>>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>>>
>>>>>>>>>>       Andy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> VV.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>>>
>>>>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>>>
>>>>>>>>>>> 1.2) Command
>>>>>>>>>>>
>>>>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>
>>>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>
>>>>>>>>>>> => Unknown argument: graph
>>>>>>>>>>>
>>>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>>>
>>>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
>>>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by Martynas Jusevičius <ma...@atomgraph.com>.

Vincent,

can you start by explaining what you are trying to do and why, rather
describing how you're doing it?

On Thu, Jan 31, 2019 at 2:20 PM vincent ventresque
<vi...@ens-lyon.fr> wrote:
>
> Sorry, I should have explained more clearly : the previous messages
> where about default settings in s-get, and when creating a new function
> to handle --output option, I noticed there was a wrong content-type in
> s-get for plain json (see my s-get file here :
> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download).
>
>
> My purpose was to demonstrate that the problem isn't linked to s-get,
> since it's the same with curl. Besides, I noticed the same problem with
> n-quads.
>
> curl --header 'Accept: application/n-quads'
> 'http://localhost:3030/test_tdb2?graph=http://test'
> <rdf:RDF
>      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>      xmlns:j.0="http://example.org/" >
>    <rdf:Description rdf:about="http://example.org/titi">
>      <j.0:tata>coucou</j.0:tata>
>    </rdf:Description>
> </rdf:RDF>
>
>
>
> Le 31/01/2019 à 14:12, ajs6f a écrit :
> > I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:
> >
> > https://jena.apache.org/documentation/io/rdf-json.html
> >
> > but I can't imagine that's what you're looking for.
> >
> > ajs6f
> >
> >> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
> >>
> >> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
> >>
> >>
> >> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
> >>
> >> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
> >> <rdf:RDF
> >>      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> >>      xmlns:j.0="http://example.org/" >
> >>    <rdf:Description rdf:about="http://example.org/titi">
> >>      <j.0:tata>coucou</j.0:tata>
> >>    </rdf:Description>
> >> </rdf:RDF>
> >>
> >>
> >> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
> >> {
> >>    "http://example.org/titi" : {
> >>      "http://example.org/tata" : [ {
> >>        "type" : "literal" ,
> >>        "value" : "coucou"
> >>      }
> >>       ]
> >>    }
> >> }
> >>
> >>
> >>
> >> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
> >>> Thanks for your quick reply!
> >>>
> >>>> $mtAppJSON isn't used.
> >>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
> >>>
> >>>> The settings are: ...
> >>> I made a little test : comment these lines and the "names" part, and you'll get XML!
> >>>
> >>>
> >>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
> >>>>
> >>>> On 31/01/2019 11:26, vincent ventresque wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I found the origin of the problem for json : the $mtAppJSON had the value
> >>>>>
> >>>>> 'application/json'
> >>>> $mtAppJSON isn't used.
> >>>>
> >>>> "application/rdf+json"
> >>>> isn't JSON-LD (it's the old Talis format).
> >>>>
> >>>> There is:
> >>>>
> >>>> $mtJSONLD           = 'application/ld+json'
> >>>>
> >>>>> it has to be replaced with
> >>>>>
> >>>>> 'application/rdf+json'
> >>>>>
> >>>>> I've updated the file here :
> >>>>>
> >>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
> >>>>>
> >>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
> >>>>>
> >>>>> $mtNQuads = 'application/n-quads'
> >>>>>
> >>>>> with
> >>>>>
> >>>>> $mtNQuads = 'application/x-trig'
> >>>>>
> >>>>> but still have xml...
> >>>> The settings are:
> >>>>
> >>>> # Default for GET
> >>>> # At least allow anything (and hope!)
> >>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
> >>>> # Datasets
> >>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
> >>>> # For SPARQL query
> >>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
> >>>>
> >>>> # Accept any in case of trouble.
> >>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
> >>>> $accept_results="#{$accept_results} , */*;q=0.1"
> >>>>
> >>>>
> >>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
> >>>> Yes.
> >>>>
> >>>> RDF/XML for graphs, N-Quads for datasets.
> >>>>
> >>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
> >>>>
> >>>>      Andy
> >>>>
> >>>>> Thanks in advance
> >>>>>
> >>>>> VV
> >>>>>
> >>>>>
> >>>>> Ok, maybe I'm going to submit a pull request, but I'd
> >>>>>
> >>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
> >>>>>> Hi Andy,
> >>>>>>
> >>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
> >>>>>>
> >>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
> >>>>>>
> >>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
> >>>>>>
> >>>>>> -- added "--output" in options + created a new function (set_output_format)
> >>>>>>
> >>>>>> -- it works for ntriples, xml, Json-LD,
> >>>>>>
> >>>>>> -- doesn't work for json (returns xml...)
> >>>>>>
> >>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
> >>>>>>
> >>>>>>
> >>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
> >>>>>>> Hi Andy,
> >>>>>>>
> >>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
> >>>>>>>
> >>>>>>>> Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
> >>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
> >>>>>>>>
> >>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
> >>>>>>>>>
> >>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
> >>>>>>>>>
> >>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
> >>>>>>>>>
> >>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
> >>>>>>>>>
> >>>>>>>>> Are there better solutions to get an export in several files?
> >>>>>>>> Ways I can think of:
> >>>>>>>>
> >>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
> >>>>>>>>
> >>>>>>>> 2/ Use curl
> >>>>>>>>
> >>>>>>>> curl --header 'Accept: application/n-triples' \
> >>>>>>>>     'http://localhost:3030/ds?graph=http://bnf_titres'
> >>>>>>>>
> >>>>>>>> 3/ Parse the s-get output:
> >>>>>>>>
> >>>>>>>> s-get ... | riot --syntax TTL
> >>>>>>>>
> >>>>>>>>      Andy
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> VV.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
> >>>>>>>>>
> >>>>>>>>> 1.1) Edit s-query ruby script (add nt)
> >>>>>>>>>
> >>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
> >>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
> >>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
> >>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
> >>>>>>>>>
> >>>>>>>>> 1.2) Command
> >>>>>>>>>
> >>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
> >>>>>>>>>
> >>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
> >>>>>>>>>
> >>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
> >>>>>>>>>
> >>>>>>>>> => Unknown argument: graph
> >>>>>>>>>
> >>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
> >>>>>>>>>
> >>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
> >>>>>>>>>
> >>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
> >>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
> >>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Sorry, I should have explained more clearly : the previous messages 
where about default settings in s-get, and when creating a new function 
to handle --output option, I noticed there was a wrong content-type in 
s-get for plain json (see my s-get file here : 
https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download). 


My purpose was to demonstrate that the problem isn't linked to s-get, 
since it's the same with curl. Besides, I noticed the same problem with 
n-quads.

curl --header 'Accept: application/n-quads' 
'http://localhost:3030/test_tdb2?graph=http://test'
<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:j.0="http://example.org/" >
   <rdf:Description rdf:about="http://example.org/titi">
     <j.0:tata>coucou</j.0:tata>
   </rdf:Description>
</rdf:RDF>



Le 31/01/2019 à 14:12, ajs6f a écrit :
> I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:
>
> https://jena.apache.org/documentation/io/rdf-json.html
>
> but I can't imagine that's what you're looking for.
>
> ajs6f
>
>> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
>>
>> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
>>
>>
>> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
>>
>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
>> <rdf:RDF
>>      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>      xmlns:j.0="http://example.org/" >
>>    <rdf:Description rdf:about="http://example.org/titi">
>>      <j.0:tata>coucou</j.0:tata>
>>    </rdf:Description>
>> </rdf:RDF>
>>
>>
>> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
>>
>> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
>> {
>>    "http://example.org/titi" : {
>>      "http://example.org/tata" : [ {
>>        "type" : "literal" ,
>>        "value" : "coucou"
>>      }
>>       ]
>>    }
>> }
>>
>>
>>
>> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>>> Thanks for your quick reply!
>>>
>>>> $mtAppJSON isn't used.
>>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
>>>
>>>> The settings are: ...
>>> I made a little test : comment these lines and the "names" part, and you'll get XML!
>>>
>>>
>>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>>>
>>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>>> Hello,
>>>>>
>>>>> I found the origin of the problem for json : the $mtAppJSON had the value
>>>>>
>>>>> 'application/json'
>>>> $mtAppJSON isn't used.
>>>>
>>>> "application/rdf+json"
>>>> isn't JSON-LD (it's the old Talis format).
>>>>
>>>> There is:
>>>>
>>>> $mtJSONLD           = 'application/ld+json'
>>>>
>>>>> it has to be replaced with
>>>>>
>>>>> 'application/rdf+json'
>>>>>
>>>>> I've updated the file here :
>>>>>
>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>>>>>
>>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
>>>>>
>>>>> $mtNQuads = 'application/n-quads'
>>>>>
>>>>> with
>>>>>
>>>>> $mtNQuads = 'application/x-trig'
>>>>>
>>>>> but still have xml...
>>>> The settings are:
>>>>
>>>> # Default for GET
>>>> # At least allow anything (and hope!)
>>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>>> # Datasets
>>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>>> # For SPARQL query
>>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>>>
>>>> # Accept any in case of trouble.
>>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>>>
>>>>
>>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
>>>> Yes.
>>>>
>>>> RDF/XML for graphs, N-Quads for datasets.
>>>>
>>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
>>>>
>>>>      Andy
>>>>
>>>>> Thanks in advance
>>>>>
>>>>> VV
>>>>>
>>>>>
>>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>>>
>>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>>> Hi Andy,
>>>>>>
>>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
>>>>>>
>>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
>>>>>>
>>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download
>>>>>>
>>>>>> -- added "--output" in options + created a new function (set_output_format)
>>>>>>
>>>>>> -- it works for ntriples, xml, Json-LD,
>>>>>>
>>>>>> -- doesn't work for json (returns xml...)
>>>>>>
>>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
>>>>>>
>>>>>>
>>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>>>>>>>
>>>>>>>> Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
>>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>>>
>>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
>>>>>>>>>
>>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
>>>>>>>>>
>>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
>>>>>>>>>
>>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
>>>>>>>>>
>>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>>> Ways I can think of:
>>>>>>>>
>>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
>>>>>>>>
>>>>>>>> 2/ Use curl
>>>>>>>>
>>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>>     'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>>>
>>>>>>>> 3/ Parse the s-get output:
>>>>>>>>
>>>>>>>> s-get ... | riot --syntax TTL
>>>>>>>>
>>>>>>>>      Andy
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> VV.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>
>>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>>>
>>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>>>
>>>>>>>>> 1.2) Command
>>>>>>>>>
>>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>
>>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>
>>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>
>>>>>>>>> => Unknown argument: graph
>>>>>>>>>
>>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>
>>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>>>
>>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
>>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by ajs6f <aj...@apache.org>.

I'm not sure what you expect to get back from Fuseki with an "application/json" mimetype? There is no W3C-spec plain-JSON RDF serialization that I know of. I suppose there's the old Tallis idea:

https://jena.apache.org/documentation/io/rdf-json.html

but I can't imagine that's what you're looking for.

ajs6f

> On Jan 31, 2019, at 8:09 AM, vincent ventresque <vi...@ens-lyon.fr> wrote:
> 
> It seems that the problem is completely independent from s-get (see these results with curl below). So I think there's a default setting somewhere in Fuseki itself.
> 
> 
> #~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~
> 
> :~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 'http://localhost:3030/test_tdb2?graph=http://test'
> <rdf:RDF
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:j.0="http://example.org/" >
>   <rdf:Description rdf:about="http://example.org/titi">
>     <j.0:tata>coucou</j.0:tata>
>   </rdf:Description>
> </rdf:RDF>
> 
> 
> #~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~
> 
> :~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 'http://localhost:3030/test_tdb2?graph=http://test'
> {
>   "http://example.org/titi" : {
>     "http://example.org/tata" : [ {
>       "type" : "literal" ,
>       "value" : "coucou"
>     }
>      ]
>   }
> }
> 
> 
> 
> Le 31/01/2019 à 12:58, vincent ventresque a écrit :
>> Thanks for your quick reply!
>> 
>> > $mtAppJSON isn't used.
>> 
>> I think my previous msg wasn't clear : I meant raw json and not json-ld (my code works now for both, and I use  $mtAppJSON ; but I had to replace 'application/json' with 'application/rdf+json' in order to get json instead of XML ; see the file here https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
>> 
>> > The settings are: ...
>> 
>> I made a little test : comment these lines and the "names" part, and you'll get XML!
>> 
>> 
>> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>> 
>>> 
>>> On 31/01/2019 11:26, vincent ventresque wrote:
>>>> Hello,
>>>> 
>>>> I found the origin of the problem for json : the $mtAppJSON had the value
>>>> 
>>>> 'application/json'
>>> 
>>> $mtAppJSON isn't used.
>>> 
>>> "application/rdf+json"
>>> isn't JSON-LD (it's the old Talis format).
>>> 
>>> There is:
>>> 
>>> $mtJSONLD           = 'application/ld+json'
>>> 
>>>> 
>>>> it has to be replaced with
>>>> 
>>>> 'application/rdf+json'
>>>> 
>>>> I've updated the file here :
>>>> 
>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>> 
>>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd like to understand why 'application/json' returns xml. Besides, it's the same thing for nquads : I tried to replace
>>>> 
>>>> $mtNQuads = 'application/n-quads'
>>>> 
>>>> with
>>>> 
>>>> $mtNQuads = 'application/x-trig'
>>>> 
>>>> but still have xml...
>>> 
>>> The settings are:
>>> 
>>> # Default for GET
>>> # At least allow anything (and hope!)
>>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , #{$mtJSONLD};q=0.5"
>>> # Datasets
>>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>>> # For SPARQL query
>>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , #{$accept_rdf}"
>>> 
>>> # Accept any in case of trouble.
>>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>>> $accept_results="#{$accept_results} , */*;q=0.1"
>>> 
>>> 
>>>> 
>>>> Is there a kind of default setting somewhere (if content-type isn't recognized in Fuseki, the response is xml) ?
>>> 
>>> Yes.
>>> 
>>> RDF/XML for graphs, N-Quads for datasets.
>>> 
>>> Run Fuseki/full with "-v" and it should print the content negotiation details.
>>> 
>>>     Andy
>>> 
>>>> 
>>>> Thanks in advance
>>>> 
>>>> VV
>>>> 
>>>> 
>>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>> 
>>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>>> Hi Andy,
>>>>> 
>>>>> Thanks again for your idea to modify the s-get script, it helped me understand ruby utilities and http requests (I often use the ruby scripts but never really looked inside).
>>>>> 
>>>>> Don't know how to submit a pull request, and I'm not a ruby expert! Therefore I've put a small test file here :
>>>>> 
>>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>> 
>>>>> -- added "--output" in options + created a new function (set_output_format)
>>>>> 
>>>>> -- it works for ntriples, xml, Json-LD,
>>>>> 
>>>>> -- doesn't work for json (returns xml...)
>>>>> 
>>>>> N.B. : in this test file, I've removed large parts of the original code in order to improve readability
>>>>> 
>>>>> 
>>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>>> Hi Andy,
>>>>>> 
>>>>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>>>>>> 
>>>>>> > Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes
>>>>>> 
>>>>>> I had made an attempt to modify the s-get script in the same way as for s-query but it didn't work : if I have a moment I'll try to understand how the options are handled.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>> 
>>>>>>> 
>>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I want to export a named graph which is stored in a TDB dataset, and I want to store the output in several files (for the named graph contains +/- 9.5 M triples).
>>>>>>>> 
>>>>>>>> My idea is to use "split" command in order to cut the output of the export into pieces. However, this solution with "split" requires ntriples or nquads (one triple per line, so that the files are not cut in the middle of an assertion ; besides, it's also more practical to have a triple per line if I want to transform the data with perl or sed).
>>>>>>>> 
>>>>>>>> I found a solution with s-query but had to edit the ruby s-query script to get ntriples (see below).
>>>>>>>> 
>>>>>>>> There are other possible solutions for an export via command-line utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives nquads as output, but one can't export only a part of the data, everything is exported at once. The "s-get" solution allows to select a named graph in the dataset, but I couldn't change the output format.
>>>>>>>> 
>>>>>>>> Are there better solutions to get an export in several files?
>>>>>>> 
>>>>>>> Ways I can think of:
>>>>>>> 
>>>>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" header then please submit a pull request for the changes.
>>>>>>> 
>>>>>>> 2/ Use curl
>>>>>>> 
>>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>> 
>>>>>>> 3/ Parse the s-get output:
>>>>>>> 
>>>>>>> s-get ... | riot --syntax TTL
>>>>>>> 
>>>>>>>     Andy
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> VV.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>> 
>>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>> 
>>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>> 
>>>>>>>> 1.2) Command
>>>>>>>> 
>>>>>>>> /my/path/to/fuseki/bin/s-query --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>> 
>>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>> 
>>>>>>>> /my/path/to/jena/bin/tdbdump --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>> 
>>>>>>>> => Unknown argument: graph
>>>>>>>> 
>>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>> 
>>>>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data http://bnf_titres --output=text | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>>> 
>>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: --output=text (OptionParser::InvalidOption)
>>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

It seems that the problem is completely independent from s-get (see 
these results with curl below). So I think there's a default setting 
somewhere in Fuseki itself.


#~~~~~~~  --header 'Accept: application/json' ~~~~~~~~~~~~~~~~~~~~~

:~/Documents/fuseki/bin$ curl --header 'Accept: application/json' 
'http://localhost:3030/test_tdb2?graph=http://test'
<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:j.0="http://example.org/" >
   <rdf:Description rdf:about="http://example.org/titi">
     <j.0:tata>coucou</j.0:tata>
   </rdf:Description>
</rdf:RDF>


#~~~~~~~  --header 'Accept: application/rdf+json' ~~~~~~~~~~~~~~~~~~~~~~

:~/Documents/fuseki/bin$ curl --header 'Accept: application/rdf+json' 
'http://localhost:3030/test_tdb2?graph=http://test'
{
   "http://example.org/titi" : {
     "http://example.org/tata" : [ {
       "type" : "literal" ,
       "value" : "coucou"
     }
      ]
   }
}



Le 31/01/2019 à 12:58, vincent ventresque a écrit :
> Thanks for your quick reply!
>
> > $mtAppJSON isn't used.
>
> I think my previous msg wasn't clear : I meant raw json and not 
> json-ld (my code works now for both, and I use  $mtAppJSON ; but I had 
> to replace 'application/json' with 'application/rdf+json' in order to 
> get json instead of XML ; see the file here 
> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)
>
> > The settings are: ...
>
> I made a little test : comment these lines and the "names" part, and 
> you'll get XML!
>
>
> Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>>
>>
>> On 31/01/2019 11:26, vincent ventresque wrote:
>>> Hello,
>>>
>>> I found the origin of the problem for json : the $mtAppJSON had the 
>>> value
>>>
>>> 'application/json'
>>
>> $mtAppJSON isn't used.
>>
>> "application/rdf+json"
>> isn't JSON-LD (it's the old Talis format).
>>
>> There is:
>>
>> $mtJSONLD           = 'application/ld+json'
>>
>>>
>>> it has to be replaced with
>>>
>>> 'application/rdf+json'
>>>
>>> I've updated the file here :
>>>
>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>
>>>
>>> Maybe I'm going to submit a pull request as Andy suggested, but I'd 
>>> like to understand why 'application/json' returns xml. Besides, it's 
>>> the same thing for nquads : I tried to replace
>>>
>>> $mtNQuads = 'application/n-quads'
>>>
>>> with
>>>
>>> $mtNQuads = 'application/x-trig'
>>>
>>> but still have xml...
>>
>> The settings are:
>>
>> # Default for GET
>> # At least allow anything (and hope!)
>> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , 
>> #{$mtJSONLD};q=0.5"
>> # Datasets
>> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
>> # For SPARQL query
>> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , 
>> #{$accept_rdf}"
>>
>> # Accept any in case of trouble.
>> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
>> $accept_results="#{$accept_results} , */*;q=0.1"
>>
>>
>>>
>>> Is there a kind of default setting somewhere (if content-type isn't 
>>> recognized in Fuseki, the response is xml) ?
>>
>> Yes.
>>
>> RDF/XML for graphs, N-Quads for datasets.
>>
>> Run Fuseki/full with "-v" and it should print the content negotiation 
>> details.
>>
>>     Andy
>>
>>>
>>> Thanks in advance
>>>
>>> VV
>>>
>>>
>>> Ok, maybe I'm going to submit a pull request, but I'd
>>>
>>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>>> Hi Andy,
>>>>
>>>> Thanks again for your idea to modify the s-get script, it helped me 
>>>> understand ruby utilities and http requests (I often use the ruby 
>>>> scripts but never really looked inside).
>>>>
>>>> Don't know how to submit a pull request, and I'm not a ruby expert! 
>>>> Therefore I've put a small test file here :
>>>>
>>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>>
>>>>
>>>> -- added "--output" in options + created a new function 
>>>> (set_output_format)
>>>>
>>>> -- it works for ntriples, xml, Json-LD,
>>>>
>>>> -- doesn't work for json (returns xml...)
>>>>
>>>> N.B. : in this test file, I've removed large parts of the original 
>>>> code in order to improve readability
>>>>
>>>>
>>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>>> Hi Andy,
>>>>>
>>>>> Many thanks for these ideas, I'm going to try the curl & riot 
>>>>> solutions.
>>>>>
>>>>> > Modify the s-get script to handle --output and set the "Accept:" 
>>>>> header then please submit a pull request for the changes
>>>>>
>>>>> I had made an attempt to modify the s-get script in the same way 
>>>>> as for s-query but it didn't work : if I have a moment I'll try to 
>>>>> understand how the options are handled.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>>
>>>>>>
>>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I want to export a named graph which is stored in a TDB dataset, 
>>>>>>> and I want to store the output in several files (for the named 
>>>>>>> graph contains +/- 9.5 M triples).
>>>>>>>
>>>>>>> My idea is to use "split" command in order to cut the output of 
>>>>>>> the export into pieces. However, this solution with "split" 
>>>>>>> requires ntriples or nquads (one triple per line, so that the 
>>>>>>> files are not cut in the middle of an assertion ; besides, it's 
>>>>>>> also more practical to have a triple per line if I want to 
>>>>>>> transform the data with perl or sed).
>>>>>>>
>>>>>>> I found a solution with s-query but had to edit the ruby s-query 
>>>>>>> script to get ntriples (see below).
>>>>>>>
>>>>>>> There are other possible solutions for an export via 
>>>>>>> command-line utilities : "s-get" and "tdbdump". If I understand 
>>>>>>> well, "tdbdump" gives nquads as output, but one can't export 
>>>>>>> only a part of the data, everything is exported at once. The 
>>>>>>> "s-get" solution allows to select a named graph in the dataset, 
>>>>>>> but I couldn't change the output format.
>>>>>>>
>>>>>>> Are there better solutions to get an export in several files?
>>>>>>
>>>>>> Ways I can think of:
>>>>>>
>>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>>> "Accept:" header then please submit a pull request for the changes.
>>>>>>
>>>>>> 2/ Use curl
>>>>>>
>>>>>> curl --header 'Accept: application/n-triples' \
>>>>>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>>
>>>>>> 3/ Parse the s-get output:
>>>>>>
>>>>>> s-get ... | riot --syntax TTL
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> VV.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>>
>>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>>
>>>>>>> 1.2) Command
>>>>>>>
>>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>>> --service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p 
>>>>>>> ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" 
>>>>>>> --output=nt | split -l 500000 - --additional-suffix=.nt 
>>>>>>> BnfTextTitres-
>>>>>>>
>>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>>>>>>> ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>>
>>>>>>> => Unknown argument: graph
>>>>>>>
>>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>>>>>>> output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>>> --output=text | split -l 500000 - --additional-suffix=.nt 
>>>>>>> BnfTextTitres-
>>>>>>>
>>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid 
>>>>>>> option: --output=text (OptionParser::InvalidOption)
>>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Thanks for your quick reply!

 > $mtAppJSON isn't used.

I think my previous msg wasn't clear : I meant raw json and not json-ld 
(my code works now for both, and I use  $mtAppJSON ; but I had to 
replace 'application/json' with 'application/rdf+json' in order to get 
json instead of XML ; see the file here 
https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download)

 > The settings are: ...

I made a little test : comment these lines and the "names" part, and 
you'll get XML!


Le 31/01/2019 à 12:48, Andy Seaborne a écrit :
>
>
> On 31/01/2019 11:26, vincent ventresque wrote:
>> Hello,
>>
>> I found the origin of the problem for json : the $mtAppJSON had the 
>> value
>>
>> 'application/json'
>
> $mtAppJSON isn't used.
>
> "application/rdf+json"
> isn't JSON-LD (it's the old Talis format).
>
> There is:
>
> $mtJSONLD           = 'application/ld+json'
>
>>
>> it has to be replaced with
>>
>> 'application/rdf+json'
>>
>> I've updated the file here :
>>
>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>
>>
>> Maybe I'm going to submit a pull request as Andy suggested, but I'd 
>> like to understand why 'application/json' returns xml. Besides, it's 
>> the same thing for nquads : I tried to replace
>>
>> $mtNQuads = 'application/n-quads'
>>
>> with
>>
>> $mtNQuads = 'application/x-trig'
>>
>> but still have xml...
>
> The settings are:
>
> # Default for GET
> # At least allow anything (and hope!)
> $accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , 
> #{$mtJSONLD};q=0.5"
> # Datasets
> $accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
> # For SPARQL query
> $accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , 
> #{$accept_rdf}"
>
> # Accept any in case of trouble.
> $accept_rdf="#{$accept_rdf} , */*;q=0.1"
> $accept_results="#{$accept_results} , */*;q=0.1"
>
>
>>
>> Is there a kind of default setting somewhere (if content-type isn't 
>> recognized in Fuseki, the response is xml) ?
>
> Yes.
>
> RDF/XML for graphs, N-Quads for datasets.
>
> Run Fuseki/full with "-v" and it should print the content negotiation 
> details.
>
>     Andy
>
>>
>> Thanks in advance
>>
>> VV
>>
>>
>> Ok, maybe I'm going to submit a pull request, but I'd
>>
>> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>>> Hi Andy,
>>>
>>> Thanks again for your idea to modify the s-get script, it helped me 
>>> understand ruby utilities and http requests (I often use the ruby 
>>> scripts but never really looked inside).
>>>
>>> Don't know how to submit a pull request, and I'm not a ruby expert! 
>>> Therefore I've put a small test file here :
>>>
>>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>>
>>>
>>> -- added "--output" in options + created a new function 
>>> (set_output_format)
>>>
>>> -- it works for ntriples, xml, Json-LD,
>>>
>>> -- doesn't work for json (returns xml...)
>>>
>>> N.B. : in this test file, I've removed large parts of the original 
>>> code in order to improve readability
>>>
>>>
>>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>>> Hi Andy,
>>>>
>>>> Many thanks for these ideas, I'm going to try the curl & riot 
>>>> solutions.
>>>>
>>>> > Modify the s-get script to handle --output and set the "Accept:" 
>>>> header then please submit a pull request for the changes
>>>>
>>>> I had made an attempt to modify the s-get script in the same way as 
>>>> for s-query but it didn't work : if I have a moment I'll try to 
>>>> understand how the options are handled.
>>>>
>>>>
>>>>
>>>>
>>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>>
>>>>>
>>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I want to export a named graph which is stored in a TDB dataset, 
>>>>>> and I want to store the output in several files (for the named 
>>>>>> graph contains +/- 9.5 M triples).
>>>>>>
>>>>>> My idea is to use "split" command in order to cut the output of 
>>>>>> the export into pieces. However, this solution with "split" 
>>>>>> requires ntriples or nquads (one triple per line, so that the 
>>>>>> files are not cut in the middle of an assertion ; besides, it's 
>>>>>> also more practical to have a triple per line if I want to 
>>>>>> transform the data with perl or sed).
>>>>>>
>>>>>> I found a solution with s-query but had to edit the ruby s-query 
>>>>>> script to get ntriples (see below).
>>>>>>
>>>>>> There are other possible solutions for an export via command-line 
>>>>>> utilities : "s-get" and "tdbdump". If I understand well, 
>>>>>> "tdbdump" gives nquads as output, but one can't export only a 
>>>>>> part of the data, everything is exported at once. The "s-get" 
>>>>>> solution allows to select a named graph in the dataset, but I 
>>>>>> couldn't change the output format.
>>>>>>
>>>>>> Are there better solutions to get an export in several files?
>>>>>
>>>>> Ways I can think of:
>>>>>
>>>>> 1/ Modify the s-get script to handle --output and set the 
>>>>> "Accept:" header then please submit a pull request for the changes.
>>>>>
>>>>> 2/ Use curl
>>>>>
>>>>> curl --header 'Accept: application/n-triples' \
>>>>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>>
>>>>> 3/ Parse the s-get output:
>>>>>
>>>>> s-get ... | riot --syntax TTL
>>>>>
>>>>>     Andy
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> VV.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>>
>>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>>> -- l. 515 : opts.on('--output=TYPE', 
>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>> -- l. 519 : opts.on('--accept=TYPE', 
>>>>>> [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>>
>>>>>> 1.2) Command
>>>>>>
>>>>>> /my/path/to/fuseki/bin/s-query 
>>>>>> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p 
>>>>>> ?o } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt 
>>>>>> | split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>>
>>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>>>>>> ~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> /my/path/to/jena/bin/tdbdump 
>>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>>
>>>>>> => Unknown argument: graph
>>>>>>
>>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>>>>>> output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> /my/path/to/fuseki/bin/s-get 
>>>>>> http://localhost:3030/BnF_text_v2/data http://bnf_titres 
>>>>>> --output=text | split -l 500000 - --additional-suffix=.nt 
>>>>>> BnfTextTitres-
>>>>>>
>>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
>>>>>> --output=text (OptionParser::InvalidOption)
>>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>>

Re: wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by Andy Seaborne <an...@apache.org>.


On 31/01/2019 11:26, vincent ventresque wrote:
> Hello,
> 
> I found the origin of the problem for json : the $mtAppJSON had the value
> 
> 'application/json'

$mtAppJSON isn't used.

"application/rdf+json"
isn't JSON-LD (it's the old Talis format).

There is:

$mtJSONLD           = 'application/ld+json'

> 
> it has to be replaced with
> 
> 'application/rdf+json'
> 
> I've updated the file here :
> 
> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
> 
> 
> Maybe I'm going to submit a pull request as Andy suggested, but I'd like 
> to understand why 'application/json' returns xml. Besides, it's the same 
> thing for nquads : I tried to replace
> 
> $mtNQuads = 'application/n-quads'
> 
> with
> 
> $mtNQuads = 'application/x-trig'
> 
> but still have xml...

The settings are:

# Default for GET
# At least allow anything (and hope!)
$accept_rdf="#{$mtTurtle} , #{$mtNTriples};q=0.9 , #{$mtRDF};q=0.8 , 
#{$mtJSONLD};q=0.5"
# Datasets
$accept_ds="#{$mtTrig} , #{$mtNQuads};q=0.9 , #{$mtJSONLD};q=0.5"
# For SPARQL query
$accept_results="#{$mtSparqlResultsJ} , #{$mtSparqlResultsX};q=0.9 , 
#{$accept_rdf}"

# Accept any in case of trouble.
$accept_rdf="#{$accept_rdf} , */*;q=0.1"
$accept_results="#{$accept_results} , */*;q=0.1"


> 
> Is there a kind of default setting somewhere (if content-type isn't 
> recognized in Fuseki, the response is xml) ?

Yes.

RDF/XML for graphs, N-Quads for datasets.

Run Fuseki/full with "-v" and it should print the content negotiation 
details.

     Andy

> 
> Thanks in advance
> 
> VV
> 
> 
> Ok, maybe I'm going to submit a pull request, but I'd
> 
> Le 29/01/2019 à 17:11, vincent ventresque a écrit :
>> Hi Andy,
>>
>> Thanks again for your idea to modify the s-get script, it helped me 
>> understand ruby utilities and http requests (I often use the ruby 
>> scripts but never really looked inside).
>>
>> Don't know how to submit a pull request, and I'm not a ruby expert! 
>> Therefore I've put a small test file here :
>>
>> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>>
>>
>> -- added "--output" in options + created a new function 
>> (set_output_format)
>>
>> -- it works for ntriples, xml, Json-LD,
>>
>> -- doesn't work for json (returns xml...)
>>
>> N.B. : in this test file, I've removed large parts of the original 
>> code in order to improve readability
>>
>>
>> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>>> Hi Andy,
>>>
>>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>>>
>>> > Modify the s-get script to handle --output and set the "Accept:" 
>>> header then please submit a pull request for the changes
>>>
>>> I had made an attempt to modify the s-get script in the same way as 
>>> for s-query but it didn't work : if I have a moment I'll try to 
>>> understand how the options are handled.
>>>
>>>
>>>
>>>
>>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>>
>>>>
>>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>>> Hello,
>>>>>
>>>>> I want to export a named graph which is stored in a TDB dataset, 
>>>>> and I want to store the output in several files (for the named 
>>>>> graph contains +/- 9.5 M triples).
>>>>>
>>>>> My idea is to use "split" command in order to cut the output of the 
>>>>> export into pieces. However, this solution with "split" requires 
>>>>> ntriples or nquads (one triple per line, so that the files are not 
>>>>> cut in the middle of an assertion ; besides, it's also more 
>>>>> practical to have a triple per line if I want to transform the data 
>>>>> with perl or sed).
>>>>>
>>>>> I found a solution with s-query but had to edit the ruby s-query 
>>>>> script to get ntriples (see below).
>>>>>
>>>>> There are other possible solutions for an export via command-line 
>>>>> utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" 
>>>>> gives nquads as output, but one can't export only a part of the 
>>>>> data, everything is exported at once. The "s-get" solution allows 
>>>>> to select a named graph in the dataset, but I couldn't change the 
>>>>> output format.
>>>>>
>>>>> Are there better solutions to get an export in several files?
>>>>
>>>> Ways I can think of:
>>>>
>>>> 1/ Modify the s-get script to handle --output and set the "Accept:" 
>>>> header then please submit a pull request for the changes.
>>>>
>>>> 2/ Use curl
>>>>
>>>> curl --header 'Accept: application/n-triples' \
>>>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>>>
>>>> 3/ Parse the s-get output:
>>>>
>>>> s-get ... | riot --syntax TTL
>>>>
>>>>     Andy
>>>>
>>>>
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> VV.
>>>>>
>>>>>
>>>>>
>>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> 1.1) Edit s-query ruby script (add nt)
>>>>>
>>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>>
>>>>> 1.2) Command
>>>>>
>>>>> /my/path/to/fuseki/bin/s-query 
>>>>> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o 
>>>>> } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | 
>>>>> split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>>
>>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>>>>> ~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> /my/path/to/jena/bin/tdbdump 
>>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>
>>>>> => Unknown argument: graph
>>>>>
>>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>>>>> output) ~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
>>>>> http://bnf_titres --output=text | split -l 500000 - 
>>>>> --additional-suffix=.nt BnfTextTitres-
>>>>>
>>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
>>>>> --output=text (OptionParser::InvalidOption)
>>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>>

wrong content-types in s-get | Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Hello,

I found the origin of the problem for json : the $mtAppJSON had the value

'application/json'

it has to be replaced with

'application/rdf+json'

I've updated the file here :

https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 


Maybe I'm going to submit a pull request as Andy suggested, but I'd like 
to understand why 'application/json' returns xml. Besides, it's the same 
thing for nquads : I tried to replace

$mtNQuads = 'application/n-quads'

with

$mtNQuads = 'application/x-trig'

but still have xml...

Is there a kind of default setting somewhere (if content-type isn't 
recognized in Fuseki, the response is xml) ?

Thanks in advance

VV


Ok, maybe I'm going to submit a pull request, but I'd

Le 29/01/2019 à 17:11, vincent ventresque a écrit :
> Hi Andy,
>
> Thanks again for your idea to modify the s-get script, it helped me 
> understand ruby utilities and http requests (I often use the ruby 
> scripts but never really looked inside).
>
> Don't know how to submit a pull request, and I'm not a ruby expert! 
> Therefore I've put a small test file here :
>
> https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download 
>
>
> -- added "--output" in options + created a new function 
> (set_output_format)
>
> -- it works for ntriples, xml, Json-LD,
>
> -- doesn't work for json (returns xml...)
>
> N.B. : in this test file, I've removed large parts of the original 
> code in order to improve readability
>
>
> Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
>> Hi Andy,
>>
>> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>>
>> > Modify the s-get script to handle --output and set the "Accept:" 
>> header then please submit a pull request for the changes
>>
>> I had made an attempt to modify the s-get script in the same way as 
>> for s-query but it didn't work : if I have a moment I'll try to 
>> understand how the options are handled.
>>
>>
>>
>>
>> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>>
>>>
>>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>>> Hello,
>>>>
>>>> I want to export a named graph which is stored in a TDB dataset, 
>>>> and I want to store the output in several files (for the named 
>>>> graph contains +/- 9.5 M triples).
>>>>
>>>> My idea is to use "split" command in order to cut the output of the 
>>>> export into pieces. However, this solution with "split" requires 
>>>> ntriples or nquads (one triple per line, so that the files are not 
>>>> cut in the middle of an assertion ; besides, it's also more 
>>>> practical to have a triple per line if I want to transform the data 
>>>> with perl or sed).
>>>>
>>>> I found a solution with s-query but had to edit the ruby s-query 
>>>> script to get ntriples (see below).
>>>>
>>>> There are other possible solutions for an export via command-line 
>>>> utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" 
>>>> gives nquads as output, but one can't export only a part of the 
>>>> data, everything is exported at once. The "s-get" solution allows 
>>>> to select a named graph in the dataset, but I couldn't change the 
>>>> output format.
>>>>
>>>> Are there better solutions to get an export in several files?
>>>
>>> Ways I can think of:
>>>
>>> 1/ Modify the s-get script to handle --output and set the "Accept:" 
>>> header then please submit a pull request for the changes.
>>>
>>> 2/ Use curl
>>>
>>> curl --header 'Accept: application/n-triples' \
>>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>>
>>> 3/ Parse the s-get output:
>>>
>>> s-get ... | riot --syntax TTL
>>>
>>>     Andy
>>>
>>>
>>>>
>>>> Thanks in advance,
>>>>
>>>> VV.
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> 1.1) Edit s-query ruby script (add nt)
>>>>
>>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>>
>>>> 1.2) Command
>>>>
>>>> /my/path/to/fuseki/bin/s-query 
>>>> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o 
>>>> } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | 
>>>> split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>>
>>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>>>> ~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> /my/path/to/jena/bin/tdbdump 
>>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>>> --graph=http://bnf_titres | split -l 500000 - 
>>>> --additional-suffix=.nt BnfTextTitres-
>>>>
>>>> => Unknown argument: graph
>>>>
>>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>>>> output) ~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
>>>> http://bnf_titres --output=text | split -l 500000 - 
>>>> --additional-suffix=.nt BnfTextTitres-
>>>>
>>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
>>>> --output=text (OptionParser::InvalidOption)
>>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>>

Re: Export named graph from TDB to several ntriples files

Posted by vincent ventresque <vi...@ens-lyon.fr>.

Hi Andy,

Thanks again for your idea to modify the s-get script, it helped me 
understand ruby utilities and http requests (I often use the ruby 
scripts but never really looked inside).

Don't know how to submit a pull request, and I'm not a ruby expert! 
Therefore I've put a small test file here :

https://sourceforge.net/projects/ffl-misc/files/fuseki_scripts_custom-ruby/s-get/download

-- added "--output" in options + created a new function (set_output_format)

-- it works for ntriples, xml, Json-LD,

-- doesn't work for json (returns xml...)

N.B. : in this test file, I've removed large parts of the original code 
in order to improve readability


Le 28/01/2019 à 15:28, Vincent Ventresque a écrit :
> Hi Andy,
>
> Many thanks for these ideas, I'm going to try the curl & riot solutions.
>
> > Modify the s-get script to handle --output and set the "Accept:" 
> header then please submit a pull request for the changes
>
> I had made an attempt to modify the s-get script in the same way as 
> for s-query but it didn't work : if I have a moment I'll try to 
> understand how the options are handled.
>
>
>
>
> Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>>
>>
>> On 28/01/2019 11:04, Vincent Ventresque wrote:
>>> Hello,
>>>
>>> I want to export a named graph which is stored in a TDB dataset, and 
>>> I want to store the output in several files (for the named graph 
>>> contains +/- 9.5 M triples).
>>>
>>> My idea is to use "split" command in order to cut the output of the 
>>> export into pieces. However, this solution with "split" requires 
>>> ntriples or nquads (one triple per line, so that the files are not 
>>> cut in the middle of an assertion ; besides, it's also more 
>>> practical to have a triple per line if I want to transform the data 
>>> with perl or sed).
>>>
>>> I found a solution with s-query but had to edit the ruby s-query 
>>> script to get ntriples (see below).
>>>
>>> There are other possible solutions for an export via command-line 
>>> utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" 
>>> gives nquads as output, but one can't export only a part of the 
>>> data, everything is exported at once. The "s-get" solution allows to 
>>> select a named graph in the dataset, but I couldn't change the 
>>> output format.
>>>
>>> Are there better solutions to get an export in several files?
>>
>> Ways I can think of:
>>
>> 1/ Modify the s-get script to handle --output and set the "Accept:" 
>> header then please submit a pull request for the changes.
>>
>> 2/ Use curl
>>
>> curl --header 'Accept: application/n-triples' \
>>    'http://localhost:3030/ds?graph=http://bnf_titres'
>>
>> 3/ Parse the s-get output:
>>
>> s-get ... | riot --syntax TTL
>>
>>     Andy
>>
>>
>>>
>>> Thanks in advance,
>>>
>>> VV.
>>>
>>>
>>>
>>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> 1.1) Edit s-query ruby script (add nt)
>>>
>>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>>
>>> 1.2) Command
>>>
>>> /my/path/to/fuseki/bin/s-query 
>>> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o 
>>> } where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | 
>>> split -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>>
>>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> /my/path/to/jena/bin/tdbdump 
>>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>>> --graph=http://bnf_titres | split -l 500000 - 
>>> --additional-suffix=.nt BnfTextTitres-
>>>
>>> => Unknown argument: graph
>>>
>>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>>> output) ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
>>> http://bnf_titres --output=text | split -l 500000 - 
>>> --additional-suffix=.nt BnfTextTitres-
>>>
>>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
>>> --output=text (OptionParser::InvalidOption)
>>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>>

Re: Export named graph from TDB to several ntriples files

Posted by Vincent Ventresque <vi...@ens-lyon.fr>.

Hi Andy,

Many thanks for these ideas, I'm going to try the curl & riot solutions.

 > Modify the s-get script to handle --output and set the "Accept:" 
header then please submit a pull request for the changes

I had made an attempt to modify the s-get script in the same way as for 
s-query but it didn't work : if I have a moment I'll try to understand 
how the options are handled.


  

Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
>
>
> On 28/01/2019 11:04, Vincent Ventresque wrote:
>> Hello,
>>
>> I want to export a named graph which is stored in a TDB dataset, and 
>> I want to store the output in several files (for the named graph 
>> contains +/- 9.5 M triples).
>>
>> My idea is to use "split" command in order to cut the output of the 
>> export into pieces. However, this solution with "split" requires 
>> ntriples or nquads (one triple per line, so that the files are not 
>> cut in the middle of an assertion ; besides, it's also more practical 
>> to have a triple per line if I want to transform the data with perl 
>> or sed).
>>
>> I found a solution with s-query but had to edit the ruby s-query 
>> script to get ntriples (see below).
>>
>> There are other possible solutions for an export via command-line 
>> utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" 
>> gives nquads as output, but one can't export only a part of the data, 
>> everything is exported at once. The "s-get" solution allows to select 
>> a named graph in the dataset, but I couldn't change the output format.
>>
>> Are there better solutions to get an export in several files?
>
> Ways I can think of:
>
> 1/ Modify the s-get script to handle --output and set the "Accept:" 
> header then please submit a pull request for the changes.
>
> 2/ Use curl
>
> curl --header 'Accept: application/n-triples' \
>    'http://localhost:3030/ds?graph=http://bnf_titres'
>
> 3/ Parse the s-get output:
>
> s-get ... | riot --syntax TTL
>
>     Andy
>
>
>>
>> Thanks in advance,
>>
>> VV.
>>
>>
>>
>> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
>>
>> 1.1) Edit s-query ruby script (add nt)
>>
>> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
>> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
>> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
>>
>> 1.2) Command
>>
>> /my/path/to/fuseki/bin/s-query 
>> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o } 
>> where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split 
>> -l 500000 - --additional-suffix=.nt BnfTextTitres-
>>
>> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> /my/path/to/jena/bin/tdbdump 
>> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
>> --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt 
>> BnfTextTitres-
>>
>> => Unknown argument: graph
>>
>> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle 
>> output) ~~~~~~~~~~~~~~~~~~~~~
>>
>> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
>> http://bnf_titres --output=text | split -l 500000 - 
>> --additional-suffix=.nt BnfTextTitres-
>>
>> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
>> --output=text (OptionParser::InvalidOption)
>> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>>

Re: Export named graph from TDB to several ntriples files

Posted by Andy Seaborne <an...@apache.org>.


On 28/01/2019 11:04, Vincent Ventresque wrote:
> Hello,
> 
> I want to export a named graph which is stored in a TDB dataset, and I 
> want to store the output in several files (for the named graph contains 
> +/- 9.5 M triples).
> 
> My idea is to use "split" command in order to cut the output of the 
> export into pieces. However, this solution with "split" requires 
> ntriples or nquads (one triple per line, so that the files are not cut 
> in the middle of an assertion ; besides, it's also more practical to 
> have a triple per line if I want to transform the data with perl or sed).
> 
> I found a solution with s-query but had to edit the ruby s-query script 
> to get ntriples (see below).
> 
> There are other possible solutions for an export via command-line 
> utilities : "s-get" and "tdbdump". If I understand well, "tdbdump" gives 
> nquads as output, but one can't export only a part of the data, 
> everything is exported at once. The "s-get" solution allows to select a 
> named graph in the dataset, but I couldn't change the output format.
> 
> Are there better solutions to get an export in several files?

Ways I can think of:

1/ Modify the s-get script to handle --output and set the "Accept:" 
header then please submit a pull request for the changes.

2/ Use curl

curl --header 'Accept: application/n-triples' \
    'http://localhost:3030/ds?graph=http://bnf_titres'

3/ Parse the s-get output:

s-get ... | riot --syntax TTL

     Andy


> 
> Thanks in advance,
> 
> VV.
> 
> 
> 
> ~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
> 
> 1.1) Edit s-query ruby script (add nt)
> 
> -- l. 572 : when  "json","xml","text","csv","tsv","nt"
> -- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
> -- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
> -- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
> 
> 1.2) Command
> 
> /my/path/to/fuseki/bin/s-query 
> --service=http://localhost:3030/BnF_text_v2/  "construct { ?s ?p ?o } 
> where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split -l 
> 500000 - --additional-suffix=.nt BnfTextTitres-
> 
> ~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph) 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> /my/path/to/jena/bin/tdbdump 
> --loc=/my/path/to/fuseki/run/databases/BnF_text_v2 
> --graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt 
> BnfTextTitres-
> 
> => Unknown argument: graph
> 
> ~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle output) 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> /my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data 
> http://bnf_titres --output=text | split -l 500000 - 
> --additional-suffix=.nt BnfTextTitres-
> 
> => /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option: 
> --output=text (OptionParser::InvalidOption)
> from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'
>