You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andrew U Frank <fr...@geoinfo.tuwien.ac.at> on 2017/03/26 10:52:46 UTC

fuseki silently ignores insert data requests with a BOM character

i use fuseki with the SPARQL update "INSERT DATA {...}" command, send as 
a HTTP POST to a fuseki server.
this works very well except when a triple contains in a literal a BOM 
(65279) character. Then the confirmation is still positiv (204) but the 
triples are NOT inserted.

the issue is not that the request with the BOM is ignored - this is 
probably a good thing, but that a 204 confirmation is produced; some 
information pointing to a syntax error in the SPARQL request or similar 
is necessary.

i cannot see if the request arrives at the fuseki server ok - is there a 
flag i can set when starting the fuseki server to show the request as it 
is received? i can only see that the server is receiving the POST.

here the protocol of the sender:

callHTTP5 :
     request POST http://xxxxt:3030/march25/update HTTP/1.1
Accept: */*
Content-Length: 586
Content-Type: application/sparql-update


     requestbody INSERT DATA { GRAPH <http://gerastree.at/g12> 
{<http://gerastree.at/waterhouse-kw#> 
<http://gerastree.at/lit_2014#titel> "\ufeff the BOM "@xx  .
....
} }
callHTTP5 result is is Right HTTP/1.1 204 No Content
Date: Sun, 26 Mar 2017 10:32:08 GMT
Fuseki-Request-ID: 39
Connection: close

the literal is  "\65279 the BOM "  - if i remove the BOM mark, the 
contents are stored, but the response from the server is exactly the same!

please produce an appropriate error message!

andrew

-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                  +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
thank you for the hints - i use haskell and assume that between the
strings which i see and what is sent 'on the wire' is converted. i am
not familiar with your comment about the difference between utf8
encoding and utf8 on the wire. in the material that you pointed to i do
not see such a conversion mentioned. can you give me another pointer?

i will read more about what haskell does in encoding utf8. what i
understand is that a umlaut (U+00E4) is encoded in three bytes...

i assume you will fix the differences in the decoders to assure that the
return code and the store action corresponds.

thank you for the help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/28/2017 10:57 PM, Andy Seaborne wrote:
>
>
> On 28/03/17 21:35, Andrew U Frank wrote:
>> the problem/bug is not related to the BOM character but seemingly to
>> many UTF-8.
>>
>> i get (consistently) a return code of 204 when the fuseki server is
>> running without -v and 500 when running with -v if any of the literatls
>> contains a "strange" (nonASCII?) UTF-8. the current problem is the
>> character � (code point 228 - character a with diaresis, german umlaut).
>> if i remove the character, the triples (all of the request) are stored,
>> if it is in the literat, none is stored.
>
> (can we stick to hex please?)
>
> 228 = U+00E4
>
> I suspect that codepoints are not being encoded into UTF-8 correctly.
> That is what the java-based decoder that you hit via "-v" is saying.
>
> For example, U+00E4 is 3 bytes : c3 a4 0a : in UTF-8 on the wire.
>
> What is definitely wrong is sending the codepoint as a byte directly :
> xE4 or two bytes 00 E4.
>
>>
>> i understand that a request encoded as application/sparql-update must be
>> coded as UTF8 which my literal is - or is there some special encoding
>> necessary for the german a umlaut? i do not think that the triples
>> should be encoded as latin1 or similar??
>
> Can you confirm that on the wire it is c3 a4 0a?
>
>>
>> i tried to POST with curl or wget, but did not succeed (i have not much
>> experience with these outside of simplest case).
>>
>> in any case, it is likely a bug when the response with or without -v in
>> the fuseki start makes a difference?
>
> Hitting different decoders.
>
> Strictly, it is an error and it should be 500.  javacc
> bytes-to-character seems to be too lax.
>
>>
>> thank you for the help!
>>
>> andrew
>>
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.

On 28/03/17 21:35, Andrew U Frank wrote:
> the problem/bug is not related to the BOM character but seemingly to
> many UTF-8.
>
> i get (consistently) a return code of 204 when the fuseki server is
> running without -v and 500 when running with -v if any of the literatls
> contains a "strange" (nonASCII?) UTF-8. the current problem is the
> character � (code point 228 - character a with diaresis, german umlaut).
> if i remove the character, the triples (all of the request) are stored,
> if it is in the literat, none is stored.

(can we stick to hex please?)

228 = U+00E4

I suspect that codepoints are not being encoded into UTF-8 correctly. 
That is what the java-based decoder that you hit via "-v" is saying.

For example, U+00E4 is 3 bytes : c3 a4 0a : in UTF-8 on the wire.

What is definitely wrong is sending the codepoint as a byte directly : 
xE4 or two bytes 00 E4.

>
> i understand that a request encoded as application/sparql-update must be
> coded as UTF8 which my literal is - or is there some special encoding
> necessary for the german a umlaut? i do not think that the triples
> should be encoded as latin1 or similar??

Can you confirm that on the wire it is c3 a4 0a?

>
> i tried to POST with curl or wget, but did not succeed (i have not much
> experience with these outside of simplest case).
>
> in any case, it is likely a bug when the response with or without -v in
> the fuseki start makes a difference?

Hitting different decoders.

Strictly, it is an error and it should be 500.  javacc 
bytes-to-character seems to be too lax.

>
> thank you for the help!
>
> andrew
>
>

Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
the problem/bug is not related to the BOM character but seemingly to
many UTF-8.

i get (consistently) a return code of 204 when the fuseki server is
running without -v and 500 when running with -v if any of the literatls
contains a "strange" (nonASCII?) UTF-8. the current problem is the
character � (code point 228 - character a with diaresis, german umlaut).
if i remove the character, the triples (all of the request) are stored,
if it is in the literat, none is stored.

i understand that a request encoded as application/sparql-update must be
coded as UTF8 which my literal is - or is there some special encoding
necessary for the german a umlaut? i do not think that the triples
should be encoded as latin1 or similar??

i tried to POST with curl or wget, but did not succeed (i have not much
experience with these outside of simplest case).

in any case, it is likely a bug when the response with or without -v in
the fuseki start makes a difference?

thank you for the help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 0000000 ef bb bf
> 0000003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 0000000 fe ff
> 0000002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
>     Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Connection:         
>> close
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Accept:             
>> */*
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Length:     
>> 1062
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki     WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>>     at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>>     at
>> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
>>
>>     at
>> org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:285)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:248)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>
>>     at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1156)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1088)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>
>>     at
>> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:374)
>>
>>     at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
>>
>>     at org.eclipse.jetty.server.Server.handle(Server.java:517)
>>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
>>     at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
>>
>>     at
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
>>
>>     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>>     at
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>>
>>     at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>>     at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>     at java.io.Reader.read(Reader.java:140)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:316)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:298)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:182)
>>
>>     ... 47 more
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] 500
>> java.nio.charset.MalformedInputException: Input length = 1 (4 ms)
>>
>> the version is recently downloaded (2.5.0 - is there a better one?).
>>
>> the transfer is using a http protocol (i think haskell ghc uses the
>> libcurl) and the request is now:
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2b>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "\ufeffwith bomUnsere Namen werden
>> lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 500
>> java.nio.charset.MalformedInputException: Input length = 1
>> Date: Tue, 28 Mar 2017 10:45:07 GMT
>> Fuseki-Request-ID: 49
>> Content-Type: text/plain;charset=utf-8
>> Cache-Control: must-revalidate,no-cache,no-store
>> Pragma: no-cache
>> Content-Length: 134
>> Connection: close
>>
>> which is a "not ok repsonse"  and coresponds to the fact that nothing is
>> stored .
>>
>> i thought this could be closed and assumed i had some other problem. but
>> then i restarted fuseki (exactly the same configuration as before
>> (--mem) but without the -v
>> and get a different response for the same request (the program producing
>> was not changed) - this time with a 204 answer (and no triples stored,
>> as for the 500 response), which is clearly not to be expected.
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2d>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "\ufeffwith bom(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "Unsere Namen werden lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 204 No Content
>> Date: Tue, 28 Mar 2017 10:58:33 GMT
>> Fuseki-Request-ID: 28
>> Connection: close
>>
>> i hope this is enough information that you can identify a fix to allow
>> the 500 response to pass through.
>>
>> to reproduce the problem it seems to be enough to have a BOM  "\65279"
>> character in a triple with a literal (perhaps at the front position, but
>> seemingly any triple in the request triggers the error response).
>>
>> thank you for your effort - i like fuseki a lot!
>>
>> andrew
>>
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
dear andy

thank you again for your lasting help. i changed some aspects of the
encoding and the sending of bytes and i had the impression that this
cleaned up my problem - unfortunately, i cannot test at the moment (for
some other reasons). if this is not enough, then i will use wireshark,
which i have never used but is probably a good thing to learn...

i can follow why you fuseki cannot produce better error messages, so
other tools must be learned (the problem is often, that the additional
tools contribute new ways of making errors...)

i hope i learned something with your help!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/29/2017 01:46 PM, Andy Seaborne wrote:
> Probably your code puts a x00 into the bytes.  x00 is illegal in
> unicode (but not java strings!).
>
> Fuseki is logging what it receives. To print it needs to be a string,
> not bytes, so it creates a string .. and goes bang.  All I can do is
> change the decoder setup to put a "illegal char" marker in the log. 
> As I said, the exact error point is not available to Java in teh
> check-fail case.
>
> URL-encoding is not related - this is an HTTP POST with the data in
> the HTTP body.
>
> Try a tool that allows you to look at the on-the-wire action (e.g.
> wireshark). Capturing inside Jetty-Fuseki has had too many places
> where the bytes have been touched. Capturing in the client or wire is
> reliable.
>
> Sorry - don't know Haskell network code.
>
>     Andy
>
> On 29/03/17 08:36, Andrew U Frank wrote:
>> thank you - i am aware of this, but still wonder where the encoding on
>> my end goes wrong. it would be very helpful if the fuseki server would
>> log the input it receives for errors in the 'insert data' case. it does
>> show my input (as it is received) if i url-encode it (which is an error
>> with message and produces a copy of what is received). it does not show
>> this in the case of incorrect utf8 characters but this would be very
>> helpful to under stand where in the stack the problem lies. i will
>> experiment more.
>>
>> do you have a suggestion for a simple "web sink  to log"? could ntop be
>> used to capture the request and identify what is sent my end? any
>> suggestions on details how to?
>>
>> than you a lot for your help!
>>
>> andrew
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.
Probably your code puts a x00 into the bytes.  x00 is illegal in unicode 
(but not java strings!).

Fuseki is logging what it receives. To print it needs to be a string, 
not bytes, so it creates a string .. and goes bang.  All I can do is 
change the decoder setup to put a "illegal char" marker in the log.  As 
I said, the exact error point is not available to Java in teh check-fail 
case.

URL-encoding is not related - this is an HTTP POST with the data in the 
HTTP body.

Try a tool that allows you to look at the on-the-wire action (e.g. 
wireshark). Capturing inside Jetty-Fuseki has had too many places where 
the bytes have been touched. Capturing in the client or wire is reliable.

Sorry - don't know Haskell network code.

     Andy

On 29/03/17 08:36, Andrew U Frank wrote:
> thank you - i am aware of this, but still wonder where the encoding on
> my end goes wrong. it would be very helpful if the fuseki server would
> log the input it receives for errors in the 'insert data' case. it does
> show my input (as it is received) if i url-encode it (which is an error
> with message and produces a copy of what is received). it does not show
> this in the case of incorrect utf8 characters but this would be very
> helpful to under stand where in the stack the problem lies. i will
> experiment more.
>
> do you have a suggestion for a simple "web sink  to log"? could ntop be
> used to capture the request and identify what is sent my end? any
> suggestions on details how to?
>
> than you a lot for your help!
>
> andrew
>

Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
thank you - i am aware of this, but still wonder where the encoding on
my end goes wrong. it would be very helpful if the fuseki server would
log the input it receives for errors in the 'insert data' case. it does
show my input (as it is received) if i url-encode it (which is an error
with message and produces a copy of what is received). it does not show
this in the case of incorrect utf8 characters but this would be very
helpful to under stand where in the stack the problem lies. i will
experiment more.

do you have a suggestion for a simple "web sink  to log"? could ntop be
used to capture the request and identify what is sent my end? any
suggestions on details how to?

than you a lot for your help!

andrew

-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/28/2017 11:48 PM, Andy Seaborne wrote:
>
>
> On 28/03/17 22:05, Andrew U Frank wrote:
>> i found that encoding the literals in the requests as latin1 i do not
>> see errors and the triples are stored.
>>
>> is this intended behaviour? for now, i have a work around.
>>
>> i look forward to your analysis of the code? when i look at the java
>> error message, i sense that there is a encoding selected? is it UTF8 or
>> latin1?
>>
>> thank you for maintaining fuseki!
>>
>> andrew
>>
>>
> For some (not all) iso-8859-1/latin1 characters, sending and reading
> back as if it were UTF-8 works.  Its the wrong character in the
> database ("\ufffd") but the reverse undoes the damage. This is not true of
> all of latin-1.
>
>     Andy


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.

On 28/03/17 22:05, Andrew U Frank wrote:
> i found that encoding the literals in the requests as latin1 i do not
> see errors and the triples are stored.
>
> is this intended behaviour? for now, i have a work around.
>
> i look forward to your analysis of the code? when i look at the java
> error message, i sense that there is a encoding selected? is it UTF8 or
> latin1?
>
> thank you for maintaining fuseki!
>
> andrew
>
>
For some (not all) iso-8859-1/latin1 characters, sending and reading 
back as if it were UTF-8 works.  Its the wrong character in the database 
("\ufffd") but the reverse undoes the damage. This is not true of all of latin-1.

     Andy

Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
i found that encoding the literals in the requests as latin1 i do not
see errors and the triples are stored.

is this intended behaviour? for now, i have a work around.

i look forward to your analysis of the code? when i look at the java
error message, i sense that there is a encoding selected? is it UTF8 or
latin1?

thank you for maintaining fuseki!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 0000000 ef bb bf
> 0000003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 0000000 fe ff
> 0000002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
>     Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Connection:         
>> close
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Accept:             
>> */*
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Length:     
>> 1062
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki     WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>>     at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>>     at
>> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
>>
>>     at
>> org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:285)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:248)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>
>>     at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1156)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1088)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>
>>     at
>> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:374)
>>
>>     at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
>>
>>     at org.eclipse.jetty.server.Server.handle(Server.java:517)
>>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
>>     at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
>>
>>     at
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
>>
>>     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>>     at
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>>
>>     at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>>     at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>     at java.io.Reader.read(Reader.java:140)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:316)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:298)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:182)
>>
>>     ... 47 more
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] 500
>> java.nio.charset.MalformedInputException: Input length = 1 (4 ms)
>>
>> the version is recently downloaded (2.5.0 - is there a better one?).
>>
>> the transfer is using a http protocol (i think haskell ghc uses the
>> libcurl) and the request is now:
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2b>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "\ufeffwith bomUnsere Namen werden
>> lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 500
>> java.nio.charset.MalformedInputException: Input length = 1
>> Date: Tue, 28 Mar 2017 10:45:07 GMT
>> Fuseki-Request-ID: 49
>> Content-Type: text/plain;charset=utf-8
>> Cache-Control: must-revalidate,no-cache,no-store
>> Pragma: no-cache
>> Content-Length: 134
>> Connection: close
>>
>> which is a "not ok repsonse"  and coresponds to the fact that nothing is
>> stored .
>>
>> i thought this could be closed and assumed i had some other problem. but
>> then i restarted fuseki (exactly the same configuration as before
>> (--mem) but without the -v
>> and get a different response for the same request (the program producing
>> was not changed) - this time with a 204 answer (and no triples stored,
>> as for the 500 response), which is clearly not to be expected.
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2d>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "\ufeffwith bom(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "Unsere Namen werden lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 204 No Content
>> Date: Tue, 28 Mar 2017 10:58:33 GMT
>> Fuseki-Request-ID: 28
>> Connection: close
>>
>> i hope this is enough information that you can identify a fix to allow
>> the 500 response to pass through.
>>
>> to reproduce the problem it seems to be enough to have a BOM  "\65279"
>> character in a triple with a literal (perhaps at the front position, but
>> seemingly any triple in the request triggers the error response).
>>
>> thank you for your effort - i like fuseki a lot!
>>
>> andrew
>>
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.
Recorded as JENA-1312, with a test case.

The difference between -v and no -v is that the verbose logging code 
uses java to convert bytes to chars, while the non-verbose path uses javacc.

     Andy

On 28/03/17 15:23, Andrew U Frank wrote:
> the server was started with
>
> exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server -v --update
> --loc=/home/frank/march19 /marchDB
>
> and then with
>
> exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server  --update
> --mem /memDB
>
> (first with -v and got the error message, then i removed the -v and got
> 204). the 204 return and no insertion happens in both memory cases.
>
>
> i do think it would be sufficient to inform the sender that the request
> is not ok and rejected (when the position can be indicated, the better).
>
> the "restart effect" is not produced by the restart, just the difference
> between starting fuseki server with -v or not. with -v, the error 500 is
> returned, without -v an ok return is returned.
>
> my problem is only that the error message 500 (which is internally
> produced) is not sent back when -v is not present. (i am, at least at
> the moment, not interested to send a BOM character, it is rather an
> annoying problem caused by a file i probably received from somewhere and
> which i now routinely filter out. nevertheless, thank you for the
> information you pointed me to. i understand now that sending a BOM
> character as is not a legal literal). however, i am ONLY concerned when
> i see a 200 return and the triples are not inserted.
>
> if you cannot reproduce the effect, i could try to see if i can produce
> it with using curl/wget - at the moment i tested with a program which
> inserts the triples.
>
> does your system insert triples where an (illegal) BOM character is in a
> literal unencode sent (and not running with the -v flag)?
>
> thank you for your effort and time!
>
> andrew
>
>
>

Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
the server was started with

exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server -v --update
--loc=/home/frank/march19 /marchDB

and then with

exec /home/frank/jena/apache-jena-fuseki-2.5.0/fuseki-server  --update
--mem /memDB

(first with -v and got the error message, then i removed the -v and got
204). the 204 return and no insertion happens in both memory cases.


i do think it would be sufficient to inform the sender that the request
is not ok and rejected (when the position can be indicated, the better).

the "restart effect" is not produced by the restart, just the difference
between starting fuseki server with -v or not. with -v, the error 500 is
returned, without -v an ok return is returned.

my problem is only that the error message 500 (which is internally
produced) is not sent back when -v is not present. (i am, at least at
the moment, not interested to send a BOM character, it is rather an
annoying problem caused by a file i probably received from somewhere and
which i now routinely filter out. nevertheless, thank you for the
information you pointed me to. i understand now that sending a BOM
character as is not a legal literal). however, i am ONLY concerned when
i see a 200 return and the triples are not inserted.

if you cannot reproduce the effect, i could try to see if i can produce
it with using curl/wget - at the moment i tested with a program which
inserts the triples.

does your system insert triples where an (illegal) BOM character is in a
literal unencode sent (and not running with the -v flag)?

thank you for your effort and time!

andrew



-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/28/2017 03:35 PM, Andy Seaborne wrote:
> What storage is the Fuseki server using?  I can't reproduce the
> restart effect.
>
> The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's
> bytes xEF xBB xBF.
>
> We are talking about what is on-the-wire which means UTF-8 encoded
> unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF
>
> http://unicode.org/faq/utf_bom.html#bom4
>
> The bytes xFE xFF are illegal as UTF-8 hence the message you see.
>
> $ echo -n $'\uFEFF' | od -t x1
> ==>
> 0000000 ef bb bf
> 0000003
>
> $ echo -n $'\xFE\xFF' | od -t x1
> ==>
> 0000000 fe ff
> 0000002
>
> The fact that the 500 does not say where the error in the input stream
> occurs is an unfortunate effect of efficient decoding by java and by
> javacc.  It processes large blocks of bytes and does not say where in
> the block the error occurred.  This is a nuisance.
>
> What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL
> Update.
>
>     Andy
>
>
>
> On 28/03/17 12:07, Andrew U Frank wrote:
>> thank you for your information. starting fuseki with -v gives indeed
>> more information. in this case i get
>>
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST
>> http://127.0.0.1:3030/memDB/update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Connection:         
>> close
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => User-Agent:
>> haskell-HTTP/4000.3.5
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Host:
>> 127.0.0.1:3030
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Accept:             
>> */*
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Length:     
>> 1062
>> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Type:
>> application/sparql-update
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST /memDB :: 'update' ::
>> [application/sparql-update] ?
>> [2017-03-28 12:45:07] Fuseki     WARN  [49] Runtime IO Exception (client
>> left?) RC = 500 : java.nio.charset.MalformedInputException: Input
>> length = 1
>> org.apache.jena.atlas.RuntimeIOException:
>> java.nio.charset.MalformedInputException: Input length = 1
>>     at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>>
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>>
>>     at
>> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>>     at
>> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
>>
>>     at
>> org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
>>
>>     at
>> org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
>>
>>     at
>> org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
>>
>>     at
>> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
>>
>>     at
>> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:285)
>>
>>     at
>> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:248)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>
>>     at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1156)
>>
>>     at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>>
>>     at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>
>>     at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1088)
>>
>>     at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>
>>     at
>> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:374)
>>
>>     at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
>>
>>     at org.eclipse.jetty.server.Server.handle(Server.java:517)
>>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
>>     at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
>>
>>     at
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
>>
>>     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>>     at
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
>>
>>     at
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>>
>>     at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>>
>>     at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>>     at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>>     at java.io.Reader.read(Reader.java:140)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:316)
>>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:298)
>>     at
>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:182)
>>
>>     ... 47 more
>> [2017-03-28 12:45:07] Fuseki     INFO  [49] 500
>> java.nio.charset.MalformedInputException: Input length = 1 (4 ms)
>>
>> the version is recently downloaded (2.5.0 - is there a better one?).
>>
>> the transfer is using a http protocol (i think haskell ghc uses the
>> libcurl) and the request is now:
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2b>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "\ufeffwith bomUnsere Namen werden
>> lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 500
>> java.nio.charset.MalformedInputException: Input length = 1
>> Date: Tue, 28 Mar 2017 10:45:07 GMT
>> Fuseki-Request-ID: 49
>> Content-Type: text/plain;charset=utf-8
>> Cache-Control: must-revalidate,no-cache,no-store
>> Pragma: no-cache
>> Content-Length: 134
>> Connection: close
>>
>> which is a "not ok repsonse"  and coresponds to the fact that nothing is
>> stored .
>>
>> i thought this could be closed and assumed i had some other problem. but
>> then i restarted fuseki (exactly the same configuration as before
>> (--mem) but without the -v
>> and get a different response for the same request (the program producing
>> was not changed) - this time with a 204 answer (and no triples stored,
>> as for the 500 response), which is clearly not to be expected.
>>
>> callHTTP5 :
>> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
>> Accept: */*
>> Content-Length: 1062
>> Content-Type: application/sparql-update
>>
>>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2d>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "\ufeffwith bom(Krieg und Welt)"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#hl1> "Unsere Namen werden lebendig"@de  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P000>  .
>> <http://gerastree.at/waterhouse-kw#P003>
>> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
>> eigen?\n"@de  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inBuch>
>> <http://gerastree.at/waterhouse-kw#P004>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#inPart>
>> <http://gerastree.at/lit_2014#P003>  .
>> <http://gerastree.at/waterhouse-kw#P004>
>> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
>> } }
>> callHTTP5 result is is Right HTTP/1.1 204 No Content
>> Date: Tue, 28 Mar 2017 10:58:33 GMT
>> Fuseki-Request-ID: 28
>> Connection: close
>>
>> i hope this is enough information that you can identify a fix to allow
>> the 500 response to pass through.
>>
>> to reproduce the problem it seems to be enough to have a BOM  "\65279"
>> character in a triple with a literal (perhaps at the front position, but
>> seemingly any triple in the request triggers the error response).
>>
>> thank you for your effort - i like fuseki a lot!
>>
>> andrew
>>
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.
What storage is the Fuseki server using?  I can't reproduce the restart 
effect.

The BOM is not 65257 (bytes xFE xFF) in a SPARQL Update request, it's 
bytes xEF xBB xBF.

We are talking about what is on-the-wire which means UTF-8 encoded 
unicode and codepoint 65257, U+FEFF is 3 bytes in UTF-8 xEF xBB xBF

http://unicode.org/faq/utf_bom.html#bom4

The bytes xFE xFF are illegal as UTF-8 hence the message you see.

$ echo -n $'\uFEFF' | od -t x1
==>
0000000 ef bb bf
0000003

$ echo -n $'\xFE\xFF' | od -t x1
==>
0000000 fe ff
0000002

The fact that the 500 does not say where the error in the input stream 
occurs is an unfortunate effect of efficient decoding by java and by 
javacc.  It processes large blocks of bytes and does not say where in 
the block the error occurred.  This is a nuisance.

What is legal is to put the unicode encoding  "\uFEFF" into the SPARQL 
Update.

     Andy



On 28/03/17 12:07, Andrew U Frank wrote:
> thank you for your information. starting fuseki with -v gives indeed
> more information. in this case i get
>
> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST
> http://127.0.0.1:3030/memDB/update
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Connection:          close
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => User-Agent:
> haskell-HTTP/4000.3.5
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Host:
> 127.0.0.1:3030
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Accept:              */*
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Length:      1062
> [2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Type:
> application/sparql-update
> [2017-03-28 12:45:07] Fuseki     INFO  [49] POST /memDB :: 'update' ::
> [application/sparql-update] ?
> [2017-03-28 12:45:07] Fuseki     WARN  [49] Runtime IO Exception (client
> left?) RC = 500 : java.nio.charset.MalformedInputException: Input length = 1
> org.apache.jena.atlas.RuntimeIOException:
> java.nio.charset.MalformedInputException: Input length = 1
>     at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
>     at
> org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
>     at
> org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
>     at
> org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
>     at
> org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
>     at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>     at
> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
>     at
> org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
>     at
> org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
>     at
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>     at
> org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
>     at
> org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
>     at
> org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
>     at
> org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
>     at
> org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
>     at
> org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
>     at
> org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
>     at
> org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
>     at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>     at
> org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:285)
>     at
> org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:248)
>     at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
>     at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>     at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1156)
>     at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>     at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1088)
>     at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:374)
>     at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
>     at org.eclipse.jetty.server.Server.handle(Server.java:517)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
>     at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
>     at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
>     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>     at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
>     at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
>     at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
>     at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>     at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>     at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>     at java.io.Reader.read(Reader.java:140)
>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:316)
>     at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:298)
>     at
> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:182)
>     ... 47 more
> [2017-03-28 12:45:07] Fuseki     INFO  [49] 500
> java.nio.charset.MalformedInputException: Input length = 1 (4 ms)
>
> the version is recently downloaded (2.5.0 - is there a better one?).
>
> the transfer is using a http protocol (i think haskell ghc uses the
> libcurl) and the request is now:
>
> callHTTP5 :
> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
> Accept: */*
> Content-Length: 1062
> Content-Type: application/sparql-update
>
>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2b>
> {<http://gerastree.at/waterhouse-kw#>
> <http://gerastree.at/lit_2014#titel> "(Krieg und Welt)"@de  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#hl1> "\ufeffwith bomUnsere Namen werden
> lebendig"@de  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#inBuch>
> <http://gerastree.at/waterhouse-kw#>  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P000>  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
> eigen?\n"@de  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#inBuch>
> <http://gerastree.at/waterhouse-kw#P004>  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P003>  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
> } }
> callHTTP5 result is is Right HTTP/1.1 500
> java.nio.charset.MalformedInputException: Input length = 1
> Date: Tue, 28 Mar 2017 10:45:07 GMT
> Fuseki-Request-ID: 49
> Content-Type: text/plain;charset=utf-8
> Cache-Control: must-revalidate,no-cache,no-store
> Pragma: no-cache
> Content-Length: 134
> Connection: close
>
> which is a "not ok repsonse"  and coresponds to the fact that nothing is
> stored .
>
> i thought this could be closed and assumed i had some other problem. but
> then i restarted fuseki (exactly the same configuration as before
> (--mem) but without the -v
> and get a different response for the same request (the program producing
> was not changed) - this time with a 204 answer (and no triples stored,
> as for the 500 response), which is clearly not to be expected.
>
> callHTTP5 :
> request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
> Accept: */*
> Content-Length: 1062
> Content-Type: application/sparql-update
>
>  requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2d>
> {<http://gerastree.at/waterhouse-kw#>
> <http://gerastree.at/lit_2014#titel> "\ufeffwith bom(Krieg und Welt)"@de  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#hl1> "Unsere Namen werden lebendig"@de  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#inBuch>
> <http://gerastree.at/waterhouse-kw#>  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P000>  .
> <http://gerastree.at/waterhouse-kw#P003>
> <http://gerastree.at/lit_2014#aufSeite> "L009"  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
> eigen?\n"@de  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#inBuch>
> <http://gerastree.at/waterhouse-kw#P004>  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P003>  .
> <http://gerastree.at/waterhouse-kw#P004>
> <http://gerastree.at/lit_2014#aufSeite> "L011"  .
> } }
> callHTTP5 result is is Right HTTP/1.1 204 No Content
> Date: Tue, 28 Mar 2017 10:58:33 GMT
> Fuseki-Request-ID: 28
> Connection: close
>
> i hope this is enough information that you can identify a fix to allow
> the 500 response to pass through.
>
> to reproduce the problem it seems to be enough to have a BOM  "\65279"
> character in a triple with a literal (perhaps at the front position, but
> seemingly any triple in the request triggers the error response).
>
> thank you for your effort - i like fuseki a lot!
>
> andrew
>
>

Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andrew U Frank <fr...@geoinfo.tuwien.ac.at>.
thank you for your information. starting fuseki with -v gives indeed
more information. in this case i get

[2017-03-28 12:45:07] Fuseki     INFO  [49] POST
http://127.0.0.1:3030/memDB/update
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => Connection:          close
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => User-Agent:         
haskell-HTTP/4000.3.5
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => Host:               
127.0.0.1:3030
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => Accept:              */*
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Length:      1062
[2017-03-28 12:45:07] Fuseki     INFO  [49]   => Content-Type:       
application/sparql-update
[2017-03-28 12:45:07] Fuseki     INFO  [49] POST /memDB :: 'update' ::
[application/sparql-update] ?
[2017-03-28 12:45:07] Fuseki     WARN  [49] Runtime IO Exception (client
left?) RC = 500 : java.nio.charset.MalformedInputException: Input length = 1
org.apache.jena.atlas.RuntimeIOException:
java.nio.charset.MalformedInputException: Input length = 1
    at org.apache.jena.atlas.io.IO.exception(IO.java:233)
    at
org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:183)
    at
org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.java:108)
    at
org.apache.jena.fuseki.servlets.ActionSPARQL.executeLifecycle(ActionSPARQL.java:134)
    at
org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeRequest(SPARQL_UberServlet.java:356)
    at
org.apache.jena.fuseki.servlets.SPARQL_UberServlet.serviceDispatch(SPARQL_UberServlet.java:317)
    at
org.apache.jena.fuseki.servlets.SPARQL_UberServlet.executeAction(SPARQL_UberServlet.java:272)
    at
org.apache.jena.fuseki.servlets.ActionSPARQL.execCommonWorker(ActionSPARQL.java:85)
    at
org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:81)
    at
org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73)
    at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
    at
org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
    at
org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
    at
org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
    at
org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
    at
org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
    at
org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
    at
org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
    at
org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
    at
org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
    at
org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
    at
org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
    at
org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
    at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
    at
org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:285)
    at
org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:248)
    at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
    at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1156)
    at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1088)
    at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:374)
    at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
    at org.eclipse.jetty.server.Server.handle(Server.java:517)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
    at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
    at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
    at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
    at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
    at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
    at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.Reader.read(Reader.java:140)
    at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:316)
    at org.apache.jena.atlas.io.IO.readWholeFileAsUTF8(IO.java:298)
    at
org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Update.java:182)
    ... 47 more
[2017-03-28 12:45:07] Fuseki     INFO  [49] 500
java.nio.charset.MalformedInputException: Input length = 1 (4 ms)

the version is recently downloaded (2.5.0 - is there a better one?).

the transfer is using a http protocol (i think haskell ghc uses the
libcurl) and the request is now:

callHTTP5 : 
request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
Accept: */*
Content-Length: 1062
Content-Type: application/sparql-update

 requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2b>
{<http://gerastree.at/waterhouse-kw#>
<http://gerastree.at/lit_2014#titel> "(Krieg und Welt)"@de  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#hl1> "\ufeffwith bomUnsere Namen werden
lebendig"@de  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#inBuch>
<http://gerastree.at/waterhouse-kw#>  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P000>  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#aufSeite> "L009"  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
eigen?\n"@de  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#inBuch>
<http://gerastree.at/waterhouse-kw#P004>  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P003>  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#aufSeite> "L011"  .
} }
callHTTP5 result is is Right HTTP/1.1 500
java.nio.charset.MalformedInputException: Input length = 1
Date: Tue, 28 Mar 2017 10:45:07 GMT
Fuseki-Request-ID: 49
Content-Type: text/plain;charset=utf-8
Cache-Control: must-revalidate,no-cache,no-store
Pragma: no-cache
Content-Length: 134
Connection: close

which is a "not ok repsonse"  and coresponds to the fact that nothing is
stored .

i thought this could be closed and assumed i had some other problem. but
then i restarted fuseki (exactly the same configuration as before
(--mem) but without the -v
and get a different response for the same request (the program producing
was not changed) - this time with a 204 answer (and no triples stored,
as for the 500 response), which is clearly not to be expected.

callHTTP5 : 
request POST http://127.0.0.1:3030/memDB/update HTTP/1.1
Accept: */*
Content-Length: 1062
Content-Type: application/sparql-update

 requestbody  INSERT DATA { GRAPH <http://gerastree.at/fn2d>
{<http://gerastree.at/waterhouse-kw#>
<http://gerastree.at/lit_2014#titel> "\ufeffwith bom(Krieg und Welt)"@de  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#hl1> "Unsere Namen werden lebendig"@de  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#inBuch>
<http://gerastree.at/waterhouse-kw#>  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P000>  .
<http://gerastree.at/waterhouse-kw#P003>
<http://gerastree.at/lit_2014#aufSeite> "L009"  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#paragraph> "Was ist ihm fremd und was sein
eigen?\n"@de  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#inBuch>
<http://gerastree.at/waterhouse-kw#P004>  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#inPart> <http://gerastree.at/lit_2014#P003>  .
<http://gerastree.at/waterhouse-kw#P004>
<http://gerastree.at/lit_2014#aufSeite> "L011"  .
} }
callHTTP5 result is is Right HTTP/1.1 204 No Content
Date: Tue, 28 Mar 2017 10:58:33 GMT
Fuseki-Request-ID: 28
Connection: close

i hope this is enough information that you can identify a fix to allow
the 500 response to pass through.

to reproduce the problem it seems to be enough to have a BOM  "\65279"  
character in a triple with a literal (perhaps at the front position, but
seemingly any triple in the request triggers the error response).

thank you for your effort - i like fuseki a lot!

andrew


-- 
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil 
 

On 03/27/2017 08:48 PM, Andy Seaborne wrote:
> andrew,
>
> Which version of Fuseki is this?
>
> You can launch with "-v" to get more runtime info.
>
> Also - how are you sending the request to Fuseki?
>
> If you are parsing the string and then sending the parsed form, the
> BOM it might be that the BOM is lost because of handling (by java) of
> BOM in the middle of text:
>
> http://unicode.org/faq/utf_bom.html#bom6
>
> What exactly ends up in the Fuseki server?
>
>     Andy
>
> On 26/03/17 11:52, Andrew U Frank wrote:
>> i use fuseki with the SPARQL update "INSERT DATA {...}" command, send as
>> a HTTP POST to a fuseki server.
>> this works very well except when a triple contains in a literal a BOM
>> (65279) character. Then the confirmation is still positiv (204) but the
>> triples are NOT inserted.
>>
>> the issue is not that the request with the BOM is ignored - this is
>> probably a good thing, but that a 204 confirmation is produced; some
>> information pointing to a syntax error in the SPARQL request or similar
>> is necessary.
>>
>> i cannot see if the request arrives at the fuseki server ok - is there a
>> flag i can set when starting the fuseki server to show the request as it
>> is received? i can only see that the server is receiving the POST.
>>
>> here the protocol of the sender:
>>
>> callHTTP5 :
>>     request POST http://xxxxt:3030/march25/update HTTP/1.1
>> Accept: */*
>> Content-Length: 586
>> Content-Type: application/sparql-update
>>
>>
>>     requestbody INSERT DATA { GRAPH <http://gerastree.at/g12>
>> {<http://gerastree.at/waterhouse-kw#>
>> <http://gerastree.at/lit_2014#titel> "\ufeff the BOM "@xx  .
>> ....
>> } }
>> callHTTP5 result is is Right HTTP/1.1 204 No Content
>> Date: Sun, 26 Mar 2017 10:32:08 GMT
>> Fuseki-Request-ID: 39
>> Connection: close
>>
>> the literal is  "\65279 the BOM "  - if i remove the BOM mark, the
>> contents are stored, but the response from the server is exactly the
>> same!
>>
>> please produce an appropriate error message!
>>
>> andrew
>>


Re: fuseki silently ignores insert data requests with a BOM character

Posted by Andy Seaborne <an...@apache.org>.
andrew,

Which version of Fuseki is this?

You can launch with "-v" to get more runtime info.

Also - how are you sending the request to Fuseki?

If you are parsing the string and then sending the parsed form, the BOM 
it might be that the BOM is lost because of handling (by java) of BOM in 
the middle of text:

http://unicode.org/faq/utf_bom.html#bom6

What exactly ends up in the Fuseki server?

     Andy

On 26/03/17 11:52, Andrew U Frank wrote:
> i use fuseki with the SPARQL update "INSERT DATA {...}" command, send as
> a HTTP POST to a fuseki server.
> this works very well except when a triple contains in a literal a BOM
> (65279) character. Then the confirmation is still positiv (204) but the
> triples are NOT inserted.
>
> the issue is not that the request with the BOM is ignored - this is
> probably a good thing, but that a 204 confirmation is produced; some
> information pointing to a syntax error in the SPARQL request or similar
> is necessary.
>
> i cannot see if the request arrives at the fuseki server ok - is there a
> flag i can set when starting the fuseki server to show the request as it
> is received? i can only see that the server is receiving the POST.
>
> here the protocol of the sender:
>
> callHTTP5 :
>     request POST http://xxxxt:3030/march25/update HTTP/1.1
> Accept: */*
> Content-Length: 586
> Content-Type: application/sparql-update
>
>
>     requestbody INSERT DATA { GRAPH <http://gerastree.at/g12>
> {<http://gerastree.at/waterhouse-kw#>
> <http://gerastree.at/lit_2014#titel> "\ufeff the BOM "@xx  .
> ....
> } }
> callHTTP5 result is is Right HTTP/1.1 204 No Content
> Date: Sun, 26 Mar 2017 10:32:08 GMT
> Fuseki-Request-ID: 39
> Connection: close
>
> the literal is  "\65279 the BOM "  - if i remove the BOM mark, the
> contents are stored, but the response from the server is exactly the same!
>
> please produce an appropriate error message!
>
> andrew
>