You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/04/15 06:34:28 UTC

[GitHub] [jena] LorenzBuehmann opened a new issue, #1259: regression when query is sent as POST

LorenzBuehmann opened a new issue, #1259:
URL: https://github.com/apache/jena/issues/1259

   Sending a query string longer then the `GET` request threshold, i.e. `POST` send mode is used, then the body content isn't marked as  UTF-8 encoding:
   
   ### Example query:
   ```sparql
   PREFIX wd: <http://www.wikidata.org/entity/>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX geo: <http://www.opengis.net/ont/geosparql#>
   PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX coy: <https://schema.coypu.org/#>
   PREFIX data:      <https://data.coypu.org/country/>
   
   PREFIX wikibase: <http://wikiba.se/ontology#>
   PREFIX bd: <http://www.bigdata.com/rdf#>
   PREFIX mwapi: <https://www.mediawiki.org/ontology#API/>
   PREFIX wdt: <http://www.wikidata.org/prop/direct/>
   
   SELECT * {
   
   
   BIND("Curaçao" AS ?str)
     SERVICE <https://query.wikidata.org/sparql> {
         SELECT ?item ?itemLabel ?typeLabel ?str {
         SERVICE wikibase:mwapi {
         bd:serviceParam wikibase:endpoint "www.wikidata.org";
           wikibase:api "EntitySearch";
           mwapi:search ?str ;
           mwapi:language "en";
           wikibase:limit 5 .
         ?item wikibase:apiOutputItem mwapi:item.
           ?num wikibase:apiOrdinal true.
         }
         ?item (wdt:P279|wdt:P31) ?type 
           FILTER(?type not in (wd:Q4167410, wd:Q13442814, wd:Q13433827))
            FILTER (EXISTS {?type wdt:P279* wd:Q618123} || EXISTS {?type wdt:P279* wd:Q1048835 })
         SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
         }
     }
   }
   ```
   ignore the meaning of the query, it just does an entity lookup in Wikidata via `SERVICE` clause. The important thing is the `BIND` with a string `Curaçao` having a non ASCII char.
   
   The result of this query is empty with (at least) Jena 4.4.0 and 4.5.0 SNAPSHOT  - it works with Jena 4.1.0 for example. It also works if we remove one of the `FILTER`s in the query which leads to a simple `GET` request.
   
   I remember that the HTTP API was switched to the Java 11 internal one, that might be the point where the behavior changed.
   
   -----
   Note, I know that according to the [Standard](https://www.w3.org/TR/sparql11-protocol/#query-via-post-direct) the body should always be treated as UTF-8, at least it's stated:
   
   > Note that UTF-8 is the only valid charset here. 
   
    so it looks more like a Blazegraph issue in the end.
   
   ----
   Nevertheless, the UTF-8 encoding was probably explicitly  stated in the old HTTP API implementation.
   
   I tried a quick fix in the method `QueryExecHTTP::executeQueryPostBody`
   ```java
   // Use SPARQL query body and MIME type.
       private HttpRequest.Builder executeQueryPostBody(Params thisParams, String acceptHeader) {
           // Use thisParams (for default-graph-uri etc)
           String requestURL = requestURL(service, thisParams.httpString());
           HttpRequest.Builder builder = HttpLib.requestBuilder(requestURL, httpHeaders, readTimeout, readTimeoutUnit);
           contentTypeHeader(builder, WebContent.contentTypeSPARQLQuery + "; charset=UTF-8"); // this line has been changed
           acceptHeader(builder, acceptHeader);
           return builder.POST(BodyPublishers.ofString(queryString));
       }
   ```
   
   This solved the issue. Clearly, I don't think if this intended, but I doubt it's harmful to mention the encoding.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1259:
URL: https://github.com/apache/jena/issues/1259#issuecomment-1105681624

   Separate issue and PR please!
   
   With error handling.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs closed issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
afs closed issue #1259: regression when query is sent as POST
URL: https://github.com/apache/jena/issues/1259


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1259:
URL: https://github.com/apache/jena/issues/1259#issuecomment-1105683615

   We might as well put hack the "charset=utf8".
   
   I noticed another problem - GET and POST+form are not encoding as % characters outside printable ASCII.
   Everything works, including Wikidata, but strictly it is wrong.
   
   A fix is quite easy - a PR in-progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] LorenzBuehmann commented on issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
LorenzBuehmann commented on issue #1259:
URL: https://github.com/apache/jena/issues/1259#issuecomment-1103643711

   Hi @afs 
   
   Yes, as I expected a limitation on the Wikidata backend or at least their server setup. I was just confused by the different behaviour of Jena 4.1.0 vs the latest versions, and then I remembered that you changed the used HTTP API. 
   
   See https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java#L921
   
   ```java
    static private String getQueryString(final HttpServletRequest req)
               throws IOException {
           if (RESTServlet.hasMimeType(req, MIME_SPARQL_QUERY)) {
               // return the body of the POST, see trac 711
               return readFully(req.getReader());
           }
           return req.getParameter(ATTR_QUERY) != null ? req
                   .getParameter(ATTR_QUERY) : (String) req
                   .getAttribute(ATTR_QUERY);
       }
   ```
   
   Unfortunately they rely on the old HTTP API and the `HttpServletRequest` sticks to `ISO-8859-` by default if in the HTTP request no encoding is specified - and you can't change the default encoding afaik. The only fix would be to set the encoding on the request object, i.e.  
   ```java
   req.setCharacterEncoding("UTF-8");
   ```
   
   So not sure how to continue, we'll raise an issue on Blazegraph, but I don't think that fix will even make it to Wikidata setup as they would have to rebuild and redeploy Blazegraph.
   
   
   Regarding POST Form, via `curl` it works:
   
   ```bash
   curl -X POST --data "query=SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql
   ```
   
   
   For Jena I guess we can close this issue here and at least have it for reference and documentation as a known limitation. Might affect other users as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1259:
URL: https://github.com/apache/jena/issues/1259#issuecomment-1100607544

   Hi @LorenzBuehmann ,
   
   I vaguely (it was a very long time ago!) recall this coming up before. A difference now is that the only site this affects is likely to be wikidata (and then, only for now).
   
   Here is a MVCE:
   ```java
       public static void main(String...args) {
           // U00E7
           String qs = "SELECT ?x { BIND('Curaçao' As ?x) }";
           String qsx = "SELECT ?x { BIND('Cura\\u00E7ao' As ?x) }";
   
           RowSet rowSet = QueryExecHTTP
                   .service("https://query.wikidata.org/sparql")
                   //.sendMode(QuerySendMode.asPostForm)
                   //.sendMode(QuerySendMode.asPost)
                   .sendMode(QuerySendMode.asGetAlways)
                   .queryString(qs)
                   .select();
           RowSetOps.out(rowSet);
       }
   ```
   After checking, the corruption is on the request receiving and `qsx` works in all three cases.
   
   The three different sendModes give three different results.
   
   * `asGetAlways` works
   * `asPost` is corrupted in a way that looks like UTF-8 read as ISO-8859-?
   * `asPostForm` is a different corruption, not sure what and that might be Jena.
   
   I don't know why ISO-8859 is being used if their servers are Linux (system default). It hints it is a choice in the Blazegraph code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] LorenzBuehmann commented on issue #1259: regression when query is sent as POST

Posted by GitBox <gi...@apache.org>.
LorenzBuehmann commented on issue #1259:
URL: https://github.com/apache/jena/issues/1259#issuecomment-1103846006

   @afs a follow up issue/question (we could also open another issue for better reference)
   
   Wikidata people argued to use POST form because it works ...
   
   We tried to set the `SERVICE` request mode via Fuseki assembler config:
   
   ```
   ja:context [ ja:cxtName "arq:httpServiceSendMode" ;  ja:cxtValue "asGetWithLimitForm" ] ;
   ```
   
   This indeed fails, as `Context::get` tries to return an object of the expected type in `Service::chooseQuerySendMode` method which in that case will be `QuerySendMode` and indeed casting a `String` to this type fails.
   
   A quick fix would workaround the limitation and handle at least the two different types of the context value, i.e. i) `String` coming from an assembler config or ii) a `QuerySendMode` coming from maybe some Java API setup :
   
   ```java
   private static QuerySendMode chooseQuerySendMode(String serviceURL, Context context, QuerySendMode dftValue) {
           if ( context == null )
               return dftValue;
           Object querySendMode = context.<Object>get(httpServiceSendMode, dftValue);
           if (querySendMode instanceof String) { // handle string type from assembler config
               return QuerySendMode.valueOf((String) querySendMode);
           } else if (querySendMode instanceof QuerySendMode) { // handle enum type from Java API
               return (QuerySendMode) querySendMode;
           }
           // handle null value and other non-supported types
           return context.get(httpServiceSendMode, dftValue);
       }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org