You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Wolfgang Fahl <wf...@bitplan.com> on 2020/08/19 12:18:23 UTC

Fuseki Bad Requests response details

Dear Apache Jena Users,

you'll find this mail also as
https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err

in the last few weeks i tried out some graph databases in the python
environment. Namely:

- weaviate see http://wiki.bitplan.com/index.php/Weaviate

- dgraph http://wiki.bitplan.com/index.php/Dgraph

- ruruki https://pypi.org/project/ruruki/

and created a test project documented at
http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open source at:
https://github.com/WolfgangFahl/DgraphAndWeaviateTest

After some ups and downs in the evaluation process i decided to try out
Apache Jena / Fuseki /SPARQL as an alternative and added:

https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py

to allow for a "round trip" operation between python list of dicts and
Jena/SPARQL based storage.

The approach performs very well for my usecase and after trying it out
for a while i get into more details that need to be addressed.

The stackoverflow question
https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
addresses the initial issues and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
issues 2-5 show some detail problems that were already fixed.

Now I am working with some 180000 records i'd like to import from 6
different data sources and each data source seems to have new exotic records
that make the approach fail.

E.g. one batch of records gives me the following log:

read 45601 events in   0.6 s
storing 45601 events to sparql
  batch for         1 -      2000 of     45601 cr:Event in    0.6 s
->    0.6 s
  batch for      2001 -      4000 of     45601 cr:Event in    0.5 s
->    1.1 s
  batch for      4001 -      6000 of     45601 cr:Event in    0.5 s
->    1.6 s
  batch for      6001 -      8000 of     45601 cr:Event in    0.5 s
->    2.1 s
  batch for      8001 -     10000 of     45601 cr:Event in    0.5 s
->    2.6 s
  batch for     10001 -     12000 of     45601 cr:Event in    0.7 s
->    3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
line 1073, in _query
    response = urlopener(request)
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 222, in urlopen
    return opener.open(url, data, timeout)
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 531, in open
    response = meth(req, response)
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 640, in http_response
    response = self.parent.error(
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 569, in error
    return self._call_chain(*args)
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 502, in _call_chain
    result = func(*args)
  File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
request has been sent to the endpoint, probably the sparql query is bad
formed.

Response:
b'Error 400: Bad Request\n'

Now since I don't get any details on what the problem is i am working
with a binary search. With the error above i only know the problem
is with a record with a batchIndex between 12000 and 14000 so I am .
setting the limit to 14000 and batchSize to 100 to get closer.

 batch for     13301 -     13400 of     14000 cr:Event in    0.0 s ->   
4.3 s

is now the last successful batch. So i am using a binary search: 13450
fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok
So record 13422 is the culprit and I switch on debug mode to see the
INSERT Data created for the record:

  cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
  cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
  cr:Event__102140gtm20003 cr:Event_source "crossref".
  cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
  cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local
fields".
  cr:Event__102140gtm20003 cr:Event_startDate
"1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
  cr:Event__102140gtm20003 cr:Event_year 1999.
  cr:Event__102140gtm20003 cr:Event_month 9.
  cr:Event__102140gtm20003 cr:Event_endDate
"1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.

So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
here. I will work around this issue. The real question is:

*How can i get the Fuseki API via SPARQLWrapper to properly report a
detailed error message e.g. with something like "error in line #
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is 
not a valid triple?**
*


Yours

   Wolfgang

-- 

BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl 


Re: Fuseki Bad Requests response details

Posted by Andy Seaborne <an...@apache.org>.
JENA-1946 may improve the situation but the HTTP response is not the 
ideal place to put user feedback.

The Fuseki log has details, and you can parse the query before sending it

This:

> 'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',

is wrong at \S

You can't just dump any old string into SPARQL or Turtle by puttign 
'...' around it. It needs escaping on the python side.

https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/

shows the title is corrupted.

     Andy


On 25/08/2020 14:24, Wolfgang Fahl wrote:
> Andy - thx for the response.
> 
> I tried:
> 
> curl -X POST -H "Content-Type:application/sparql-update" -d @error.data localhost:3030/cr/update
> 
> Error 400: Bad Request
> 
> and
> 
> curl -X POST -H "Content-Type:application/sparql-update" -d @insert.data localhost:3030/cr/update
> 
> cat insert.data
> PREFIX cr:<http://cr.bitplan.com/>
>          INSERT DATA {
>            cr:version cr:author "Wolfgang Fahl".
>          }
> 
> cat error.data
> 
> PREFIX cr:<http://cr.bitplan.com/Event/0.1/>
> 
> INSERT DATA {
> 
>    cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_title "“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018".
> 
>    cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_url"https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/".
> 
> }
> 
> and followed the hint of Standislav Kralin at 
> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err 
> to add a new test
> 
>   def testSPARQLErrorMessage(self):
>          '''
>          test error handling
>          seehttps://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>          '''
>          listOfDicts=[{
>              'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',
>              'url': 'https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/'}]
>          entityType="cr:Event"
>          primaryKey='title'
>          prefixes="PREFIX cr:<http://cr.bitplan.com/Event/0.1/>"
>          jena=self.getJena(mode='update',typedLiterals=False,debug=True)
>          errors=jena.insertListOfDicts(listOfDicts,entityType,primaryKey,prefixes)
>          self.checkErrors(errors,1)
>          error=errors[0]
>          self.assertTrue("probably the sparql query is bad formed" in error)
> 
> which gives:
> 
> Response:
> 
> b'Error 400: Bad Request\n'
> 
> ERRORS:
> 
> QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
> 
> Response:
> 
> b'Error 400: Bad Request\n' for record 0
> 
> The response body of the 400 HttpError doesn't seem to have more data 
> and i would not know how to get extra information via the curl request.
> The question is IMHO still unsolved and i am not sure whether 
> SPARQLWrapper could do better or how...
> Cheers
> 
>    Wolfgang
> 
> Am 19.08.20 um 16:15 schrieb Andy Seaborne:
>> """
>> How can i get the Fuseki API via SPARQLWrapper to properly report a 
>> detailed error message e.g. with something like "error in line # 
>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is 
>> not a valid triple?
>> """
>>
>> This is a Q about SPARQLWrapper, not Fuseki.
>>
>> Look in the response body because, for Fuseki, it has the details of 
>> the error in plain text.
>>
>> You can also print the query out in Python and parse it with Jena 
>> locally. Or send it with curl which prints the body.
>>
>>
>>     Andy
>>
>> On 19/08/2020 13:18, Wolfgang Fahl wrote:
>>> Dear Apache Jena Users,
>>>
>>> you'll find this mail also as 
>>> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>>>
>>> in the last few weeks i tried out some graph databases in the python 
>>> environment. Namely:
>>>
>>> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
>>>
>>> - dgraph http://wiki.bitplan.com/index.php/Dgraph
>>>
>>> - ruruki https://pypi.org/project/ruruki/
>>>
>>> and created a test project documented at 
>>> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open 
>>> source at:
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
>>>
>>> After some ups and downs in the evaluation process i decided to try 
>>> out Apache Jena / Fuseki /SPARQL as an alternative and added:
>>>
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py 
>>>
>>> and
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py 
>>>
>>>
>>> to allow for a "round trip" operation between python list of dicts 
>>> and Jena/SPARQL based storage.
>>>
>>> The approach performs very well for my usecase and after trying it 
>>> out for a while i get into more details that need to be addressed.
>>>
>>> The stackoverflow question 
>>> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396 
>>> addresses the initial issues and 
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed 
>>> issues 2-5 show some detail problems that were already fixed.
>>>
>>> Now I am working with some 180000 records i'd like to import from 6 
>>> different data sources and each data source seems to have new exotic 
>>> records
>>> that make the approach fail.
>>>
>>> E.g. one batch of records gives me the following log:
>>>
>>> read 45601 events in   0.6 s
>>> storing 45601 events to sparql
>>>    batch for         1 -      2000 of     45601 cr:Event in 0.6 s 
>>> ->    0.6 s
>>>    batch for      2001 -      4000 of     45601 cr:Event in 0.5 s 
>>> ->    1.1 s
>>>    batch for      4001 -      6000 of     45601 cr:Event in 0.5 s 
>>> ->    1.6 s
>>>    batch for      6001 -      8000 of     45601 cr:Event in 0.5 s 
>>> ->    2.1 s
>>>    batch for      8001 -     10000 of     45601 cr:Event in 0.5 s 
>>> ->    2.6 s
>>>    batch for     10001 -     12000 of     45601 cr:Event in 0.7 s 
>>> ->    3.2 s
>>> ======================================================================
>>> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
>>> test loading crossref data
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>    File 
>>> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py", 
>>> line 1073, in _query
>>>      response = urlopener(request)
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 222, in urlopen
>>>      return opener.open(url, data, timeout)
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 531, in open
>>>      response = meth(req, response)
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 640, in http_response
>>>      response = self.parent.error(
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 569, in error
>>>      return self._call_chain(*args)
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 502, in _call_chain
>>>      result = func(*args)
>>>    File 
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
>>> line 649, in http_error_default
>>>      raise HTTPError(req.full_url, code, msg, hdrs, fp)
>>> urllib.error.HTTPError: HTTP Error 400: Bad Request
>>>
>>> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad 
>>> request has been sent to the endpoint, probably the sparql query is 
>>> bad formed.
>>>
>>> Response:
>>> b'Error 400: Bad Request\n'
>>>
>>> Now since I don't get any details on what the problem is i am working 
>>> with a binary search. With the error above i only know the problem
>>> is with a record with a batchIndex between 12000 and 14000 so I am . 
>>> setting the limit to 14000 and batchSize to 100 to get closer.
>>>
>>>   batch for     13301 -     13400 of     14000 cr:Event in 0.0 s 
>>> ->    4.3 s
>>>
>>> is now the last successful batch. So i am using a binary search: 
>>> 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 
>>> 13421 ok
>>> So record 13422 is the culprit and I switch on debug mode to see the 
>>> INSERT Data created for the record:
>>>
>>>    cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
>>>    cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
>>>    cr:Event__102140gtm20003 cr:Event_source "crossref".
>>>    cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
>>>    cr:Event__102140gtm20003 cr:Event_title "Invitation to higher 
>>> local fields".
>>>    cr:Event__102140gtm20003 cr:Event_startDate 
>>> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
>>>    cr:Event__102140gtm20003 cr:Event_year 1999.
>>>    cr:Event__102140gtm20003 cr:Event_month 9.
>>>    cr:Event__102140gtm20003 cr:Event_endDate 
>>> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
>>>
>>> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit 
>>> here. I will work around this issue. The real question is:
>>>
>>> *How can i get the Fuseki API via SPARQLWrapper to properly report a 
>>> detailed error message e.g. with something like "error in line # 
>>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is  
>>> not a valid triple?**
>>> *
>>>
>>>
>>> Yours
>>>
>>>     Wolfgang
>>>
>>> -- 
>>>
>>> BITPlan - smart solutions
>>> Wolfgang Fahl
>>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>>> Web:http://www.bitplan.de
>>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, 
>>> Geschäftsführer: Wolfgang Fahl
>>>
>>
> -- 
> 
> BITPlan - smart solutions
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web:http://www.bitplan.de
> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
> 

Re: Fuseki Bad Requests response details

Posted by Wolfgang Fahl <wf...@bitplan.com>.
Andy - thx for the response.

I tried:

curl -X POST -H "Content-Type:application/sparql-update" -d @error.data localhost:3030/cr/update

Error 400: Bad Request

and

curl -X POST -H "Content-Type:application/sparql-update" -d @insert.data localhost:3030/cr/update

cat insert.data 
PREFIX cr: <http://cr.bitplan.com/>
        INSERT DATA { 
          cr:version cr:author "Wolfgang Fahl". 
        }

cat error.data

PREFIX cr: <http://cr.bitplan.com/Event/0.1/>

INSERT DATA {

  cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_title "“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018".

  cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_url "https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/".

}

and followed the hint of Standislav Kralin at
https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
to add a new test

 def testSPARQLErrorMessage(self):
        '''
        test error handling 
        see https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
        '''
        listOfDicts=[{
            'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',
            'url': 'https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/'}]
        entityType="cr:Event"   
        primaryKey='title'
        prefixes="PREFIX cr: <http://cr.bitplan.com/Event/0.1/>"
        jena=self.getJena(mode='update',typedLiterals=False,debug=True)
        errors=jena.insertListOfDicts(listOfDicts,entityType,primaryKey,prefixes)
        self.checkErrors(errors,1)
        error=errors[0]
        self.assertTrue("probably the sparql query is bad formed" in error)

which gives:

Response:

b'Error 400: Bad Request\n'

ERRORS:

QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed. 

Response:

b'Error 400: Bad Request\n' for record 0

The response body of the 400 HttpError doesn't seem to have more data
and i would not know how to get extra information via the curl request.
The question is IMHO still unsolved and i am not sure whether
SPARQLWrapper could do better or how...
Cheers

  Wolfgang

Am 19.08.20 um 16:15 schrieb Andy Seaborne:
> """
> How can i get the Fuseki API via SPARQLWrapper to properly report a
> detailed error message e.g. with something like "error in line #
> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
> not a valid triple?
> """
>
> This is a Q about SPARQLWrapper, not Fuseki.
>
> Look in the response body because, for Fuseki, it has the details of
> the error in plain text.
>
> You can also print the query out in Python and parse it with Jena
> locally. Or send it with curl which prints the body.
>
>
>     Andy
>
> On 19/08/2020 13:18, Wolfgang Fahl wrote:
>> Dear Apache Jena Users,
>>
>> you'll find this mail also as
>> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>>
>> in the last few weeks i tried out some graph databases in the python
>> environment. Namely:
>>
>> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
>>
>> - dgraph http://wiki.bitplan.com/index.php/Dgraph
>>
>> - ruruki https://pypi.org/project/ruruki/
>>
>> and created a test project documented at
>> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open
>> source at:
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
>>
>> After some ups and downs in the evaluation process i decided to try
>> out Apache Jena / Fuseki /SPARQL as an alternative and added:
>>
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
>>
>> and
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
>>
>>
>> to allow for a "round trip" operation between python list of dicts
>> and Jena/SPARQL based storage.
>>
>> The approach performs very well for my usecase and after trying it
>> out for a while i get into more details that need to be addressed.
>>
>> The stackoverflow question
>> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
>> addresses the initial issues and
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
>> issues 2-5 show some detail problems that were already fixed.
>>
>> Now I am working with some 180000 records i'd like to import from 6
>> different data sources and each data source seems to have new exotic
>> records
>> that make the approach fail.
>>
>> E.g. one batch of records gives me the following log:
>>
>> read 45601 events in   0.6 s
>> storing 45601 events to sparql
>>    batch for         1 -      2000 of     45601 cr:Event in    0.6 s
>> ->    0.6 s
>>    batch for      2001 -      4000 of     45601 cr:Event in    0.5 s
>> ->    1.1 s
>>    batch for      4001 -      6000 of     45601 cr:Event in    0.5 s
>> ->    1.6 s
>>    batch for      6001 -      8000 of     45601 cr:Event in    0.5 s
>> ->    2.1 s
>>    batch for      8001 -     10000 of     45601 cr:Event in    0.5 s
>> ->    2.6 s
>>    batch for     10001 -     12000 of     45601 cr:Event in    0.7 s
>> ->    3.2 s
>> ======================================================================
>> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
>> test loading crossref data
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File
>> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
>> line 1073, in _query
>>      response = urlopener(request)
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 222, in urlopen
>>      return opener.open(url, data, timeout)
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 531, in open
>>      response = meth(req, response)
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 640, in http_response
>>      response = self.parent.error(
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 569, in error
>>      return self._call_chain(*args)
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 502, in _call_chain
>>      result = func(*args)
>>    File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 649, in http_error_default
>>      raise HTTPError(req.full_url, code, msg, hdrs, fp)
>> urllib.error.HTTPError: HTTP Error 400: Bad Request
>>
>> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
>> request has been sent to the endpoint, probably the sparql query is
>> bad formed.
>>
>> Response:
>> b'Error 400: Bad Request\n'
>>
>> Now since I don't get any details on what the problem is i am working
>> with a binary search. With the error above i only know the problem
>> is with a record with a batchIndex between 12000 and 14000 so I am .
>> setting the limit to 14000 and batchSize to 100 to get closer.
>>
>>   batch for     13301 -     13400 of     14000 cr:Event in    0.0 s
>> ->    4.3 s
>>
>> is now the last successful batch. So i am using a binary search:
>> 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok,
>> 13421 ok
>> So record 13422 is the culprit and I switch on debug mode to see the
>> INSERT Data created for the record:
>>
>>    cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
>>    cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
>>    cr:Event__102140gtm20003 cr:Event_source "crossref".
>>    cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
>>    cr:Event__102140gtm20003 cr:Event_title "Invitation to higher
>> local fields".
>>    cr:Event__102140gtm20003 cr:Event_startDate
>> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
>>    cr:Event__102140gtm20003 cr:Event_year 1999.
>>    cr:Event__102140gtm20003 cr:Event_month 9.
>>    cr:Event__102140gtm20003 cr:Event_endDate
>> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
>>
>> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
>> here. I will work around this issue. The real question is:
>>
>> *How can i get the Fuseki API via SPARQLWrapper to properly report a
>> detailed error message e.g. with something like "error in line #
>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is 
>> not a valid triple?**
>> *
>>
>>
>> Yours
>>
>>     Wolfgang
>>
>> -- 
>>
>> BITPlan - smart solutions
>> Wolfgang Fahl
>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>> Web:http://www.bitplan.de
>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548,
>> Geschäftsführer: Wolfgang Fahl
>>
>
-- 

BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl 


Re: Fuseki Bad Requests response details

Posted by Andy Seaborne <an...@apache.org>.
"""
How can i get the Fuseki API via SPARQLWrapper to properly report a 
detailed error message e.g. with something like "error in line # 
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is 
not a valid triple?
"""

This is a Q about SPARQLWrapper, not Fuseki.

Look in the response body because, for Fuseki, it has the details of the 
error in plain text.

You can also print the query out in Python and parse it with Jena 
locally. Or send it with curl which prints the body.


     Andy

On 19/08/2020 13:18, Wolfgang Fahl wrote:
> Dear Apache Jena Users,
> 
> you'll find this mail also as 
> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
> 
> in the last few weeks i tried out some graph databases in the python 
> environment. Namely:
> 
> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
> 
> - dgraph http://wiki.bitplan.com/index.php/Dgraph
> 
> - ruruki https://pypi.org/project/ruruki/
> 
> and created a test project documented at 
> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open source at:
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
> 
> After some ups and downs in the evaluation process i decided to try out 
> Apache Jena / Fuseki /SPARQL as an alternative and added:
> 
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
> and
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
> 
> to allow for a "round trip" operation between python list of dicts and 
> Jena/SPARQL based storage.
> 
> The approach performs very well for my usecase and after trying it out 
> for a while i get into more details that need to be addressed.
> 
> The stackoverflow question 
> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396 
> addresses the initial issues and 
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed 
> issues 2-5 show some detail problems that were already fixed.
> 
> Now I am working with some 180000 records i'd like to import from 6 
> different data sources and each data source seems to have new exotic records
> that make the approach fail.
> 
> E.g. one batch of records gives me the following log:
> 
> read 45601 events in   0.6 s
> storing 45601 events to sparql
>    batch for         1 -      2000 of     45601 cr:Event in    0.6 s 
> ->    0.6 s
>    batch for      2001 -      4000 of     45601 cr:Event in    0.5 s 
> ->    1.1 s
>    batch for      4001 -      6000 of     45601 cr:Event in    0.5 s 
> ->    1.6 s
>    batch for      6001 -      8000 of     45601 cr:Event in    0.5 s 
> ->    2.1 s
>    batch for      8001 -     10000 of     45601 cr:Event in    0.5 s 
> ->    2.6 s
>    batch for     10001 -     12000 of     45601 cr:Event in    0.7 s 
> ->    3.2 s
> ======================================================================
> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
> test loading crossref data
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File 
> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py", 
> line 1073, in _query
>      response = urlopener(request)
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 222, in urlopen
>      return opener.open(url, data, timeout)
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 531, in open
>      response = meth(req, response)
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 640, in http_response
>      response = self.parent.error(
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 569, in error
>      return self._call_chain(*args)
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 502, in _call_chain
>      result = func(*args)
>    File 
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", 
> line 649, in http_error_default
>      raise HTTPError(req.full_url, code, msg, hdrs, fp)
> urllib.error.HTTPError: HTTP Error 400: Bad Request
> 
> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad 
> request has been sent to the endpoint, probably the sparql query is bad 
> formed.
> 
> Response:
> b'Error 400: Bad Request\n'
> 
> Now since I don't get any details on what the problem is i am working 
> with a binary search. With the error above i only know the problem
> is with a record with a batchIndex between 12000 and 14000 so I am . 
> setting the limit to 14000 and batchSize to 100 to get closer.
> 
>   batch for     13301 -     13400 of     14000 cr:Event in    0.0 s 
> ->    4.3 s
> 
> is now the last successful batch. So i am using a binary search: 13450 
> fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok
> So record 13422 is the culprit and I switch on debug mode to see the 
> INSERT Data created for the record:
> 
>    cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
>    cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
>    cr:Event__102140gtm20003 cr:Event_source "crossref".
>    cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
>    cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local 
> fields".
>    cr:Event__102140gtm20003 cr:Event_startDate 
> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
>    cr:Event__102140gtm20003 cr:Event_year 1999.
>    cr:Event__102140gtm20003 cr:Event_month 9.
>    cr:Event__102140gtm20003 cr:Event_endDate 
> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
> 
> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit 
> here. I will work around this issue. The real question is:
> 
> *How can i get the Fuseki API via SPARQLWrapper to properly report a 
> detailed error message e.g. with something like "error in line # 
> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is  
> not a valid triple?**
> *
> 
> 
> Yours
> 
>     Wolfgang
> 
> -- 
> 
> BITPlan - smart solutions
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web:http://www.bitplan.de
> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
>