You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Wolfgang Fahl <wf...@bitplan.com> on 2020/08/19 12:18:23 UTC
Fuseki Bad Requests response details
Dear Apache Jena Users,
you'll find this mail also as
https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
in the last few weeks i tried out some graph databases in the python
environment. Namely:
- weaviate see http://wiki.bitplan.com/index.php/Weaviate
- dgraph http://wiki.bitplan.com/index.php/Dgraph
- ruruki https://pypi.org/project/ruruki/
and created a test project documented at
http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open source at:
https://github.com/WolfgangFahl/DgraphAndWeaviateTest
After some ups and downs in the evaluation process i decided to try out
Apache Jena / Fuseki /SPARQL as an alternative and added:
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
to allow for a "round trip" operation between python list of dicts and
Jena/SPARQL based storage.
The approach performs very well for my usecase and after trying it out
for a while i get into more details that need to be addressed.
The stackoverflow question
https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
addresses the initial issues and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
issues 2-5 show some detail problems that were already fixed.
Now I am working with some 180000 records i'd like to import from 6
different data sources and each data source seems to have new exotic records
that make the approach fail.
E.g. one batch of records gives me the following log:
read 45601 events in 0.6 s
storing 45601 events to sparql
batch for 1 - 2000 of 45601 cr:Event in 0.6 s
-> 0.6 s
batch for 2001 - 4000 of 45601 cr:Event in 0.5 s
-> 1.1 s
batch for 4001 - 6000 of 45601 cr:Event in 0.5 s
-> 1.6 s
batch for 6001 - 8000 of 45601 cr:Event in 0.5 s
-> 2.1 s
batch for 8001 - 10000 of 45601 cr:Event in 0.5 s
-> 2.6 s
batch for 10001 - 12000 of 45601 cr:Event in 0.7 s
-> 3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
line 1073, in _query
response = urlopener(request)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 222, in urlopen
return opener.open(url, data, timeout)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 531, in open
response = meth(req, response)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 640, in http_response
response = self.parent.error(
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 569, in error
return self._call_chain(*args)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 502, in _call_chain
result = func(*args)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
request has been sent to the endpoint, probably the sparql query is bad
formed.
Response:
b'Error 400: Bad Request\n'
Now since I don't get any details on what the problem is i am working
with a binary search. With the error above i only know the problem
is with a record with a batchIndex between 12000 and 14000 so I am .
setting the limit to 14000 and batchSize to 100 to get closer.
batch for 13301 - 13400 of 14000 cr:Event in 0.0 s ->
4.3 s
is now the last successful batch. So i am using a binary search: 13450
fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok
So record 13422 is the culprit and I switch on debug mode to see the
INSERT Data created for the record:
cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
cr:Event__102140gtm20003 cr:Event_source "crossref".
cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local
fields".
cr:Event__102140gtm20003 cr:Event_startDate
"1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
cr:Event__102140gtm20003 cr:Event_year 1999.
cr:Event__102140gtm20003 cr:Event_month 9.
cr:Event__102140gtm20003 cr:Event_endDate
"1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
here. I will work around this issue. The real question is:
*How can i get the Fuseki API via SPARQLWrapper to properly report a
detailed error message e.g. with something like "error in line #
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
not a valid triple?**
*
Yours
Wolfgang
--
BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
Re: Fuseki Bad Requests response details
Posted by Andy Seaborne <an...@apache.org>.
JENA-1946 may improve the situation but the HTTP response is not the
ideal place to put user feedback.
The Fuseki log has details, and you can parse the query before sending it
This:
> 'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',
is wrong at \S
You can't just dump any old string into SPARQL or Turtle by puttign
'...' around it. It needs escaping on the python side.
https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/
shows the title is corrupted.
Andy
On 25/08/2020 14:24, Wolfgang Fahl wrote:
> Andy - thx for the response.
>
> I tried:
>
> curl -X POST -H "Content-Type:application/sparql-update" -d @error.data localhost:3030/cr/update
>
> Error 400: Bad Request
>
> and
>
> curl -X POST -H "Content-Type:application/sparql-update" -d @insert.data localhost:3030/cr/update
>
> cat insert.data
> PREFIX cr:<http://cr.bitplan.com/>
> INSERT DATA {
> cr:version cr:author "Wolfgang Fahl".
> }
>
> cat error.data
>
> PREFIX cr:<http://cr.bitplan.com/Event/0.1/>
>
> INSERT DATA {
>
> cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_title "“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018".
>
> cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_url"https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/".
>
> }
>
> and followed the hint of Standislav Kralin at
> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
> to add a new test
>
> def testSPARQLErrorMessage(self):
> '''
> test error handling
> seehttps://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
> '''
> listOfDicts=[{
> 'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',
> 'url': 'https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/'}]
> entityType="cr:Event"
> primaryKey='title'
> prefixes="PREFIX cr:<http://cr.bitplan.com/Event/0.1/>"
> jena=self.getJena(mode='update',typedLiterals=False,debug=True)
> errors=jena.insertListOfDicts(listOfDicts,entityType,primaryKey,prefixes)
> self.checkErrors(errors,1)
> error=errors[0]
> self.assertTrue("probably the sparql query is bad formed" in error)
>
> which gives:
>
> Response:
>
> b'Error 400: Bad Request\n'
>
> ERRORS:
>
> QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
>
> Response:
>
> b'Error 400: Bad Request\n' for record 0
>
> The response body of the 400 HttpError doesn't seem to have more data
> and i would not know how to get extra information via the curl request.
> The question is IMHO still unsolved and i am not sure whether
> SPARQLWrapper could do better or how...
> Cheers
>
> Wolfgang
>
> Am 19.08.20 um 16:15 schrieb Andy Seaborne:
>> """
>> How can i get the Fuseki API via SPARQLWrapper to properly report a
>> detailed error message e.g. with something like "error in line #
>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
>> not a valid triple?
>> """
>>
>> This is a Q about SPARQLWrapper, not Fuseki.
>>
>> Look in the response body because, for Fuseki, it has the details of
>> the error in plain text.
>>
>> You can also print the query out in Python and parse it with Jena
>> locally. Or send it with curl which prints the body.
>>
>>
>> Andy
>>
>> On 19/08/2020 13:18, Wolfgang Fahl wrote:
>>> Dear Apache Jena Users,
>>>
>>> you'll find this mail also as
>>> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>>>
>>> in the last few weeks i tried out some graph databases in the python
>>> environment. Namely:
>>>
>>> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
>>>
>>> - dgraph http://wiki.bitplan.com/index.php/Dgraph
>>>
>>> - ruruki https://pypi.org/project/ruruki/
>>>
>>> and created a test project documented at
>>> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open
>>> source at:
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
>>>
>>> After some ups and downs in the evaluation process i decided to try
>>> out Apache Jena / Fuseki /SPARQL as an alternative and added:
>>>
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
>>>
>>> and
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
>>>
>>>
>>> to allow for a "round trip" operation between python list of dicts
>>> and Jena/SPARQL based storage.
>>>
>>> The approach performs very well for my usecase and after trying it
>>> out for a while i get into more details that need to be addressed.
>>>
>>> The stackoverflow question
>>> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
>>> addresses the initial issues and
>>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
>>> issues 2-5 show some detail problems that were already fixed.
>>>
>>> Now I am working with some 180000 records i'd like to import from 6
>>> different data sources and each data source seems to have new exotic
>>> records
>>> that make the approach fail.
>>>
>>> E.g. one batch of records gives me the following log:
>>>
>>> read 45601 events in 0.6 s
>>> storing 45601 events to sparql
>>> batch for 1 - 2000 of 45601 cr:Event in 0.6 s
>>> -> 0.6 s
>>> batch for 2001 - 4000 of 45601 cr:Event in 0.5 s
>>> -> 1.1 s
>>> batch for 4001 - 6000 of 45601 cr:Event in 0.5 s
>>> -> 1.6 s
>>> batch for 6001 - 8000 of 45601 cr:Event in 0.5 s
>>> -> 2.1 s
>>> batch for 8001 - 10000 of 45601 cr:Event in 0.5 s
>>> -> 2.6 s
>>> batch for 10001 - 12000 of 45601 cr:Event in 0.7 s
>>> -> 3.2 s
>>> ======================================================================
>>> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
>>> test loading crossref data
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>> File
>>> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
>>> line 1073, in _query
>>> response = urlopener(request)
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 222, in urlopen
>>> return opener.open(url, data, timeout)
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 531, in open
>>> response = meth(req, response)
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 640, in http_response
>>> response = self.parent.error(
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 569, in error
>>> return self._call_chain(*args)
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 502, in _call_chain
>>> result = func(*args)
>>> File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>>> line 649, in http_error_default
>>> raise HTTPError(req.full_url, code, msg, hdrs, fp)
>>> urllib.error.HTTPError: HTTP Error 400: Bad Request
>>>
>>> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
>>> request has been sent to the endpoint, probably the sparql query is
>>> bad formed.
>>>
>>> Response:
>>> b'Error 400: Bad Request\n'
>>>
>>> Now since I don't get any details on what the problem is i am working
>>> with a binary search. With the error above i only know the problem
>>> is with a record with a batchIndex between 12000 and 14000 so I am .
>>> setting the limit to 14000 and batchSize to 100 to get closer.
>>>
>>> batch for 13301 - 13400 of 14000 cr:Event in 0.0 s
>>> -> 4.3 s
>>>
>>> is now the last successful batch. So i am using a binary search:
>>> 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok,
>>> 13421 ok
>>> So record 13422 is the culprit and I switch on debug mode to see the
>>> INSERT Data created for the record:
>>>
>>> cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
>>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
>>> cr:Event__102140gtm20003 cr:Event_source "crossref".
>>> cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
>>> cr:Event__102140gtm20003 cr:Event_title "Invitation to higher
>>> local fields".
>>> cr:Event__102140gtm20003 cr:Event_startDate
>>> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
>>> cr:Event__102140gtm20003 cr:Event_year 1999.
>>> cr:Event__102140gtm20003 cr:Event_month 9.
>>> cr:Event__102140gtm20003 cr:Event_endDate
>>> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
>>>
>>> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
>>> here. I will work around this issue. The real question is:
>>>
>>> *How can i get the Fuseki API via SPARQLWrapper to properly report a
>>> detailed error message e.g. with something like "error in line #
>>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
>>> not a valid triple?**
>>> *
>>>
>>>
>>> Yours
>>>
>>> Wolfgang
>>>
>>> --
>>>
>>> BITPlan - smart solutions
>>> Wolfgang Fahl
>>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>>> Web:http://www.bitplan.de
>>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548,
>>> Geschäftsführer: Wolfgang Fahl
>>>
>>
> --
>
> BITPlan - smart solutions
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web:http://www.bitplan.de
> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
>
Re: Fuseki Bad Requests response details
Posted by Wolfgang Fahl <wf...@bitplan.com>.
Andy - thx for the response.
I tried:
curl -X POST -H "Content-Type:application/sparql-update" -d @error.data localhost:3030/cr/update
Error 400: Bad Request
and
curl -X POST -H "Content-Type:application/sparql-update" -d @insert.data localhost:3030/cr/update
cat insert.data
PREFIX cr: <http://cr.bitplan.com/>
INSERT DATA {
cr:version cr:author "Wolfgang Fahl".
}
cat error.data
PREFIX cr: <http://cr.bitplan.com/Event/0.1/>
INSERT DATA {
cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_title "“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018".
cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_url "https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/".
}
and followed the hint of Standislav Kralin at
https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
to add a new test
def testSPARQLErrorMessage(self):
'''
test error handling
see https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
'''
listOfDicts=[{
'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018',
'url': 'https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/'}]
entityType="cr:Event"
primaryKey='title'
prefixes="PREFIX cr: <http://cr.bitplan.com/Event/0.1/>"
jena=self.getJena(mode='update',typedLiterals=False,debug=True)
errors=jena.insertListOfDicts(listOfDicts,entityType,primaryKey,prefixes)
self.checkErrors(errors,1)
error=errors[0]
self.assertTrue("probably the sparql query is bad formed" in error)
which gives:
Response:
b'Error 400: Bad Request\n'
ERRORS:
QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
Response:
b'Error 400: Bad Request\n' for record 0
The response body of the 400 HttpError doesn't seem to have more data
and i would not know how to get extra information via the curl request.
The question is IMHO still unsolved and i am not sure whether
SPARQLWrapper could do better or how...
Cheers
Wolfgang
Am 19.08.20 um 16:15 schrieb Andy Seaborne:
> """
> How can i get the Fuseki API via SPARQLWrapper to properly report a
> detailed error message e.g. with something like "error in line #
> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
> not a valid triple?
> """
>
> This is a Q about SPARQLWrapper, not Fuseki.
>
> Look in the response body because, for Fuseki, it has the details of
> the error in plain text.
>
> You can also print the query out in Python and parse it with Jena
> locally. Or send it with curl which prints the body.
>
>
> Andy
>
> On 19/08/2020 13:18, Wolfgang Fahl wrote:
>> Dear Apache Jena Users,
>>
>> you'll find this mail also as
>> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>>
>> in the last few weeks i tried out some graph databases in the python
>> environment. Namely:
>>
>> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
>>
>> - dgraph http://wiki.bitplan.com/index.php/Dgraph
>>
>> - ruruki https://pypi.org/project/ruruki/
>>
>> and created a test project documented at
>> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open
>> source at:
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
>>
>> After some ups and downs in the evaluation process i decided to try
>> out Apache Jena / Fuseki /SPARQL as an alternative and added:
>>
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
>>
>> and
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
>>
>>
>> to allow for a "round trip" operation between python list of dicts
>> and Jena/SPARQL based storage.
>>
>> The approach performs very well for my usecase and after trying it
>> out for a while i get into more details that need to be addressed.
>>
>> The stackoverflow question
>> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
>> addresses the initial issues and
>> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
>> issues 2-5 show some detail problems that were already fixed.
>>
>> Now I am working with some 180000 records i'd like to import from 6
>> different data sources and each data source seems to have new exotic
>> records
>> that make the approach fail.
>>
>> E.g. one batch of records gives me the following log:
>>
>> read 45601 events in 0.6 s
>> storing 45601 events to sparql
>> batch for 1 - 2000 of 45601 cr:Event in 0.6 s
>> -> 0.6 s
>> batch for 2001 - 4000 of 45601 cr:Event in 0.5 s
>> -> 1.1 s
>> batch for 4001 - 6000 of 45601 cr:Event in 0.5 s
>> -> 1.6 s
>> batch for 6001 - 8000 of 45601 cr:Event in 0.5 s
>> -> 2.1 s
>> batch for 8001 - 10000 of 45601 cr:Event in 0.5 s
>> -> 2.6 s
>> batch for 10001 - 12000 of 45601 cr:Event in 0.7 s
>> -> 3.2 s
>> ======================================================================
>> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
>> test loading crossref data
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File
>> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
>> line 1073, in _query
>> response = urlopener(request)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 222, in urlopen
>> return opener.open(url, data, timeout)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 531, in open
>> response = meth(req, response)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 640, in http_response
>> response = self.parent.error(
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 569, in error
>> return self._call_chain(*args)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 502, in _call_chain
>> result = func(*args)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
>> line 649, in http_error_default
>> raise HTTPError(req.full_url, code, msg, hdrs, fp)
>> urllib.error.HTTPError: HTTP Error 400: Bad Request
>>
>> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
>> request has been sent to the endpoint, probably the sparql query is
>> bad formed.
>>
>> Response:
>> b'Error 400: Bad Request\n'
>>
>> Now since I don't get any details on what the problem is i am working
>> with a binary search. With the error above i only know the problem
>> is with a record with a batchIndex between 12000 and 14000 so I am .
>> setting the limit to 14000 and batchSize to 100 to get closer.
>>
>> batch for 13301 - 13400 of 14000 cr:Event in 0.0 s
>> -> 4.3 s
>>
>> is now the last successful batch. So i am using a binary search:
>> 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok,
>> 13421 ok
>> So record 13422 is the culprit and I switch on debug mode to see the
>> INSERT Data created for the record:
>>
>> cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
>> cr:Event__102140gtm20003 cr:Event_source "crossref".
>> cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
>> cr:Event__102140gtm20003 cr:Event_title "Invitation to higher
>> local fields".
>> cr:Event__102140gtm20003 cr:Event_startDate
>> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
>> cr:Event__102140gtm20003 cr:Event_year 1999.
>> cr:Event__102140gtm20003 cr:Event_month 9.
>> cr:Event__102140gtm20003 cr:Event_endDate
>> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
>>
>> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
>> here. I will work around this issue. The real question is:
>>
>> *How can i get the Fuseki API via SPARQLWrapper to properly report a
>> detailed error message e.g. with something like "error in line #
>> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
>> not a valid triple?**
>> *
>>
>>
>> Yours
>>
>> Wolfgang
>>
>> --
>>
>> BITPlan - smart solutions
>> Wolfgang Fahl
>> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
>> Tel. +49 2154 811-480, Fax +49 2154 811-481
>> Web:http://www.bitplan.de
>> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548,
>> Geschäftsführer: Wolfgang Fahl
>>
>
--
BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
Re: Fuseki Bad Requests response details
Posted by Andy Seaborne <an...@apache.org>.
"""
How can i get the Fuseki API via SPARQLWrapper to properly report a
detailed error message e.g. with something like "error in line #
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
not a valid triple?
"""
This is a Q about SPARQLWrapper, not Fuseki.
Look in the response body because, for Fuseki, it has the details of the
error in plain text.
You can also print the query out in Python and parse it with Jena
locally. Or send it with curl which prints the body.
Andy
On 19/08/2020 13:18, Wolfgang Fahl wrote:
> Dear Apache Jena Users,
>
> you'll find this mail also as
> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
>
> in the last few weeks i tried out some graph databases in the python
> environment. Namely:
>
> - weaviate see http://wiki.bitplan.com/index.php/Weaviate
>
> - dgraph http://wiki.bitplan.com/index.php/Dgraph
>
> - ruruki https://pypi.org/project/ruruki/
>
> and created a test project documented at
> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open source at:
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest
>
> After some ups and downs in the evaluation process i decided to try out
> Apache Jena / Fuseki /SPARQL as an alternative and added:
>
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
> and
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
>
> to allow for a "round trip" operation between python list of dicts and
> Jena/SPARQL based storage.
>
> The approach performs very well for my usecase and after trying it out
> for a while i get into more details that need to be addressed.
>
> The stackoverflow question
> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
> addresses the initial issues and
> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
> issues 2-5 show some detail problems that were already fixed.
>
> Now I am working with some 180000 records i'd like to import from 6
> different data sources and each data source seems to have new exotic records
> that make the approach fail.
>
> E.g. one batch of records gives me the following log:
>
> read 45601 events in 0.6 s
> storing 45601 events to sparql
> batch for 1 - 2000 of 45601 cr:Event in 0.6 s
> -> 0.6 s
> batch for 2001 - 4000 of 45601 cr:Event in 0.5 s
> -> 1.1 s
> batch for 4001 - 6000 of 45601 cr:Event in 0.5 s
> -> 1.6 s
> batch for 6001 - 8000 of 45601 cr:Event in 0.5 s
> -> 2.1 s
> batch for 8001 - 10000 of 45601 cr:Event in 0.5 s
> -> 2.6 s
> batch for 10001 - 12000 of 45601 cr:Event in 0.7 s
> -> 3.2 s
> ======================================================================
> ERROR: testCrossref (tests.test_Crossref.TestCrossref)
> test loading crossref data
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File
> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
> line 1073, in _query
> response = urlopener(request)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 222, in urlopen
> return opener.open(url, data, timeout)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 531, in open
> response = meth(req, response)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 640, in http_response
> response = self.parent.error(
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 569, in error
> return self._call_chain(*args)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 502, in _call_chain
> result = func(*args)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
> line 649, in http_error_default
> raise HTTPError(req.full_url, code, msg, hdrs, fp)
> urllib.error.HTTPError: HTTP Error 400: Bad Request
>
> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
> request has been sent to the endpoint, probably the sparql query is bad
> formed.
>
> Response:
> b'Error 400: Bad Request\n'
>
> Now since I don't get any details on what the problem is i am working
> with a binary search. With the error above i only know the problem
> is with a record with a batchIndex between 12000 and 14000 so I am .
> setting the limit to 14000 and batchSize to 100 to get closer.
>
> batch for 13301 - 13400 of 14000 cr:Event in 0.0 s
> -> 4.3 s
>
> is now the last successful batch. So i am using a binary search: 13450
> fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok
> So record 13422 is the culprit and I switch on debug mode to see the
> INSERT Data created for the record:
>
> cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
> cr:Event__102140gtm20003 cr:Event_source "crossref".
> cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
> cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local
> fields".
> cr:Event__102140gtm20003 cr:Event_startDate
> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
> cr:Event__102140gtm20003 cr:Event_year 1999.
> cr:Event__102140gtm20003 cr:Event_month 9.
> cr:Event__102140gtm20003 cr:Event_endDate
> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
>
> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
> here. I will work around this issue. The real question is:
>
> *How can i get the Fuseki API via SPARQLWrapper to properly report a
> detailed error message e.g. with something like "error in line #
> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
> not a valid triple?**
> *
>
>
> Yours
>
> Wolfgang
>
> --
>
> BITPlan - smart solutions
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web:http://www.bitplan.de
> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
>