You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Stefano Cossu <sc...@artic.edu> on 2017/12/24 22:10:04 UTC

Python bindings?

Hello,
I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a back 
end store.

Right now my application is very slow, I suspect due to the HTTP 
overhead: profiling shows a large chunk of time waiting for sockets.

Is there a reliable way to write Python code against the Fuseki Java 
API? I understand that Fuseki is written in Java and there are no native 
Python bindings. I have looked at options such as Jython, Jpype and 
PyJnius but I am wondering how reliable these options are. Any suggestions?

Thanks,
Stefano

Re: Python bindings?

Posted by Andy Seaborne <an...@apache.org>.

On 28/12/17 04:43, Stefano Cossu wrote:
> Hi Andy,
> 
> By doing a straight POST on Fuseki/TDB/Jetty, on a quad-core i7 laptop 
> with 12Gb RAM:
> 
> time curl -i --data-binary 'WITH <info:graph/__root__> DELETE {} 
> INSERT{<info:s#1> <info:p#1> <info:o#1> . } WHERE {}' 
> -H'Content-Type:application/sparql-update' 
> 'http://localhost:3030/lakesuperior-dev/update'
> HTTP/1.1 204 No Content
> Date: Thu, 28 Dec 2017 04:34:56 GMT
> Fuseki-Request-ID: 12
> 
> 
> real    0m0.144s
> user    0m0.010s
> sys    0m0.002s

Repeated calls will be faster because when curl is invoked like that, 
the connection is created each time. Max HTTP overhead.

That operation is the same as doing:

INSERT DATA {}

> 
> 
> Fuseki log:
> 
> [2017-12-27 22:34:56] Fuseki     INFO  [12] POST 
> http://localhost:3030/lakesuperior-dev/update
> [2017-12-27 22:34:56] Fuseki     INFO  [12] POST /lakesuperior-dev :: 
> 'update' :: [application/sparql-update] ?
> [2017-12-27 22:34:56] Fuseki     INFO  [12] 204 No Content (125 ms)
> 
> 
> There is some HTTP overhead indeed but as you suggest it seems to be 
> mostly Fuseki doing work. The time for sending the same request goes 
> from 120 to 189ms. Would you consider this normal and should I settle on 
> it?

When sending from Python?

Sorry - I have no experience using python with or without rdflib where 
this type of performance matters.

> 
> This is important for me because so far I have bundled more complex 
> requests in one SPARQL update or query request to avoid the HTTP tax, 
> but if that were less severe than having Fuseki parse one complex query 
> I could rethink my application code.

A sequence of SPARQL Update requests can be sent in one request by using 
";" between them.



> 
> Thanks,
> Stefano
> 
> 
> 
> On 12/26/2017 11:50 AM, Andy Seaborne wrote:
>>> I suspect due to the HTTP overhead: profiling shows a large chunk of 
>>> time waiting for sockets. 
>>
>> If it waiting, then either it is because Fuseki is doing work (see the 
>> log file which has entries at start and end of an operation), or the 
>> client is waiting (maybe connection management issues?).
>>
>> Fuseki does keep the connection open (connection caching). If log 
>> looks correct, how long is the client waiting?
>>
>>      Andy
>>
>> On 26/12/17 03:25, Stefano Cossu wrote:
>>> Dick,
>>> I am interested in hearing the reasons behind your developers 
>>> dropping RDFLib, which I find very convenient for de/serializing RDF 
>>> but I feel like it is somewhat brittle and quite obscure in the back 
>>> end connection part. I think that your approach to using straight 
>>> HTTP calls for that may be a better choice.
>>>
>>> Also, thanks for the tip on Thrift. I am not familiar with it but I 
>>> would be interested in knowing how your team is building Python 
>>> bindings for the Jena API if it is meant to become a public project 
>>> at some point.
>>>
>>> Best,
>>> Stefano
>>>
>>>
>>> On 12/24/2017 04:33 PM, dandh988 wrote:
>>>> We use Python against Jena/Fuseki/CustomHTTP and find direct SPARQL 
>>>> against the endpoint to be "fast". The Python Devs dropped using the 
>>>> RDFLib.
>>>> We also have a Thirft connection in development which is proving 
>>>> useful for low level Jena API access.
>>>>
>>>> Dick
>>>> -------- Original message --------From: Stefano Cossu 
>>>> <sc...@artic.edu> Date: 24/12/2017  22:10  (GMT+00:00) To: 
>>>> users@jena.apache.org Subject: Python bindings?
>>>> Hello,
>>>> I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a 
>>>> back
>>>> end store.
>>>>
>>>> Right now my application is very slow, I suspect due to the HTTP
>>>> overhead: profiling shows a large chunk of time waiting for sockets.
>>>>
>>>> Is there a reliable way to write Python code against the Fuseki Java
>>>> API? I understand that Fuseki is written in Java and there are no 
>>>> native
>>>> Python bindings. I have looked at options such as Jython, Jpype and
>>>> PyJnius but I am wondering how reliable these options are. Any 
>>>> suggestions?
>>>>
>>>> Thanks,
>>>> Stefano
>>>>
>>>
> 

Re: Python bindings?

Posted by Stefano Cossu <sc...@artic.edu>.
Hi Andy,

By doing a straight POST on Fuseki/TDB/Jetty, on a quad-core i7 laptop 
with 12Gb RAM:

time curl -i --data-binary 'WITH <info:graph/__root__> DELETE {} 
INSERT{<info:s#1> <info:p#1> <info:o#1> . } WHERE {}' 
-H'Content-Type:application/sparql-update' 
'http://localhost:3030/lakesuperior-dev/update'
HTTP/1.1 204 No Content
Date: Thu, 28 Dec 2017 04:34:56 GMT
Fuseki-Request-ID: 12


real	0m0.144s
user	0m0.010s
sys	0m0.002s


Fuseki log:

[2017-12-27 22:34:56] Fuseki     INFO  [12] POST 
http://localhost:3030/lakesuperior-dev/update
[2017-12-27 22:34:56] Fuseki     INFO  [12] POST /lakesuperior-dev :: 
'update' :: [application/sparql-update] ?
[2017-12-27 22:34:56] Fuseki     INFO  [12] 204 No Content (125 ms)


There is some HTTP overhead indeed but as you suggest it seems to be 
mostly Fuseki doing work. The time for sending the same request goes 
from 120 to 189ms. Would you consider this normal and should I settle on 
it?

This is important for me because so far I have bundled more complex 
requests in one SPARQL update or query request to avoid the HTTP tax, 
but if that were less severe than having Fuseki parse one complex query 
I could rethink my application code.

Thanks,
Stefano



On 12/26/2017 11:50 AM, Andy Seaborne wrote:
>> I suspect due to the HTTP overhead: profiling shows a large chunk of 
>> time waiting for sockets. 
> 
> If it waiting, then either it is because Fuseki is doing work (see the 
> log file which has entries at start and end of an operation), or the 
> client is waiting (maybe connection management issues?).
> 
> Fuseki does keep the connection open (connection caching). If log looks 
> correct, how long is the client waiting?
> 
>      Andy
> 
> On 26/12/17 03:25, Stefano Cossu wrote:
>> Dick,
>> I am interested in hearing the reasons behind your developers dropping 
>> RDFLib, which I find very convenient for de/serializing RDF but I feel 
>> like it is somewhat brittle and quite obscure in the back end 
>> connection part. I think that your approach to using straight HTTP 
>> calls for that may be a better choice.
>>
>> Also, thanks for the tip on Thrift. I am not familiar with it but I 
>> would be interested in knowing how your team is building Python 
>> bindings for the Jena API if it is meant to become a public project at 
>> some point.
>>
>> Best,
>> Stefano
>>
>>
>> On 12/24/2017 04:33 PM, dandh988 wrote:
>>> We use Python against Jena/Fuseki/CustomHTTP and find direct SPARQL 
>>> against the endpoint to be "fast". The Python Devs dropped using the 
>>> RDFLib.
>>> We also have a Thirft connection in development which is proving 
>>> useful for low level Jena API access.
>>>
>>> Dick
>>> -------- Original message --------From: Stefano Cossu 
>>> <sc...@artic.edu> Date: 24/12/2017  22:10  (GMT+00:00) To: 
>>> users@jena.apache.org Subject: Python bindings?
>>> Hello,
>>> I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a back
>>> end store.
>>>
>>> Right now my application is very slow, I suspect due to the HTTP
>>> overhead: profiling shows a large chunk of time waiting for sockets.
>>>
>>> Is there a reliable way to write Python code against the Fuseki Java
>>> API? I understand that Fuseki is written in Java and there are no native
>>> Python bindings. I have looked at options such as Jython, Jpype and
>>> PyJnius but I am wondering how reliable these options are. Any 
>>> suggestions?
>>>
>>> Thanks,
>>> Stefano
>>>
>>

-- 
Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026


Re: Python bindings?

Posted by Andy Seaborne <an...@apache.org>.
> I suspect due to the HTTP overhead: profiling shows a large chunk of time waiting for sockets. 

If it waiting, then either it is because Fuseki is doing work (see the 
log file which has entries at start and end of an operation), or the 
client is waiting (maybe connection management issues?).

Fuseki does keep the connection open (connection caching). If log looks 
correct, how long is the client waiting?

	Andy

On 26/12/17 03:25, Stefano Cossu wrote:
> Dick,
> I am interested in hearing the reasons behind your developers dropping 
> RDFLib, which I find very convenient for de/serializing RDF but I feel 
> like it is somewhat brittle and quite obscure in the back end connection 
> part. I think that your approach to using straight HTTP calls for that 
> may be a better choice.
> 
> Also, thanks for the tip on Thrift. I am not familiar with it but I 
> would be interested in knowing how your team is building Python bindings 
> for the Jena API if it is meant to become a public project at some point.
> 
> Best,
> Stefano
> 
> 
> On 12/24/2017 04:33 PM, dandh988 wrote:
>> We use Python against Jena/Fuseki/CustomHTTP and find direct SPARQL 
>> against the endpoint to be "fast". The Python Devs dropped using the 
>> RDFLib.
>> We also have a Thirft connection in development which is proving 
>> useful for low level Jena API access.
>>
>> Dick
>> -------- Original message --------From: Stefano Cossu 
>> <sc...@artic.edu> Date: 24/12/2017  22:10  (GMT+00:00) To: 
>> users@jena.apache.org Subject: Python bindings?
>> Hello,
>> I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a back
>> end store.
>>
>> Right now my application is very slow, I suspect due to the HTTP
>> overhead: profiling shows a large chunk of time waiting for sockets.
>>
>> Is there a reliable way to write Python code against the Fuseki Java
>> API? I understand that Fuseki is written in Java and there are no native
>> Python bindings. I have looked at options such as Jython, Jpype and
>> PyJnius but I am wondering how reliable these options are. Any 
>> suggestions?
>>
>> Thanks,
>> Stefano
>>
> 

Re: Python bindings?

Posted by Stefano Cossu <sc...@artic.edu>.
Dick,
I am interested in hearing the reasons behind your developers dropping 
RDFLib, which I find very convenient for de/serializing RDF but I feel 
like it is somewhat brittle and quite obscure in the back end connection 
part. I think that your approach to using straight HTTP calls for that 
may be a better choice.

Also, thanks for the tip on Thrift. I am not familiar with it but I 
would be interested in knowing how your team is building Python bindings 
for the Jena API if it is meant to become a public project at some point.

Best,
Stefano


On 12/24/2017 04:33 PM, dandh988 wrote:
> We use Python against Jena/Fuseki/CustomHTTP and find direct SPARQL against the endpoint to be "fast". The Python Devs dropped using the RDFLib.
> We also have a Thirft connection in development which is proving useful for low level Jena API access.
> 
> Dick
> -------- Original message --------From: Stefano Cossu <sc...@artic.edu> Date: 24/12/2017  22:10  (GMT+00:00) To: users@jena.apache.org Subject: Python bindings?
> Hello,
> I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a back
> end store.
> 
> Right now my application is very slow, I suspect due to the HTTP
> overhead: profiling shows a large chunk of time waiting for sockets.
> 
> Is there a reliable way to write Python code against the Fuseki Java
> API? I understand that Fuseki is written in Java and there are no native
> Python bindings. I have looked at options such as Jython, Jpype and
> PyJnius but I am wondering how reliable these options are. Any suggestions?
> 
> Thanks,
> Stefano
> 

-- 
Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026


Re: Python bindings?

Posted by dandh988 <da...@gmail.com>.
We use Python against Jena/Fuseki/CustomHTTP and find direct SPARQL against the endpoint to be "fast". The Python Devs dropped using the RDFLib.
We also have a Thirft connection in development which is proving useful for low level Jena API access.

Dick
-------- Original message --------From: Stefano Cossu <sc...@artic.edu> Date: 24/12/2017  22:10  (GMT+00:00) To: users@jena.apache.org Subject: Python bindings? 
Hello,
I am writing a LDP server using Python's RDFlib and Fuseki/TDB as a back 
end store.

Right now my application is very slow, I suspect due to the HTTP 
overhead: profiling shows a large chunk of time waiting for sockets.

Is there a reliable way to write Python code against the Fuseki Java 
API? I understand that Fuseki is written in Java and there are no native 
Python bindings. I have looked at options such as Jython, Jpype and 
PyJnius but I am wondering how reliable these options are. Any suggestions?

Thanks,
Stefano