You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Dimov, Stefan" <st...@sap.com> on 2018/01/07 00:57:39 UTC

Fast way to load InMem DB

Hi all,

Is there a quick way to upload multiple trig/ttl files into in-mem DB?
(something like tdbloader2)

Regatds,
Stefan

Re: Fast way to load InMem DB

Posted by ajs6f <aj...@apache.org>.
Please show us your Fuseki config and how you are using s-post. When you write "the slower the upload becomes", what does that mean? How slow? Does it continue to get slower the more you upload? How much slower?

Heap size should not be very relevant for loading a streamable format, but it will matter to the actual in-memory database.


Adam Soroka

> On Jan 8, 2018, at 4:02 PM, Dimov, Stefan <st...@sap.com> wrote:
> 
> Alright, here’s some context:
> 
> I have a Jena/Fuseki with (an empty in the beginning) InMem DB and I have about 600 N-triples files. Every file contains 100,000 triples, so overall about 60MT.
> 
> I’m uploading those files with s-post. In the beginning, it takes about 6-7 secs per file to upload it, but the more data is already in the DB, the slower the upload becomes.
> 
> The question is: How can I speed up uploading? Would it help if I concatenate (some of) the files before uploading? What would be the proper heap size for 60MT?
> 
> Would any other strategy help?
> 
> 
> Regards,
> Stefan
> 
> 
> 
> On 1/7/18, 1:14 PM, "ajs6f" <aj...@apache.org> wrote:
> 
>    You can use SOH to upload files to particular graphs:
> 
>    https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http
> 
>    and like any *nix CLI tool, you can loop it or use xargs or some equivalent. Any action against Fuseki's APIs is going to traverse HTTP and some network and that is often far more important than anything else. Otherwise, I have no idea what you mean by "fast". You've given no context nor told us what you've already tried.
> 
>    ajs6f
> 
>> On Jan 7, 2018, at 4:02 PM, Dimov, Stefan <st...@sap.com> wrote:
>> 
>> Thanks Adam,
>> 
>> Yes, apparently I didn’t mean loading into an “off-line in-mem DB”, which doesn’t make sense as you explained (
>> 
>> That’s why I said: “something like” tdbloader2. Apparently it (if there’s such thing) will have to be with on-line DB (very likely with Fuseki).
>> 
>> The point is – I need it to be fast …
>> 
>> Regards,
>> Stefan  
>> 
>> On 1/7/18, 5:48 AM, "ajs6f" <aj...@apache.org> wrote:
>> 
>>   What would be left after a command-line utility ran? If it set up a in-mem dataset, then loaded into it, then finished, the in-mem dataset would go away. 
>> 
>>   Maybe you want to load into an in-memory dataset in Fuseki?
>> 
>>   Adam Soroka
>> 
>>> On Jan 6, 2018, at 7:57 PM, Dimov, Stefan <st...@sap.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> Is there a quick way to upload multiple trig/ttl files into in-mem DB?
>>> (something like tdbloader2)
>>> 
>>> Regatds,
>>> Stefan
>> 
>> 
>> 
> 
> 
> 


Re: Fast way to load InMem DB

Posted by "Dimov, Stefan" <st...@sap.com>.
Alright, here’s some context:

I have a Jena/Fuseki with (an empty in the beginning) InMem DB and I have about 600 N-triples files. Every file contains 100,000 triples, so overall about 60MT.

I’m uploading those files with s-post. In the beginning, it takes about 6-7 secs per file to upload it, but the more data is already in the DB, the slower the upload becomes.

The question is: How can I speed up uploading? Would it help if I concatenate (some of) the files before uploading? What would be the proper heap size for 60MT?

Would any other strategy help?


Regards,
Stefan



On 1/7/18, 1:14 PM, "ajs6f" <aj...@apache.org> wrote:

    You can use SOH to upload files to particular graphs:
    
    https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http
    
    and like any *nix CLI tool, you can loop it or use xargs or some equivalent. Any action against Fuseki's APIs is going to traverse HTTP and some network and that is often far more important than anything else. Otherwise, I have no idea what you mean by "fast". You've given no context nor told us what you've already tried.
    
    ajs6f
    
    > On Jan 7, 2018, at 4:02 PM, Dimov, Stefan <st...@sap.com> wrote:
    > 
    > Thanks Adam,
    > 
    > Yes, apparently I didn’t mean loading into an “off-line in-mem DB”, which doesn’t make sense as you explained (
    > 
    > That’s why I said: “something like” tdbloader2. Apparently it (if there’s such thing) will have to be with on-line DB (very likely with Fuseki).
    > 
    > The point is – I need it to be fast …
    > 
    > Regards,
    > Stefan  
    > 
    > On 1/7/18, 5:48 AM, "ajs6f" <aj...@apache.org> wrote:
    > 
    >    What would be left after a command-line utility ran? If it set up a in-mem dataset, then loaded into it, then finished, the in-mem dataset would go away. 
    > 
    >    Maybe you want to load into an in-memory dataset in Fuseki?
    > 
    >    Adam Soroka
    > 
    >> On Jan 6, 2018, at 7:57 PM, Dimov, Stefan <st...@sap.com> wrote:
    >> 
    >> Hi all,
    >> 
    >> Is there a quick way to upload multiple trig/ttl files into in-mem DB?
    >> (something like tdbloader2)
    >> 
    >> Regatds,
    >> Stefan
    > 
    > 
    > 
    
    


Re: Fast way to load InMem DB

Posted by ajs6f <aj...@apache.org>.
You can use SOH to upload files to particular graphs:

https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http

and like any *nix CLI tool, you can loop it or use xargs or some equivalent. Any action against Fuseki's APIs is going to traverse HTTP and some network and that is often far more important than anything else. Otherwise, I have no idea what you mean by "fast". You've given no context nor told us what you've already tried.

ajs6f

> On Jan 7, 2018, at 4:02 PM, Dimov, Stefan <st...@sap.com> wrote:
> 
> Thanks Adam,
> 
> Yes, apparently I didn’t mean loading into an “off-line in-mem DB”, which doesn’t make sense as you explained (
> 
> That’s why I said: “something like” tdbloader2. Apparently it (if there’s such thing) will have to be with on-line DB (very likely with Fuseki).
> 
> The point is – I need it to be fast …
> 
> Regards,
> Stefan  
> 
> On 1/7/18, 5:48 AM, "ajs6f" <aj...@apache.org> wrote:
> 
>    What would be left after a command-line utility ran? If it set up a in-mem dataset, then loaded into it, then finished, the in-mem dataset would go away. 
> 
>    Maybe you want to load into an in-memory dataset in Fuseki?
> 
>    Adam Soroka
> 
>> On Jan 6, 2018, at 7:57 PM, Dimov, Stefan <st...@sap.com> wrote:
>> 
>> Hi all,
>> 
>> Is there a quick way to upload multiple trig/ttl files into in-mem DB?
>> (something like tdbloader2)
>> 
>> Regatds,
>> Stefan
> 
> 
> 


Re: Fast way to load InMem DB

Posted by "Dimov, Stefan" <st...@sap.com>.
Thanks Adam,

Yes, apparently I didn’t mean loading into an “off-line in-mem DB”, which doesn’t make sense as you explained (

That’s why I said: “something like” tdbloader2. Apparently it (if there’s such thing) will have to be with on-line DB (very likely with Fuseki).

The point is – I need it to be fast …

Regards,
Stefan  

On 1/7/18, 5:48 AM, "ajs6f" <aj...@apache.org> wrote:

    What would be left after a command-line utility ran? If it set up a in-mem dataset, then loaded into it, then finished, the in-mem dataset would go away. 
    
    Maybe you want to load into an in-memory dataset in Fuseki?
    
    Adam Soroka
    
    > On Jan 6, 2018, at 7:57 PM, Dimov, Stefan <st...@sap.com> wrote:
    > 
    > Hi all,
    > 
    > Is there a quick way to upload multiple trig/ttl files into in-mem DB?
    > (something like tdbloader2)
    > 
    > Regatds,
    > Stefan
    
    


Re: Fast way to load InMem DB

Posted by ajs6f <aj...@apache.org>.
What would be left after a command-line utility ran? If it set up a in-mem dataset, then loaded into it, then finished, the in-mem dataset would go away. 

Maybe you want to load into an in-memory dataset in Fuseki?

Adam Soroka

> On Jan 6, 2018, at 7:57 PM, Dimov, Stefan <st...@sap.com> wrote:
> 
> Hi all,
> 
> Is there a quick way to upload multiple trig/ttl files into in-mem DB?
> (something like tdbloader2)
> 
> Regatds,
> Stefan