You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Osma Suominen <os...@helsinki.fi> on 2017/11/16 15:22:47 UTC

bad Fuseki Content-Length value Re: TDB2 testing

Hi Andy!

Andy Seaborne kirjoitti 16.11.2017 klo 15:54:
> I am weeks behind on email.

No problem with that. I just noticed that other TDB2 issues and comments 
were discussed on JIRA and on users@ while this one wasn't. I figured it 
might have fallen through the cracks.

>  > 4. Should there be a JIRA issue about the bad Content-Length values 
> reported by Fuseki?
> 
> I don't see any connection to TDB2.
> 
> Please separate this out into another email - what is the problem and 
> does it apply to the current codebase?

Sorry I wasn't clear. This is something you mentioned yourself in 
another e-mail on 2017-10-06 about how to load large files into Fuseki 
with TDB2:

> This seems to work:
> 
> wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: application/n-triples' http://localhost:3030/data
> 
> 200M BSBM (49Gbytes) loaded at 42K triples/s.
> 
> The content length in the fuskei log is reported wrongly (1002691465 ... int/long error) but the triple count is right. 

The only connection to TDB2 is that with TDB1 transaction sizes were 
limited, so I guess that the overflow situation never happened. With 
TDB2 you can now push very large files into Fuseki (yay!), but this 
exposes the problem. It's a very minor issue at least to me. I'm more 
interested in the other questions - especially if it's possible to 
maintain a Fuseki endpoint with a TDB2 store, occasionally pushing new 
data but not filling the disk doing so.

-Osma

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: bad Fuseki Content-Length value Re: TDB2 testing

Posted by Andy Seaborne <an...@apache.org>.

On 16/11/17 15:22, Osma Suominen wrote:
...

>> Please separate this out into another email - what is the problem and 
>> does it apply to the current codebase?
> 
> Sorry I wasn't clear. This is something you mentioned yourself in 
> another e-mail on 2017-10-06 about how to load large files into Fuseki 
> with TDB2:
> 
>> This seems to work:
>>
>> wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 
>> 'Content-type: application/n-triples' http://localhost:3030/data
>>
>> 200M BSBM (49Gbytes) loaded at 42K triples/s.
>>
>> The content length in the fuskei log is reported wrongly (1002691465 
>> ... int/long error) but the triple count is right. 

Now fixed.

> 
> The only connection to TDB2 is that with TDB1 transaction sizes were 
> limited, so I guess that the overflow situation never happened. With 
> TDB2 you can now push very large files into Fuseki (yay!), but this 
> exposes the problem. It's a very minor issue at least to me. I'm more 
> interested in the other questions - especially if it's possible to 
> maintain a Fuseki endpoint with a TDB2 store, occasionally pushing new 
> data but not filling the disk doing so.

It was an int/long bug so it happens at 2G.

(there is an equivalent problem in Apache Common FileUpload - but in a 
place that Jena does not use fortunately when called from the UI. It's 
unfixable in FileUpload.  If yoy say "getString()" then the string is 
limited to 2G charcaters becasue Java string are char[]'s.)

A 2G file into TDB1 is a few G of RAM maximum, and isn't near the size 
limits for Fuseki. Fuseki uses TDB cautiously and further restricts the 
delayed work queue.

    Andy

> 
> -Osma
>