You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Rafiq <ak...@univ-lyon1.fr> on 2013/12/16 11:38:00 UTC

Looking for suggestions if you have

To Andy,

I  need suggestions on two issues:

(i) I launched TDB loader to upload 159.(something) million triples, it 
took 357636.22 seconds to complete (!!!).
     To me, TDB loader consumed an enormous amount of time because I 
have 1.3 billion triples to load. In addition to
     processing time, loading time is critical for me, I choose TDB 
because I was impressed by the benchmark you(Andy)
      published in (http://www.w3.org/wiki/LargeTripleStores). I am sure 
there is a way out which I don't know but you know.
      I would be really grateful if you could help me.

     (*Machine* Info: I am using 64 bit machine with 8 GBntel(R) 
Core(TM) i5-2400 CPU @ 3.10GHz. Its a Quad core machine
                       with Linux OS.)

   (ii) I am using LUBM dataset and running LUBM queries. Unfortunately, 
when I launch a query, the system throws a
*400 error* which essentially is bad request error. According to my 
understanding, the server finds a syntactic mismatch
        but I have got no idea why does it happen. I am using Fuseqi 
SPARQL interface.

Thanking you in advance for your suggestions.

Regards,
Rafiq


Re: Looking for suggestions if you have

Posted by Andy Seaborne <an...@apache.org>.
On 16/12/13 10:38, Rafiq wrote:
> To Andy,
>
> I  need suggestions on two issues:
>
> (i) I launched TDB loader to upload 159.(something) million triples, it
> took 357636.22 seconds to complete (!!!).
>      To me, TDB loader consumed an enormous amount of time because I
> have 1.3 billion triples to load. In addition to
>      processing time, loading time is critical for me, I choose TDB
> because I was impressed by the benchmark you(Andy)
>       published in (http://www.w3.org/wiki/LargeTripleStores). I am sure
> there is a way out which I don't know but you know.
>       I would be really grateful if you could help me.
>
>      (*Machine* Info: I am using 64 bit machine with 8 GBntel(R)
> Core(TM) i5-2400 CPU @ 3.10GHz. Its a Quad core machine
>                        with Linux OS.)

It's probably because of the 8G.  If you can borrow a larger machine 
just for the load, then it should go faster.  I loaded 410e6 recently 
overnight on a 30G machine so, subject to what your data profile is, 
16G-24G would be better.  And a small heap - 2G?

It does sound like you've pushed over the edge. Because disk is 
involved, if it starts hitting disk too much, performance drops sharply.

tdbloader2 *may* help but it's not certain to.

>    (ii) I am using LUBM dataset and running LUBM queries. Unfortunately,
> when I launch a query, the system throws a
> *400 error* which essentially is bad request error. According to my
> understanding, the server finds a syntactic mismatch
>         but I have got no idea why does it happen. I am using Fuseqi
> SPARQL interface.

Try using the online query validator as it prints the parse error.  That 
is also in the response body for the 400.

>
> Thanking you in advance for your suggestions.
>
> Regards,
> Rafiq

	Andy