You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by ro...@tiscali.it on 2022/01/03 11:58:34 UTC

Use command tdbquery

  Hi,

i am using a fuseki server and need to run a query which returns
a lot of results. The use of the HTTP call (http: // localhost: 3030 /
ds / query = myQuery) is very slow and inefficient. I thought about
using the tdbquery command. But I don't want to stop fuseki. Is there
any way to do this?
  


Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70


Re: Use command tdbquery

Posted by ro...@tiscali.it.
  Thanks for the reply. It can be one of the alternatives

Il
03.01.2022 19:09 Andy Seaborne ha scritto: 

> On 03/01/2022 17:44,
robert.barry@tiscali.it [2]wrote:
> 
>> Hi, you are right, I was not
clear in the request. I try to explain myself better. I have a knowledge
base of over a billion triples. I am testing a query that returns about
2 million results (in the future I will have many queries that will
return a lot of data) On the client side I have to allow the download of
the results in CSV format (on asynchronous request, not through
batch).
> 
> How long does it take?
> 
>> But, with these volumes of
data, we can have 2 types of errors: - OutOfMemory on the Result (I can
increase the heap size....)
> 
> How are you making the query? (what
software?)
> 
> Fuseki will stream results back and with the Jena client
code, can 
> provide a end-to-end streaming solution.
> 
> The fastest
results for is the binary Thrift encoding.
> 
> RDFConnectionFuseki will
use this.
> 
> Some queries don't stream.
> For example, can you clarify
- a query? What kind of query 
>> 
>>> - "many results", any number? -
What do you consider slow and
>> inefficient and what are would you
consider ideal? 
>> 
>>> Also, why do
>> you think that the HTTP call is
the bottleneck? I think that this is a wrong assumption. Try to run a
simple query and you will see that the HTTP call is not the bottleneck.
Sent: Monday, 3 January 2022 12:59 To: u
>> 
>>> margin-left:5px;
width:100%">Subject:
>> dbquery Hi, i am using a fuseki server and need
to run a query which returns a lot of results. The use of the HTTP call
(http: // localhost: 3030 / ds / query = myQuery) is very
>> 
>>> any
way to do this?
>> tyle="padding-left:5px; border-left:#1010ff 2px
solid; margin-left:5px; width:100%">Con Tiscali Mobile Smart 70 hai 70
GB in 4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese
http://tisca.li/Smart70 [1] [4] Con Tiscali Mobile Smart 70 hai 70 GB in
4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese htt
>> 
>>> 
 



Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70


Re: Use command tdbquery

Posted by ro...@tiscali.it.
  Thanks for the reply.

Il 03.01.2022 20:49 A. Soroka ha scritto: 

>
Is it possible for you to make a copy of the database to query
offline?
> That can be expensive in storage, but it's really the
simplest thing to do
> in many ways.
> 
> Adam
> 
> On Mon, Jan 3, 2022,
1:09 PM Andy Seaborne wrote:
> 
>> On 03/01/2022 17:44,
robert.barry@tiscali.it [2]wrote: 
>> 
>>> Hi, you are right, I was not
clear in the request. I try to explain myself better. I have a knowledge
base of over a billion triples. I am testing a query that returns about
2 million results (in the future I will have many queries that will
return a lot of data) On the client side I have to allow the download of
the results in CSV format (on asynchronous request, not through
batch).
>> How long does it take? 
>> 
>>> But, with these volumes of
data, we can have 2 types of errors: - OutOfMemory on the Result (I can
increase the heap size....)
>> How are you making the query? (what
software?) Fuseki will stream results back and with the Jena client
code, can provide a end-to-end streaming solution. The fastest results
for is the binary Thrift encoding. RDFConnectionFuseki will use this.
Some queries don't stream. For example, can you clarify - a query? What
kind of query 
>>> 
>>>> - "many results", any number? - What do you
consider slow and
>>> inefficient and what are would you consider ideal?

>>> 
>>>> Also, why do
>>> you think that the HTTP call is the
bottleneck? I think that this is a wrong assumption. Try to run a simple
query and you will see that the HTTP call is not the bottleneck. Sent:
Monday, 3 January 2022 12:59 To: u
>>> 
>>>> margin-left:5px;
width:100%">Subject:
>>> dbquery Hi, i am using a fuseki server and need
to run a query which returns a lot of results. The use of the HTTP call
(http: // localhost: 3030 / ds / query = myQuery) is very
>>> 
>>>> any
way to do this?
>>> tyle="padding-left:5px; border-left:#1010ff 2px
solid; margin-left:5px; width:100%">Con Tiscali Mobile Smart 70 hai 70
GB in 4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese
http://tisca.li/Smart70 [1] [4] Con Tiscali Mobile Smart 70 hai 70 GB in
4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese

  



Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70


Re: Use command tdbquery

Posted by "A. Soroka" <so...@gmail.com>.
Is it possible for you to make a copy of the database to query offline?
That can be expensive in storage, but it's really the simplest thing to do
in many ways.

Adam

On Mon, Jan 3, 2022, 1:09 PM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 03/01/2022 17:44, robert.barry@tiscali.it wrote:
> >
> >
> > Hi,
> >
> > you are right, I was not clear in the request. I try to
> > explain myself better.
> > I have a knowledge base of over a billion
> > triples.
> > I am testing a query that returns about 2 million results (in
> > the future I will have many queries that will return a lot of data)
> > On
> > the client side I have to allow the download of the results in CSV
> > format (on asynchronous request, not through batch).
>
> How long does it take?
>
> > But, with these
> > volumes of data, we can have 2 types of errors:
> > - OutOfMemory on the
> > Result (I can increase the heap size....)
>
> How are you making the query? (what software?)
>
> Fuseki will stream results back and with the Jena client code, can
> provide a end-to-end streaming solution.
>
> The fastest results for is the binary Thrift encoding.
>
> RDFConnectionFuseki will use this.
>
> Some queries don't stream.
>
> > - Connection timeout on Fuseki
> > (can I increase the configuration timeout?)
>
> What is timing it out? Some intermediate?
>
> Fuseki by default does not have timeouts. Your configuration may set
> them but the default is unbounded.
>
> If you have set timeouts, you can create another service to the same
> database with different settings. It shares the TDB database safely.
>
> > For this reason I was
> > thinking of using the tdbquery command (takes 3 minutes to run with
> > tdbquery). But I can't stop fuseki to perform the download operation.
> > Fuseki must remain active at all times to answer all other
> > questions.
>
> You can't use tdbquery this way.
>
> It should cause an error saying "already in use" or some such message.
> There is locking on the file system to detect dual use.
>
> With virtualized setups it may be possible to not get the error because
> filing systems are weird, but all that has happened is the the locking
> is not seeing the duplicate use, not finding it is possible.
>
> You will corrupt the database.
>
> Corrupt = permanently damage, not recoverable.
>
>      Andy
>
> >
> > Il 03.01.2022 17:25 Rinor Sefa ha scritto:
> >
> >> I think if
> > you describe your use case in more detail, it would be easier to get
> > help.
> >>
> >> For example, can you clarify
> >> - a query? What kind of query
> >
> >> - "many results", any number?
> >> - What do you consider slow and
> > inefficient and what are would you consider ideal?
> >>
> >> Also, why do
> > you think that the HTTP call is the bottleneck? I think that this is a
> > wrong assumption. Try to run a simple query and you will see that the
> > HTTP call is not the bottleneck.
> >>
> >> -----Original Message-----
> >> From:
> > robert.barry@tiscali.it [1]
> >> Sent: Monday, 3 January 2022 12:59
> >> To:
> > users@jena.apache.org [3]
> >> Subject: Use command tdbquery
> >>
> >> Hi,
> >>
> >>
> > i am using a fuseki server and need to run a query which returns a lot
> > of results. The use of the HTTP call (http: // localhost: 3030 / ds /
> > query = myQuery) is very slow and inefficient. I thought about using the
> > tdbquery command. But I don't want to stop fuseki. Is there any way to
> > do this?
> >>
> >> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti
> > illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [4]
> >
> >
> >
> >
> > Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS
> a soli 7,99€ al mese http://tisca.li/Smart70
> >
> >
>

Re: Use command tdbquery

Posted by Andy Seaborne <an...@apache.org>.

On 03/01/2022 17:44, robert.barry@tiscali.it wrote:
>    
> 
> Hi,
> 
> you are right, I was not clear in the request. I try to
> explain myself better.
> I have a knowledge base of over a billion
> triples.
> I am testing a query that returns about 2 million results (in
> the future I will have many queries that will return a lot of data)
> On
> the client side I have to allow the download of the results in CSV
> format (on asynchronous request, not through batch).

How long does it take?

> But, with these
> volumes of data, we can have 2 types of errors:
> - OutOfMemory on the
> Result (I can increase the heap size....)

How are you making the query? (what software?)

Fuseki will stream results back and with the Jena client code, can 
provide a end-to-end streaming solution.

The fastest results for is the binary Thrift encoding.

RDFConnectionFuseki will use this.

Some queries don't stream.

> - Connection timeout on Fuseki
> (can I increase the configuration timeout?)

What is timing it out? Some intermediate?

Fuseki by default does not have timeouts. Your configuration may set 
them but the default is unbounded.

If you have set timeouts, you can create another service to the same 
database with different settings. It shares the TDB database safely.

> For this reason I was
> thinking of using the tdbquery command (takes 3 minutes to run with
> tdbquery). But I can't stop fuseki to perform the download operation.
> Fuseki must remain active at all times to answer all other
> questions.

You can't use tdbquery this way.

It should cause an error saying "already in use" or some such message. 
There is locking on the file system to detect dual use.

With virtualized setups it may be possible to not get the error because 
filing systems are weird, but all that has happened is the the locking 
is not seeing the duplicate use, not finding it is possible.

You will corrupt the database.

Corrupt = permanently damage, not recoverable.

     Andy

> 
> Il 03.01.2022 17:25 Rinor Sefa ha scritto:
> 
>> I think if
> you describe your use case in more detail, it would be easier to get
> help.
>>
>> For example, can you clarify
>> - a query? What kind of query
> 
>> - "many results", any number?
>> - What do you consider slow and
> inefficient and what are would you consider ideal?
>>
>> Also, why do
> you think that the HTTP call is the bottleneck? I think that this is a
> wrong assumption. Try to run a simple query and you will see that the
> HTTP call is not the bottleneck.
>>
>> -----Original Message-----
>> From:
> robert.barry@tiscali.it [1]
>> Sent: Monday, 3 January 2022 12:59
>> To:
> users@jena.apache.org [3]
>> Subject: Use command tdbquery
>>
>> Hi,
>>
>>
> i am using a fuseki server and need to run a query which returns a lot
> of results. The use of the HTTP call (http: // localhost: 3030 / ds /
> query = myQuery) is very slow and inefficient. I thought about using the
> tdbquery command. But I don't want to stop fuseki. Is there any way to
> do this?
>>
>> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti
> illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [4]
> 
> 
> 
> 
> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70
> 
> 

RE: Use command tdbquery

Posted by ro...@tiscali.it.
  

Hi,

you are right, I was not clear in the request. I try to
explain myself better.
I have a knowledge base of over a billion
triples.
I am testing a query that returns about 2 million results (in
the future I will have many queries that will return a lot of data)
On
the client side I have to allow the download of the results in CSV
format (on asynchronous request, not through batch).
But, with these
volumes of data, we can have 2 types of errors:
- OutOfMemory on the
Result (I can increase the heap size....)
- Connection timeout on Fuseki
(can I increase the configuration timeout?)

For this reason I was
thinking of using the tdbquery command (takes 3 minutes to run with
tdbquery). But I can't stop fuseki to perform the download operation.
Fuseki must remain active at all times to answer all other
questions.

Il 03.01.2022 17:25 Rinor Sefa ha scritto: 

> I think if
you describe your use case in more detail, it would be easier to get
help. 
> 
> For example, can you clarify
> - a query? What kind of query

> - "many results", any number?
> - What do you consider slow and
inefficient and what are would you consider ideal? 
> 
> Also, why do
you think that the HTTP call is the bottleneck? I think that this is a
wrong assumption. Try to run a simple query and you will see that the
HTTP call is not the bottleneck.
> 
> -----Original Message-----
> From:
robert.barry@tiscali.it [1] 
> Sent: Monday, 3 January 2022 12:59
> To:
users@jena.apache.org [3]
> Subject: Use command tdbquery
> 
> Hi,
> 
>
i am using a fuseki server and need to run a query which returns a lot
of results. The use of the HTTP call (http: // localhost: 3030 / ds /
query = myQuery) is very slow and inefficient. I thought about using the
tdbquery command. But I don't want to stop fuseki. Is there any way to
do this?
> 
> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti
illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [4]




Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70


RE: Use command tdbquery

Posted by Rinor Sefa <ri...@uzh.ch>.
I think if you describe your use case in more detail, it would be easier to get help. 

For example, can you clarify
- a query? What kind of query 
- "many results", any number?
- What do you consider slow and inefficient and what are would you consider ideal?  

Also, why do you think that the HTTP call is the bottleneck? I think that this is a wrong assumption. Try to run a simple query and you will see that the HTTP call is not the bottleneck.

-----Original Message-----
From: robert.barry@tiscali.it <ro...@tiscali.it> 
Sent: Monday, 3 January 2022 12:59
To: users@jena.apache.org
Subject: Use command tdbquery

  Hi,

i am using a fuseki server and need to run a query which returns a lot of results. The use of the HTTP call (http: // localhost: 3030 / ds / query = myQuery) is very slow and inefficient. I thought about using the tdbquery command. But I don't want to stop fuseki. Is there any way to do this?
  


Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70