You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by George News <ge...@gmx.net> on 2017/11/15 11:02:48 UTC

List mamed models is taking a lot

Hi,

I would like to know if there is any way of speeding up
Dataset.listNames(). Currently running it on a TDB with 200 graphs it
takes something like 10 seconds.

I think it is quite a lot, as there should be some index pointing to them.

Could you please tell me how I can speed up? I'm now currently thinking
on uploading all the graphs names in memory when starting my service and
then use the names from memory.

Regards,
Jorge

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

This is the code I use to measure:

long startTime = System.nanoTime();
Txn.executeRead(dataset, () -> {
   Iterator<String> list = dataset.listNames();
   while (list.hasNext()) {
      list.next();
   }
});

long endTime = System.nanoTime();
log.info("List execution time: " + (endTime - startTime) / 1000000 + "ms");
		

On 2017-11-15 12:02, George News wrote:
> Hi,
> 
> I would like to know if there is any way of speeding up
> Dataset.listNames(). Currently running it on a TDB with 200 graphs it
> takes something like 10 seconds.
> 
> I think it is quite a lot, as there should be some index pointing to them.
> 
> Could you please tell me how I can speed up? I'm now currently thinking
> on uploading all the graphs names in memory when starting my service and
> then use the names from memory.
> 
> Regards,
> Jorge
>

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

On 2017-11-16 14:42, Rob Vesse wrote:
> TDB does not store a separate list of graphs as it is a quad store i.e. it stores raw quads.
> 
> Therefore, in order to list the available graphs it has to iterate over all the quads and track the unique values for the graph field.

I thought that internally you were also indexing information per graph.
Quite similar to as considering a SQL database where each graph is
stored in its own table.

> So what you are doing yourself is probably the best approach

I'm on it :(

> Rob
> 
> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
> 
>     Sorry for insisting on this matter, but I really need to remove the
>     delay and would love to understand how the system works, as otherwise I
>     have to rework a big part of my code :(
>     
>     On 2017-11-15 12:02, George News wrote:
>     > Hi,
>     > 
>     > I would like to know if there is any way of speeding up
>     > Dataset.listNames(). Currently running it on a TDB with 200 graphs it
>     > takes something like 10 seconds.
>     > 
>     > I think it is quite a lot, as there should be some index pointing to them.
>     > 
>     > Could you please tell me how I can speed up? I'm now currently thinking
>     > on uploading all the graphs names in memory when starting my service and
>     > then use the names from memory.
>     > 
>     > Regards,
>     > Jorge
>     > 
>     
> 
> 
> 
> 
>

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

On 2017-11-16 15:38, ajs6f wrote:
>>> How much data? SSD vs
>>> disk? How much free RAM for file cache? What else is on the machine?
>>> Cold start or warm database?
>>
>> Machine data - Intel Xeon E312xx (Sandy Bridge) and 32 GB RAM
> 
> How much heap have you allocated to Jena's JVM? What else (as Andy asked) is going on on the machine? There is an important difference between heap and free (free is available to the OS for file caching, heap is not).

I have two different setups on similar machines:

- WildFly shared with other apps:

JAVA_OPTS="-Xms64m -Xmx8192m -XX:MetaspaceSize=96M
-XX:MaxMetaspaceSize=1024m -Djava.net.preferIPv4Stack=true"

- WildFly just for the semantic app

JAVA_OPTS="-Xms64m -Xmx512m -XX:MetaspaceSize=96M
-XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true"

On both the delay is quite similar.

Maybe my problem is a setup configuration.


> ajs6f
> 
>> On Nov 16, 2017, at 9:34 AM, George News <ge...@gmx.net> wrote:
>>
>> On 2017-11-16 14:51, Andy Seaborne wrote:
>>>
>>>
>>> On 16/11/17 13:42, Rob Vesse wrote:
>>>> TDB does not store a separate list of graphs as it is a quad store
>>>> i.e. it stores raw quads.
>>>>
>>>> Therefore, in order to list the available graphs it has to iterate
>>>> over all the quads and track the unique values for the graph field.
>>>>
>>>> So what you are doing yourself is probably the best approach
>>>
>>> I have recently been doing some related queries (COUNT) and don't see
>>> this issue but who knows? Details are missing.  
>>
>> When I make a query to a specific graph it works really well and
>> quickly. But the problem is that listing is taking too much.
>>
>>> How much data? SSD vs
>>> disk? How much free RAM for file cache? What else is on the machine?
>>> Cold start or warm database?
>>
>> Machine data - Intel Xeon E312xx (Sandy Bridge) and 32 GB RAM
>>
>>> An example database to try out would help.
>>
>> Database is over 20Gb now and has 200 named graphs, so unable to post it ;)
>>
>>>>
>>>> Rob
>>>>
>>>> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
>>>>
>>>>      Sorry for insisting on this matter,
>>>
>>> That's not how this works.
>>
>> I know that you answer whenever you have time, and I will never be able
>> to thank you, but let's say in order to be "polite" it's better to
>> apologize for insisting as this is like upping a post, which I really
>> don't like.
>>
>>>     Andy
>>>
>>>> but I really need to remove the
>>>>      delay and would love to understand how the system works, as
>>>> otherwise I
>>>>      have to rework a big part of my code :(
>>>>           On 2017-11-15 12:02, George News wrote:
>>>>      > Hi,
>>>>      >
>>>>      > I would like to know if there is any way of speeding up
>>>>      > Dataset.listNames(). Currently running it on a TDB with 200
>>>> graphs it
>>>>      > takes something like 10 seconds.
>>>>      >
>>>>      > I think it is quite a lot, as there should be some index
>>>> pointing to them.
>>>>      >
>>>>      > Could you please tell me how I can speed up? I'm now currently
>>>> thinking
>>>>      > on uploading all the graphs names in memory when starting my
>>>> service and
>>>>      > then use the names from memory.
>>>>      >
>>>>      > Regards,
>>>>      > Jorge
>>>>      >
>>>>     
>>>>
>>>>
>>>>
>>>
> 
>

Re: List mamed models is taking a lot

Posted by ajs6f <aj...@apache.org>.

>> How much data? SSD vs
>> disk? How much free RAM for file cache? What else is on the machine?
>> Cold start or warm database?
> 
> Machine data - Intel Xeon E312xx (Sandy Bridge) and 32 GB RAM

How much heap have you allocated to Jena's JVM? What else (as Andy asked) is going on on the machine? There is an important difference between heap and free (free is available to the OS for file caching, heap is not).

ajs6f

> On Nov 16, 2017, at 9:34 AM, George News <ge...@gmx.net> wrote:
> 
> On 2017-11-16 14:51, Andy Seaborne wrote:
>> 
>> 
>> On 16/11/17 13:42, Rob Vesse wrote:
>>> TDB does not store a separate list of graphs as it is a quad store
>>> i.e. it stores raw quads.
>>> 
>>> Therefore, in order to list the available graphs it has to iterate
>>> over all the quads and track the unique values for the graph field.
>>> 
>>> So what you are doing yourself is probably the best approach
>> 
>> I have recently been doing some related queries (COUNT) and don't see
>> this issue but who knows? Details are missing.  
> 
> When I make a query to a specific graph it works really well and
> quickly. But the problem is that listing is taking too much.
> 
>> How much data? SSD vs
>> disk? How much free RAM for file cache? What else is on the machine?
>> Cold start or warm database?
> 
> Machine data - Intel Xeon E312xx (Sandy Bridge) and 32 GB RAM
> 
>> An example database to try out would help.
> 
> Database is over 20Gb now and has 200 named graphs, so unable to post it ;)
> 
>>> 
>>> Rob
>>> 
>>> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
>>> 
>>>      Sorry for insisting on this matter,
>> 
>> That's not how this works.
> 
> I know that you answer whenever you have time, and I will never be able
> to thank you, but let's say in order to be "polite" it's better to
> apologize for insisting as this is like upping a post, which I really
> don't like.
> 
>>     Andy
>> 
>>> but I really need to remove the
>>>      delay and would love to understand how the system works, as
>>> otherwise I
>>>      have to rework a big part of my code :(
>>>           On 2017-11-15 12:02, George News wrote:
>>>      > Hi,
>>>      >
>>>      > I would like to know if there is any way of speeding up
>>>      > Dataset.listNames(). Currently running it on a TDB with 200
>>> graphs it
>>>      > takes something like 10 seconds.
>>>      >
>>>      > I think it is quite a lot, as there should be some index
>>> pointing to them.
>>>      >
>>>      > Could you please tell me how I can speed up? I'm now currently
>>> thinking
>>>      > on uploading all the graphs names in memory when starting my
>>> service and
>>>      > then use the names from memory.
>>>      >
>>>      > Regards,
>>>      > Jorge
>>>      >
>>>     
>>> 
>>> 
>>> 
>>

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

On 2017-11-16 16:13, Andy Seaborne wrote:
> 
> 
> On 16/11/17 14:34, George News wrote:
>> Database is over 20Gb now and has 200 named graphs, so unable to post
>> it ;)
> 
> Triple counts?

A bit more than 30 million


>>
>>>>
>>>> Rob
>>>>
>>>> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
>>>>
>>>>       Sorry for insisting on this matter,
>>>
>>> That's not how this works.
>>
>> I know that you answer whenever you have time, and I will never be able
>> to thank you, but let's say in order to be "polite" it's better to
>> apologize for insisting as this is like upping a post, which I really
>> don't like.
>>
>>>      Andy
>>>
>>>> but I really need to remove the
>>>>       delay and would love to understand how the system works, as
>>>> otherwise I
>>>>       have to rework a big part of my code :(
>>>>            On 2017-11-15 12:02, George News wrote:
>>>>       > Hi,
>>>>       >
>>>>       > I would like to know if there is any way of speeding up
>>>>       > Dataset.listNames(). Currently running it on a TDB with 200
>>>> graphs it
>>>>       > takes something like 10 seconds.
>>>>       >
>>>>       > I think it is quite a lot, as there should be some index
>>>> pointing to them.
>>>>       >
>>>>       > Could you please tell me how I can speed up? I'm now currently
>>>> thinking
>>>>       > on uploading all the graphs names in memory when starting my
>>>> service and
>>>>       > then use the names from memory.
>>>>       >
>>>>       > Regards,
>>>>       > Jorge
>>>>       >
>>>>     
>>>>
>>>>
>>>
>

Re: List mamed models is taking a lot

Posted by Andy Seaborne <an...@apache.org>.


On 16/11/17 14:34, George News wrote:
> Database is over 20Gb now and has 200 named graphs, so unable to post it ;)

Triple counts?

> 
>>>
>>> Rob
>>>
>>> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
>>>
>>>       Sorry for insisting on this matter,
>>
>> That's not how this works.
> 
> I know that you answer whenever you have time, and I will never be able
> to thank you, but let's say in order to be "polite" it's better to
> apologize for insisting as this is like upping a post, which I really
> don't like.
> 
>>      Andy
>>
>>> but I really need to remove the
>>>       delay and would love to understand how the system works, as
>>> otherwise I
>>>       have to rework a big part of my code :(
>>>            On 2017-11-15 12:02, George News wrote:
>>>       > Hi,
>>>       >
>>>       > I would like to know if there is any way of speeding up
>>>       > Dataset.listNames(). Currently running it on a TDB with 200
>>> graphs it
>>>       > takes something like 10 seconds.
>>>       >
>>>       > I think it is quite a lot, as there should be some index
>>> pointing to them.
>>>       >
>>>       > Could you please tell me how I can speed up? I'm now currently
>>> thinking
>>>       > on uploading all the graphs names in memory when starting my
>>> service and
>>>       > then use the names from memory.
>>>       >
>>>       > Regards,
>>>       > Jorge
>>>       >
>>>      
>>>
>>>
>>>
>>

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

On 2017-11-16 14:51, Andy Seaborne wrote:
> 
> 
> On 16/11/17 13:42, Rob Vesse wrote:
>> TDB does not store a separate list of graphs as it is a quad store
>> i.e. it stores raw quads.
>>
>> Therefore, in order to list the available graphs it has to iterate
>> over all the quads and track the unique values for the graph field.
>>
>> So what you are doing yourself is probably the best approach
> 
> I have recently been doing some related queries (COUNT) and don't see
> this issue but who knows? Details are missing.  

When I make a query to a specific graph it works really well and
quickly. But the problem is that listing is taking too much.

> How much data? SSD vs
> disk? How much free RAM for file cache? What else is on the machine?
> Cold start or warm database?

Machine data - Intel Xeon E312xx (Sandy Bridge) and 32 GB RAM

> An example database to try out would help.

Database is over 20Gb now and has 200 named graphs, so unable to post it ;)

>>
>> Rob
>>
>> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
>>
>>      Sorry for insisting on this matter,
> 
> That's not how this works.

I know that you answer whenever you have time, and I will never be able
to thank you, but let's say in order to be "polite" it's better to
apologize for insisting as this is like upping a post, which I really
don't like.

>     Andy
> 
>> but I really need to remove the
>>      delay and would love to understand how the system works, as
>> otherwise I
>>      have to rework a big part of my code :(
>>           On 2017-11-15 12:02, George News wrote:
>>      > Hi,
>>      >
>>      > I would like to know if there is any way of speeding up
>>      > Dataset.listNames(). Currently running it on a TDB with 200
>> graphs it
>>      > takes something like 10 seconds.
>>      >
>>      > I think it is quite a lot, as there should be some index
>> pointing to them.
>>      >
>>      > Could you please tell me how I can speed up? I'm now currently
>> thinking
>>      > on uploading all the graphs names in memory when starting my
>> service and
>>      > then use the names from memory.
>>      >
>>      > Regards,
>>      > Jorge
>>      >
>>     
>>
>>
>>
>

Re: List mamed models is taking a lot

Posted by Andy Seaborne <an...@apache.org>.


On 16/11/17 13:42, Rob Vesse wrote:
> TDB does not store a separate list of graphs as it is a quad store i.e. it stores raw quads.
> 
> Therefore, in order to list the available graphs it has to iterate over all the quads and track the unique values for the graph field.
> 
> So what you are doing yourself is probably the best approach

I have recently been doing some related queries (COUNT) and don't see 
this issue but who knows? Details are missing.  How much data? SSD vs 
disk? How much free RAM for file cache? What else is on the machine? 
Cold start or warm database?

An example database to try out would help.

> 
> Rob
> 
> On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:
> 
>      Sorry for insisting on this matter,

That's not how this works.

	Andy

> but I really need to remove the
>      delay and would love to understand how the system works, as otherwise I
>      have to rework a big part of my code :(
>      
>      On 2017-11-15 12:02, George News wrote:
>      > Hi,
>      >
>      > I would like to know if there is any way of speeding up
>      > Dataset.listNames(). Currently running it on a TDB with 200 graphs it
>      > takes something like 10 seconds.
>      >
>      > I think it is quite a lot, as there should be some index pointing to them.
>      >
>      > Could you please tell me how I can speed up? I'm now currently thinking
>      > on uploading all the graphs names in memory when starting my service and
>      > then use the names from memory.
>      >
>      > Regards,
>      > Jorge
>      >
>      
> 
> 
> 
>

Re: List mamed models is taking a lot

Posted by Rob Vesse <rv...@dotnetrdf.org>.

TDB does not store a separate list of graphs as it is a quad store i.e. it stores raw quads.

Therefore, in order to list the available graphs it has to iterate over all the quads and track the unique values for the graph field.

So what you are doing yourself is probably the best approach

Rob

On 16/11/2017, 12:03, "George News" <ge...@gmx.net> wrote:

    Sorry for insisting on this matter, but I really need to remove the
    delay and would love to understand how the system works, as otherwise I
    have to rework a big part of my code :(
    
    On 2017-11-15 12:02, George News wrote:
    > Hi,
    > 
    > I would like to know if there is any way of speeding up
    > Dataset.listNames(). Currently running it on a TDB with 200 graphs it
    > takes something like 10 seconds.
    > 
    > I think it is quite a lot, as there should be some index pointing to them.
    > 
    > Could you please tell me how I can speed up? I'm now currently thinking
    > on uploading all the graphs names in memory when starting my service and
    > then use the names from memory.
    > 
    > Regards,
    > Jorge
    >

Re: List mamed models is taking a lot

Posted by George News <ge...@gmx.net>.

Sorry for insisting on this matter, but I really need to remove the
delay and would love to understand how the system works, as otherwise I
have to rework a big part of my code :(

On 2017-11-15 12:02, George News wrote:
> Hi,
> 
> I would like to know if there is any way of speeding up
> Dataset.listNames(). Currently running it on a TDB with 200 graphs it
> takes something like 10 seconds.
> 
> I think it is quite a lot, as there should be some index pointing to them.
> 
> Could you please tell me how I can speed up? I'm now currently thinking
> on uploading all the graphs names in memory when starting my service and
> then use the names from memory.
> 
> Regards,
> Jorge
>