You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2013/01/16 02:02:06 UTC

Now function a performance bottleneck

Hi Andy,

I ran a profiler through a typical scenario with our software and 
discovered that 10% of the clocked time was spent in the initialization 
of the xsd:dateTime literal needed by the afn:now() and NOW function. 
Below is a snapshot from Yourkit profiler.



Unless this is a measurement error in the tool, I believe there would be 
a low-hanging fruit to optimize performance if the value would be 
computed on demand only. I do understand that you need to get the time 
stamp for the duration of the whole query, but I believe it would be 
much faster to simply get the current time millis as a long, and from 
this derive the actual XSD literal only if someone really calls this 
SPARQL method.

Thanks
Holger


Re: Now function a performance bottleneck

Posted by Andy Seaborne <an...@apache.org>.
Holger - could you put a patch for this on JIRA please?

	Andy


On 28/01/13 06:10, Holger Knublauch wrote:
> Hi Andy,
>
> I have meanwhile collected more conclusive evidence of the difference
> that a change to the now() function would make. I ran typical test
> scenarios in our EVN tool, which creates thousands of SPARQL queries to
> build up the UI. Average results over three runs are:
>
> - Current code: 9 seconds
> - Code with a constant xsd:dateTime: 8.5 seconds.
>
> As I said this code creates thousands of QueryExecutions, so it
> certainly is a bit extreme - yet I guess any SPARQL end point based on
> Jena would have the same performance "bottleneck" if used extensively.
>
> HTH
> Holger
>
>
> On 1/17/2013 9:24, Andy Seaborne wrote:
>> On 16/01/13 22:48, Holger Knublauch wrote:
>>>
>>> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>>>
>>>> On 16/01/13 01:02, Holger Knublauch wrote:
>>>>> Hi Andy,
>>>>>
>>>>> I ran a profiler through a typical scenario with our software and
>>>>> discovered that 10% of the clocked time was spent in the
>>>>> initialization
>>>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>>>> Below is a snapshot from Yourkit profiler.
>>>>>
>>>>>
>>>>>
>>>>> Unless this is a measurement error in the tool, I believe there
>>>>> would be
>>>>> a low-hanging fruit to optimize performance if the value would be
>>>>> computed on demand only. I do understand that you need to get the time
>>>>> stamp for the duration of the whole query, but I believe it would be
>>>>> much faster to simply get the current time millis as a long, and from
>>>>> this derive the actual XSD literal only if someone really calls this
>>>>> SPARQL method.
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>
>>>> Holger,
>>>>
>>>> This isn't clear to me - I can not see the code context for the call
>>>> to NodeFactory.nodeAsDateTime.
>>>>
>>>> what is the SPARQL query being executed?
>>>
>>> Any QueryExecution. The relevant call tree starts in the class Context:
>>>
>>>      context.set(ARQConstants.sysCurrentTime,
>>> NodeFactory.nowAsDateTime()) ;
>>>
>>> My suggestion is to use a different constant such as
>>> sysCurrentTimeMillis for a simple long and compute the sysCurrentTime
>>> on demand only, on the first call of the now function.
>>
>> Holger,
>>
>> What I need is a comparison - have you run with such a change? Your
>> suggestion looks sensible IF it addresses the root cause.
>>
>> What is the profile with a dummy value for
>>   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>>
>> because the report seems to show at least 889ms (0.9 seconds) per query.
>> My experience is that each query does not take almost 1 second
>> overhead in a warmed up JVM!
>>
>> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
>>
>> Have you run with such a change?
>> What is the profile with a dummy value for
>>   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>>
>> otherwise isn't it better to look at Integer.toString because that is
>> used elsewhere?
>>
>> (note - locale processing in Java is sometimes expensive and delayed
>> and it is needed for string conversions)
>>
>> And which version of Jena/ARQ is this?
>>
>>     Andy
>>
>>>
>>> HTH
>>> Holger
>>>
>>
>>
>> Is this the first query to run in the JVM?
>> Are you running a single query in a fresh JVM?
>> Is there a nested select?
>>
>>
>>
>>     Andy
>>
>


Re: Now function a performance bottleneck

Posted by Holger Knublauch <ho...@knublauch.com>.
Hi Andy,

I have meanwhile collected more conclusive evidence of the difference 
that a change to the now() function would make. I ran typical test 
scenarios in our EVN tool, which creates thousands of SPARQL queries to 
build up the UI. Average results over three runs are:

- Current code: 9 seconds
- Code with a constant xsd:dateTime: 8.5 seconds.

As I said this code creates thousands of QueryExecutions, so it 
certainly is a bit extreme - yet I guess any SPARQL end point based on 
Jena would have the same performance "bottleneck" if used extensively.

HTH
Holger


On 1/17/2013 9:24, Andy Seaborne wrote:
> On 16/01/13 22:48, Holger Knublauch wrote:
>>
>> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>>
>>> On 16/01/13 01:02, Holger Knublauch wrote:
>>>> Hi Andy,
>>>>
>>>> I ran a profiler through a typical scenario with our software and
>>>> discovered that 10% of the clocked time was spent in the 
>>>> initialization
>>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>>> Below is a snapshot from Yourkit profiler.
>>>>
>>>>
>>>>
>>>> Unless this is a measurement error in the tool, I believe there 
>>>> would be
>>>> a low-hanging fruit to optimize performance if the value would be
>>>> computed on demand only. I do understand that you need to get the time
>>>> stamp for the duration of the whole query, but I believe it would be
>>>> much faster to simply get the current time millis as a long, and from
>>>> this derive the actual XSD literal only if someone really calls this
>>>> SPARQL method.
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>
>>> Holger,
>>>
>>> This isn't clear to me - I can not see the code context for the call 
>>> to NodeFactory.nodeAsDateTime.
>>>
>>> what is the SPARQL query being executed?
>>
>> Any QueryExecution. The relevant call tree starts in the class Context:
>>
>>      context.set(ARQConstants.sysCurrentTime, 
>> NodeFactory.nowAsDateTime()) ;
>>
>> My suggestion is to use a different constant such as 
>> sysCurrentTimeMillis for a simple long and compute the sysCurrentTime 
>> on demand only, on the first call of the now function.
>
> Holger,
>
> What I need is a comparison - have you run with such a change? Your 
> suggestion looks sensible IF it addresses the root cause.
>
> What is the profile with a dummy value for
>   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> because the report seems to show at least 889ms (0.9 seconds) per query.
> My experience is that each query does not take almost 1 second 
> overhead in a warmed up JVM!
>
> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
>
> Have you run with such a change?
> What is the profile with a dummy value for
>   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> otherwise isn't it better to look at Integer.toString because that is 
> used elsewhere?
>
> (note - locale processing in Java is sometimes expensive and delayed 
> and it is needed for string conversions)
>
> And which version of Jena/ARQ is this?
>
>     Andy
>
>>
>> HTH
>> Holger
>>
>
>
> Is this the first query to run in the JVM?
> Are you running a single query in a fresh JVM?
> Is there a nested select?
>
>
>
>     Andy
>


Re: Now function a performance bottleneck

Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 23:24, Andy Seaborne wrote:
> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.

There is fast Integer to StringBuilder code in NumberUtils - it's used 
by TDB when reconstructing nodes from encoded binary values.

	Andy


Re: Now function a performance bottleneck

Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 22:48, Holger Knublauch wrote:
>
> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>
>> On 16/01/13 01:02, Holger Knublauch wrote:
>>> Hi Andy,
>>>
>>> I ran a profiler through a typical scenario with our software and
>>> discovered that 10% of the clocked time was spent in the initialization
>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>> Below is a snapshot from Yourkit profiler.
>>>
>>>
>>>
>>> Unless this is a measurement error in the tool, I believe there would be
>>> a low-hanging fruit to optimize performance if the value would be
>>> computed on demand only. I do understand that you need to get the time
>>> stamp for the duration of the whole query, but I believe it would be
>>> much faster to simply get the current time millis as a long, and from
>>> this derive the actual XSD literal only if someone really calls this
>>> SPARQL method.
>>>
>>> Thanks
>>> Holger
>>>
>>
>> Holger,
>>
>> This isn't clear to me - I can not see the code context for the call to NodeFactory.nodeAsDateTime.
>>
>> what is the SPARQL query being executed?
>
> Any QueryExecution. The relevant call tree starts in the class Context:
>
>      context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> My suggestion is to use a different constant such as sysCurrentTimeMillis for a simple long and compute the sysCurrentTime on demand only, on the first call of the now function.

Holger,

What I need is a comparison - have you run with such a change? Your 
suggestion looks sensible IF it addresses the root cause.

What is the profile with a dummy value for
   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;

because the report seems to show at least 889ms (0.9 seconds) per query.
My experience is that each query does not take almost 1 second overhead 
in a warmed up JVM!

The profile shows 140ms in Integer.toString and 15ms in calcTimezone.

Have you run with such a change?
What is the profile with a dummy value for
   context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;

otherwise isn't it better to look at Integer.toString because that is 
used elsewhere?

(note - locale processing in Java is sometimes expensive and delayed and 
it is needed for string conversions)

And which version of Jena/ARQ is this?

	Andy

>
> HTH
> Holger
>


Is this the first query to run in the JVM?
Are you running a single query in a fresh JVM?
Is there a nested select?



	Andy


Re: Now function a performance bottleneck

Posted by Holger Knublauch <ho...@knublauch.com>.
On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:

> On 16/01/13 01:02, Holger Knublauch wrote:
>> Hi Andy,
>> 
>> I ran a profiler through a typical scenario with our software and
>> discovered that 10% of the clocked time was spent in the initialization
>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>> Below is a snapshot from Yourkit profiler.
>> 
>> 
>> 
>> Unless this is a measurement error in the tool, I believe there would be
>> a low-hanging fruit to optimize performance if the value would be
>> computed on demand only. I do understand that you need to get the time
>> stamp for the duration of the whole query, but I believe it would be
>> much faster to simply get the current time millis as a long, and from
>> this derive the actual XSD literal only if someone really calls this
>> SPARQL method.
>> 
>> Thanks
>> Holger
>> 
> 
> Holger,
> 
> This isn't clear to me - I can not see the code context for the call to NodeFactory.nodeAsDateTime.
> 
> what is the SPARQL query being executed?

Any QueryExecution. The relevant call tree starts in the class Context:
        
    context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;

My suggestion is to use a different constant such as sysCurrentTimeMillis for a simple long and compute the sysCurrentTime on demand only, on the first call of the now function.

HTH
Holger


Re: Now function a performance bottleneck

Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 01:02, Holger Knublauch wrote:
> Hi Andy,
>
> I ran a profiler through a typical scenario with our software and
> discovered that 10% of the clocked time was spent in the initialization
> of the xsd:dateTime literal needed by the afn:now() and NOW function.
> Below is a snapshot from Yourkit profiler.
>
>
>
> Unless this is a measurement error in the tool, I believe there would be
> a low-hanging fruit to optimize performance if the value would be
> computed on demand only. I do understand that you need to get the time
> stamp for the duration of the whole query, but I believe it would be
> much faster to simply get the current time millis as a long, and from
> this derive the actual XSD literal only if someone really calls this
> SPARQL method.
>
> Thanks
> Holger
>

Holger,

This isn't clear to me - I can not see the code context for the call to 
NodeFactory.nodeAsDateTime.

what is the SPARQL query being executed?

	Andy