You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2013/01/16 02:02:06 UTC
Now function a performance bottleneck
Hi Andy,
I ran a profiler through a typical scenario with our software and
discovered that 10% of the clocked time was spent in the initialization
of the xsd:dateTime literal needed by the afn:now() and NOW function.
Below is a snapshot from Yourkit profiler.
Unless this is a measurement error in the tool, I believe there would be
a low-hanging fruit to optimize performance if the value would be
computed on demand only. I do understand that you need to get the time
stamp for the duration of the whole query, but I believe it would be
much faster to simply get the current time millis as a long, and from
this derive the actual XSD literal only if someone really calls this
SPARQL method.
Thanks
Holger
Re: Now function a performance bottleneck
Posted by Andy Seaborne <an...@apache.org>.
Holger - could you put a patch for this on JIRA please?
Andy
On 28/01/13 06:10, Holger Knublauch wrote:
> Hi Andy,
>
> I have meanwhile collected more conclusive evidence of the difference
> that a change to the now() function would make. I ran typical test
> scenarios in our EVN tool, which creates thousands of SPARQL queries to
> build up the UI. Average results over three runs are:
>
> - Current code: 9 seconds
> - Code with a constant xsd:dateTime: 8.5 seconds.
>
> As I said this code creates thousands of QueryExecutions, so it
> certainly is a bit extreme - yet I guess any SPARQL end point based on
> Jena would have the same performance "bottleneck" if used extensively.
>
> HTH
> Holger
>
>
> On 1/17/2013 9:24, Andy Seaborne wrote:
>> On 16/01/13 22:48, Holger Knublauch wrote:
>>>
>>> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>>>
>>>> On 16/01/13 01:02, Holger Knublauch wrote:
>>>>> Hi Andy,
>>>>>
>>>>> I ran a profiler through a typical scenario with our software and
>>>>> discovered that 10% of the clocked time was spent in the
>>>>> initialization
>>>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>>>> Below is a snapshot from Yourkit profiler.
>>>>>
>>>>>
>>>>>
>>>>> Unless this is a measurement error in the tool, I believe there
>>>>> would be
>>>>> a low-hanging fruit to optimize performance if the value would be
>>>>> computed on demand only. I do understand that you need to get the time
>>>>> stamp for the duration of the whole query, but I believe it would be
>>>>> much faster to simply get the current time millis as a long, and from
>>>>> this derive the actual XSD literal only if someone really calls this
>>>>> SPARQL method.
>>>>>
>>>>> Thanks
>>>>> Holger
>>>>>
>>>>
>>>> Holger,
>>>>
>>>> This isn't clear to me - I can not see the code context for the call
>>>> to NodeFactory.nodeAsDateTime.
>>>>
>>>> what is the SPARQL query being executed?
>>>
>>> Any QueryExecution. The relevant call tree starts in the class Context:
>>>
>>> context.set(ARQConstants.sysCurrentTime,
>>> NodeFactory.nowAsDateTime()) ;
>>>
>>> My suggestion is to use a different constant such as
>>> sysCurrentTimeMillis for a simple long and compute the sysCurrentTime
>>> on demand only, on the first call of the now function.
>>
>> Holger,
>>
>> What I need is a comparison - have you run with such a change? Your
>> suggestion looks sensible IF it addresses the root cause.
>>
>> What is the profile with a dummy value for
>> context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>>
>> because the report seems to show at least 889ms (0.9 seconds) per query.
>> My experience is that each query does not take almost 1 second
>> overhead in a warmed up JVM!
>>
>> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
>>
>> Have you run with such a change?
>> What is the profile with a dummy value for
>> context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>>
>> otherwise isn't it better to look at Integer.toString because that is
>> used elsewhere?
>>
>> (note - locale processing in Java is sometimes expensive and delayed
>> and it is needed for string conversions)
>>
>> And which version of Jena/ARQ is this?
>>
>> Andy
>>
>>>
>>> HTH
>>> Holger
>>>
>>
>>
>> Is this the first query to run in the JVM?
>> Are you running a single query in a fresh JVM?
>> Is there a nested select?
>>
>>
>>
>> Andy
>>
>
Re: Now function a performance bottleneck
Posted by Holger Knublauch <ho...@knublauch.com>.
Hi Andy,
I have meanwhile collected more conclusive evidence of the difference
that a change to the now() function would make. I ran typical test
scenarios in our EVN tool, which creates thousands of SPARQL queries to
build up the UI. Average results over three runs are:
- Current code: 9 seconds
- Code with a constant xsd:dateTime: 8.5 seconds.
As I said this code creates thousands of QueryExecutions, so it
certainly is a bit extreme - yet I guess any SPARQL end point based on
Jena would have the same performance "bottleneck" if used extensively.
HTH
Holger
On 1/17/2013 9:24, Andy Seaborne wrote:
> On 16/01/13 22:48, Holger Knublauch wrote:
>>
>> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>>
>>> On 16/01/13 01:02, Holger Knublauch wrote:
>>>> Hi Andy,
>>>>
>>>> I ran a profiler through a typical scenario with our software and
>>>> discovered that 10% of the clocked time was spent in the
>>>> initialization
>>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>>> Below is a snapshot from Yourkit profiler.
>>>>
>>>>
>>>>
>>>> Unless this is a measurement error in the tool, I believe there
>>>> would be
>>>> a low-hanging fruit to optimize performance if the value would be
>>>> computed on demand only. I do understand that you need to get the time
>>>> stamp for the duration of the whole query, but I believe it would be
>>>> much faster to simply get the current time millis as a long, and from
>>>> this derive the actual XSD literal only if someone really calls this
>>>> SPARQL method.
>>>>
>>>> Thanks
>>>> Holger
>>>>
>>>
>>> Holger,
>>>
>>> This isn't clear to me - I can not see the code context for the call
>>> to NodeFactory.nodeAsDateTime.
>>>
>>> what is the SPARQL query being executed?
>>
>> Any QueryExecution. The relevant call tree starts in the class Context:
>>
>> context.set(ARQConstants.sysCurrentTime,
>> NodeFactory.nowAsDateTime()) ;
>>
>> My suggestion is to use a different constant such as
>> sysCurrentTimeMillis for a simple long and compute the sysCurrentTime
>> on demand only, on the first call of the now function.
>
> Holger,
>
> What I need is a comparison - have you run with such a change? Your
> suggestion looks sensible IF it addresses the root cause.
>
> What is the profile with a dummy value for
> context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> because the report seems to show at least 889ms (0.9 seconds) per query.
> My experience is that each query does not take almost 1 second
> overhead in a warmed up JVM!
>
> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
>
> Have you run with such a change?
> What is the profile with a dummy value for
> context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> otherwise isn't it better to look at Integer.toString because that is
> used elsewhere?
>
> (note - locale processing in Java is sometimes expensive and delayed
> and it is needed for string conversions)
>
> And which version of Jena/ARQ is this?
>
> Andy
>
>>
>> HTH
>> Holger
>>
>
>
> Is this the first query to run in the JVM?
> Are you running a single query in a fresh JVM?
> Is there a nested select?
>
>
>
> Andy
>
Re: Now function a performance bottleneck
Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 23:24, Andy Seaborne wrote:
> The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
There is fast Integer to StringBuilder code in NumberUtils - it's used
by TDB when reconstructing nodes from encoded binary values.
Andy
Re: Now function a performance bottleneck
Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 22:48, Holger Knublauch wrote:
>
> On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
>
>> On 16/01/13 01:02, Holger Knublauch wrote:
>>> Hi Andy,
>>>
>>> I ran a profiler through a typical scenario with our software and
>>> discovered that 10% of the clocked time was spent in the initialization
>>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>>> Below is a snapshot from Yourkit profiler.
>>>
>>>
>>>
>>> Unless this is a measurement error in the tool, I believe there would be
>>> a low-hanging fruit to optimize performance if the value would be
>>> computed on demand only. I do understand that you need to get the time
>>> stamp for the duration of the whole query, but I believe it would be
>>> much faster to simply get the current time millis as a long, and from
>>> this derive the actual XSD literal only if someone really calls this
>>> SPARQL method.
>>>
>>> Thanks
>>> Holger
>>>
>>
>> Holger,
>>
>> This isn't clear to me - I can not see the code context for the call to NodeFactory.nodeAsDateTime.
>>
>> what is the SPARQL query being executed?
>
> Any QueryExecution. The relevant call tree starts in the class Context:
>
> context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
>
> My suggestion is to use a different constant such as sysCurrentTimeMillis for a simple long and compute the sysCurrentTime on demand only, on the first call of the now function.
Holger,
What I need is a comparison - have you run with such a change? Your
suggestion looks sensible IF it addresses the root cause.
What is the profile with a dummy value for
context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
because the report seems to show at least 889ms (0.9 seconds) per query.
My experience is that each query does not take almost 1 second overhead
in a warmed up JVM!
The profile shows 140ms in Integer.toString and 15ms in calcTimezone.
Have you run with such a change?
What is the profile with a dummy value for
context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
otherwise isn't it better to look at Integer.toString because that is
used elsewhere?
(note - locale processing in Java is sometimes expensive and delayed and
it is needed for string conversions)
And which version of Jena/ARQ is this?
Andy
>
> HTH
> Holger
>
Is this the first query to run in the JVM?
Are you running a single query in a fresh JVM?
Is there a nested select?
Andy
Re: Now function a performance bottleneck
Posted by Holger Knublauch <ho...@knublauch.com>.
On Jan 17, 2013, at 12:52 AM, Andy Seaborne wrote:
> On 16/01/13 01:02, Holger Knublauch wrote:
>> Hi Andy,
>>
>> I ran a profiler through a typical scenario with our software and
>> discovered that 10% of the clocked time was spent in the initialization
>> of the xsd:dateTime literal needed by the afn:now() and NOW function.
>> Below is a snapshot from Yourkit profiler.
>>
>>
>>
>> Unless this is a measurement error in the tool, I believe there would be
>> a low-hanging fruit to optimize performance if the value would be
>> computed on demand only. I do understand that you need to get the time
>> stamp for the duration of the whole query, but I believe it would be
>> much faster to simply get the current time millis as a long, and from
>> this derive the actual XSD literal only if someone really calls this
>> SPARQL method.
>>
>> Thanks
>> Holger
>>
>
> Holger,
>
> This isn't clear to me - I can not see the code context for the call to NodeFactory.nodeAsDateTime.
>
> what is the SPARQL query being executed?
Any QueryExecution. The relevant call tree starts in the class Context:
context.set(ARQConstants.sysCurrentTime, NodeFactory.nowAsDateTime()) ;
My suggestion is to use a different constant such as sysCurrentTimeMillis for a simple long and compute the sysCurrentTime on demand only, on the first call of the now function.
HTH
Holger
Re: Now function a performance bottleneck
Posted by Andy Seaborne <an...@apache.org>.
On 16/01/13 01:02, Holger Knublauch wrote:
> Hi Andy,
>
> I ran a profiler through a typical scenario with our software and
> discovered that 10% of the clocked time was spent in the initialization
> of the xsd:dateTime literal needed by the afn:now() and NOW function.
> Below is a snapshot from Yourkit profiler.
>
>
>
> Unless this is a measurement error in the tool, I believe there would be
> a low-hanging fruit to optimize performance if the value would be
> computed on demand only. I do understand that you need to get the time
> stamp for the duration of the whole query, but I believe it would be
> much faster to simply get the current time millis as a long, and from
> this derive the actual XSD literal only if someone really calls this
> SPARQL method.
>
> Thanks
> Holger
>
Holger,
This isn't clear to me - I can not see the code context for the call to
NodeFactory.nodeAsDateTime.
what is the SPARQL query being executed?
Andy