You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Marc Strapetz <ma...@syntevo.com> on 2015/03/13 20:34:22 UTC

1.9.x JavaHL: long initial delay when performing a log

I'm experiencing a strange initial delay when performing a log using JavaHL.

svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x

shows first results after 2-3 seconds, while following code snippet 
takes at least 20 seconds (sometimes significantly more, might depend on 
the server's load):

ISVNRemote session =
   factory.openRemoteSession("http://svn.apache.org/repos/asf");

List<String> paths =
   Collections.singletonList("subversion/branches/1.8.x");

session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
                false, false, null, new LogMessageCallback() {
   public void singleMessage(Set<ChangePath> changedPaths,
     long revision, Map<String, byte[]> revprops, boolean hasChildren) {
     System.out.println("DATA");
   }
});

Once the log responds, a bunch of revisions are reported, so it seems 
that there is some kind of caching of log records.

I've tested with latest 1.9.x sources on Windows but have seen the same 
behavior with javahl-1.8-extensions branch on Linux, too.

-Marc

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Mark Phippard <ma...@gmail.com>.
> On Mar 15, 2015, at 5:10 PM, Branko Čibej <br...@wandisco.com> wrote:
> 
>> On 15.03.2015 19:30, Bert Huijben wrote:
>> 
>>> -----Original Message-----
>>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>>> Sent: zondag 15 maart 2015 12:02
>>> To: dev@subversion.apache.org
>>> Subject: Re: 1.9.x JavaHL: long initial delay when performing a log
>>> 
>>>> Same here on OSX. However, I can't any place in the code that would
>>>> cause the delay. I added similar time-printing code to the C++ part of
>>>> JavaHL and got extremely strange results:
>>>> 
>>>> TestStatus (Java): 2015-03-13 22:21:40.403
>>>> svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
>>>> callback: 2015-03-13T21:21:50.098592Z
>>>> invoke: 2015-03-13T21:21:50.098671Z
>>>> TestStatus (Java): 2015-03-13 22:21:50.098 1666354
>>>> return: 2015-03-13T21:21:50.099058Z
>>> I can confirm this delay in native code on Windows. I've tried to dig
>>> deeper into svn_ra_get_log2, however I'm lost at
>>> session->vtable->get_log ... is there some kind of "core loop" which
>>> processes incoming HTTP data, so we could place debug output there?
>>> 
>>>> Now I'm really beginning to wonder what the native JavaHL implementation
>>>> is doing differently from libsvn_client.
>>> Just a vague idea: could there be some kind of input caching in
>>> low-level HTTP libraries before information is sent to Subversion and
>>> converted to log entries? Maybe JavaHL would initialize this caching
>>> differently than command line or not at all?
>> Are you using the same libraries in both cases?
>> (I think TortoiseSVN still explicitly disables ipv6 support in apr)
>> 
>> One case where I sometimes get reports of slow operations is when the used hostname resolves to both an IPv4 and an IPv6 address. (Localhost is such a name on modern windows versions). Using 127.0.0.1 in your local testcode (or making sure the server process and network also work properly on ipv6) might help here. Declaring a specific host-/dns-name for Subversion, resolving to only the addresses really listening is usually the best approach... like it would be for a normal website.
> 
> I reproduced this with trunk, built locally, with JavaHL using exactly
> the same RA libs as the comand line client.
> 

Does JavaHL reverse the start/end revisions when calling the API? Sending a range of 0:HEAD to the server could explain the difference.  That said, I'd assume you would have spotted that by now. I cannot imagine what else it could be, but I have also not tried this new ISVNRemote API.

Mark

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Branko Čibej <br...@wandisco.com>.
On 15.03.2015 19:30, Bert Huijben wrote:
>
>> -----Original Message-----
>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>> Sent: zondag 15 maart 2015 12:02
>> To: dev@subversion.apache.org
>> Subject: Re: 1.9.x JavaHL: long initial delay when performing a log
>>
>>> Same here on OSX. However, I can't any place in the code that would
>>> cause the delay. I added similar time-printing code to the C++ part of
>>> JavaHL and got extremely strange results:
>>>
>>> TestStatus (Java): 2015-03-13 22:21:40.403
>>> svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
>>> callback: 2015-03-13T21:21:50.098592Z
>>> invoke: 2015-03-13T21:21:50.098671Z
>>> TestStatus (Java): 2015-03-13 22:21:50.098 1666354
>>> return: 2015-03-13T21:21:50.099058Z
>> I can confirm this delay in native code on Windows. I've tried to dig
>> deeper into svn_ra_get_log2, however I'm lost at
>> session->vtable->get_log ... is there some kind of "core loop" which
>> processes incoming HTTP data, so we could place debug output there?
>>
>>> Now I'm really beginning to wonder what the native JavaHL implementation
>>> is doing differently from libsvn_client.
>> Just a vague idea: could there be some kind of input caching in
>> low-level HTTP libraries before information is sent to Subversion and
>> converted to log entries? Maybe JavaHL would initialize this caching
>> differently than command line or not at all?
> Are you using the same libraries in both cases?
> (I think TortoiseSVN still explicitly disables ipv6 support in apr)
>
> One case where I sometimes get reports of slow operations is when the used hostname resolves to both an IPv4 and an IPv6 address. (Localhost is such a name on modern windows versions). Using 127.0.0.1 in your local testcode (or making sure the server process and network also work properly on ipv6) might help here. Declaring a specific host-/dns-name for Subversion, resolving to only the addresses really listening is usually the best approach... like it would be for a normal website.

I reproduced this with trunk, built locally, with JavaHL using exactly
the same RA libs as the comand line client.

-- Brane


RE: 1.9.x JavaHL: long initial delay when performing a log

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
> Sent: zondag 15 maart 2015 12:02
> To: dev@subversion.apache.org
> Subject: Re: 1.9.x JavaHL: long initial delay when performing a log
> 
> > Same here on OSX. However, I can't any place in the code that would
> > cause the delay. I added similar time-printing code to the C++ part of
> > JavaHL and got extremely strange results:
> >
> > TestStatus (Java): 2015-03-13 22:21:40.403
> > svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
> > callback: 2015-03-13T21:21:50.098592Z
> > invoke: 2015-03-13T21:21:50.098671Z
> > TestStatus (Java): 2015-03-13 22:21:50.098 1666354
> > return: 2015-03-13T21:21:50.099058Z
> 
> I can confirm this delay in native code on Windows. I've tried to dig
> deeper into svn_ra_get_log2, however I'm lost at
> session->vtable->get_log ... is there some kind of "core loop" which
> processes incoming HTTP data, so we could place debug output there?
> 
> > Now I'm really beginning to wonder what the native JavaHL implementation
> > is doing differently from libsvn_client.
> 
> Just a vague idea: could there be some kind of input caching in
> low-level HTTP libraries before information is sent to Subversion and
> converted to log entries? Maybe JavaHL would initialize this caching
> differently than command line or not at all?

Are you using the same libraries in both cases?
(I think TortoiseSVN still explicitly disables ipv6 support in apr)

One case where I sometimes get reports of slow operations is when the used hostname resolves to both an IPv4 and an IPv6 address. (Localhost is such a name on modern windows versions). Using 127.0.0.1 in your local testcode (or making sure the server process and network also work properly on ipv6) might help here. Declaring a specific host-/dns-name for Subversion, resolving to only the addresses really listening is usually the best approach... like it would be for a normal website.

	Bert
> 
> -Marc
> 
> 
> On 13.03.2015 22:28, Branko Čibej wrote:
> > [Since when are we top-posting? grr...]
> >
> > On 13.03.2015 21:17, Bert Huijben wrote:
> >> Are you requesting the results in the same order in both cases? (I
> >> don't know what the arguments in your code represent)
> >>
> >> If you retrieve oldest to youngest some delay is expected as then
> >> first all interesting revisions are fetched (youngest to oldest) and
> >> then results+detailed are spooled back the other way.
> >
> > It's a standard youngest-to-oldest log; parameters for getLogs are
> > (paths, start-revision, end-revision, limit, ...).
> >
> >> The normal svn invocation you compare to is the most efficient one...
> >>
> >> Bert
> >> ------------------------------------------------------------------------
> >> From: Marc Strapetz <ma...@syntevo.com>
> >> Sent: ‎13-‎3-‎2015 20:35
> >> To: dev@subversion.apache.org <ma...@subversion.apache.org>
> >> Subject: 1.9.x JavaHL: long initial delay when performing a log
> >>
> >> I'm experiencing a strange initial delay when performing a log using
> >> JavaHL.
> >>
> >> svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x
> >>
> >> shows first results after 2-3 seconds, while following code snippet
> >> takes at least 20 seconds (sometimes significantly more, might depend on
> >> the server's load):
> >>
> >> ISVNRemote session =
> >>     factory.openRemoteSession("http://svn.apache.org/repos/asf");
> >>
> >> List<String> paths =
> >>     Collections.singletonList("subversion/branches/1.8.x");
> >>
> >> session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
> >>                  false, false, null, new LogMessageCallback() {
> >>     public void singleMessage(Set<ChangePath> changedPaths,
> >>       long revision, Map<String, byte[]> revprops, boolean hasChildren) {
> >>       System.out.println("DATA");
> >>     }
> >> });
> >>
> >> Once the log responds, a bunch of revisions are reported, so it seems
> >> that there is some kind of caching of log records.
> >
> > I tried this with a slight change, setting the limit parameter of
> > getLog() to 1 instead of 0; here are the results:
> >
> > $ time svn log --limit 1
> http://svn.apache.org/repos/asf/subversion/branches/1.8.x
> > ...
> >
> > real	0m1.574s
> > user	0m0.007s
> > sys	0m0.006s
> >
> > $ time java -cp ... -Djava.library.path=... TestStatus
> > DATA
> >
> > real	0m1.430s
> > user	0m0.138s
> > sys	0m0.036s
> >
> >
> > So, no real difference here.
> >
> > Without the limit, it does take a bit more than 10 seconds to begin
> > displaying results. So I tested when the callback was actually invoked:
> > I added code to print the current time just before the call to getLog(),
> > and the current time and revision in the log receiver callback. The
> > output confirms that the delay is real and not, for example, an artefact
> > of some caching in stdout, for example:
> >
> > $ time java -cp ... -Djava.library.path=... TestStatus
> > 2015-03-13 21:59:46.731
> > 2015-03-13 21:59:57.223 1666354
> > 2015-03-13 21:59:57.223 1666269
> > ...
> > 2015-03-13 22:00:27.318 836421
> > 2015-03-13 22:00:27.318 836420
> >
> >
> >> I've tested with latest 1.9.x sources on Windows but have seen the same
> >> behavior with javahl-1.8-extensions branch on Linux, too.
> >
> > Same here on OSX. However, I can't any place in the code that would
> > cause the delay. I added similar time-printing code to the C++ part of
> > JavaHL and got extremely strange results:
> >
> > TestStatus (Java): 2015-03-13 22:21:40.403
> > svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
> > callback: 2015-03-13T21:21:50.098592Z
> > invoke: 2015-03-13T21:21:50.098671Z
> > TestStatus (Java): 2015-03-13 22:21:50.098 1666354
> > return: 2015-03-13T21:21:50.099058Z
> >
> > (note that there's an hour of difference between local time printed by
> > Java and UTC printed from the native code).
> >
> > This confirms that there is an actual delay of 10 seconds in the
> > *native* code between the call to svn_ra_get_log2() and the first
> > invocation of the (native) callback wrapper; each callback invocation
> > takes about half a millisecond.
> >
> > Now I'm really beginning to wonder what the native JavaHL implementation
> > is doing differently from libsvn_client.
> >
> > -- Brane
> >


Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Marc Strapetz <ma...@syntevo.com>.
> Same here on OSX. However, I can't any place in the code that would
> cause the delay. I added similar time-printing code to the C++ part of
> JavaHL and got extremely strange results:
>
> TestStatus (Java): 2015-03-13 22:21:40.403
> svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
> callback: 2015-03-13T21:21:50.098592Z
> invoke: 2015-03-13T21:21:50.098671Z
> TestStatus (Java): 2015-03-13 22:21:50.098 1666354
> return: 2015-03-13T21:21:50.099058Z

I can confirm this delay in native code on Windows. I've tried to dig 
deeper into svn_ra_get_log2, however I'm lost at 
session->vtable->get_log ... is there some kind of "core loop" which 
processes incoming HTTP data, so we could place debug output there?

> Now I'm really beginning to wonder what the native JavaHL implementation
> is doing differently from libsvn_client.

Just a vague idea: could there be some kind of input caching in 
low-level HTTP libraries before information is sent to Subversion and 
converted to log entries? Maybe JavaHL would initialize this caching 
differently than command line or not at all?

-Marc


On 13.03.2015 22:28, Branko Čibej wrote:
> [Since when are we top-posting? grr...]
>
> On 13.03.2015 21:17, Bert Huijben wrote:
>> Are you requesting the results in the same order in both cases? (I
>> don't know what the arguments in your code represent)
>>
>> If you retrieve oldest to youngest some delay is expected as then
>> first all interesting revisions are fetched (youngest to oldest) and
>> then results+detailed are spooled back the other way.
>
> It's a standard youngest-to-oldest log; parameters for getLogs are
> (paths, start-revision, end-revision, limit, ...).
>
>> The normal svn invocation you compare to is the most efficient one...
>>
>> Bert
>> ------------------------------------------------------------------------
>> From: Marc Strapetz <ma...@syntevo.com>
>> Sent: ‎13-‎3-‎2015 20:35
>> To: dev@subversion.apache.org <ma...@subversion.apache.org>
>> Subject: 1.9.x JavaHL: long initial delay when performing a log
>>
>> I'm experiencing a strange initial delay when performing a log using
>> JavaHL.
>>
>> svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x
>>
>> shows first results after 2-3 seconds, while following code snippet
>> takes at least 20 seconds (sometimes significantly more, might depend on
>> the server's load):
>>
>> ISVNRemote session =
>>     factory.openRemoteSession("http://svn.apache.org/repos/asf");
>>
>> List<String> paths =
>>     Collections.singletonList("subversion/branches/1.8.x");
>>
>> session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
>>                  false, false, null, new LogMessageCallback() {
>>     public void singleMessage(Set<ChangePath> changedPaths,
>>       long revision, Map<String, byte[]> revprops, boolean hasChildren) {
>>       System.out.println("DATA");
>>     }
>> });
>>
>> Once the log responds, a bunch of revisions are reported, so it seems
>> that there is some kind of caching of log records.
>
> I tried this with a slight change, setting the limit parameter of
> getLog() to 1 instead of 0; here are the results:
>
> $ time svn log --limit 1 http://svn.apache.org/repos/asf/subversion/branches/1.8.x
> ...
>
> real	0m1.574s
> user	0m0.007s
> sys	0m0.006s
>
> $ time java -cp ... -Djava.library.path=... TestStatus
> DATA
>
> real	0m1.430s
> user	0m0.138s
> sys	0m0.036s
>
>
> So, no real difference here.
>
> Without the limit, it does take a bit more than 10 seconds to begin
> displaying results. So I tested when the callback was actually invoked:
> I added code to print the current time just before the call to getLog(),
> and the current time and revision in the log receiver callback. The
> output confirms that the delay is real and not, for example, an artefact
> of some caching in stdout, for example:
>
> $ time java -cp ... -Djava.library.path=... TestStatus
> 2015-03-13 21:59:46.731
> 2015-03-13 21:59:57.223 1666354
> 2015-03-13 21:59:57.223 1666269
> ...
> 2015-03-13 22:00:27.318 836421
> 2015-03-13 22:00:27.318 836420
>
>
>> I've tested with latest 1.9.x sources on Windows but have seen the same
>> behavior with javahl-1.8-extensions branch on Linux, too.
>
> Same here on OSX. However, I can't any place in the code that would
> cause the delay. I added similar time-printing code to the C++ part of
> JavaHL and got extremely strange results:
>
> TestStatus (Java): 2015-03-13 22:21:40.403
> svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
> callback: 2015-03-13T21:21:50.098592Z
> invoke: 2015-03-13T21:21:50.098671Z
> TestStatus (Java): 2015-03-13 22:21:50.098 1666354
> return: 2015-03-13T21:21:50.099058Z
>
> (note that there's an hour of difference between local time printed by
> Java and UTC printed from the native code).
>
> This confirms that there is an actual delay of 10 seconds in the
> *native* code between the call to svn_ra_get_log2() and the first
> invocation of the (native) callback wrapper; each callback invocation
> takes about half a millisecond.
>
> Now I'm really beginning to wonder what the native JavaHL implementation
> is doing differently from libsvn_client.
>
> -- Brane
>

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Branko Čibej <br...@wandisco.com>.
[Since when are we top-posting? grr...]

On 13.03.2015 21:17, Bert Huijben wrote:
> Are you requesting the results in the same order in both cases? (I
> don't know what the arguments in your code represent)
>
> If you retrieve oldest to youngest some delay is expected as then
> first all interesting revisions are fetched (youngest to oldest) and
> then results+detailed are spooled back the other way.

It's a standard youngest-to-oldest log; parameters for getLogs are
(paths, start-revision, end-revision, limit, ...).

> The normal svn invocation you compare to is the most efficient one...
>
> Bert
> ------------------------------------------------------------------------
> From: Marc Strapetz <ma...@syntevo.com>
> Sent: ‎13-‎3-‎2015 20:35
> To: dev@subversion.apache.org <ma...@subversion.apache.org>
> Subject: 1.9.x JavaHL: long initial delay when performing a log
>
> I'm experiencing a strange initial delay when performing a log using
> JavaHL.
>
> svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x
>
> shows first results after 2-3 seconds, while following code snippet
> takes at least 20 seconds (sometimes significantly more, might depend on
> the server's load):
>
> ISVNRemote session =
>    factory.openRemoteSession("http://svn.apache.org/repos/asf");
>
> List<String> paths =
>    Collections.singletonList("subversion/branches/1.8.x");
>
> session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
>                 false, false, null, new LogMessageCallback() {
>    public void singleMessage(Set<ChangePath> changedPaths,
>      long revision, Map<String, byte[]> revprops, boolean hasChildren) {
>      System.out.println("DATA");
>    }
> });
>
> Once the log responds, a bunch of revisions are reported, so it seems
> that there is some kind of caching of log records.

I tried this with a slight change, setting the limit parameter of
getLog() to 1 instead of 0; here are the results:

$ time svn log --limit 1 http://svn.apache.org/repos/asf/subversion/branches/1.8.x
...

real	0m1.574s
user	0m0.007s
sys	0m0.006s

$ time java -cp ... -Djava.library.path=... TestStatus
DATA

real	0m1.430s
user	0m0.138s
sys	0m0.036s


So, no real difference here.

Without the limit, it does take a bit more than 10 seconds to begin
displaying results. So I tested when the callback was actually invoked:
I added code to print the current time just before the call to getLog(),
and the current time and revision in the log receiver callback. The
output confirms that the delay is real and not, for example, an artefact
of some caching in stdout, for example:

$ time java -cp ... -Djava.library.path=... TestStatus
2015-03-13 21:59:46.731
2015-03-13 21:59:57.223 1666354
2015-03-13 21:59:57.223 1666269
...
2015-03-13 22:00:27.318 836421
2015-03-13 22:00:27.318 836420


> I've tested with latest 1.9.x sources on Windows but have seen the same
> behavior with javahl-1.8-extensions branch on Linux, too.

Same here on OSX. However, I can't any place in the code that would
cause the delay. I added similar time-printing code to the C++ part of
JavaHL and got extremely strange results:

TestStatus (Java): 2015-03-13 22:21:40.403
svn_ra_get_log2: 2015-03-13T21:21:40.404731Z
callback: 2015-03-13T21:21:50.098592Z
invoke: 2015-03-13T21:21:50.098671Z
TestStatus (Java): 2015-03-13 22:21:50.098 1666354
return: 2015-03-13T21:21:50.099058Z

(note that there's an hour of difference between local time printed by
Java and UTC printed from the native code).

This confirms that there is an actual delay of 10 seconds in the
*native* code between the call to svn_ra_get_log2() and the first
invocation of the (native) callback wrapper; each callback invocation
takes about half a millisecond.

Now I'm really beginning to wonder what the native JavaHL implementation
is doing differently from libsvn_client.

-- Brane

RE: 1.9.x JavaHL: long initial delay when performing a log

Posted by Bert Huijben <be...@qqmail.nl>.
Are you requesting the results in the same order in both cases? (I don't know what the arguments in your code represent)

If you retrieve oldest to youngest some delay is expected as then first all interesting revisions are fetched (youngest to oldest) and then results+detailed are spooled back the other way.

The normal svn invocation you compare to is the most efficient one... 

Bert

-----Original Message-----
From: "Marc Strapetz" <ma...@syntevo.com>
Sent: ‎13-‎3-‎2015 20:35
To: "dev@subversion.apache.org" <de...@subversion.apache.org>
Subject: 1.9.x JavaHL: long initial delay when performing a log

I'm experiencing a strange initial delay when performing a log using JavaHL.

svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x

shows first results after 2-3 seconds, while following code snippet 
takes at least 20 seconds (sometimes significantly more, might depend on 
the server's load):

ISVNRemote session =
   factory.openRemoteSession("http://svn.apache.org/repos/asf");

List<String> paths =
   Collections.singletonList("subversion/branches/1.8.x");

session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
                false, false, null, new LogMessageCallback() {
   public void singleMessage(Set<ChangePath> changedPaths,
     long revision, Map<String, byte[]> revprops, boolean hasChildren) {
     System.out.println("DATA");
   }
});

Once the log responds, a bunch of revisions are reported, so it seems 
that there is some kind of caching of log records.

I've tested with latest 1.9.x sources on Windows but have seen the same 
behavior with javahl-1.8-extensions branch on Linux, too.

-Marc

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> To me this algorithm seems better.

Oops.

My argument for wanting something 'better' than the current trunk
implementation (which flushes after 4, 16, 64, 256 log entries) has
been blown out of the water. My argument depended on an assumption
that the rate of discovery of log entries could be very bursty: for
example, the first 2 entries are both recent and are discovered
quickly, then a long time before the server discovers the 3rd entry
because it is a million revs older. But Stefan has just told me that
is not the case: each log entry to be reported takes a broadly similar
time, related to the depth of the path and the amount of authz
checking, no matter how many million revisions away.

Given that news, there is no reason to want anything different from
the current implementation. Sorry for the noise.

- Julian

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> "Bert Huijben" <be...@qqmail.nl> writes:
>> BTW 500 msec is about two or three the time you expect for google page of
>> results.... Why this arbritrary number?
>
> How do you explain the 2 and 2048 in the current code?  They are all
> just arbitrary numbers.  Make it 100ms instead of 500ms.

Exactly. It's just a time that users will perceive as fairly short.
Pick any arbitrary number you like.

[...]
> I'm not claiming this is the best algorithm, I just want to know if it
> is the sort of thing we want.

To me this algorithm seems better.

- Julian

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Philip Martin <ph...@wandisco.com>.
"Bert Huijben" <be...@qqmail.nl> writes:

> Nice idea on the concept, but this patch doesn't implement that behavior
>
> You will see the first result very fast. (No behavior change)

I don't understand, I see a behaviour change when I try it.

The first few writes with my patch:

[pid  8814] writev(17, [{"HTTP/1.1 200 OK\r\nDate: Mon, 16 M"..., 189}, {"1f53\r\n", 6}, {"<?xml version=\"1.0\" encoding=\"ut"..., 7965}, {"<S:log-item>\n<D:version-name>166"..., 54}, {"\r\n", 2}], 5) = 8216
[pid  8814] writev(17, [{"1f55\r\n", 6}, {"<D:creator-displayname>rhuijben<"..., 7976}, {"<S:date>2015-02-14T00:17:25.9215"..., 45}, {"\r\n", 2}], 4) = 8029
[pid  8814] writev(17, [{"1f6d\r\n", 6}, {"</S:log-item>\n<S:log-item>\n<D:ve"..., 8000}, {"<S:date>2015-01-15T04:00:33.4982"..., 45}, {"\r\n", 2}], 4) = 8053
[pid  8814] writev(17, [{"1f4e\r\n", 6}, {"</S:log-item>\n<S:log-item>\n<D:ve"..., 7958}, {"<D:creator-displayname>rhuijben<"..., 56}, {"\r\n", 2}], 4) = 8022
[pid  8814] writev(17, [{"1f47\r\n", 6}, {"<S:date>2014-11-25T12:20:06.2266"..., 7953}, {"<S:log-item>\n<D:version-name>162"..., 54}, {"\r\n", 2}], 4) = 8015
[pid  8814] writev(17, [{"ba0\r\n", 5}, {"<D:creator-displayname>stsp</D:c"..., 2976}, {"\r\n", 2}], 3) = 2983
[pid  8814] writev(17, [{"1f52\r\n", 6}, {"<S:log-item>\n<D:version-name>161"..., 7973}, {"<S:date>2014-06-30T13:46:18.9128"..., 45}, {"\r\n", 2}], 4) = 8026

The current code without my patch:

[pid  9113] writev(17, [{"HTTP/1.1 200 OK\r\nDate: Mon, 16 M"..., 189}, {"fd\r\n", 4}, {"<?xml version=\"1.0\" encoding=\"ut"..., 253}, {"\r\n", 2}], 4) = 448
[pid  9113] writev(17, [{"a9\r\n", 4}, {"<S:log-item>\n<D:version-name>166"..., 169}, {"\r\n", 2}], 3) = 175
[pid  9113] writev(17, [{"14e\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 334}, {"\r\n", 2}], 3) = 341
[pid  9113] writev(17, [{"294\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 660}, {"\r\n", 2}], 3) = 667
[pid  9113] writev(17, [{"53a\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 1338}, {"\r\n", 2}], 3) = 1345
[pid  9113] writev(17, [{"a7c\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 2684}, {"\r\n", 2}], 3) = 2691
[pid  9113] writev(17, [{"1511\r\n", 6}, {"<S:log-item>\n<D:version-name>166"..., 5393}, {"\r\n", 2}], 3) = 5401
[pid  9113] writev(17, [{"1f46\r\n", 6}, {"<S:log-item>\n<D:version-name>166"..., 7951}, {"<D:creator-displayname>stefan2</"..., 55}, {"\r\n", 2}], 4) = 8014
[pid  9113] writev(17, [{"aba\r\n", 5}, {"<S:date>2015-02-13T10:00:18.5170"..., 2746}, {"\r\n", 2}], 3) = 2753
[pid  9113] writev(17, [{"1f61\r\n", 6}, {"<S:log-item>\n<D:version-name>165"..., 7988}, {"<S:date>2014-12-06T02:05:01.7811"..., 45}, {"\r\n", 2}], 4) = 8041
[pid  9113] writev(17, [{"1f60\r\n", 6}, {"</S:log-item>\n<S:log-item>\n<D:ve"..., 7976}, {"<D:creator-displayname>rhuijben<"..., 56}, {"\r\n", 2}], 4) = 8040
[pid  9113] writev(17, [{"153a\r\n", 6}, {"<S:date>2014-10-20T12:44:19.4131"..., 5434}, {"\r\n", 2}], 3) = 5442


> BTW 500 msec is about two or three the time you expect for google page of
> results.... Why this arbritrary number?

How do you explain the 2 and 2048 in the current code?  They are all
just arbitrary numbers.  Make it 100ms instead of 500ms.  The trace
above is what I got for a debug build and 200ms.  Several 8K writes in
the first 200ms followed by a short 3K write when the first extra flush
triggers after 200ms.  That's a very fast server that doesn't need the
extra flushes.

If I introduce a delay into the log loop, 50ms between revisions, I see
more of the extra flushes:

[pid  9751] writev(17, [{"HTTP/1.1 200 OK\r\nDate: Mon, 16 M"..., 189}, {"588\r\n", 5}, {"<?xml version=\"1.0\" encoding=\"ut"..., 1416}, {"\r\n", 2}], 4) = 1612
[pid  9751] writev(17, [{"53a\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 1338}, {"\r\n", 2}], 3) = 1345
[pid  9751] writev(17, [{"a7c\r\n", 5}, {"<S:log-item>\n<D:version-name>166"..., 2684}, {"\r\n", 2}], 3) = 2691
[pid  9751] writev(17, [{"1511\r\n", 6}, {"<S:log-item>\n<D:version-name>166"..., 5393}, {"\r\n", 2}], 3) = 5401
[pid  9751] writev(17, [{"1f46\r\n", 6}, {"<S:log-item>\n<D:version-name>166"..., 7951}, {"<D:creator-displayname>stefan2</"..., 55}, {"\r\n", 2}], 4) = 8014
[pid  9751] writev(17, [{"aba\r\n", 5}, {"<S:date>2015-02-13T10:00:18.5170"..., 2746}, {"\r\n", 2}], 3) = 2753
[pid  9751] writev(17, [{"1f61\r\n", 6}, {"<S:log-item>\n<D:version-name>165"..., 7988}, {"<S:date>2014-12-06T02:05:01.7811"..., 45}, {"\r\n", 2}], 4) = 8041
[pid  9751] writev(17, [{"1f60\r\n", 6}, {"</S:log-item>\n<S:log-item>\n<D:ve"..., 7976}, {"<D:creator-displayname>rhuijben<"..., 56}, {"\r\n", 2}], 4) = 8040
[pid  9751] writev(17, [{"153a\r\n", 6}, {"<S:date>2014-10-20T12:44:19.4131"..., 5434}, {"\r\n", 2}], 3) = 5442
[pid  9751] writev(17, [{"1f52\r\n", 6}, {"<S:log-item>\n<D:version-name>161"..., 7973}, {"<S:date>2014-06-30T13:46:18.9128"..., 45}, {"\r\n", 2}], 4) = 8026

The first flush happens after 8 revisions because the first 4 revisions
all get written within the first 200ms.  The 8th revision is the first
flush after 200ms.

I'm not claiming this is the best algorithm, I just want to know if it
is the sort of thing we want.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

RE: 1.9.x JavaHL: long initial delay when performing a log

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: maandag 16 maart 2015 18:10
> To: Julian Foad
> Cc: Bert Huijben; Marc Strapetz; dev
> Subject: Re: 1.9.x JavaHL: long initial delay when performing a log
> 
> Julian Foad <ju...@btopenworld.com> writes:
> 
> > Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
> > approximation to the desired semantics which is more like "don't delay
> > the first result more than a small fraction of a second, and don't
> > delay the next few results much more than that either". In other
> > words, the user's requirement is more about time delays. I wonder if
> > we could implement something closer to what the user really wants.
> 
> apr_time_now() is not free so we don't want to call it on every
> revision, but we could call it before every extra flush.  How about no
> more than one extra flush every 500ms:

Nice idea on the concept, but this patch doesn't implement that behavior

You will see the first result very fast. (No behavior change)


But then you use one algorithm to select when to check for a flush, and
another to decide whether to flush.


This might have the effect of showing a single result, and then thousands
more after all buffering layers are filled, because the code doesn't just
flush after 500 msec... But after X revisions (one algorithm) *and* 500
msec.
 
BTW 500 msec is about two or three the time you expect for google page of
results.... Why this arbritrary number?
 
How are you going to explain this behavior to users?


The current <= 1.8 behavior is that "Subversion is slow"... and it makes
them move to a DVCS (or svnserver), as accessing history via mod_dav is
(too) expensive... while this is just buffering.


After a patch like this, the users don't understand why Subversion is slow
after just the very first result.

The old patch had a back-off behavior of showing the first results fast, and
then slowing down to the old behavior of returning all results as afficient
as possible.


I don't see any of this in this behavior... Just a change that will most
likely be reverted after experiencing it in practice.


This 1 result and then nothing for some time is not user friendly towards
svn users, which already start reading... and then eventually everything
scrolls away.


It is also not how the streamy callbacks were intended. (But that is not
unlike how we currently get the data here)... 

We designed things to arrive as soon as the server is done processing, and
the same level of buffering (8 Kbyte) just simply doesn't apply to all
operations. It works great for updates and operations that deliver much data
fast, or when there is not much data... but not for operations that deliver
data slowly.


	Bert


Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Philip Martin writes:
>> Julian Foad writes:
>>> Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
>>> approximation to the desired semantics which is more like "don't delay
>>> the first result more than a small fraction of a second, and don't
>>> delay the next few results much more than that either". In other
>>> words, the user's requirement is more about time delays. I wonder if
>>> we could implement something closer to what the user really wants.
>>
>> apr_time_now() is not free so we don't want to call it on every
>> revision, but we could call it before every extra flush.  How about no
>> more than one extra flush every 500ms:

+1

> Occassionally the system time will jump because somebody sets the system
> clock.  If the system clock were to jump forwards there might be a flush
> that would have been avoided without the jump.  If the system clock were
> to jump backwards there may be no extra flushes.

That's good enough fallback behaviour IMO.

- Julian

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Julian Foad <ju...@btopenworld.com> writes:
>
>> Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
>> approximation to the desired semantics which is more like "don't delay
>> the first result more than a small fraction of a second, and don't
>> delay the next few results much more than that either". In other
>> words, the user's requirement is more about time delays. I wonder if
>> we could implement something closer to what the user really wants.
>
> apr_time_now() is not free so we don't want to call it on every
> revision, but we could call it before every extra flush.  How about no
> more than one extra flush every 500ms:

Occassionally the system time will jump because somebody sets the system
clock.  If the system clock were to jump forwards there might be a flush
that would have been avoided without the jump.  If the system clock were
to jump backwards there may be no extra flushes.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@btopenworld.com> writes:

> Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
> approximation to the desired semantics which is more like "don't delay
> the first result more than a small fraction of a second, and don't
> delay the next few results much more than that either". In other
> words, the user's requirement is more about time delays. I wonder if
> we could implement something closer to what the user really wants.

apr_time_now() is not free so we don't want to call it on every
revision, but we could call it before every extra flush.  How about no
more than one extra flush every 500ms:

Index: subversion/mod_dav_svn/reports/log.c
===================================================================
--- subversion/mod_dav_svn/reports/log.c	(revision 1667018)
+++ subversion/mod_dav_svn/reports/log.c	(working copy)
@@ -69,6 +69,7 @@ struct log_receiver_baton
   /* Helper variables to force early bucket brigade flushes */
   int result_count;
   int next_forced_flush;
+  apr_time_t event_time;
 };
 
 
@@ -291,22 +292,30 @@ log_receiver(void *baton,
      and busy servers. This algorithm makes sure the client gets the first
      two results very fast (but less efficient), while it gradually removes
      its performance hit and falls back to the APR standard buffer handling,
-     which streamlines the http processing */
+     which streamlines the http processing. Don't flush more than once per
+     500ms (an arbitrary 'short' time) or within the first 500ms. */
+#define FLUSH_USEC 500000
   lrb->result_count++;
   if (lrb->result_count == lrb->next_forced_flush)
     {
-      apr_off_t len = 0;
-      (void)apr_brigade_length(lrb->bb, FALSE, &len);
+      apr_time_t now = apr_time_now();
 
-      if (len != 0)
+      if (now > lrb->event_time + FLUSH_USEC)
         {
-          apr_status_t apr_err = ap_fflush(lrb->output, lrb->bb);
-          if (apr_err)
-            return svn_error_create(apr_err, NULL, NULL);
+          apr_off_t len = 0;
+          (void)apr_brigade_length(lrb->bb, FALSE, &len);
 
-          if (lrb->output->c->aborted)
-            return svn_error_create(SVN_ERR_APMOD_CONNECTION_ABORTED,
-                                    NULL, NULL);
+          if (len != 0)
+            {
+              apr_status_t apr_err = ap_fflush(lrb->output, lrb->bb);
+              if (apr_err)
+                return svn_error_create(apr_err, NULL, NULL);
+
+              if (lrb->output->c->aborted)
+                return svn_error_create(SVN_ERR_APMOD_CONNECTION_ABORTED,
+                                        NULL, NULL);
+            }
+          lrb->event_time = now;
         }
 
       if (lrb->result_count < 2048)
@@ -460,6 +469,7 @@ dav_svn__log_report(const dav_resource *resource,
 
   lrb.result_count = 0;
   lrb.next_forced_flush = 1;
+  lrb.event_time = apr_time_now();
 
   /* Our svn_log_entry_receiver_t sends the <S:log-report> header in
      a lazy fashion.  Before writing the first log message, it assures

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Julian Foad <ju...@btopenworld.com>.
I voiced some concerns on IRC (
http://colabti.org/irclogger/irclogger_log/svn-dev?date=2015-03-16#l80
) about the patch http://svn.apache.org/r1666965 nominated for
backport to 1.9.x. More details below.

Bert Huijben wrote:
> Julian Foad wrote:
>> Marc Strapetz wrote:
>>> On 16.03.2015 01:50, Bert Huijben wrote:
>>>> Our server reports use an apr feature that buffers +- 8 KByte data before
>>>> sending out the first data.
>>>>
>>>> In this specific JavaHL case you ask for just the revision numbers. [...]
>>>> I think every revision would (encoded in our Xml protocol) cost about 70
>>>> bytes, so there would fit at least 100 revisions in that buffer.
>>
>> On my testing against the ASF EU mirror, the buffer size seems to be
>> 48 KB, the response is compressed, and around 3000 log entries (with
>> -q) or 10000 log entries (with --xml --with-no-revprops) are buffered
>> at a time.

The scenario tested above has a huge latency (10 or 20 seconds in my
case) and it looks like this patch would bring it down hugely (to ~0.5
seconds in my case). That's wonderful.

However, it hasn't been sufficiently analyzed and tested yet,
especially not for backport to 1.9, so I have voted against the 1.9
backport proposal for the time being.

>> I used this tracing command:
>>     strace -tt -e trace=read svn log -q
>>         http://svn.eu.apache.org/repos/asf/subversion/trunk --limit=10000
>>         2>&1 > /dev/null | grep -v "EAGAIN"
>
> The first level of buffering (The 8 KByte I quoted, or more precisely
> APR_BUCKET_BUFF_SIZE) is before the compression output filter even
> receives the first byte.
>
> It is quite possible further layers have their own buffering limits,

Yes, and the biggest buffer (or the total size of all buffers) is what
determines the overall latency till the user sees the first result
with no forced flushes.

> but if they follow the filter rules the flush should still get through. The apr brigades add a special flush frame to trigger this behavior, and the documentation explicitly says that filters should take care to handle this properly.

I agree, it should work. Just needs to be tested.

The patch flushes the buffer after the first (1, 2, 4, 8, ..., 2048)
log entries. That's fine for a fast network and a compute-bound
server. The testing above was against a 1.8 server. We don't yet have
any test data against a 1.9 server. The 1.9 server is going to be, I
understand, *much* faster at computing logs. When the computation is
fast so that thousands of logs are available within a fraction of a
second, then we get a burst of extra flushes/packets at the start of
every log but the user experience is no better. Might the additional
cost of flushes/packets be significant? Probably not, we imagine, but
I have no concrete evidence of this.

Flushing after the 1st, 2nd, 4th, 8th, ... log entry provides a crude
approximation to the desired semantics which is more like "don't delay
the first result more than a small fraction of a second, and don't
delay the next few results much more than that either". In other
words, the user's requirement is more about time delays. I wonder if
we could implement something closer to what the user really wants.

- Julian

RE: 1.9.x JavaHL: long initial delay when performing a log

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com]
> Sent: maandag 16 maart 2015 14:32
> To: Marc Strapetz
> Cc: Bert Huijben; dev
> Subject: Re: 1.9.x JavaHL: long initial delay when performing a log
> 
> On 16 March 2015 at 11:00, Marc Strapetz <ma...@syntevo.com>
> wrote:
> > On 16.03.2015 01:50, Bert Huijben wrote:
> >> Our server reports use an apr feature that buffers +- 8 KByte data before
> >> sending out the first data.
> >>
> >> In this specific JavaHL case you ask for just the revision numbers.
> >> (Unlike the C api, JavaHL's session.getLog() appears to handle a null list
> >> of revision properties as no revision properties. Not the standard set!)
> >>
> >> I think every revision would (encoded in our Xml protocol) cost about 70
> >> bytes, so there would fit at least 100 revisions in that buffer.
> 
> On my testing against the ASF EU mirror, the buffer size seems to be
> 48 KB, the response is compressed, and around 3000 log entries (with
> -q) or 10000 log entries (with --xml --with-no-revprops) are buffered
> at a time.
> 
> I used this tracing command:
>     strace -tt -e trace=read svn log -q
> http://svn.eu.apache.org/repos/asf/subversion/trunk --limit=10000 2>&1
> > /dev/null | grep -v "EAGAIN"

The first level of buffering (The 8 KByte I quoted, or more precisely APR_BUCKET_BUFF_SIZE) is before the compression output filter even receives the first byte.

It is quite possible further layers have their own buffering limits, but if they follow the filter rules the flush should still get through. The apr brigades add a special flush frame to trigger this behavior, and the documentation explicitly says that filters should take care to handle this properly.

I would guess that 'log -q' has a very good compression ratio. (90% same xml elements; a few author names and then revision numbers)

	Bert


Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Julian Foad <ju...@btopenworld.com>.
On 16 March 2015 at 11:00, Marc Strapetz <ma...@syntevo.com> wrote:
> On 16.03.2015 01:50, Bert Huijben wrote:
>> Our server reports use an apr feature that buffers +- 8 KByte data before
>> sending out the first data.
>>
>> In this specific JavaHL case you ask for just the revision numbers.
>> (Unlike the C api, JavaHL's session.getLog() appears to handle a null list
>> of revision properties as no revision properties. Not the standard set!)
>>
>> I think every revision would (encoded in our Xml protocol) cost about 70
>> bytes, so there would fit at least 100 revisions in that buffer.

On my testing against the ASF EU mirror, the buffer size seems to be
48 KB, the response is compressed, and around 3000 log entries (with
-q) or 10000 log entries (with --xml --with-no-revprops) are buffered
at a time.

I used this tracing command:
    strace -tt -e trace=read svn log -q
http://svn.eu.apache.org/repos/asf/subversion/trunk --limit=10000 2>&1
> /dev/null | grep -v "EAGAIN"

- Julian



> That's it! I now ran my previous example as it was (3 times) and it took
> average 19s until the first revision was reported.
>
> With discoverPath set to true, time until first response dropped to average
> 7s.
>
> With discoverPath set back to false, but revisionProperties set to
> "svn:log", time until first response is 1.5s - 2s -- so timings like the
> command line client!
>
> Regarding your patch, it could make sense to stop doubling next_forced_flush
> at a certain limit (say 128), if the sent log records are small, i.e. if
> discoverPath=false and revisionProperties = null .

Re: 1.9.x JavaHL: long initial delay when performing a log

Posted by Marc Strapetz <ma...@syntevo.com>.
On 16.03.2015 01:50, Bert Huijben wrote:
>> -----Original Message-----
>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>> Once the log responds, a bunch of revisions are reported, so it seems
>> that there is some kind of caching of log records.
>>
>> I've tested with latest 1.9.x sources on Windows but have seen the same
>> behavior with javahl-1.8-extensions branch on Linux, too.
>
> I can only find a server side buffering... But this might explain what you see.
>
>
> Our server reports use an apr feature that buffers +- 8 KByte data before sending out the first data.
>
>
> In this specific JavaHL case you ask for just the revision numbers. (Unlike the C api, JavaHL's session.getLog() appears to handle a null list of revision properties as no revision properties. Not the standard set!)
>
> I think every revision would (encoded in our Xml protocol) cost about 70 bytes, so there would fit at least 100 revisions in that buffer.

That's it! I now ran my previous example as it was (3 times) and it took 
average 19s until the first revision was reported.

With discoverPath set to true, time until first response dropped to 
average 7s.

With discoverPath set back to false, but revisionProperties set to 
"svn:log", time until first response is 1.5s - 2s -- so timings like the 
command line client!

Regarding your patch, it could make sense to stop doubling 
next_forced_flush at a certain limit (say 128), if the sent log records 
are small, i.e. if discoverPath=false and revisionProperties = null .

-Marc


On 16.03.2015 01:50, Bert Huijben wrote:
>
>
>> -----Original Message-----
>> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
>> Sent: vrijdag 13 maart 2015 20:34
>> To: dev@subversion.apache.org
>> Subject: 1.9.x JavaHL: long initial delay when performing a log
>>
>> I'm experiencing a strange initial delay when performing a log using JavaHL.
>>
>> svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x
>>
>> shows first results after 2-3 seconds, while following code snippet
>> takes at least 20 seconds (sometimes significantly more, might depend on
>> the server's load):
>>
>> ISVNRemote session =
>>     factory.openRemoteSession("http://svn.apache.org/repos/asf");
>>
>> List<String> paths =
>>     Collections.singletonList("subversion/branches/1.8.x");
>>
>> session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
>>                  false, false, null, new LogMessageCallback() {
>>     public void singleMessage(Set<ChangePath> changedPaths,
>>       long revision, Map<String, byte[]> revprops, boolean hasChildren) {
>>       System.out.println("DATA");
>>     }
>> });
>>
>> Once the log responds, a bunch of revisions are reported, so it seems
>> that there is some kind of caching of log records.
>>
>> I've tested with latest 1.9.x sources on Windows but have seen the same
>> behavior with javahl-1.8-extensions branch on Linux, too.
>
> I can only find a server side buffering... But this might explain what you see.
>
>
> Our server reports use an apr feature that buffers +- 8 KByte data before sending out the first data.
>
>
> In this specific JavaHL case you ask for just the revision numbers. (Unlike the C api, JavaHL's session.getLog() appears to handle a null list of revision properties as no revision properties. Not the standard set!)
>
> I think every revision would (encoded in our Xml protocol) cost about 70 bytes, so there would fit at least 100 revisions in that buffer.
>
>
> For each of these revisions the apr repository has to handle a security check for every changed path... And many branch revisions involve more than a few paths.
>
>
> This looks like an extremely worst case for this operation. The first 100 revisions would have to be fully processed before you get the first result.
>
>
>
> I think a patch like the one attached should fix most of the usecases without affecting server performance too much... But it has to be applied at the server.
> (I'm trying to create a testcase to see how much this helps)
>
> 	Bert
>

RE: 1.9.x JavaHL: long initial delay when performing a log

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Marc Strapetz [mailto:marc.strapetz@syntevo.com]
> Sent: vrijdag 13 maart 2015 20:34
> To: dev@subversion.apache.org
> Subject: 1.9.x JavaHL: long initial delay when performing a log
> 
> I'm experiencing a strange initial delay when performing a log using JavaHL.
> 
> svn log http://svn.apache.org/repos/asf/subversion/branches/1.8.x
> 
> shows first results after 2-3 seconds, while following code snippet
> takes at least 20 seconds (sometimes significantly more, might depend on
> the server's load):
> 
> ISVNRemote session =
>    factory.openRemoteSession("http://svn.apache.org/repos/asf");
> 
> List<String> paths =
>    Collections.singletonList("subversion/branches/1.8.x");
> 
> session.getLog(paths, Revision.SVN_INVALID_REVNUM, 0, 0, false,
>                 false, false, null, new LogMessageCallback() {
>    public void singleMessage(Set<ChangePath> changedPaths,
>      long revision, Map<String, byte[]> revprops, boolean hasChildren) {
>      System.out.println("DATA");
>    }
> });
> 
> Once the log responds, a bunch of revisions are reported, so it seems
> that there is some kind of caching of log records.
> 
> I've tested with latest 1.9.x sources on Windows but have seen the same
> behavior with javahl-1.8-extensions branch on Linux, too.

I can only find a server side buffering... But this might explain what you see.


Our server reports use an apr feature that buffers +- 8 KByte data before sending out the first data.


In this specific JavaHL case you ask for just the revision numbers. (Unlike the C api, JavaHL's session.getLog() appears to handle a null list of revision properties as no revision properties. Not the standard set!)

I think every revision would (encoded in our Xml protocol) cost about 70 bytes, so there would fit at least 100 revisions in that buffer.


For each of these revisions the apr repository has to handle a security check for every changed path... And many branch revisions involve more than a few paths.


This looks like an extremely worst case for this operation. The first 100 revisions would have to be fully processed before you get the first result.



I think a patch like the one attached should fix most of the usecases without affecting server performance too much... But it has to be applied at the server. 
(I'm trying to create a testcase to see how much this helps)

	Bert