You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Eugeny N Dzhurinsky <eu...@jdevelop.com> on 2005/10/13 17:34:05 UTC

performance (again)

Well, i'm really stucked.
I created this code (see attachment) to create respository. It creates
5-leaves tree with depth = 5. (each node has 6 children nodes and 10
properties).

I started this at 5:20 PM and it is 6:30 PM now, but it's still working.
It seem to create complete tree with 30 minutes, and started to write it on
the disk. Previous time it wrote 19300 nodes (or soemthing like this) in 1.5
hour.

Is it normal?
-- 
Eugene N Dzhurinsky

Re: was [performance (again)] -> concurrent access?

Posted by Marcel Reutegger <ma...@gmx.net>.

David Nuescheler wrote:
> 
>>random access : 453 ms
>>search by property: 453 ms
> 
> can you describe what you mean by "random access"? is that accessing
> a node by path or by uuid? or something different.
> is it just coincidence that search by property takes exactly the same
> amount of time as the "random" access?
> if you say "search by property" how many entries do you have in
> your search result?
> 
> i would say that 453 ms would be too slow for a search with few results
> and is certainly way too slow for a single access of an item by
> path or uuid.

the first query may be slower because caches have to be populated, 
subsequent queries will be much faster. If I remember correctly the test 
only executes one single query.

regards
  marcel

Re: was [performance (again)] -> concurrent access?

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.

On Wed, Oct 19, 2005 at 11:53:34AM +0200, David Nuescheler wrote:
> > random access : 453 ms
> > search by property: 453 ms
> can you describe what you mean by "random access"? is that accessing
> a node by path or by uuid? or something different.

it's just access to node by path.

> is it just coincidence that search by property takes exactly the same
> amount of time as the "random" access?

I tried several times, it seems that equal times in results I posted is just
coincidence, right. Uusually it differs in ~100 ms

> if you say "search by property" how many entries do you have in
> your search result?

20 or so, can't recall right now.

> > Now the question is how several users can have concurrent access to
> > repository for reading and writing operations?
> every user operates on their own session, is their any further
> information that you need? maybe i misunderstand your question, it
> seems pretty obvious to me...

Okay, in another words, if several users will work with the same repository
and userA will edit siblingA in nodeA, but user B at the same time will edit
siblingB in the same node (nodeA) - will this situation be handled properly if
they will try to save results at the same time?

-- 
Eugene N Dzhurinsky

Re: was [performance (again)] -> concurrent access?

Posted by David Nuescheler <da...@gmail.com>.

hi eugene,

> random access : 453 ms
> search by property: 453 ms
can you describe what you mean by "random access"? is that accessing
a node by path or by uuid? or something different.
is it just coincidence that search by property takes exactly the same
amount of time as the "random" access?
if you say "search by property" how many entries do you have in
your search result?

i would say that 453 ms would be too slow for a search with few results
and is certainly way too slow for a single access of an item by
path or uuid.

> Now the question is how several users can have concurrent access to
> repository for reading and writing operations?
every user operates on their own session, is their any further
information that you need? maybe i misunderstand your question, it
seems pretty obvious to me...

regards,
david

was [performance (again)] -> concurrent access?

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.

On Fri, Oct 14, 2005 at 02:49:09PM +0200, Stefan Guggisberg wrote:
> > > because i am using a windows box i modified repository.xml to
> > > use CQFileSystem instead of LocalFileSystem for the default
> > > workspace.
> > > here's the results when i ran it on my machine (with -Xmx128):
> > > Build 19530 in 796453 ms
> > > Traverse 19530 in 36219 ms
> > > node  found in 0 ms
> > > i.e.
> > > - 19'530 nodes
> > > - 410'130 properties with 195'300 being BINARY!
> > > - 429'660 items on total
> > > - 1.8ms per item
> > > - 40ms per node
> > > i guess that's not too bad.

my results are:
build  19350 nodes in 519220 ms
random access : 453 ms
search by property: 453 ms

Now the question is how several users can have concurrent access to repository
for reading and writing operations?

-- 
Eugene N Dzhurinsky

Re: performance (again)

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.

On Fri, Oct 14, 2005 at 02:49:09PM +0200, Stefan Guggisberg wrote:
> i tried to but it never finished;)

???
impossible, but yes, it could take a lot of time...
Okay, I will try your code and let you know is there any difference with your
results. I think it might be because JCR produces abou 140.000 files in the
filesystem, may be it just UFS-related issue - slow down performance so
dramatically...

-- 
Eugene N Dzhurinsky

Re: performance (again)

Posted by Stefan Guggisberg <st...@gmail.com>.

On 10/14/05, Eugeny N Dzhurinsky <eu...@jdevelop.com> wrote:
> On Fri, Oct 14, 2005 at 02:32:20PM +0200, Stefan Guggisberg wrote:
> > eugeny, fyi:
> >
> > in your code, i changed SAVE_INTERVAL to 100 and fixed
> > the loop writing the properties as follows:
> >
> >             for (int j = 0; j < PROPERTY_COUNT; j++) {
> >                 InputStream in = new FileInputStream("repotest/repository.xml");
> >                 n.setProperty("prop_blob", in);
> >                 in.close();
> >                 n.setProperty("prop" + j, level + "_" + i + "_" + j);
> >             }
> >
> > because i am using a windows box i modified repository.xml to
> > use CQFileSystem instead of LocalFileSystem for the default
> > workspace.
> >
> > here's the results when i ran it on my machine (with -Xmx128):
> >
> > Build 19530 in 796453 ms
> > Traverse 19530 in 36219 ms
> > node  found in 0 ms
> >
> > i.e.
> > - 19'530 nodes
> > - 410'130 properties with 195'300 being BINARY!
> > - 429'660 items on total
> > - 1.8ms per item
> > - 40ms per node
> > i guess that's not too bad.
>
> Yep, looks cool.
> But with my results, I think issue with replacing the same property won't
> affect performance SO much...
>
> Could you please execute original code to see if it will produce different
> results?

i tried to but it never finished;)

>
> --
> Eugene N Dzhurinsky
>

Re: performance (again)

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.

On Fri, Oct 14, 2005 at 02:32:20PM +0200, Stefan Guggisberg wrote:
> eugeny, fyi:
> 
> in your code, i changed SAVE_INTERVAL to 100 and fixed
> the loop writing the properties as follows:
> 
>             for (int j = 0; j < PROPERTY_COUNT; j++) {
>                 InputStream in = new FileInputStream("repotest/repository.xml");
>                 n.setProperty("prop_blob", in);
>                 in.close();
>                 n.setProperty("prop" + j, level + "_" + i + "_" + j);
>             }
> 
> because i am using a windows box i modified repository.xml to
> use CQFileSystem instead of LocalFileSystem for the default
> workspace.
> 
> here's the results when i ran it on my machine (with -Xmx128):
> 
> Build 19530 in 796453 ms
> Traverse 19530 in 36219 ms
> node  found in 0 ms
> 
> i.e.
> - 19'530 nodes
> - 410'130 properties with 195'300 being BINARY!
> - 429'660 items on total
> - 1.8ms per item
> - 40ms per node
> i guess that's not too bad.

Yep, looks cool.
But with my results, I think issue with replacing the same property won't
affect performance SO much...

Could you please execute original code to see if it will produce different
results?

-- 
Eugene N Dzhurinsky

Re: performance (again)

Posted by Stefan Guggisberg <st...@gmail.com>.

eugeny, fyi:

in your code, i changed SAVE_INTERVAL to 100 and fixed
the loop writing the properties as follows:

            for (int j = 0; j < PROPERTY_COUNT; j++) {
                InputStream in = new FileInputStream("repotest/repository.xml");
                n.setProperty("prop_blob", in);
                in.close();
                n.setProperty("prop" + j, level + "_" + i + "_" + j);
            }

because i am using a windows box i modified repository.xml to
use CQFileSystem instead of LocalFileSystem for the default
workspace.

here's the results when i ran it on my machine (with -Xmx128):

Build 19530 in 796453 ms
Traverse 19530 in 36219 ms
node  found in 0 ms

i.e.
- 19'530 nodes
- 410'130 properties with 195'300 being BINARY!
- 429'660 items on total
- 1.8ms per item
- 40ms per node

i guess that's not too bad.

cheers
stefan

Re: performance (again)

Posted by Stefan Guggisberg <st...@gmail.com>.

hi eugeny,

the *bad* performance you're experiencing doesn't surprise me
at all as there are a number of issues with your code:

1. you think that you're creating 20 properties per node (10 String
   and 10 Binary properties) whereas in fact you're only creating
   2 properties.
2. you're using BLOBFileValue for setting a binary value. this is
   an internal class from jackrabbit's core and should never be
   used by an application. for setting binary values you can use
   any of the following:
     - javax.jcr.Node.setProperty(..., InputStream)
     - javax.jcr.Property.setValue(InputStream)
     - or use javax.jcr.ValueFactory to create a binary Value object
3. you only save after you/ve transiently created 20000 nodes.
    with your current code this would be a total of 80'000 transient items!
    (per node: 2 explicitly created properties and 1 autocreated property
    jcr:primaryType; that's three items per node)
    i would recommend to save smaller sets of transient changes, e.g.
    every 100 or 1000 nodes.
4. you're creating a lot of binary properties. binary properties are,
    for obvious reasons, more *expensive* than non-binary properties as
    they're taking up more resources.

i suggest you fix your code with something like this:

    //private static final int SAVE_INTERVAL = 20000;
    private static final int SAVE_INTERVAL = 100;

...

/*
            for (int j = 0; j < PROPERTY_COUNT; j++) {
                n.setProperty("prop_blob", new BLOBFileValue((level + "_" + i
                        + "_" + j).getBytes()));
                n.setProperty("prop", session.getValueFactory().createValue(
                        level + "_" + i + "_" + j));
            }
*/
            n.setProperty("prop_blob", new
FileInputStream("repotest/repository.xml"));
            for (int j = 0; j < PROPERTY_COUNT; j++) {
                n.setProperty("prop" + j, level + "_" + i + "_" + j);
            }


cheers
stefan



On 10/13/05, Eugeny N Dzhurinsky <eu...@jdevelop.com> wrote:
> On Thu, Oct 13, 2005 at 06:34:05PM +0300, Eugeny N Dzhurinsky wrote:
> > Well, i'm really stucked.
> > I created this code (see attachment) to create respository. It creates
> > 5-leaves tree with depth = 5. (each node has 6 children nodes and 10
> > properties).
> >
> > I started this at 5:20 PM and it is 6:30 PM now, but it's still working.
> > It seem to create complete tree with 30 minutes, and started to write it on
> > the disk. Previous time it wrote 19300 nodes (or soemthing like this) in 1.5
> > hour.
>
> okay, here is the output:
> [java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:116) - Build 19530 in 4119509
> ms
> [java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:119) - Traverse 19530 in 25234
> ms
> [java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:121) - node  found in 1 ms
>
> search and traverse speed is really impressive (traverse includes log4j, so I
> assume clean time will be something like 15 seconds or so)
>
> Getting node by name looks cool too, but what about building time???
>
> I'm running FreeBSD 4.11 with native BSD JDK 1.4.2 on P-IV 2.6 GHz with 512 Mb
> ram
>
> the JVM parameters are -Xms128m -XMX512m
>
> --
> Eugene N Dzhurinsky
>

Re: performance (again)

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.

On Thu, Oct 13, 2005 at 06:34:05PM +0300, Eugeny N Dzhurinsky wrote:
> Well, i'm really stucked.
> I created this code (see attachment) to create respository. It creates
> 5-leaves tree with depth = 5. (each node has 6 children nodes and 10
> properties).
> 
> I started this at 5:20 PM and it is 6:30 PM now, but it's still working.
> It seem to create complete tree with 30 minutes, and started to write it on
> the disk. Previous time it wrote 19300 nodes (or soemthing like this) in 1.5
> hour.

okay, here is the output:
[java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:116) - Build 19530 in 4119509
ms
[java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:119) - Traverse 19530 in 25234
ms
[java] DEBUG 13/36/05 06:36:17 [main] (JCRTest:121) - node  found in 1 ms

search and traverse speed is really impressive (traverse includes log4j, so I
assume clean time will be something like 15 seconds or so)

Getting node by name looks cool too, but what about building time???

I'm running FreeBSD 4.11 with native BSD JDK 1.4.2 on P-IV 2.6 GHz with 512 Mb
ram

the JVM parameters are -Xms128m -XMX512m

-- 
Eugene N Dzhurinsky