You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Cheng Zhang <zh...@yahoo.com> on 2009/01/23 20:08:56 UTC

session save and performance question

I saw session saving costs too much time. I guest it might be caused by too many nodes to be saved. what's the best practice to save the session to get the best performance? 

Thanks a lot,
Kevin


Re: session save and performance question

Posted by Alexander Klimetschek <ak...@day.com>.
On Sat, Jan 24, 2009 at 2:13 AM, Cheng Zhang <zh...@yahoo.com> wrote:
> What I'm doing currently is to use Jackrabbit to store XML document. Each XML element's attribute will be mapped to a node property. And each XML text element, for example "<age>13</age>", will be also mapped to node property. All other XML elements are mapped to a jackrabbit node.
>
> I'm now doubting if my case is the right use case for Jackrabbit. In my case, one XML document will be mapped to 500 node/child-nodes. If I have 1 million xml documents, it means 500M nodes will be created and indexed.
>
> Or I only map those nodes/properties I'm interested and in addition store the xml in a file node.

Have a look at http://markmail.org/message/edhqnsb4hkmfg5eu

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: session save and performance question

Posted by Mark Nüßler <ma...@9elements.com>.
Hi Kevin,

got no expirience with mapping xml to a possible jackrabbit structure.

Think you should give your last thought a try. If you don't need every
information from your xml-file, just map these you are interestet in.
If you save the xml-file as additional property (Stream) to the 
content-node you are able to analyse the xml again. Maybe you
could have same kind of background-worker.

just a thought:
1. create a content-node and add the xml-file as blob,
    this shouldn't take much time
2. run a background-worker to do the hard stuff

got also no expirience with more than ~30 properties,
maybe you could have a look at the benchmark classes
of jackrabbit and write a performace test.

 > I'm now doubting if my case is the right use case for Jackrabbit.
^^ if your xml files have all the same structure and you query
always the same properties, a relational db is much faster.
but if you are dealing with different semistructured files
and want to be as flexible as possible - i don't know any better
than jackrabbit. i my usecase i must say i have just 'raped'
the rabbit ;-)


best regards

derMark



Cheng Zhang schrieb:
> Hi Mark & all,
> 
> What I'm doing currently is to use Jackrabbit to store XML document. Each XML element's attribute will be mapped to a node property. And each XML text element, for example "<age>13</age>", will be also mapped to node property. All other XML elements are mapped to a jackrabbit node.
> 
> I'm now doubting if my case is the right use case for Jackrabbit. In my case, one XML document will be mapped to 500 node/child-nodes. If I have 1 million xml documents, it means 500M nodes will be created and indexed. 
> 
> Or I only map those nodes/properties I'm interested and in addition store the xml in a file node.
> 
> Best,
> Kevin
> 
> 
> 
> ----- Original Message ----
> From: Mark Nüßler <ma...@9elements.com>
> To: users@jackrabbit.apache.org
> Sent: Friday, January 23, 2009 1:38:49 PM
> Subject: Re: session save and performance question
> 
> Hello Kevin,
> 
> i think you are right, when saying that saving the session is cost
> intensive. but ... it depends on ...
> 
> 1. how your application works internally
>     - do you really have to save the session every time
>       or can you work with a transaction concept ?
> 2. what kind of structure you use
>     - flat or hierachical
> 3. if you need/use mix:versioning
> 4. .... maybe others i've not mentioned here ?
> 
> i made some tests regarding 'invisible' structure nodes
> vs a flat hierarchy when adding 50k content nodes.
> [50k is a small amount within my current project]
> 
> it is always better to have a kind of structure above
> your content nodes, when you exceed x numbers of child-nodes.
> 
> because i haven't done all of the tests, i am not really sure
> what i would suggest for x - the userlist says ~10k, anywhere
> else (or was it an old entry ?) mentioned 4k.
> 
> @Kevin, if not done yet, read some of the old list-entries
> 
> 
> best regards
> 
> derMark
> 
> 
> Cheng Zhang schrieb:
>> I saw session saving costs too much time. I guest it might be caused by too many nodes to be saved. what's the best practice to save the session to get the best performance? 
>> Thanks a lot,
>> Kevin
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.10.12/1911 - Release Date: 23.01.2009 07:28
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com 
>> Version: 8.0.176 / Virus Database: 270.10.13/1912 - Release Date: 23.01.2009 18:54
>>

Re: session save and performance question

Posted by Cheng Zhang <zh...@yahoo.com>.
Hi Mark & all,

What I'm doing currently is to use Jackrabbit to store XML document. Each XML element's attribute will be mapped to a node property. And each XML text element, for example "<age>13</age>", will be also mapped to node property. All other XML elements are mapped to a jackrabbit node.

I'm now doubting if my case is the right use case for Jackrabbit. In my case, one XML document will be mapped to 500 node/child-nodes. If I have 1 million xml documents, it means 500M nodes will be created and indexed. 

Or I only map those nodes/properties I'm interested and in addition store the xml in a file node.

Best,
Kevin



----- Original Message ----
From: Mark Nüßler <ma...@9elements.com>
To: users@jackrabbit.apache.org
Sent: Friday, January 23, 2009 1:38:49 PM
Subject: Re: session save and performance question

Hello Kevin,

i think you are right, when saying that saving the session is cost
intensive. but ... it depends on ...

1. how your application works internally
    - do you really have to save the session every time
      or can you work with a transaction concept ?
2. what kind of structure you use
    - flat or hierachical
3. if you need/use mix:versioning
4. .... maybe others i've not mentioned here ?

i made some tests regarding 'invisible' structure nodes
vs a flat hierarchy when adding 50k content nodes.
[50k is a small amount within my current project]

it is always better to have a kind of structure above
your content nodes, when you exceed x numbers of child-nodes.

because i haven't done all of the tests, i am not really sure
what i would suggest for x - the userlist says ~10k, anywhere
else (or was it an old entry ?) mentioned 4k.

@Kevin, if not done yet, read some of the old list-entries


best regards

derMark


Cheng Zhang schrieb:
> I saw session saving costs too much time. I guest it might be caused by too many nodes to be saved. what's the best practice to save the session to get the best performance? 
> Thanks a lot,
> Kevin
> 
> 
> ------------------------------------------------------------------------
> 
> 
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.10.12/1911 - Release Date: 23.01.2009 07:28
> 


Re: session save and performance question

Posted by Mark Nüßler <ma...@9elements.com>.
Hello Kevin,

i think you are right, when saying that saving the session is cost
intensive. but ... it depends on ...

1. how your application works internally
     - do you really have to save the session every time
       or can you work with a transaction concept ?
2. what kind of structure you use
	- flat or hierachical
3. if you need/use mix:versioning
4. .... maybe others i've not mentioned here ?

i made some tests regarding 'invisible' structure nodes
vs a flat hierarchy when adding 50k content nodes.
[50k is a small amount within my current project]

it is always better to have a kind of structure above
your content nodes, when you exceed x numbers of child-nodes.

because i haven't done all of the tests, i am not really sure
what i would suggest for x - the userlist says ~10k, anywhere
else (or was it an old entry ?) mentioned 4k.

@Kevin, if not done yet, read some of the old list-entries


best regards

derMark


Cheng Zhang schrieb:
> I saw session saving costs too much time. I guest it might be caused by too many nodes to be saved. what's the best practice to save the session to get the best performance? 
> 
> Thanks a lot,
> Kevin
> 
> 
> ------------------------------------------------------------------------
> 
> 
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com 
> Version: 8.0.176 / Virus Database: 270.10.12/1911 - Release Date: 23.01.2009 07:28
>