You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Marcin Nowak <ma...@comarch.com> on 2007/04/23 08:21:26 UTC
eXist
Hi,
Recently I've discovered XML database quite similar in general concepts
to Jackrabbit, in fact it does not provide versioning and referencing
between nodes but it is really fast as I compared it with Jackrabbit,
especially in querying and importing nodes, question is why Jackrabbit
performs so badly in comparison to eXist?
Project webpage:
http://exist.sourceforge.net/
BR,
Marcin Nowak
Re: eXist
Posted by Marcin Nowak <ma...@comarch.com>.
Hi,
First of all, my intention was definitely not to troll - I am looking
for the best solution for an XML storage, my favourite is Jackrabbit but
I've found something what in my opinion performs better - I am only
asking why? I really want to use Jackrabbit, I like it versioning and
referencing features but I need it to be a high performance XML storage.
In fact my question was based on short testing, but not just 5 minutes
:) I have created a repository containing a collections nested in each
other(three of them) each with three 4,5 MB XML files. Then I've
launched a query (btw - import times are impressive (4,5MB XML in ca. 10
seconds)- will you agree? If not - show me how to configure Jackrabbit
to preform that good(same import in Jackrabbit took ca. 16 minutes on
same machine) - again please don't take it as trolling - **I really want
to know how to configure Jackrabbit to be high-performance**). Query was
really simple
for $x in //type where $x='STRING_SINGLE'
return $x
and was performed on the whole DB - correct me if I am wrong. Results of
querying I have received after less than 4 seconds.
I know how Jackrabbit performs in default configuration, on derby,
mysql, and oracle DB very well, you can see results of my tests
somewhere here in mailing archives, I've published complex report some
time ago, after that report I have made those tests again - because of
changes made in Jackrabbit source code, results were better but in
comparison to eXist, again, not to optimistic.
My main question is that is there anything that can speed up Jackrabbit
to get close to performance results achieved in eXist? Take this
question seriously - performance is one of the main requirements to XML
storage which I need.
BR,
Marcin Nowak
Jean-Baptiste Quenot wrote:
> * Marcin Nowak:
>
>
>> Recently I've discovered XML database quite similar in general
>> concepts to Jackrabbit, in fact it does not provide versioning
>> and referencing between nodes but it is really fast as
>> I compared it with Jackrabbit, especially in querying and
>> importing nodes, question is why Jackrabbit performs so badly in
>> comparison to eXist?
>>
>
> You're asking for a troll very obviously, so I won't comment on
> it, but there are a few things that are worth to mention:
>
> 1. eXist is an XML database, Jackrabbit is not, so you are
> comparing two unrelated things. Moreover, even if the query
> syntax can look similar, eXist returns XML, whereas JCR returns
> Java objects. You need to understand the implications of this,
> namely parsing the resulting XML and work with it can quickly
> lead to memory and CPU starvation, especially when the query
> returns a lot of documents. JCR plays nicely with this, as it
> returns an iterator on the data set.
>
> 2. Jackrabbit is mostly seen as a Java-API, whereas eXist is a
> standalone beast with specific servlets that talk xmlrpc, REST,
> and so on mostly accessed using HTTP requests causing an
> additional overhead. eXist even has a front-end based on
> Cocoon. A *lot* of caching is done on the eXist side, while
> with Jackrabbit you will need a second-level cache in your own
> code to address that.
>
> 3. In my book, eXist is not designed to let you query the whole
> database at once, whereas Jackrabbit allows you to return a
> sorted subset of documents from the whole repository very
> efficiently, by design. Accessing one XML document is very
> different from querying the whole database with 10k+ documents.
> Play with eXist more than 5 minutes with a serious data set and
> you will notice by yourself.
>
> 4. Jackrabbit's efficiency at importing nodes depends largely on
> the persistence and filesystem implementation you are using.
> For example I've seen the BDB storage backend perform 10 times
> faster than the XML-file-based one.
>
> 5. When you compare two approaches (one XML database, one JCR
> repository) for your own usecase, and moreover when you ask for
> feedback about your experiments, publish the results of your
> benchmarks, be very careful to mention *what* you tested, and
> *how*. You also need to mention of course the numeric figures.
> Otherwise you're just spreading FUD.
>
> Cheers,
>
Re: eXist
Posted by Jean-Baptiste Quenot <jb...@apache.org>.
* Marcin Nowak:
> Recently I've discovered XML database quite similar in general
> concepts to Jackrabbit, in fact it does not provide versioning
> and referencing between nodes but it is really fast as
> I compared it with Jackrabbit, especially in querying and
> importing nodes, question is why Jackrabbit performs so badly in
> comparison to eXist?
You're asking for a troll very obviously, so I won't comment on
it, but there are a few things that are worth to mention:
1. eXist is an XML database, Jackrabbit is not, so you are
comparing two unrelated things. Moreover, even if the query
syntax can look similar, eXist returns XML, whereas JCR returns
Java objects. You need to understand the implications of this,
namely parsing the resulting XML and work with it can quickly
lead to memory and CPU starvation, especially when the query
returns a lot of documents. JCR plays nicely with this, as it
returns an iterator on the data set.
2. Jackrabbit is mostly seen as a Java-API, whereas eXist is a
standalone beast with specific servlets that talk xmlrpc, REST,
and so on mostly accessed using HTTP requests causing an
additional overhead. eXist even has a front-end based on
Cocoon. A *lot* of caching is done on the eXist side, while
with Jackrabbit you will need a second-level cache in your own
code to address that.
3. In my book, eXist is not designed to let you query the whole
database at once, whereas Jackrabbit allows you to return a
sorted subset of documents from the whole repository very
efficiently, by design. Accessing one XML document is very
different from querying the whole database with 10k+ documents.
Play with eXist more than 5 minutes with a serious data set and
you will notice by yourself.
4. Jackrabbit's efficiency at importing nodes depends largely on
the persistence and filesystem implementation you are using.
For example I've seen the BDB storage backend perform 10 times
faster than the XML-file-based one.
5. When you compare two approaches (one XML database, one JCR
repository) for your own usecase, and moreover when you ask for
feedback about your experiments, publish the results of your
benchmarks, be very careful to mention *what* you tested, and
*how*. You also need to mention of course the numeric figures.
Otherwise you're just spreading FUD.
Cheers,
--
Jean-Baptiste Quenot
aka John Banana Qwerty
http://caraldi.com/jbq/
Re: eXist
Posted by Marcin Nowak <ma...@comarch.com>.
So you suggest that storing data as attributes could be more efficient
in Jackrabbit? After weekend I'll try to provide some results of the
same test cases but with another set of XML-s with storing based on
attributes, I'll also make some comparison charts.
If there are any give me some suggestions how data should be organized
to fit best in Jackrabbit architecture - what should I avoid, where are
the limitations/depth, number of subtags on one level, etc./ ?
Jukka Zitting wrote:
> Hi,
>
> On 4/24/07, Marcin Nowak <ma...@comarch.com> wrote:
>> I can't share those files but I can give you some stats:
>
> Your data set seems to primarily use tags instead of attributes for
> storing content. Jackrabbit nodes are quite a bit "heavier" than DOM
> nodes, which probably explains the difference in performance.
>
> As a rule of thumb I've sometimes used a rough metric that a
> Jackrabbit node is about an order of magnitude more expensive than a
> DOM node. I think we probably could improve this quite a bit.
>
> BR,
>
> Jukka Zitting
>
Re: eXist
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 4/24/07, Marcin Nowak <ma...@comarch.com> wrote:
> I can't share those files but I can give you some stats:
Your data set seems to primarily use tags instead of attributes for
storing content. Jackrabbit nodes are quite a bit "heavier" than DOM
nodes, which probably explains the difference in performance.
As a rule of thumb I've sometimes used a rough metric that a
Jackrabbit node is about an order of magnitude more expensive than a
DOM node. I think we probably could improve this quite a bit.
BR,
Jukka Zitting
Re: eXist
Posted by Marcin Nowak <ma...@comarch.com>.
I can't share those files but I can give you some stats:
XML contains 3321 subtags to root
there are two types of subtags
1. Tag containing a text value /2090 tags/
2. Tag containing structure as follows (every subtag contains also a
text value) /1231 tags/:
document root
|----->subtag
| |------>subtag attrib1 attrib2
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | |------>subtag
| |------>subtag attrib1 attrib2
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | |------>subtag
| |------>subtag attrib1 attrib2
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | | |------>subtag
| | |------>subtag
BR,
Marcin Nowak
David Nuescheler wrote:
> hi marcin,
>
>> ... some junk-XML documents of size 4715740 B ...
> is this a valid usecase for your application and therefore
> similar to what you expect from your application to be
> working with?
> ...do you think you can share those xml files with the list aswell?
>
> regards,
> david
>
Re: eXist
Posted by David Nuescheler <da...@gmail.com>.
hi marcin,
> ... some junk-XML documents of size 4715740 B ...
is this a valid usecase for your application and therefore
similar to what you expect from your application to be
working with?
...do you think you can share those xml files with the list aswell?
regards,
david
Re: eXist
Posted by Marcin Nowak <ma...@comarch.com>.
Hi,
For testing purposes of Jackrabbit I (in fact we :)) have used attached
classes and some junk-XML documents of size 4715740 B, testing eXist was
not so complex, as we used provided by authors of eXist demo application
and imported same files in same procedure as we did for Jackrabbit.
Report on Jackrabbit performance can be found in this mailing archive,
and results achieved in eXist - I don't have a formal report on it now -
but you can easily reproduce those tests. Jackrabbit performance report
was based on Jackrabbit v. 1.1.1, after that we relaunched tests again,
based on the same procedure and Jackrabbit v. 1.2.1 - results were
better ca. 20% - in fact tests should now be relaunched because of
bundle persistence manager.
Looking forward for your reply :)
BR,
Marcin Nowak
Jukka Zitting wrote:
> Hi,
>
> On 4/23/07, Marcin Nowak <ma...@comarch.com> wrote:
>> But that is not the point :) anyone have an idea how to configure
>> Jackrabbit to perform like eXist?
>
> Let's see how well we can do. Given a quick look it seems that eXist
> will certainly beat Jackrabbit in the performance comparison, but I'd
> be interested in seeing how close we can get and what are the limiting
> factors we face.
>
> Could you share the test code you are using for both eXist and
> Jackrabbit?
>
> BR,
>
> Jukka Zitting
>
Re: eXist
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 4/23/07, Marcin Nowak <ma...@comarch.com> wrote:
> But that is not the point :) anyone have an idea how to configure
> Jackrabbit to perform like eXist?
Let's see how well we can do. Given a quick look it seems that eXist
will certainly beat Jackrabbit in the performance comparison, but I'd
be interested in seeing how close we can get and what are the limiting
factors we face.
Could you share the test code you are using for both eXist and Jackrabbit?
BR,
Jukka Zitting
Re: eXist
Posted by Marcin Nowak <ma...@comarch.com>.
Hi,
FolDeRol wrote:
> Marcin,
>
> I used to work with eXist 2.5 years ago. JCR and XML:DB concepts are
> actually have some common moments. The reason of eXist's performance
> is, as
> far as I know, the fact that eXists keeps the whole database as in-memory
> DOM model
I've made some tests and - I'm not sure where DB is being stored.. I did
the following:
1. Started repository and checked memory usage of it
2. Added 30 MB of XML files
3. Shut down the repository.
4. Started it again and checked memory usage.
It is quite the same as in point 1.
But that is not the point :) anyone have an idea how to configure
Jackrabbit to perform like eXist?
BR,
Marcin Nowak
> and, in addition uses advanced indexes like those that allow quick
> processing of XPath expressions like "/x//y".
>
> Regards
>
> On 4/23/07, Marcin Nowak <ma...@comarch.com> wrote:
>>
>> Hi,
>>
>> Recently I've discovered XML database quite similar in general concepts
>> to Jackrabbit, in fact it does not provide versioning and referencing
>> between nodes but it is really fast as I compared it with Jackrabbit,
>> especially in querying and importing nodes, question is why Jackrabbit
>> performs so badly in comparison to eXist?
>>
>> Project webpage:
>> http://exist.sourceforge.net/
>>
>> BR,
>> Marcin Nowak
>>
>
Re: eXist
Posted by FolDeRol <fo...@gmail.com>.
Marcin,
I used to work with eXist 2.5 years ago. JCR and XML:DB concepts are
actually have some common moments. The reason of eXist's performance is, as
far as I know, the fact that eXists keeps the whole database as in-memory
DOM model and, in addition uses advanced indexes like those that allow quick
processing of XPath expressions like "/x//y".
Regards
On 4/23/07, Marcin Nowak <ma...@comarch.com> wrote:
>
> Hi,
>
> Recently I've discovered XML database quite similar in general concepts
> to Jackrabbit, in fact it does not provide versioning and referencing
> between nodes but it is really fast as I compared it with Jackrabbit,
> especially in querying and importing nodes, question is why Jackrabbit
> performs so badly in comparison to eXist?
>
> Project webpage:
> http://exist.sourceforge.net/
>
> BR,
> Marcin Nowak
>