You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Werner Frieb <w....@gmx.net> on 2003/12/19 17:13:16 UTC

Report on XML Databases ready

Hi list members !

I've just finished a report on XML databases
http://www.studierstube.org/world/xml_databases_compared.html
where I've tried to evaluate and compare Tamino - Xindice - eXist.

Please, let me know what you think and
send a copy of your answer directly to my e-mail address (w.frieb@gmx.net),
so that I don't miss it.

Merry Christmas !

Werner.


Re: Report on XML Databases ready

Posted by Don Saxton <ds...@pacbell.net>.
Werner
I think your report will be a real contribution to further progress. Great
job.
Don

----- Original Message ----- 
From: "Werner Frieb" <w....@gmx.net>
To: <xi...@xml.apache.org>
Sent: Saturday, December 20, 2003 10:49 AM
Subject: Re: Report on XML Databases ready


>
>
> >>One thing you you forgot to test, and hence include in your report, was
> >>performance when using non "large" documents.
> >>In writing the XML:DB benchmarks (xmldbench,sourceforge.net)
>
> Performance is not really important to us, so we did not do any
benchmarking.
>
> >>I've noted that using xml documents of about 30K in size, eXist
> >>falls over at anything over and above about 30 000 resources, whereas
> >>Xindice 1.0 AND 1.1b happily performs up to about 100 000
> >>documents.
> >>
> >>Your test data of 5MB documents would have missed this important note,
> >>and I still question the reasoning in having a 5MB xml document anyway -
> >>usually this is symptomatic of a design problem. (IMHO)
> >
> >OTOH, it would be interesting to know what amount of memory was given to
> >the xindice during this test (my bet is that it was default for JVM -
> >64Mb), and what amount of memory would be enough to pass the test without
> >OOME. But Werner have not gone this far, so we you won't see memory
> >requirements comparison for the databases.
>
> No, I did not change the memory settings of the JVM.
> Why not make a note in the documentation explaining that ?
>
> And no - I don't think it's a design problem having big sized XML
> documents, it's more the request not to have zillions of mini documents,
> because this is unhandy, when it comes to import/export data from the
> database. Furthermore, it does not always make sense to split up a
> document. In some cases there is even a good reason for avoiding that.
> Think of a play of Shakespeare for example...
>
> >Also, I notices that you reference to the old, xindice 1.0,
documentation.
> >All documentation which is work in progress and which is relevant to the
> >xindice 1.1 is located under dev/ folder (dev tab in the navigation).
> >
> >Hope this clears the version mess you had with the documentation.
>
> Yes, this is the first time noticing that there is a separate
documentation
> for the new version of Xinidice.
> But I was on the dev page before, so I think this version numbers are
quite
> new.
> And the headings of my hardcopy of the documentation read version numbers
> like 0.7.1 and 0.9.2 and not 1.0 and 1.1...
>
> Werner.


Re: Report on XML Databases ready

Posted by Vadim Gritsenko <va...@verizon.net>.
Werner Frieb wrote:

>
>>> One thing you you forgot to test, and hence include in your report, 
>>> was performance when using non "large" documents.
>>> In writing the XML:DB benchmarks (xmldbench,sourceforge.net)
>>
>
> Performance is not really important to us, so we did not do any 
> benchmarking.
>
>>> I've noted that using xml documents of about 30K in size, eXist
>>> falls over at anything over and above about 30 000 resources, 
>>> whereas Xindice 1.0 AND 1.1b happily performs up to about 100 000
>>> documents.
>>>
>>> Your test data of 5MB documents would have missed this important 
>>> note, and I still question the reasoning in having a 5MB xml 
>>> document anyway - usually this is symptomatic of a design problem. 
>>> (IMHO)
>>
>>
>> OTOH, it would be interesting to know what amount of memory was given 
>> to the xindice during this test (my bet is that it was default for 
>> JVM - 64Mb), and what amount of memory would be enough to pass the 
>> test without OOME. But Werner have not gone this far, so we you won't 
>> see memory requirements comparison for the databases.
>
>
> No, I did not change the memory settings of the JVM.
> Why not make a note in the documentation explaining that ?


Send in a patch! :-)


> And no - I don't think it's a design problem having big sized XML 
> documents, it's more the request not to have zillions of mini 
> documents, because this is unhandy, when it comes to import/export 
> data from the database. Furthermore, it does not always make sense to 
> split up a document. In some cases there is even a good reason for 
> avoiding that. Think of a play of Shakespeare for example...


It's not design problem in general, but it is design problem when 
designing specifically for xindice, due to xindice architecture, which 
is noted in the FAQ.


>> Also, I notices that you reference to the old, xindice 1.0, 
>> documentation. All documentation which is work in progress and which 
>> is relevant to the xindice 1.1 is located under dev/ folder (dev tab 
>> in the navigation).
>>
>> Hope this clears the version mess you had with the documentation.
>
>
> Yes, this is the first time noticing that there is a separate 
> documentation for the new version of Xinidice.
> But I was on the dev page before, so I think this version numbers are 
> quite new.
> And the headings of my hardcopy of the documentation read version 
> numbers like 0.7.1 and 0.9.2 and not 1.0 and 1.1...


Documentation for 1.1 was there; but I made changes to docs titles (and 
updated content a bit: CORBA -> XML-RPC) to clearly show the xindice 
version it is applicable for.

Version number of the doc currently on the very bottom, and it is 
obtained from the revision CVS tag.

Vadim



Re: Report on XML Databases ready

Posted by Werner Frieb <w....@gmx.net>.

>>One thing you you forgot to test, and hence include in your report, was 
>>performance when using non "large" documents.
>>In writing the XML:DB benchmarks (xmldbench,sourceforge.net)

Performance is not really important to us, so we did not do any benchmarking.

>>I've noted that using xml documents of about 30K in size, eXist
>>falls over at anything over and above about 30 000 resources, whereas 
>>Xindice 1.0 AND 1.1b happily performs up to about 100 000
>>documents.
>>
>>Your test data of 5MB documents would have missed this important note, 
>>and I still question the reasoning in having a 5MB xml document anyway - 
>>usually this is symptomatic of a design problem. (IMHO)
>
>OTOH, it would be interesting to know what amount of memory was given to 
>the xindice during this test (my bet is that it was default for JVM - 
>64Mb), and what amount of memory would be enough to pass the test without 
>OOME. But Werner have not gone this far, so we you won't see memory 
>requirements comparison for the databases.

No, I did not change the memory settings of the JVM.
Why not make a note in the documentation explaining that ?

And no - I don't think it's a design problem having big sized XML 
documents, it's more the request not to have zillions of mini documents, 
because this is unhandy, when it comes to import/export data from the 
database. Furthermore, it does not always make sense to split up a 
document. In some cases there is even a good reason for avoiding that. 
Think of a play of Shakespeare for example...

>Also, I notices that you reference to the old, xindice 1.0, documentation. 
>All documentation which is work in progress and which is relevant to the 
>xindice 1.1 is located under dev/ folder (dev tab in the navigation).
>
>Hope this clears the version mess you had with the documentation.

Yes, this is the first time noticing that there is a separate documentation 
for the new version of Xinidice.
But I was on the dev page before, so I think this version numbers are quite 
new.
And the headings of my hardcopy of the documentation read version numbers 
like 0.7.1 and 0.9.2 and not 1.0 and 1.1...

Werner.


Re: Report on XML Databases ready

Posted by Vadim Gritsenko <va...@verizon.net>.
(CC ing Werner as per his request)

webhiker@tiscali.fr wrote:

> One thing you you forgot to test, and hence include in your report, 
> was performance when using non "large" documents.
> In writing the XML:DB benchmarks (xmldbench,sourceforge.net)


(Sidenote: from http://xmldbench.sourceforge.net: "All databases are 
tested "out of the box", and no tweaking of parameters will be done". 
Xindice out of the box will use JVM of the servlet container; when 
script xindice.[sh|bat] used, it allows to set JAVA_OPT variable, but 
there is no default settings, which means that only 64Mb will be used. 
For comparison, eXist is allowed to use 256Mb of memory)


> I've noted that using xml documents of about 30K in size, eXist
> falls over at anything over and above about 30 000 resources, whereas 
> Xindice 1.0 AND 1.1b happily performs up to about 100 000
> documents.
>
> Your test data of 5MB documents would have missed this important note, 
> and I still question the reasoning in having a 5MB xml document anyway 
> - usually this is symptomatic of a design problem. (IMHO)


I think you are right. Xindice FAQ, question #2:


            2. What is Xindice not?

    Xindice is not a persistent DOM implementation. It was not designed 
to store and
    manage single monster sized documents, where one document is treated 
as a set of
    mini documents. It was specifically designed for managing many small 
to medium sized documents.



OTOH, it would be interesting to know what amount of memory was given to 
the xindice during this test (my bet is that it was default for JVM - 
64Mb), and what amount of memory would be enough to pass the test 
without OOME. But Werner have not gone this far, so we you won't see 
memory requirements comparison for the databases.



> WH
>
> Werner Frieb wrote:
>
>>
>> Hi list members !
>>
>> I've just finished a report on XML databases
>> http://www.studierstube.org/world/xml_databases_compared.html
>> where I've tried to evaluate and compare Tamino - Xindice - eXist.
>>
>> Please, let me know what you think and
>


Also, I notices that you reference to the old, xindice 1.0, 
documentation. All documentation which is work in progress and which is 
relevant to the xindice 1.1 is located under dev/ folder (dev tab in the 
navigation).

Hope this clears the version mess you had with the documentation.

Regards,
Vadim



>> send a copy of your answer directly to my e-mail address 
>> (w.frieb@gmx.net),
>> so that I don't miss it.
>>
>> Merry Christmas !
>>
>> Werner.
>



Re: Report on XML Databases ready

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Wolfram Horwath wrote:

>
> Vadim Gritsenko schrieb:
>
>> Wolfram Horwath wrote:
>>
>>> In DB I have 100 documents, each about 2,6K of size. I found that 
>>> the conversion was rather slow and tried to improve performance. I 
>>> was reducing open/close of a Collection, and reducing 
>>> database-queries by caching often needed results. As it was still 
>>> slow, I thought it would be writing files to disk.
>>>
>>> I then remembered the hint of someone to have a look at eXist and, 
>>> both using XML:DB, switched database for testing purposes. What I 
>>> found was neat was the way I could access the results of a query 
>>> with eXist, but then came the drawback: while the conversion using 
>>> Xindice took about 140s, eXist needed about 20s, which I find 
>>> puzzling, as the documents are really small in size.
>>>
>>> This are just my observations, any comments?
>>
>>
>>
>> Unit tests in Xindice inserting 10000 documents, and then reading 
>> them all in just several seconds, which is fast enough for me. 
>
>
> So, what could be the problem responsible for it being so slow in my 
> task?


I don't know - I don't have your task, and I don't have your environment.

Can you take a look at the 
org.apache.xindice.integration.client.basic.DocumentTest, 
org.apache.xindice.core.filer.FilerTestBase.testInsertManyDocuments(), 
and come up with some test (you can add it to DocumentTest or 
XMLResourceTest) which would be similar to your task (and as slow as 
your task)?

You can then run your test, and see how it works for you. Simply follow 
these steps:
  * build
  * xindice start
  * build test
Read html reports in build/test-report.

Once there is a test, it is possible to determine a problem why it is so 
slow for you.


PS Wild guess: do you have DTD declarations in your XML pointing to a 
DTD located on an external server, and for each XML parsing, parser goes 
over the network for the DTD? You can easily detect this by unplugging 
from the network.


Vadim


Re: Report on XML Databases ready

Posted by Wolfram Horwath <wo...@innovations.de>.

Vadim Gritsenko schrieb:

> Wolfram Horwath wrote:
>
>> Hi!
>>
>> Maybe my observation regarding performance can be of interest...
>>
>> I am currently working on a project for my studies, where the 
>> documents I work on are stored in a XML-database, which is Xindice.
>>
>> The task I was recently working on was converting the documents 
>> stored in the DB to another XML-format, which is done using Java 
>> (yes, I was trying XSLT but came to the decision that XSLT could not 
>> do this transformation).
>>
>> In DB I have 100 documents, each about 2,6K of size. I found that the 
>> conversion was rather slow and tried to improve performance. I was 
>> reducing open/close of a Collection, and reducing database-queries by 
>> caching often needed results. As it was still slow, I thought it 
>> would be writing files to disk.
>>
>> I then remembered the hint of someone to have a look at eXist and, 
>> both using XML:DB, switched database for testing purposes. What I 
>> found was neat was the way I could access the results of a query with 
>> eXist, but then came the drawback: while the conversion using Xindice 
>> took about 140s, eXist needed about 20s, which I find puzzling, as 
>> the documents are really small in size.
>>
>> This are just my observations, any comments?
>
>
>
> Unit tests in Xindice inserting 10000 documents, and then reading them 
> all in just several seconds, which is fast enough for me. 

So, what could be the problem responsible for it being so slow in my task?

Wolfram


Re: Report on XML Databases ready

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Wolfram Horwath wrote:

> Hi!
>
> Maybe my observation regarding performance can be of interest...
>
> I am currently working on a project for my studies, where the 
> documents I work on are stored in a XML-database, which is Xindice.
>
> The task I was recently working on was converting the documents stored 
> in the DB to another XML-format, which is done using Java (yes, I was 
> trying XSLT but came to the decision that XSLT could not do this 
> transformation).
>
> In DB I have 100 documents, each about 2,6K of size. I found that the 
> conversion was rather slow and tried to improve performance. I was 
> reducing open/close of a Collection, and reducing database-queries by 
> caching often needed results. As it was still slow, I thought it would 
> be writing files to disk.
>
> I then remembered the hint of someone to have a look at eXist and, 
> both using XML:DB, switched database for testing purposes. What I 
> found was neat was the way I could access the results of a query with 
> eXist, but then came the drawback: while the conversion using Xindice 
> took about 140s, eXist needed about 20s, which I find puzzling, as the 
> documents are really small in size.
>
> This are just my observations, any comments?


Unit tests in Xindice inserting 10000 documents, and then reading them 
all in just several seconds, which is fast enough for me.

Vadim


Re: Report on XML Databases ready

Posted by Wolfram Horwath <wo...@innovations.de>.
Hi!

Maybe my observation regarding performance can be of interest...

I am currently working on a project for my studies, where the documents 
I work on are stored in a XML-database, which is Xindice.

The task I was recently working on was converting the documents stored 
in the DB to another XML-format, which is done using Java (yes, I was 
trying XSLT but came to the decision that XSLT could not do this 
transformation).

In DB I have 100 documents, each about 2,6K of size. I found that the 
conversion was rather slow and tried to improve performance. I was 
reducing open/close of a Collection, and reducing database-queries by 
caching often needed results. As it was still slow, I thought it would 
be writing files to disk.

I then remembered the hint of someone to have a look at eXist and, both 
using XML:DB, switched database for testing purposes. What I found was 
neat was the way I could access the results of a query with eXist, but 
then came the drawback: while the conversion using Xindice took about 
140s, eXist needed about 20s, which I find puzzling, as the documents 
are really small in size.

This are just my observations, any comments?

Greetings,

Wolfram Horwath

webhiker@tiscali.fr schrieb:

> One thing you you forgot to test, and hence include in your report, 
> was performance when using non "large" documents.
> In writing the XML:DB benchmarks (xmldbench,sourceforge.net) I've 
> noted that using xml documents of about 30K in size, eXist
> falls over at anything over and above about 30 000 resources, whereas 
> Xindice 1.0 AND 1.1b happily performs up to about 100 000
> documents.
>
> Your test data of 5MB documents would have missed this important note, 
> and I still question the reasoning in having a 5MB xml document anyway 
> - usually this
> is symptomatic of a design problem. (IMHO)
>
>
>
> WH
>
> Werner Frieb wrote:
>
>>
>> Hi list members !
>>
>> I've just finished a report on XML databases
>> http://www.studierstube.org/world/xml_databases_compared.html
>> where I've tried to evaluate and compare Tamino - Xindice - eXist.
>>
>> Please, let me know what you think and
>> send a copy of your answer directly to my e-mail address 
>> (w.frieb@gmx.net),
>> so that I don't miss it.
>>
>> Merry Christmas !
>>
>> Werner.
>>
>>
>
>


Re: Report on XML Databases ready

Posted by Murray Altheim <m....@open.ac.uk>.
webhiker@tiscali.fr wrote:
> One thing you you forgot to test, and hence include in your report, was 
> performance when using non "large" documents.
> In writing the XML:DB benchmarks (xmldbench,sourceforge.net) I've noted 
> that using xml documents of about 30K in size, eXist
> falls over at anything over and above about 30 000 resources, whereas 
> Xindice 1.0 AND 1.1b happily performs up to about 100 000
> documents.
> 
> Your test data of 5MB documents would have missed this important note, 
> and I still question the reasoning in having a 5MB xml document anyway - 
> usually this is symptomatic of a design problem. (IMHO)

I wouldn't take that generalization too seriously. XML happens to
be a common serialization format for a lot of content, and I've
commonly seen 25-100MB XML documents which I'd hardly characterize
as "design problems". E.g., the ITIS zoological database has chunks
already broken up from the bigger database, each chunk is as big as
25MB. I assume the XML serializations of the Cyc ontology will be
very large, like 100MB or bigger. It really depends on the demands
of any specific application. It might be quite inappropriate to
break up certain documents into smaller pieces, especially if one
wants to conserve the ID namespace, etc.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   The New Zealand Herald : Latest World News

      Kitten survives street sweeper
        http://www.nzherald.co.nz/latestnewsstory.cfm?storyID=3539584

   [must be an important kitten]


Re: Report on XML Databases ready

Posted by "webhiker@tiscali.fr" <we...@tiscali.fr>.
One thing you you forgot to test, and hence include in your report, was 
performance when using non "large" documents.
In writing the XML:DB benchmarks (xmldbench,sourceforge.net) I've noted 
that using xml documents of about 30K in size, eXist
falls over at anything over and above about 30 000 resources, whereas 
Xindice 1.0 AND 1.1b happily performs up to about 100 000
documents.

Your test data of 5MB documents would have missed this important note, 
and I still question the reasoning in having a 5MB xml document anyway - 
usually this
is symptomatic of a design problem. (IMHO)



WH

Werner Frieb wrote:

>
> Hi list members !
>
> I've just finished a report on XML databases
> http://www.studierstube.org/world/xml_databases_compared.html
> where I've tried to evaluate and compare Tamino - Xindice - eXist.
>
> Please, let me know what you think and
> send a copy of your answer directly to my e-mail address 
> (w.frieb@gmx.net),
> so that I don't miss it.
>
> Merry Christmas !
>
> Werner.
>
>