You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Bent Andre Solheim <be...@bent-andre.com> on 2003/09/08 19:14:03 UTC

Xindice 1.1b2 stability, reliability and installation.

Hi users and developers,

I checked out Xindice 1.1b from CVS using the build-1_1_b2 tag. Was
unable to make it work properly. I then checked the working version from
CVS out and replaced the xindice-1.1.b2.jar from the b2 version with the
new one both on the server and on the commandline tool. Everything
appears to be ok.

However,

I have been performing serious tests of Xindice 1.0 the past weeks and
found it unreliable and unstable. I inserted some 3GB of data and tested
with several simultanous accesses and heavy queries. In some occasions
the results from the queries was incorrect (returned 0), and finally,
what made me consider using the CVS version, the database corrupted some
of the inserted data and was unable to restart (hung).

I have tried to make the version tagged as build-1_1_b2 work with no
success. Got it to work with the newest version from CVS, but I am
reluctant to use a working-version of a database in a production
environment. The current scenario is inserting 500-3000 documents a day
varying from 20 to 200 MB total. These numbers will increase in time.

My question is, to both users and developers, how stable and reliable is
Xindice in its current state? Is it production ready? I'm very close to
selecting another database for my project, because it is not acceptable
risking a corrupt database with a probablity not very close to zero. I
originally wanted Xindice as my primary data store because it offers a
great deal of flexibility not offered by the alternatives. At the time
being, I do not dare using it this way, because I cannot afford losing
data or taking my service down for hours to restore from backup.

Any answers from anybody with experience in this area are greatly
appreciated!


best regards,


Bent André Solheim


Collection numbers: Here's what's going on

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

First of all, I'm surprised other people haven't run into this kind of
problem. I have one collection with about 95 subcollections, each of which
has four subcollections. It kills my system after a little while because
it runs out of file descriptors. Java gives all sorts of errors about
having too many open files, after I've already increased the number to
Linux's system max of 1048576. Not only that, but on another occasion, it
somehow corrupted the database when I ran out of file descriptors, which
was making it appear as though the problem was something else. I thought
there was a concurrency problem because one of the collections was
corrupted, but it appears as though this is it. Has anyone else dealt with
this? Can I please take a poll of how many collections people have and how
many documents in each. I mean, most databases can handle hundreds of
thousands of records for tables, so I don't know what to do here.

Thanks,
David


Collection numbers: Here's what's going on

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

First of all, I'm surprised other people haven't run into this kind of
problem. I have one collection with about 95 subcollections, each of which
has four subcollections. It kills my system after a little while because
it runs out of file descriptors. Java gives all sorts of errors about
having too many open files, after I've already increased the number to
Linux's system max of 1048576. Not only that, but on another occasion, it
somehow corrupted the database when I ran out of file descriptors, which
was making it appear as though the problem was something else. I thought
there was a concurrency problem because one of the collections was
corrupted, but it appears as though this is it. Has anyone else dealt with
this? Can I please take a poll of how many collections people have and how
many documents in each. I mean, most databases can handle hundreds of
thousands of records for tables, so I don't know what to do here.

Thanks,
David


Threading and Collections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

I've been working on a problem for quite a while now and I would greatly
appreciate any insight. I'm using 1.1 b1 with the embed driver with jdk
1.4.1.

Essentially, I have one class that queries the database. It instantiates
it as such (exception handling removed):

Collection sc = null;
Database maindb = null;

         String driver =
"org.apache.xindice.client.xmldb.embed.DatabaseImpl";
         Class c = Class.forName(driver);

         maindb = (Database) c.newInstance();
         DatabaseManager.registerDatabase(maindb);


	   sc =
DatabaseManager.getCollection("xmldb:xindice-embed:///db/"+[collection_name]);


I have many threads trying to do this at once, and then querying it with
something like sc.listResources();.

Howevever, if one set of threads is using one collection, and another set
of threads is using another, they conflict and I keep getting a
NullPointerException. Do I need to make some sort of mutex locking to
ensure that only one collection is open at once? I was under the
impression that this wasn't so, but it appears otherwise. Thank you for
your advice.

Kind regards,
David




On Thu, 11 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Well, I think I've *almost* reproduced the problem consistently. I'm
> >afraid it has something to do with threading and having multiple
> >concurrent instances of a collection. It always seems to fail when trying
> >to instantiate a new CollectionManagementService, which returns null if
> >another open database inside another thread has already gotten a service.
> >
> >
>
> So, if I set off two-three threads simultaneously to get collection
> management service, it should fail? Good candidate for the unit test then.
>
>
> >Any ideas at all?
> >
> >
>
> No, I'm not that deep into xindice internals yet.
>
> Vadim
>
>
>



Threading and Collections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

I've been working on a problem for quite a while now and I would greatly
appreciate any insight. I'm using 1.1 b1 with the embed driver with jdk
1.4.1.

Essentially, I have one class that queries the database. It instantiates
it as such (exception handling removed):

Collection sc = null;
Database maindb = null;

         String driver =
"org.apache.xindice.client.xmldb.embed.DatabaseImpl";
         Class c = Class.forName(driver);

         maindb = (Database) c.newInstance();
         DatabaseManager.registerDatabase(maindb);


	   sc =
DatabaseManager.getCollection("xmldb:xindice-embed:///db/"+[collection_name]);


I have many threads trying to do this at once, and then querying it with
something like sc.listResources();.

Howevever, if one set of threads is using one collection, and another set
of threads is using another, they conflict and I keep getting a
NullPointerException. Do I need to make some sort of mutex locking to
ensure that only one collection is open at once? I was under the
impression that this wasn't so, but it appears otherwise. Thank you for
your advice.

Kind regards,
David




On Thu, 11 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Well, I think I've *almost* reproduced the problem consistently. I'm
> >afraid it has something to do with threading and having multiple
> >concurrent instances of a collection. It always seems to fail when trying
> >to instantiate a new CollectionManagementService, which returns null if
> >another open database inside another thread has already gotten a service.
> >
> >
>
> So, if I set off two-three threads simultaneously to get collection
> management service, it should fail? Good candidate for the unit test then.
>
>
> >Any ideas at all?
> >
> >
>
> No, I'm not that deep into xindice internals yet.
>
> Vadim
>
>
>



Re: Threading and CollectionManagementService

Posted by Vadim Gritsenko <va...@verizon.net>.
David J. Thomson wrote:

>Well, I think I've *almost* reproduced the problem consistently. I'm
>afraid it has something to do with threading and having multiple
>concurrent instances of a collection. It always seems to fail when trying
>to instantiate a new CollectionManagementService, which returns null if
>another open database inside another thread has already gotten a service.
>  
>

So, if I set off two-three threads simultaneously to get collection 
management service, it should fail? Good candidate for the unit test then.


>Any ideas at all?
>  
>

No, I'm not that deep into xindice internals yet.

Vadim



Threading and CollectionManagementService

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
On Wed, 10 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Also, in my performance testing, I'm trying to create subcollections in a
> >nested loop. Is there some reason why the subcollections are becoming
> >corrupted this way? I'm closing the main collection before trying to open
> >it and create a subcollection under it, but it doesn't seem to matter.
> >
> >BTW, this is all with the embedded version, in case it matters.
> >
> >
>
> If you can consistenlty reproduce the problem with some simple java
> code, please go and file a bug with test case to bugzilla. If you know
> how to fix it -- don't go, run and file a patch to bugzilla! :)
>
> Vadim
>
>
>

Well, I think I've *almost* reproduced the problem consistently. I'm
afraid it has something to do with threading and having multiple
concurrent instances of a collection. It always seems to fail when trying
to instantiate a new CollectionManagementService, which returns null if
another open database inside another thread has already gotten a service.

Any ideas at all?

Thank you.

David


Re: Maximum number of collections + subcollections

Posted by Vadim Gritsenko <va...@verizon.net>.
David J. Thomson wrote:

>Also, in my performance testing, I'm trying to create subcollections in a
>nested loop. Is there some reason why the subcollections are becoming
>corrupted this way? I'm closing the main collection before trying to open
>it and create a subcollection under it, but it doesn't seem to matter.
>
>BTW, this is all with the embedded version, in case it matters.
>  
>

If you can consistenlty reproduce the problem with some simple java 
code, please go and file a bug with test case to bugzilla. If you know 
how to fix it -- don't go, run and file a patch to bugzilla! :)

Vadim



Re: Maximum number of collections + subcollections

Posted by Barzilai Spinak <ba...@internet.com.uy>.
David J. Thomson wrote:

>Thank you for the reply. I think you're right that it has a lot to do with
>memory and the particular server/machine running it. There's at least a
>problem with the max number of files that can be open at any given time,
>since it seems to need to open a file descriptor for each collection. I'm
>not sure what the max is for any given OS, but that's probably the answer.
>My dev Linux box seems to max out at a ulimit on open files of 1048576,
>but I'm not sure where that number comes from.
>
>  
>
I don't know if it means anything but  1024*1024=1048576  That is, 2^20

BarZ



ADSL para estar en internet las 24 horas a máxima velocidad 
	          y sin ocupar el teléfono.
-----------------------------------------------------------
http://www.internet.com.uy                   Tel. 707.42.52


Re: Maximum number of collections + subcollections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Thank you for the reply. I think you're right that it has a lot to do with
memory and the particular server/machine running it. There's at least a
problem with the max number of files that can be open at any given time,
since it seems to need to open a file descriptor for each collection. I'm
not sure what the max is for any given OS, but that's probably the answer.
My dev Linux box seems to max out at a ulimit on open files of 1048576,
but I'm not sure where that number comes from.

Also, in my performance testing, I'm trying to create subcollections in a
nested loop. Is there some reason why the subcollections are becoming
corrupted this way? I'm closing the main collection before trying to open
it and create a subcollection under it, but it doesn't seem to matter.

BTW, this is all with the embedded version, in case it matters.

-David

 On Wed, 10 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Hello all,
> >
> >What is the maximum number of collections allowed? I read somewhere in the
> >archives that ~1000 documents was the limit per collection from a
> >performance standpoint, but any idea on the maximum number of collections?
> >
> >
>
> Collection stores list of sub collections in-memory, in the hash map.
> So, one limiter will be memory.
> Collection loads up list of sub collections on start-up. Another limiter
> will be startup time.
>
>
> >I'm thinking of needing a max of about 20,000,000, with relatively few
> >documents in each, i.e. no where near 1000 per collection. Is that even in
> >close?
> >
> >
>
> Try it and tell us.
>
> Vadim
>
>
>


Re: Maximum number of collections

Posted by Vadim Gritsenko <va...@verizon.net>.
David J. Thomson wrote:

>Hello all,
>
>What is the maximum number of collections allowed? I read somewhere in the
>archives that ~1000 documents was the limit per collection from a
>performance standpoint, but any idea on the maximum number of collections?
>  
>

Collection stores list of sub collections in-memory, in the hash map. 
So, one limiter will be memory.
Collection loads up list of sub collections on start-up. Another limiter 
will be startup time.


>I'm thinking of needing a max of about 20,000,000, with relatively few
>documents in each, i.e. no where near 1000 per collection. Is that even in
>close?
>  
>

Try it and tell us.

Vadim



Maximum number of collections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

What is the maximum number of collections allowed? I read somewhere in the
archives that ~1000 documents was the limit per collection from a
performance standpoint, but any idea on the maximum number of collections?
I'm thinking of needing a max of about 20,000,000, with relatively few
documents in each, i.e. no where near 1000 per collection. Is that even in
close?

Thanks,
David



RE: Xindice 1.1b2 stability, reliability and installation.

Posted by Bent Andre Solheim <be...@bent-andre.com>.
> Vadim Gritsenko wrote:
> > Bent Andre Solheim wrote:
> [...]
> >> But back to my original question; do you have any opinion on how 
> >> stable Xindice is? Which release is the most stable? Can you 
> >> recommend it in a production environment?
> > 
> > I don't have answer on this, as I have not used 1.0 
> version. I'm using
> > 1.1 CVS lightly, don't have issues with it, but I don't have your 
> > requirements also. I'd recommend you to test current CVS as 
> it will be 
> > candidate for the next beta (1.1b2), and, eventually, 1.1 
> final release.
> 
> A note on Xindice 1.0, perhaps only anecdotal, but for what 
> it's worth:
> 
> I can't speak for performance, since that's not an issue for 
> my application, but I've been successfully using Xindice 1.0 
> since it was released, and I've never had any data corruption 
> problems. It seems pretty stable, stable enough that I'm not 
> considering 1.1 until *it* has been around awhile, *and* I 
> get some time to actually implement it.

1.0 worked great for me aswell, but when I put it in an actual
production environment - and as you say testing under system load on
planned hardware - I encountered problems. The problems occured when not
"treating" the database well; volently closing connections during heavy
queries and ungently shutting the database down. It is a sad thing, but
such situations might occur in my system, so my database must deal with
it some way. I'm not saying that the database should do miracles, but
some sort of error fixing must exist in cases the database encounters
unexpected behavior. I'm going to keep testing the 1.1b2dev, and will
hopefully be able to contribute something to this project...

Thank you for your answer!


Regards
Bent


Re: Xindice 1.1b2 stability, reliability and installation.

Posted by Murray Altheim <m....@open.ac.uk>.
Vadim Gritsenko wrote:
> Bent Andre Solheim wrote:
[...]
>> But back to my original question; do you have any opinion on how stable
>> Xindice is? Which release is the most stable? Can you recommend it in a
>> production environment?
> 
> I don't have answer on this, as I have not used 1.0 version. I'm using 
> 1.1 CVS lightly, don't have issues with it, but I don't have your 
> requirements also. I'd recommend you to test current CVS as it will be 
> candidate for the next beta (1.1b2), and, eventually, 1.1 final release.

A note on Xindice 1.0, perhaps only anecdotal, but for what it's worth:

I can't speak for performance, since that's not an issue for my application,
but I've been successfully using Xindice 1.0 since it was released, and
I've never had any data corruption problems. It seems pretty stable,
stable enough that I'm not considering 1.1 until *it* has been around
awhile, *and* I get some time to actually implement it.

Since performance is usually a consideration in a production environment,
I can only say that testing with the planned hardware under a simulated
system load is probably your best bet.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

           "The current and future international political
            environment severely constrains this country's
            ability to conduct long-range strike missions." -- DARPA
            http://news.bbc.co.uk/1/hi/world/americas/3035332.stm


RE: Xindice 1.1b2 stability, reliability and installation.

Posted by Bent Andre Solheim <be...@bent-andre.com>.
> >>>Hi users and developers,
> >>>
> >>>I checked out Xindice 1.1b from CVS using the build-1_1_b2 tag. Was
> >>>unable to make it work properly. I then checked the 
> working version 
> >>>      
> >>>
> >>>from CVS out and replaced the xindice-1.1.b2.jar from the 
> b2 version
> >>    
> >>
> >>>with the new one both on the server and on the commandline tool.
> >>>Everything appears to be ok.
> >>> 
> >>>
> >>>      
> >>>
> >>There is no 1.1b2 release. There is 1.1b1 (see
> >>http://xml.apache.org/xindice/download.cgi), and there is 
> current CVS 
> >>which is 1.1b2-dev
> >>...
> >>    
> >>
> >
> >There is a 1.1b2 tag in CVS.
> >
> 
> There is might be a tag, yes, but no official release "1.1b2" 
> has ever 
> been made, to the best of my knowledge.
> 
> 
> >I checked out using this. Building went
> >fine, but I was unable to make it work properly.
> >
> 
> I know for sure that current CVS version works for me. You can update 
> your checkout using:
> cvs -q -z3 update -PdA
> 
> 
> >The download page has
> >changed since I checked out, I think. Untill recently, the download 
> >page only contained the 1.0 version, if I am not mistaken.
> >  
> >
> 
> No, you are not, yes, it has been changed :)
> ...
> 
> 
> >But back to my original question; do you have any opinion on 
> how stable 
> >Xindice is? Which release is the most stable? Can you 
> recommend it in a 
> >production environment?
> >  
> >
> 
> I don't have answer on this, as I have not used 1.0 version. 
> I'm using 
> 1.1 CVS lightly, don't have issues with it, but I don't have your 
> requirements also. I'd recommend you to test current CVS as 
> it will be 
> candidate for the next beta (1.1b2), and, eventually, 1.1 
> final release.
> 
> Vadim
> 

Thank you for your answer, Vadim. I will continue testing the 1.1b2dev
and will hopefully be able to contribute some bug files / bug-fixes as
testing progresses.

best regards
Bent


Re: Xindice 1.1b2 stability, reliability and installation.

Posted by Vadim Gritsenko <va...@verizon.net>.
Bent Andre Solheim wrote:

>>-----Original Message-----
>>From: Vadim Gritsenko [mailto:vadim.gritsenko@verizon.net] 
>>Sent: 9. september 2003 14:17
>>To: xindice-users@xml.apache.org
>>Subject: Re: Xindice 1.1b2 stability, reliability and installation.
>>
>>
>>Bent Andre Solheim wrote:
>>
>>    
>>
>>>Hi users and developers,
>>>
>>>I checked out Xindice 1.1b from CVS using the build-1_1_b2 tag. Was 
>>>unable to make it work properly. I then checked the working version 
>>>      
>>>
>>>from CVS out and replaced the xindice-1.1.b2.jar from the b2 version 
>>    
>>
>>>with the new one both on the server and on the commandline tool. 
>>>Everything appears to be ok.
>>> 
>>>
>>>      
>>>
>>There is no 1.1b2 release. There is 1.1b1 (see 
>>http://xml.apache.org/xindice/download.cgi), and there is current CVS 
>>which is 1.1b2-dev
>>...
>>    
>>
>
>There is a 1.1b2 tag in CVS.
>

There is might be a tag, yes, but no official release "1.1b2" has ever 
been made, to the best of my knowledge.


>I checked out using this. Building went
>fine, but I was unable to make it work properly.
>

I know for sure that current CVS version works for me. You can update 
your checkout using:
cvs -q -z3 update -PdA


>The download page has
>changed since I checked out, I think. Untill recently, the download page
>only contained the 1.0 version, if I am not mistaken.
>  
>

No, you are not, yes, it has been changed :)
...


>But back to my original question; do you have any opinion on how stable
>Xindice is? Which release is the most stable? Can you recommend it in a
>production environment?
>  
>

I don't have answer on this, as I have not used 1.0 version. I'm using 
1.1 CVS lightly, don't have issues with it, but I don't have your 
requirements also. I'd recommend you to test current CVS as it will be 
candidate for the next beta (1.1b2), and, eventually, 1.1 final release.

Vadim



RE: Xindice 1.1b2 stability, reliability and installation.

Posted by Bent Andre Solheim <be...@bent-andre.com>.


/Bent



> -----Original Message-----
> From: Vadim Gritsenko [mailto:vadim.gritsenko@verizon.net] 
> Sent: 9. september 2003 14:17
> To: xindice-users@xml.apache.org
> Subject: Re: Xindice 1.1b2 stability, reliability and installation.
> 
> 
> Bent Andre Solheim wrote:
> 
> >Hi users and developers,
> >
> >I checked out Xindice 1.1b from CVS using the build-1_1_b2 tag. Was 
> >unable to make it work properly. I then checked the working version 
> >from CVS out and replaced the xindice-1.1.b2.jar from the b2 version 
> >with the new one both on the server and on the commandline tool. 
> >Everything appears to be ok.
> >  
> >
> 
> There is no 1.1b2 release. There is 1.1b1 (see 
> http://xml.apache.org/xindice/download.cgi), and there is current CVS 
> which is 1.1b2-dev
> ...

There is a 1.1b2 tag in CVS. I checked out using this. Building went
fine, but I was unable to make it work properly. The download page has
changed since I checked out, I think. Untill recently, the download page
only contained the 1.0 version, if I am not mistaken.


> 
> 
> >My question is, to both users and developers, how stable and 
> reliable 
> >is Xindice in its current state? Is it production ready? I'm 
> very close 
> >to selecting another database for my project, because it is not 
> >acceptable risking a corrupt database with a probablity not 
> very close 
> >to zero. I originally wanted Xindice as my primary data 
> store because 
> >it offers a great deal of flexibility not offered by the 
> alternatives. 
> >At the time being, I do not dare using it this way, because I cannot 
> >afford losing data or taking my service down for hours to 
> restore from 
> >backup.
> >  
> >
> 
> Why don't you perform your tests against latest CVS version? 
> If you find 
> corruption of the data, you are very welcome to file a bug into 
> Bugzilla. Currently, I don't see any open data corruption issues [1]
> 
> Vadim

I would very much like to spend time testing the newest version, but
when I first started using the database, only 1.0 was available. And my
project requires having a stable and reliable datastore. Most of the
time, the official releases are the most stable ones, so I did my test
against 1.0.

I did not file a bug because I was unable to trace the cause of the
database hanging, but it started after shutting the database down
brutally, and the bug
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22775 sounds very
familiar.

But back to my original question; do you have any opinion on how stable
Xindice is? Which release is the most stable? Can you recommend it in a
production environment?


regards
Bent
> 
> [1] 
> http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&b
ug_status=ASSIGNED&bug_status=REOPENED&product=Xindice&cmdtype=doit



Re: Xindice 1.1b2 stability, reliability and installation.

Posted by Vadim Gritsenko <va...@verizon.net>.
Bent Andre Solheim wrote:

>Hi users and developers,
>
>I checked out Xindice 1.1b from CVS using the build-1_1_b2 tag. Was
>unable to make it work properly. I then checked the working version from
>CVS out and replaced the xindice-1.1.b2.jar from the b2 version with the
>new one both on the server and on the commandline tool. Everything
>appears to be ok.
>  
>

There is no 1.1b2 release. There is 1.1b1 (see 
http://xml.apache.org/xindice/download.cgi), and there is current CVS 
which is 1.1b2-dev
...


>My question is, to both users and developers, how stable and reliable is
>Xindice in its current state? Is it production ready? I'm very close to
>selecting another database for my project, because it is not acceptable
>risking a corrupt database with a probablity not very close to zero. I
>originally wanted Xindice as my primary data store because it offers a
>great deal of flexibility not offered by the alternatives. At the time
>being, I do not dare using it this way, because I cannot afford losing
>data or taking my service down for hours to restore from backup.
>  
>

Why don't you perform your tests against latest CVS version? If you find 
corruption of the data, you are very welcome to file a bug into 
Bugzilla. Currently, I don't see any open data corruption issues [1]

Vadim

[1] 
http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&product=Xindice&cmdtype=doit