You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by "David J. Thomson" <dt...@eecs.tufts.edu> on 2003/09/10 16:35:00 UTC

Re: Maximum number of collections + subcollections

Thank you for the reply. I think you're right that it has a lot to do with
memory and the particular server/machine running it. There's at least a
problem with the max number of files that can be open at any given time,
since it seems to need to open a file descriptor for each collection. I'm
not sure what the max is for any given OS, but that's probably the answer.
My dev Linux box seems to max out at a ulimit on open files of 1048576,
but I'm not sure where that number comes from.

Also, in my performance testing, I'm trying to create subcollections in a
nested loop. Is there some reason why the subcollections are becoming
corrupted this way? I'm closing the main collection before trying to open
it and create a subcollection under it, but it doesn't seem to matter.

BTW, this is all with the embedded version, in case it matters.

-David

 On Wed, 10 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Hello all,
> >
> >What is the maximum number of collections allowed? I read somewhere in the
> >archives that ~1000 documents was the limit per collection from a
> >performance standpoint, but any idea on the maximum number of collections?
> >
> >
>
> Collection stores list of sub collections in-memory, in the hash map.
> So, one limiter will be memory.
> Collection loads up list of sub collections on start-up. Another limiter
> will be startup time.
>
>
> >I'm thinking of needing a max of about 20,000,000, with relatively few
> >documents in each, i.e. no where near 1000 per collection. Is that even in
> >close?
> >
> >
>
> Try it and tell us.
>
> Vadim
>
>
>


Collection numbers: Here's what's going on

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

First of all, I'm surprised other people haven't run into this kind of
problem. I have one collection with about 95 subcollections, each of which
has four subcollections. It kills my system after a little while because
it runs out of file descriptors. Java gives all sorts of errors about
having too many open files, after I've already increased the number to
Linux's system max of 1048576. Not only that, but on another occasion, it
somehow corrupted the database when I ran out of file descriptors, which
was making it appear as though the problem was something else. I thought
there was a concurrency problem because one of the collections was
corrupted, but it appears as though this is it. Has anyone else dealt with
this? Can I please take a poll of how many collections people have and how
many documents in each. I mean, most databases can handle hundreds of
thousands of records for tables, so I don't know what to do here.

Thanks,
David


Collection numbers: Here's what's going on

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

First of all, I'm surprised other people haven't run into this kind of
problem. I have one collection with about 95 subcollections, each of which
has four subcollections. It kills my system after a little while because
it runs out of file descriptors. Java gives all sorts of errors about
having too many open files, after I've already increased the number to
Linux's system max of 1048576. Not only that, but on another occasion, it
somehow corrupted the database when I ran out of file descriptors, which
was making it appear as though the problem was something else. I thought
there was a concurrency problem because one of the collections was
corrupted, but it appears as though this is it. Has anyone else dealt with
this? Can I please take a poll of how many collections people have and how
many documents in each. I mean, most databases can handle hundreds of
thousands of records for tables, so I don't know what to do here.

Thanks,
David


Threading and Collections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

I've been working on a problem for quite a while now and I would greatly
appreciate any insight. I'm using 1.1 b1 with the embed driver with jdk
1.4.1.

Essentially, I have one class that queries the database. It instantiates
it as such (exception handling removed):

Collection sc = null;
Database maindb = null;

         String driver =
"org.apache.xindice.client.xmldb.embed.DatabaseImpl";
         Class c = Class.forName(driver);

         maindb = (Database) c.newInstance();
         DatabaseManager.registerDatabase(maindb);


	   sc =
DatabaseManager.getCollection("xmldb:xindice-embed:///db/"+[collection_name]);


I have many threads trying to do this at once, and then querying it with
something like sc.listResources();.

Howevever, if one set of threads is using one collection, and another set
of threads is using another, they conflict and I keep getting a
NullPointerException. Do I need to make some sort of mutex locking to
ensure that only one collection is open at once? I was under the
impression that this wasn't so, but it appears otherwise. Thank you for
your advice.

Kind regards,
David




On Thu, 11 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Well, I think I've *almost* reproduced the problem consistently. I'm
> >afraid it has something to do with threading and having multiple
> >concurrent instances of a collection. It always seems to fail when trying
> >to instantiate a new CollectionManagementService, which returns null if
> >another open database inside another thread has already gotten a service.
> >
> >
>
> So, if I set off two-three threads simultaneously to get collection
> management service, it should fail? Good candidate for the unit test then.
>
>
> >Any ideas at all?
> >
> >
>
> No, I'm not that deep into xindice internals yet.
>
> Vadim
>
>
>



Threading and Collections

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
Hello all,

I've been working on a problem for quite a while now and I would greatly
appreciate any insight. I'm using 1.1 b1 with the embed driver with jdk
1.4.1.

Essentially, I have one class that queries the database. It instantiates
it as such (exception handling removed):

Collection sc = null;
Database maindb = null;

         String driver =
"org.apache.xindice.client.xmldb.embed.DatabaseImpl";
         Class c = Class.forName(driver);

         maindb = (Database) c.newInstance();
         DatabaseManager.registerDatabase(maindb);


	   sc =
DatabaseManager.getCollection("xmldb:xindice-embed:///db/"+[collection_name]);


I have many threads trying to do this at once, and then querying it with
something like sc.listResources();.

Howevever, if one set of threads is using one collection, and another set
of threads is using another, they conflict and I keep getting a
NullPointerException. Do I need to make some sort of mutex locking to
ensure that only one collection is open at once? I was under the
impression that this wasn't so, but it appears otherwise. Thank you for
your advice.

Kind regards,
David




On Thu, 11 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Well, I think I've *almost* reproduced the problem consistently. I'm
> >afraid it has something to do with threading and having multiple
> >concurrent instances of a collection. It always seems to fail when trying
> >to instantiate a new CollectionManagementService, which returns null if
> >another open database inside another thread has already gotten a service.
> >
> >
>
> So, if I set off two-three threads simultaneously to get collection
> management service, it should fail? Good candidate for the unit test then.
>
>
> >Any ideas at all?
> >
> >
>
> No, I'm not that deep into xindice internals yet.
>
> Vadim
>
>
>



Re: Threading and CollectionManagementService

Posted by Vadim Gritsenko <va...@verizon.net>.
David J. Thomson wrote:

>Well, I think I've *almost* reproduced the problem consistently. I'm
>afraid it has something to do with threading and having multiple
>concurrent instances of a collection. It always seems to fail when trying
>to instantiate a new CollectionManagementService, which returns null if
>another open database inside another thread has already gotten a service.
>  
>

So, if I set off two-three threads simultaneously to get collection 
management service, it should fail? Good candidate for the unit test then.


>Any ideas at all?
>  
>

No, I'm not that deep into xindice internals yet.

Vadim



Threading and CollectionManagementService

Posted by "David J. Thomson" <dt...@eecs.tufts.edu>.
On Wed, 10 Sep 2003, Vadim Gritsenko wrote:

> David J. Thomson wrote:
>
> >Also, in my performance testing, I'm trying to create subcollections in a
> >nested loop. Is there some reason why the subcollections are becoming
> >corrupted this way? I'm closing the main collection before trying to open
> >it and create a subcollection under it, but it doesn't seem to matter.
> >
> >BTW, this is all with the embedded version, in case it matters.
> >
> >
>
> If you can consistenlty reproduce the problem with some simple java
> code, please go and file a bug with test case to bugzilla. If you know
> how to fix it -- don't go, run and file a patch to bugzilla! :)
>
> Vadim
>
>
>

Well, I think I've *almost* reproduced the problem consistently. I'm
afraid it has something to do with threading and having multiple
concurrent instances of a collection. It always seems to fail when trying
to instantiate a new CollectionManagementService, which returns null if
another open database inside another thread has already gotten a service.

Any ideas at all?

Thank you.

David


Re: Maximum number of collections + subcollections

Posted by Vadim Gritsenko <va...@verizon.net>.
David J. Thomson wrote:

>Also, in my performance testing, I'm trying to create subcollections in a
>nested loop. Is there some reason why the subcollections are becoming
>corrupted this way? I'm closing the main collection before trying to open
>it and create a subcollection under it, but it doesn't seem to matter.
>
>BTW, this is all with the embedded version, in case it matters.
>  
>

If you can consistenlty reproduce the problem with some simple java 
code, please go and file a bug with test case to bugzilla. If you know 
how to fix it -- don't go, run and file a patch to bugzilla! :)

Vadim



Re: Maximum number of collections + subcollections

Posted by Barzilai Spinak <ba...@internet.com.uy>.
David J. Thomson wrote:

>Thank you for the reply. I think you're right that it has a lot to do with
>memory and the particular server/machine running it. There's at least a
>problem with the max number of files that can be open at any given time,
>since it seems to need to open a file descriptor for each collection. I'm
>not sure what the max is for any given OS, but that's probably the answer.
>My dev Linux box seems to max out at a ulimit on open files of 1048576,
>but I'm not sure where that number comes from.
>
>  
>
I don't know if it means anything but  1024*1024=1048576  That is, 2^20

BarZ



ADSL para estar en internet las 24 horas a máxima velocidad 
	          y sin ocupar el teléfono.
-----------------------------------------------------------
http://www.internet.com.uy                   Tel. 707.42.52