You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-dev@xml.apache.org by Cam Bazz <ca...@gmail.com> on 2007/07/04 07:47:54 UTC

btree on disk

Hello Developers,

I have been looking for a solid implementation of a Java BTree on disk for
the longest time. Could it be possible to use xindices btree on disk
seperately.
If so, where would be a good place to start. I have looked at the source
code, instantiated a new BTree and added values to it, but I got a null
pointer exception saying that the file was not opened.

Thanks to all,
-C.B.

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

Yeah, I am game. I will at least try.

Best,
-C.B.

On 7/11/07, Vadim Gritsenko <va...@reverycodes.com> wrote:
>
> Cam Bazz wrote:
> > this sounds interesting. would it be too diffucult to implement linear
> > hashing for xindices current hashfiler?
>
> No reason to implement linear hashing within the current HashFiler, but it
> makes
> sense to implement separate LinearHashFiler class, and if you start with
> HashFiler's code as a starting point, it should not be too hard.
>
> Are you game? ;)
>
> Vadim
>
>
> > On 7/11/07, *Joel Nelson* < joelnn@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> >     True for Xindice's implementation of hashing, but if Xindice
> supported
> >     Linear Hashing, even growing records would be ok.
> >
> >     http://en.wikipedia.org/wiki/Linear_hash
> >
> >     There were a lot of papers written about this, unfortunately I lost
> >     the pdfs I had a while ago. However I think Berkeley DB (the famous
> >     embedded DB) implements linear hashing, and you should be able to
> find
> >     more information by googling.
> >
> >     On 7/10/07, Natalia Shilenkova < nshilenkova@gmail.com
> >     <ma...@gmail.com>> wrote:
> >      > On 7/10/07, Joel Nelson <joelnn@gmail.com
> >     <ma...@gmail.com>> wrote:
> >      > > > it otherwords, I have to check if a graph contains a node n,
> >     if it does not
> >      > > > write a new node, and if it does retrieve the node. I need to
> >     be able to do
> >      > > > this very fast. I have tried object oriented dbms, although
> >     they worked
> >      > > > nicely for graph things, they were very poor at identity
> search.
> >      > > >
> >      > >
> >      > > Remember what a btree is good at: requiring a minimal number of
> >     disk
> >      > > accesses to do range queries, e.g. "give me all nodes with a
> key
> >      > > between 54-98"
> >      > >
> >      > > If you do not need range queries, and you don't need to iterate
> >     over
> >      > > the nodes in some sorted order, there is no advantage I know of
> to
> >      > > using a b-tree, or any other tree structure.
> >      > >
> >      > > Instead, it sounds like you already know what exact key value
> >     to look
> >      > > for (there is no range). In this case, a hash table (and
> >     therefore the
> >      > > HashFiler) may be a lot faster for you. A hash table can be
> >     configured
> >      > > so that it almost always finds (or reports that it cannot find)
> >     your
> >      > > node with 1 disk access, versus a btree which can require 3-5.
> >      >
> >      > While it is true, hash table is not always a preferred data
> >     structure
> >      > because as more values are added, it will get collisions more
> often
> >      > and will get slower. I am not sure if it appears anywhere in
> Xindice
> >      > documentation, but general guidelines for selecting between
> >     HashFiler
> >      > and BTreeFiler are the following:
> >      >
> >      > 1. If amount of records that are going to be stored can be
> estimated
> >      > and won't change too much, HashFiler is a way to go. In this case
> >      > filer can be configured to allocate optimal amount of space for
> fast
> >      > data access.
> >      >
> >      > 2. If amount of records is unknown or likely to grow or hash
> function
> >      > does not work too well, BTreeFiler will show better performance.
> >      >
> >      > This applies to any kind of data.
> >      >
> >      > > Also depending on what you are trying to do with these graphs,
> >     there
> >      > > are specialized data structures that may work even better than
> >     a hash
> >      > > table. If you are willing as you said to create your own
> >     database, you
> >      > > should not be afraid to investigate using your own file format.
> >      > >
> >      > > Here are some common structures used for storing graphs:
> >      > >
> >      > > http://en.wikipedia.org/wiki/Adjacency_list
> >      > > http://en.wikipedia.org/wiki/Adjacency_matrix
> >      > >
> >      > > Hope this helps.
> >      > >
> >      >
> >
> >
>
>

Re: btree on disk

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Cam Bazz wrote:
> this sounds interesting. would it be too diffucult to implement linear 
> hashing for xindices current hashfiler?

No reason to implement linear hashing within the current HashFiler, but it makes 
sense to implement separate LinearHashFiler class, and if you start with 
HashFiler's code as a starting point, it should not be too hard.

Are you game? ;)

Vadim


> On 7/11/07, *Joel Nelson* < joelnn@gmail.com <ma...@gmail.com>> 
> wrote:
> 
>     True for Xindice's implementation of hashing, but if Xindice supported
>     Linear Hashing, even growing records would be ok.
> 
>     http://en.wikipedia.org/wiki/Linear_hash
> 
>     There were a lot of papers written about this, unfortunately I lost
>     the pdfs I had a while ago. However I think Berkeley DB (the famous
>     embedded DB) implements linear hashing, and you should be able to find
>     more information by googling.
> 
>     On 7/10/07, Natalia Shilenkova < nshilenkova@gmail.com
>     <ma...@gmail.com>> wrote:
>      > On 7/10/07, Joel Nelson <joelnn@gmail.com
>     <ma...@gmail.com>> wrote:
>      > > > it otherwords, I have to check if a graph contains a node n,
>     if it does not
>      > > > write a new node, and if it does retrieve the node. I need to
>     be able to do
>      > > > this very fast. I have tried object oriented dbms, although
>     they worked
>      > > > nicely for graph things, they were very poor at identity search.
>      > > >
>      > >
>      > > Remember what a btree is good at: requiring a minimal number of
>     disk
>      > > accesses to do range queries, e.g. "give me all nodes with a key
>      > > between 54-98"
>      > >
>      > > If you do not need range queries, and you don't need to iterate
>     over
>      > > the nodes in some sorted order, there is no advantage I know of to
>      > > using a b-tree, or any other tree structure.
>      > >
>      > > Instead, it sounds like you already know what exact key value
>     to look
>      > > for (there is no range). In this case, a hash table (and
>     therefore the
>      > > HashFiler) may be a lot faster for you. A hash table can be
>     configured
>      > > so that it almost always finds (or reports that it cannot find)
>     your
>      > > node with 1 disk access, versus a btree which can require 3-5.
>      >
>      > While it is true, hash table is not always a preferred data
>     structure
>      > because as more values are added, it will get collisions more often
>      > and will get slower. I am not sure if it appears anywhere in Xindice
>      > documentation, but general guidelines for selecting between
>     HashFiler
>      > and BTreeFiler are the following:
>      >
>      > 1. If amount of records that are going to be stored can be estimated
>      > and won't change too much, HashFiler is a way to go. In this case
>      > filer can be configured to allocate optimal amount of space for fast
>      > data access.
>      >
>      > 2. If amount of records is unknown or likely to grow or hash function
>      > does not work too well, BTreeFiler will show better performance.
>      >
>      > This applies to any kind of data.
>      >
>      > > Also depending on what you are trying to do with these graphs,
>     there
>      > > are specialized data structures that may work even better than
>     a hash
>      > > table. If you are willing as you said to create your own
>     database, you
>      > > should not be afraid to investigate using your own file format.
>      > >
>      > > Here are some common structures used for storing graphs:
>      > >
>      > > http://en.wikipedia.org/wiki/Adjacency_list
>      > > http://en.wikipedia.org/wiki/Adjacency_matrix
>      > >
>      > > Hope this helps.
>      > >
>      >
> 
>

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

this sounds interesting. would it be too diffucult to implement linear
hashing for xindices current hashfiler?

On 7/11/07, Joel Nelson <jo...@gmail.com> wrote:
>
> True for Xindice's implementation of hashing, but if Xindice supported
> Linear Hashing, even growing records would be ok.
>
> http://en.wikipedia.org/wiki/Linear_hash
>
> There were a lot of papers written about this, unfortunately I lost
> the pdfs I had a while ago. However I think Berkeley DB (the famous
> embedded DB) implements linear hashing, and you should be able to find
> more information by googling.
>
> On 7/10/07, Natalia Shilenkova <ns...@gmail.com> wrote:
> > On 7/10/07, Joel Nelson <jo...@gmail.com> wrote:
> > > > it otherwords, I have to check if a graph contains a node n, if it
> does not
> > > > write a new node, and if it does retrieve the node. I need to be
> able to do
> > > > this very fast. I have tried object oriented dbms, although they
> worked
> > > > nicely for graph things, they were very poor at identity search.
> > > >
> > >
> > > Remember what a btree is good at: requiring a minimal number of disk
> > > accesses to do range queries, e.g. "give me all nodes with a key
> > > between 54-98"
> > >
> > > If you do not need range queries, and you don't need to iterate over
> > > the nodes in some sorted order, there is no advantage I know of to
> > > using a b-tree, or any other tree structure.
> > >
> > > Instead, it sounds like you already know what exact key value to look
> > > for (there is no range). In this case, a hash table (and therefore the
> > > HashFiler) may be a lot faster for you. A hash table can be configured
> > > so that it almost always finds (or reports that it cannot find) your
> > > node with 1 disk access, versus a btree which can require 3-5.
> >
> > While it is true, hash table is not always a preferred data structure
> > because as more values are added, it will get collisions more often
> > and will get slower. I am not sure if it appears anywhere in Xindice
> > documentation, but general guidelines for selecting between HashFiler
> > and BTreeFiler are the following:
> >
> > 1. If amount of records that are going to be stored can be estimated
> > and won't change too much, HashFiler is a way to go. In this case
> > filer can be configured to allocate optimal amount of space for fast
> > data access.
> >
> > 2. If amount of records is unknown or likely to grow or hash function
> > does not work too well, BTreeFiler will show better performance.
> >
> > This applies to any kind of data.
> >
> > > Also depending on what you are trying to do with these graphs, there
> > > are specialized data structures that may work even better than a hash
> > > table. If you are willing as you said to create your own database, you
> > > should not be afraid to investigate using your own file format.
> > >
> > > Here are some common structures used for storing graphs:
> > >
> > > http://en.wikipedia.org/wiki/Adjacency_list
> > > http://en.wikipedia.org/wiki/Adjacency_matrix
> > >
> > > Hope this helps.
> > >
> >
>

Re: btree on disk

Posted by Joel Nelson <jo...@gmail.com>.

True for Xindice's implementation of hashing, but if Xindice supported
Linear Hashing, even growing records would be ok.

http://en.wikipedia.org/wiki/Linear_hash

There were a lot of papers written about this, unfortunately I lost
the pdfs I had a while ago. However I think Berkeley DB (the famous
embedded DB) implements linear hashing, and you should be able to find
more information by googling.

On 7/10/07, Natalia Shilenkova <ns...@gmail.com> wrote:
> On 7/10/07, Joel Nelson <jo...@gmail.com> wrote:
> > > it otherwords, I have to check if a graph contains a node n, if it does not
> > > write a new node, and if it does retrieve the node. I need to be able to do
> > > this very fast. I have tried object oriented dbms, although they worked
> > > nicely for graph things, they were very poor at identity search.
> > >
> >
> > Remember what a btree is good at: requiring a minimal number of disk
> > accesses to do range queries, e.g. "give me all nodes with a key
> > between 54-98"
> >
> > If you do not need range queries, and you don't need to iterate over
> > the nodes in some sorted order, there is no advantage I know of to
> > using a b-tree, or any other tree structure.
> >
> > Instead, it sounds like you already know what exact key value to look
> > for (there is no range). In this case, a hash table (and therefore the
> > HashFiler) may be a lot faster for you. A hash table can be configured
> > so that it almost always finds (or reports that it cannot find) your
> > node with 1 disk access, versus a btree which can require 3-5.
>
> While it is true, hash table is not always a preferred data structure
> because as more values are added, it will get collisions more often
> and will get slower. I am not sure if it appears anywhere in Xindice
> documentation, but general guidelines for selecting between HashFiler
> and BTreeFiler are the following:
>
> 1. If amount of records that are going to be stored can be estimated
> and won't change too much, HashFiler is a way to go. In this case
> filer can be configured to allocate optimal amount of space for fast
> data access.
>
> 2. If amount of records is unknown or likely to grow or hash function
> does not work too well, BTreeFiler will show better performance.
>
> This applies to any kind of data.
>
> > Also depending on what you are trying to do with these graphs, there
> > are specialized data structures that may work even better than a hash
> > table. If you are willing as you said to create your own database, you
> > should not be afraid to investigate using your own file format.
> >
> > Here are some common structures used for storing graphs:
> >
> > http://en.wikipedia.org/wiki/Adjacency_list
> > http://en.wikipedia.org/wiki/Adjacency_matrix
> >
> > Hope this helps.
> >
>

Re: btree on disk

Posted by Natalia Shilenkova <ns...@gmail.com>.

On 7/10/07, Joel Nelson <jo...@gmail.com> wrote:
> > it otherwords, I have to check if a graph contains a node n, if it does not
> > write a new node, and if it does retrieve the node. I need to be able to do
> > this very fast. I have tried object oriented dbms, although they worked
> > nicely for graph things, they were very poor at identity search.
> >
>
> Remember what a btree is good at: requiring a minimal number of disk
> accesses to do range queries, e.g. "give me all nodes with a key
> between 54-98"
>
> If you do not need range queries, and you don't need to iterate over
> the nodes in some sorted order, there is no advantage I know of to
> using a b-tree, or any other tree structure.
>
> Instead, it sounds like you already know what exact key value to look
> for (there is no range). In this case, a hash table (and therefore the
> HashFiler) may be a lot faster for you. A hash table can be configured
> so that it almost always finds (or reports that it cannot find) your
> node with 1 disk access, versus a btree which can require 3-5.

While it is true, hash table is not always a preferred data structure
because as more values are added, it will get collisions more often
and will get slower. I am not sure if it appears anywhere in Xindice
documentation, but general guidelines for selecting between HashFiler
and BTreeFiler are the following:

1. If amount of records that are going to be stored can be estimated
and won't change too much, HashFiler is a way to go. In this case
filer can be configured to allocate optimal amount of space for fast
data access.

2. If amount of records is unknown or likely to grow or hash function
does not work too well, BTreeFiler will show better performance.

This applies to any kind of data.

> Also depending on what you are trying to do with these graphs, there
> are specialized data structures that may work even better than a hash
> table. If you are willing as you said to create your own database, you
> should not be afraid to investigate using your own file format.
>
> Here are some common structures used for storing graphs:
>
> http://en.wikipedia.org/wiki/Adjacency_list
> http://en.wikipedia.org/wiki/Adjacency_matrix
>
> Hope this helps.
>

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

Hello Joel,

Thanks for the hints.

Yes, I am researching on how to design my own file format. Since adjacency
matrix does not scale well, and removals are problematic, I had decided to
use an adjacency list presentation of a graph.

I have tried the b-tree approach, and yes that was real slow.

I need 2 things in main:

a. being able to find things on disk with an id.

b. being able to link things by referencing on some physical id on disk.

    let me explain more on that. lets say we have an edge (a, b) and we put
this somewhere on disk.
    to find vertex b, I dont want to search a hashtable, but rather just
follow a link to b's location on       disk.

Best Regards,

-C.B.


On 7/10/07, Joel Nelson <jo...@gmail.com> wrote:
>
> > it otherwords, I have to check if a graph contains a node n, if it does
> not
> > write a new node, and if it does retrieve the node. I need to be able to
> do
> > this very fast. I have tried object oriented dbms, although they worked
> > nicely for graph things, they were very poor at identity search.
> >
>
> Remember what a btree is good at: requiring a minimal number of disk
> accesses to do range queries, e.g. "give me all nodes with a key
> between 54-98"
>
> If you do not need range queries, and you don't need to iterate over
> the nodes in some sorted order, there is no advantage I know of to
> using a b-tree, or any other tree structure.
>
> Instead, it sounds like you already know what exact key value to look
> for (there is no range). In this case, a hash table (and therefore the
> HashFiler) may be a lot faster for you. A hash table can be configured
> so that it almost always finds (or reports that it cannot find) your
> node with 1 disk access, versus a btree which can require 3-5.
>
> Also depending on what you are trying to do with these graphs, there
> are specialized data structures that may work even better than a hash
> table. If you are willing as you said to create your own database, you
> should not be afraid to investigate using your own file format.
>
> Here are some common structures used for storing graphs:
>
> http://en.wikipedia.org/wiki/Adjacency_list
> http://en.wikipedia.org/wiki/Adjacency_matrix
>
> Hope this helps.
>

Re: btree on disk

Posted by Joel Nelson <jo...@gmail.com>.

> it otherwords, I have to check if a graph contains a node n, if it does not
> write a new node, and if it does retrieve the node. I need to be able to do
> this very fast. I have tried object oriented dbms, although they worked
> nicely for graph things, they were very poor at identity search.
>

Remember what a btree is good at: requiring a minimal number of disk
accesses to do range queries, e.g. "give me all nodes with a key
between 54-98"

If you do not need range queries, and you don't need to iterate over
the nodes in some sorted order, there is no advantage I know of to
using a b-tree, or any other tree structure.

Instead, it sounds like you already know what exact key value to look
for (there is no range). In this case, a hash table (and therefore the
HashFiler) may be a lot faster for you. A hash table can be configured
so that it almost always finds (or reports that it cannot find) your
node with 1 disk access, versus a btree which can require 3-5.

Also depending on what you are trying to do with these graphs, there
are specialized data structures that may work even better than a hash
table. If you are willing as you said to create your own database, you
should not be afraid to investigate using your own file format.

Here are some common structures used for storing graphs:

http://en.wikipedia.org/wiki/Adjacency_list
http://en.wikipedia.org/wiki/Adjacency_matrix

Hope this helps.

Re: btree on disk

Posted by Natalia Shilenkova <ns...@gmail.com>.

On 7/10/07, Cam Bazz <ca...@gmail.com> wrote:
> Hello Natalia;
>
> If I were to put records inside a BTreeFiler in a way:
>
>
> filer.writeRecord(new Key(somekeyvalue),new Value(somevalue))
>
> and then search this filer based on somekeyvalue, would I have to use a
> ValueIndexer?

No, you do not need indexer for that. Searching by a key is the
fastest way to find a record. ValueIndexer is used when you store XML
and want to search data based on element/attribute values.

> A typical operation is for adding nodes to a graph is:
>
> record r;
> if(filer.readRecord(new Key(keysearched))==null)
> {
>    r = new Record(new Key(keysearched), new Value(somevalue));
>    filer.writeRecord (r);
> }
> else
> {
>   r = filer.readRecord(new Key(keysearched));
> }
>
> it otherwords, I have to check if a graph contains a node n, if it does not
> write a new node, and if it does retrieve the node. I need to be able to do
> this very fast. I have tried object oriented dbms, although they worked
> nicely for graph things, they were very poor at identity search.

That would work. I would suggest to change it a bit, however, so the
code does not read a record twice if key is found - retrieving data
from file system is pretty expensive.

> Any ideas, recomendations and help greatly appreciated,
> -C.B.

Natalia

>
>
> On 7/9/07, Natalia Shilenkova <nshilenkova@gmail.com > wrote:
> > On 7/9/07, Cam Bazz <cambazz@gmail.com > wrote:
> > > Hello All,
> > >
> > > That was really helpful. Thanks.
> > >
> > > I also noticed FSFiler, and HashFiler. Are those for Indexing files?
> >
> > Xindice allows several way for the data to be stored. Currently it has
> > BTreeFiler (based on tree-like structure), HashFiler (based on hash
> > table), FSFiler (uses file system), MemFiler (in-memory), and
> > SizableMemFiler (in-memory). A document collection can be configured
> > to use any of those filers, depending on collection characteristics.
> >
> > Additionally, collection can have an index. There are 3 indexers in
> > Xindice now: NameIndexer, ValueIndexer, and MemValueIndexer.
> >
> > > I am working on a graph database project. I have tried several oodbms,
> and
> > > finally decided to build my own.
> > >
> > > Also, could it be possible to use xindice as a backend for Jena?
> >
> > No idea, really. Never heard of Jena before.
> >
> > Regards,
> > Natalia
> >
>
>

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

Hello Natalia;

If I were to put records inside a BTreeFiler in a way:


filer.writeRecord(new Key(somekeyvalue),new Value(somevalue))

and then search this filer based on somekeyvalue, would I have to use a
ValueIndexer?

A typical operation is for adding nodes to a graph is:

record r;
if(filer.readRecord(new Key(keysearched))==null)
{
   r = new Record(new Key(keysearched), new Value(somevalue));
   filer.writeRecord(r);
}
else
{
  r = filer.readRecord(new Key(keysearched));
}

it otherwords, I have to check if a graph contains a node n, if it does not
write a new node, and if it does retrieve the node. I need to be able to do
this very fast. I have tried object oriented dbms, although they worked
nicely for graph things, they were very poor at identity search.

Any ideas, recomendations and help greatly appreciated,
-C.B.

On 7/9/07, Natalia Shilenkova <ns...@gmail.com> wrote:
>
> On 7/9/07, Cam Bazz <ca...@gmail.com> wrote:
> > Hello All,
> >
> > That was really helpful. Thanks.
> >
> > I also noticed FSFiler, and HashFiler. Are those for Indexing files?
>
> Xindice allows several way for the data to be stored. Currently it has
> BTreeFiler (based on tree-like structure), HashFiler (based on hash
> table), FSFiler (uses file system), MemFiler (in-memory), and
> SizableMemFiler (in-memory). A document collection can be configured
> to use any of those filers, depending on collection characteristics.
>
> Additionally, collection can have an index. There are 3 indexers in
> Xindice now: NameIndexer, ValueIndexer, and MemValueIndexer.
>
> > I am working on a graph database project. I have tried several oodbms,
> and
> > finally decided to build my own.
> >
> > Also, could it be possible to use xindice as a backend for Jena?
>
> No idea, really. Never heard of Jena before.
>
> Regards,
> Natalia
>

Re: btree on disk

Posted by Natalia Shilenkova <ns...@gmail.com>.

On 7/9/07, Cam Bazz <ca...@gmail.com> wrote:
> Hello All,
>
> That was really helpful. Thanks.
>
> I also noticed FSFiler, and HashFiler. Are those for Indexing files?

Xindice allows several way for the data to be stored. Currently it has
BTreeFiler (based on tree-like structure), HashFiler (based on hash
table), FSFiler (uses file system), MemFiler (in-memory), and
SizableMemFiler (in-memory). A document collection can be configured
to use any of those filers, depending on collection characteristics.

Additionally, collection can have an index. There are 3 indexers in
Xindice now: NameIndexer, ValueIndexer, and MemValueIndexer.

> I am working on a graph database project. I have tried several oodbms, and
> finally decided to build my own.
>
> Also, could it be possible to use xindice as a backend for Jena?

No idea, really. Never heard of Jena before.

Regards,
Natalia

Re: btree on disk

Posted by Gianugo Rabellino <gi...@apache.org>.

On 7/9/07, Cam Bazz <ca...@gmail.com> wrote:
> I am working on a graph database project. I have tried several oodbms, and
> finally decided to build my own.
>
> Also, could it be possible to use xindice as a backend for Jena?

Possible, I guess it is (but with lots of pain). Sensible, I don't
think so. Graphs are horrible beasts to deal with, and a tree-based
storage doesn't look like the best starting point.

Ciao,

-- 
Gianugo Rabellino
Sourcesense, making sense of Open Source: http://www.sourcesense.com
Orixo, the XML business alliance: http://www.orixo.com
(blogging at http://www.rabellino.it/blog/)

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

Hello All,

That was really helpful. Thanks.

I also noticed FSFiler, and HashFiler. Are those for Indexing files?

I am working on a graph database project. I have tried several oodbms, and
finally decided to build my own.

Also, could it be possible to use xindice as a backend for Jena?

On 7/6/07, Vadim Gritsenko <va...@reverycodes.com> wrote:
>
> Cam Bazz wrote:
> > Hello Vadim;
> >
> > I have been looking and experimenting with BFiler. So far, I have been
> > able to insert into BTree,
> > but I have not been able to write to disk. (the initially created file
> > stays at same size).
> >
> > Best Regards,
> > -C.B
> >
> > //
> > Configuration config = new
> > Configuration(DOMParser.toDocument(Xindice.DEFAULT_CONFIGURATION),
> false);
>
> That is not valid configuration for the filer.
>
>
> > BTreeFiler btf = new BTreeFiler();
> >
> > btf.setConfig(config);
> >
> > File file = new File("c:\\");
> > btf.setLocation(file, "test");
> > btf.create();
> >
> > // this works and does indeed inserts to the btree
> > for(int i=0; i<100000; i++)
> > {
> >    btf.addValue(new Value("hello"+i), i);
> > }
> >
> > // this throws an exception saying page file not open.
> > btf.writeRecord(new Key("key"), new Value("val"));
>
> You have to open it before writing stuff in it. Please see BTreeFilerTest
> and
> FilerTestBase test classes, they have everything you need.
>
> Vadim
>
>
> > btf.close();
> >
> > On 7/5/07, *Vadim Gritsenko* <vadim@reverycodes.com
> > <ma...@reverycodes.com>> wrote:
> >
> >     Cam Bazz wrote:
> >      > Hello Developers,
> >      >
> >      > I have been looking for a solid implementation of a Java BTree on
> >     disk
> >      > for the longest time. Could it be possible to use xindices btree
> >     on disk
> >      > seperately.
> >      > If so, where would be a good place to start. I have looked at the
> >     source
> >      > code, instantiated a new BTree and added values to it, but I got
> >     a null
> >      > pointer exception saying that the file was not opened.
> >
> >     I don't think you can use BTree class directly. Take a look at
> >     BTreeFiler - that
> >     should be better fit. Once you instantiate it, you need to pass in
> >     configuration
> >     and open it. Once you are done working with it, it needs to be
> closed.
> >
> >     See also org.apache.xindice.core.Collection, there you can see how
> >     to work with
> >     Filers.
> >
> >     PS Have you seen JISP?
> >
> >     Vadim
> >
> >     [1] http://www.coyotegulch.com/products/jisp/
> >     <http://www.coyotegulch.com/products/jisp/>
>
>

Re: btree on disk

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Cam Bazz wrote:
> Hello Vadim;
> 
> I have been looking and experimenting with BFiler. So far, I have been 
> able to insert into BTree,
> but I have not been able to write to disk. (the initially created file 
> stays at same size).
> 
> Best Regards,
> -C.B
> 
> //
> Configuration config = new 
> Configuration(DOMParser.toDocument(Xindice.DEFAULT_CONFIGURATION), false);

That is not valid configuration for the filer.


> BTreeFiler btf = new BTreeFiler();
>            
> btf.setConfig(config);
>            
> File file = new File("c:\\");           
> btf.setLocation(file, "test");
> btf.create();
>   
> // this works and does indeed inserts to the btree       
> for(int i=0; i<100000; i++)
> {
>    btf.addValue(new Value("hello"+i), i);
> }
> 
> // this throws an exception saying page file not open.          
> btf.writeRecord(new Key("key"), new Value("val"));

You have to open it before writing stuff in it. Please see BTreeFilerTest and 
FilerTestBase test classes, they have everything you need.

Vadim


> btf.close();
> 
> On 7/5/07, *Vadim Gritsenko* <vadim@reverycodes.com 
> <ma...@reverycodes.com>> wrote:
> 
>     Cam Bazz wrote:
>      > Hello Developers,
>      >
>      > I have been looking for a solid implementation of a Java BTree on
>     disk
>      > for the longest time. Could it be possible to use xindices btree
>     on disk
>      > seperately.
>      > If so, where would be a good place to start. I have looked at the
>     source
>      > code, instantiated a new BTree and added values to it, but I got
>     a null
>      > pointer exception saying that the file was not opened.
> 
>     I don't think you can use BTree class directly. Take a look at
>     BTreeFiler - that
>     should be better fit. Once you instantiate it, you need to pass in
>     configuration
>     and open it. Once you are done working with it, it needs to be closed.
> 
>     See also org.apache.xindice.core.Collection, there you can see how
>     to work with
>     Filers.
> 
>     PS Have you seen JISP?
> 
>     Vadim
> 
>     [1] http://www.coyotegulch.com/products/jisp/
>     <http://www.coyotegulch.com/products/jisp/>

Re: btree on disk

Posted by Cam Bazz <ca...@gmail.com>.

Hello Vadim;

I have been looking and experimenting with BFiler. So far, I have been able
to insert into BTree,
but I have not been able to write to disk. (the initially created file stays
at same size).

Best Regards,
-C.B

//
Configuration config = new Configuration(DOMParser.toDocument(
Xindice.DEFAULT_CONFIGURATION), false);

BTreeFiler btf = new BTreeFiler();

btf.setConfig(config);

File file = new File("c:\\");
btf.setLocation(file, "test");
btf.create();

// this works and does indeed inserts to the btree
for(int i=0; i<100000; i++)
{
   btf.addValue(new Value("hello"+i), i);
}

// this throws an exception saying page file not open.
btf.writeRecord(new Key("key"), new Value("val"));

btf.close();

On 7/5/07, Vadim Gritsenko <va...@reverycodes.com> wrote:
>
> Cam Bazz wrote:
> > Hello Developers,
> >
> > I have been looking for a solid implementation of a Java BTree on disk
> > for the longest time. Could it be possible to use xindices btree on disk
> > seperately.
> > If so, where would be a good place to start. I have looked at the source
> > code, instantiated a new BTree and added values to it, but I got a null
> > pointer exception saying that the file was not opened.
>
> I don't think you can use BTree class directly. Take a look at BTreeFiler
> - that
> should be better fit. Once you instantiate it, you need to pass in
> configuration
> and open it. Once you are done working with it, it needs to be closed.
>
> See also org.apache.xindice.core.Collection, there you can see how to work
> with
> Filers.
>
> PS Have you seen JISP?
>
> Vadim
>
> [1] http://www.coyotegulch.com/products/jisp/
>

Re: btree on disk

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Cam Bazz wrote:
> Hello Developers,
> 
> I have been looking for a solid implementation of a Java BTree on disk 
> for the longest time. Could it be possible to use xindices btree on disk 
> seperately.
> If so, where would be a good place to start. I have looked at the source 
> code, instantiated a new BTree and added values to it, but I got a null 
> pointer exception saying that the file was not opened.

I don't think you can use BTree class directly. Take a look at BTreeFiler - that 
should be better fit. Once you instantiate it, you need to pass in configuration 
and open it. Once you are done working with it, it needs to be closed.

See also org.apache.xindice.core.Collection, there you can see how to work with 
Filers.

PS Have you seen JISP?

Vadim

[1] http://www.coyotegulch.com/products/jisp/