You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by Gary Shea <sh...@gtsdesign.com> on 2003/03/04 04:59:37 UTC

binary resource in xindice

I have made a first-draft version of xindice that enables binary
resources to be saved in the database.  It works, but has exposed an
interesting issue.

I have used separate API calls for binary and xml resources, so that the
ability to add binary resources has no impact on xml performance -- it's
an xml database, after all!  The implementation creates (lazily) a
separate b-tree in any collection where binary resources are to be
stored.  All binary resources go into the binary-only b-tree, the xml
side never knows it's there.  Works great.

The problem comes up when implementing the XML:DB get() method.  Store()
and remove() both allow the specification of the type of resource to
operate on, by passing the desired resource type: BinaryResource or
XMLResource.  Get() takes only a String key.  As a result, the get()
code cannot determine what kind of resource you want, and if the desired
resource is not found in the xml database, the binary database must be
queried before null may be returned.  That is, of course, an unacceptable
amount of overhead!

At this point I can think of two possibilities, either or both of which
would solve this problem:

1) Store the binary resources in the b-tree.  This requires (based on my
current understanding) one additional byte for each resource, which
isn't much overhead when you consider that a disk access is fairly
probable on any given database request.

2) Change the XML:DB definition so that get() may be called with a
Resource object.

Any thoughts?

Regards,

	Gary

RE: binary resource in xindice [PATCH]

Posted by Gary Shea <sh...@gtsdesign.com>.
Hi Kevin --

It's a big relief to hear back from a committer!  I was getting
anxious...

Thanks for the suggestion about bugzilla, I should have done that first
thing.  I have submitted both sets of patches (xmlrpc xerces
independence and binary resources) to bugzilla where they are,
respectively, bugs 17777 and 17778.

It would be great to see a work plan for a release, if that's what it
takes to get to the point where these patches can be evaluated/applied.
Is there one already, and I'm just not aware of it?  I may be able to
chip in.

Regards,

	Gary

On Fri, 7 Mar 2003, at 10:14 [-0600], Kevin Ross (Kevin.Ross@iVerticalLeap....:

> Hi Gary,
> 
> Thanks for your contribution.  A couple of things:
> 
> 1.  I'm didn't receive a binary file for either of your submitted
> patches through the mailing list.
> 2.  This situation is best handled by bugzilla.  For each patch, please
> enter a new issue in bugzilla and make sure to add the 'keyword',
> something like PATCH_INCLUDED.
> 
> Both of the issues you have addressed with a patch are important, and
> valuable to the community.  I want to make sure they don't get lost, and
> are on our list of things to do.  I've been in and out of things lately,
> but I believe we are trying to get 1.1 out before making more changes.
> 
> Please make sure to submit your work as a 'unified' patch, so we can
> easily apply it.
> 
> Thanks again,
> 
> Kevin Ross
> 
> -----Original Message-----
> From: Gary Shea [mailto:shea@gtsdesign.com] 
> Sent: Thursday, March 06, 2003 2:45 PM
> To: xindice-dev@xml.apache.org
> Subject: RE: binary resource in xindice [PATCH]
> 
> It occurred to me that some folks would prefer zip compression, so the
> zip versions are attached to this email.
> 
> 	Gary
> 
> On Thu, 6 Mar 2003, at 13:20 [-0700], Gary Shea (shea@gtsdesign.com)
> wrote:
> 
> > Attached find patch and tar files (bz2 encoded) to enable binary
> > resources in Xindice.  Tests included.  The patches currrently only
> > handle embedded XML:DB.  Adding the native XML-RPC messages and remote
> > XML:DB client should be no problem, but I wanted to get the internals
> > accepted first.
> > 
> > A couple notes about the attached patches.  First,they include the
> > patches I submitted a week or two ago for XML-RPC parser indpeendence,
> > as those haven't found their way into the codebase yet.  Sorry for the
> > extra clutter. Also, the patches include tons of log.debug() stuff.
> > I'll be happy to remove all that when the code settles down.
> > 
> > The approach I've taken is to start out by creating a flexible and
> > extensible inline metadata service, activated on a per-collection
> basis.
> > (By inline metadata, I mean that the metadata is a header at the
> > beginning of the BTree data.) That gives me an efficient way of
> > determining what type of resource a particular BTree record holds.
> All
> > the code for the inline metadata service is in
> > org.apache.xindice.core.inlinemeta.
> > 
> > Because the Xindice BTree doesn't care much about what you put into it
> > (bytes, bytes, bytes), binary records are ordinary BTree records.
> > 
> > Most of the details for managing binary resources is found in
> > Collection.  The low-level getDocument() and putDocument() methods
> have
> > been joined by getEntry() (record type agnostic, needed for XML:DB),
> > getBinary(), and putBinary().
> > 
> > All unit tests succeed; test-integration-embed works except for the
> > XUpdate test that's been failing all along; I actually haven't tried
> > test-integration-xmlrpc, whoops!
> > 
> > To run a binary-specific test, try: bin/ant test-embed-binary
> > 
> > One final note, these patches also fix a bug associated with
> > XMLSerializable.  Most of the implementations of
> streamFromXML(Document)
> > added themselves to the parent document, but putObject() in Collection
> > also added the Element returned by streamFromXML(Document) to the
> document,
> > causing it to end up in there twice.  Took me a while to figure out
> that
> > one!.  To me it made more sense that a method should act as locally as
> > possible, reducing side-effects, so now no
> > XMLSerializable.streamFromXML() implementation adds itself to its
> parent
> > Document.
> > 
> > Have fun!
> > 
> > 	Gary
> 
> 
> 
> 

RE: binary resource in xindice [PATCH]

Posted by Kevin Ross <Ke...@iVerticalLeap.com>.
Hi Gary,

Thanks for your contribution.  A couple of things:

1.  I'm didn't receive a binary file for either of your submitted
patches through the mailing list.
2.  This situation is best handled by bugzilla.  For each patch, please
enter a new issue in bugzilla and make sure to add the 'keyword',
something like PATCH_INCLUDED.

Both of the issues you have addressed with a patch are important, and
valuable to the community.  I want to make sure they don't get lost, and
are on our list of things to do.  I've been in and out of things lately,
but I believe we are trying to get 1.1 out before making more changes.

Please make sure to submit your work as a 'unified' patch, so we can
easily apply it.

Thanks again,

Kevin Ross

-----Original Message-----
From: Gary Shea [mailto:shea@gtsdesign.com] 
Sent: Thursday, March 06, 2003 2:45 PM
To: xindice-dev@xml.apache.org
Subject: RE: binary resource in xindice [PATCH]

It occurred to me that some folks would prefer zip compression, so the
zip versions are attached to this email.

	Gary

On Thu, 6 Mar 2003, at 13:20 [-0700], Gary Shea (shea@gtsdesign.com)
wrote:

> Attached find patch and tar files (bz2 encoded) to enable binary
> resources in Xindice.  Tests included.  The patches currrently only
> handle embedded XML:DB.  Adding the native XML-RPC messages and remote
> XML:DB client should be no problem, but I wanted to get the internals
> accepted first.
> 
> A couple notes about the attached patches.  First,they include the
> patches I submitted a week or two ago for XML-RPC parser indpeendence,
> as those haven't found their way into the codebase yet.  Sorry for the
> extra clutter. Also, the patches include tons of log.debug() stuff.
> I'll be happy to remove all that when the code settles down.
> 
> The approach I've taken is to start out by creating a flexible and
> extensible inline metadata service, activated on a per-collection
basis.
> (By inline metadata, I mean that the metadata is a header at the
> beginning of the BTree data.) That gives me an efficient way of
> determining what type of resource a particular BTree record holds.
All
> the code for the inline metadata service is in
> org.apache.xindice.core.inlinemeta.
> 
> Because the Xindice BTree doesn't care much about what you put into it
> (bytes, bytes, bytes), binary records are ordinary BTree records.
> 
> Most of the details for managing binary resources is found in
> Collection.  The low-level getDocument() and putDocument() methods
have
> been joined by getEntry() (record type agnostic, needed for XML:DB),
> getBinary(), and putBinary().
> 
> All unit tests succeed; test-integration-embed works except for the
> XUpdate test that's been failing all along; I actually haven't tried
> test-integration-xmlrpc, whoops!
> 
> To run a binary-specific test, try: bin/ant test-embed-binary
> 
> One final note, these patches also fix a bug associated with
> XMLSerializable.  Most of the implementations of
streamFromXML(Document)
> added themselves to the parent document, but putObject() in Collection
> also added the Element returned by streamFromXML(Document) to the
document,
> causing it to end up in there twice.  Took me a while to figure out
that
> one!.  To me it made more sense that a method should act as locally as
> possible, reducing side-effects, so now no
> XMLSerializable.streamFromXML() implementation adds itself to its
parent
> Document.
> 
> Have fun!
> 
> 	Gary



RE: binary resource in xindice [PATCH]

Posted by Gary Shea <sh...@gtsdesign.com>.
It occurred to me that some folks would prefer zip compression, so the
zip versions are attached to this email.

	Gary

On Thu, 6 Mar 2003, at 13:20 [-0700], Gary Shea (shea@gtsdesign.com) wrote:

> Attached find patch and tar files (bz2 encoded) to enable binary
> resources in Xindice.  Tests included.  The patches currrently only
> handle embedded XML:DB.  Adding the native XML-RPC messages and remote
> XML:DB client should be no problem, but I wanted to get the internals
> accepted first.
> 
> A couple notes about the attached patches.  First,they include the
> patches I submitted a week or two ago for XML-RPC parser indpeendence,
> as those haven't found their way into the codebase yet.  Sorry for the
> extra clutter. Also, the patches include tons of log.debug() stuff.
> I'll be happy to remove all that when the code settles down.
> 
> The approach I've taken is to start out by creating a flexible and
> extensible inline metadata service, activated on a per-collection basis.
> (By inline metadata, I mean that the metadata is a header at the
> beginning of the BTree data.) That gives me an efficient way of
> determining what type of resource a particular BTree record holds.  All
> the code for the inline metadata service is in
> org.apache.xindice.core.inlinemeta.
> 
> Because the Xindice BTree doesn't care much about what you put into it
> (bytes, bytes, bytes), binary records are ordinary BTree records.
> 
> Most of the details for managing binary resources is found in
> Collection.  The low-level getDocument() and putDocument() methods have
> been joined by getEntry() (record type agnostic, needed for XML:DB),
> getBinary(), and putBinary().
> 
> All unit tests succeed; test-integration-embed works except for the
> XUpdate test that's been failing all along; I actually haven't tried
> test-integration-xmlrpc, whoops!
> 
> To run a binary-specific test, try: bin/ant test-embed-binary
> 
> One final note, these patches also fix a bug associated with
> XMLSerializable.  Most of the implementations of streamFromXML(Document)
> added themselves to the parent document, but putObject() in Collection
> also added the Element returned by streamFromXML(Document) to the document,
> causing it to end up in there twice.  Took me a while to figure out that
> one!.  To me it made more sense that a method should act as locally as
> possible, reducing side-effects, so now no
> XMLSerializable.streamFromXML() implementation adds itself to its parent
> Document.
> 
> Have fun!
> 
> 	Gary

RE: binary resource in xindice [PATCH]

Posted by Gary Shea <sh...@gtsdesign.com>.
Attached find patch and tar files (bz2 encoded) to enable binary
resources in Xindice.  Tests included.  The patches currrently only
handle embedded XML:DB.  Adding the native XML-RPC messages and remote
XML:DB client should be no problem, but I wanted to get the internals
accepted first.

A couple notes about the attached patches.  First,they include the
patches I submitted a week or two ago for XML-RPC parser indpeendence,
as those haven't found their way into the codebase yet.  Sorry for the
extra clutter. Also, the patches include tons of log.debug() stuff.
I'll be happy to remove all that when the code settles down.

The approach I've taken is to start out by creating a flexible and
extensible inline metadata service, activated on a per-collection basis.
(By inline metadata, I mean that the metadata is a header at the
beginning of the BTree data.) That gives me an efficient way of
determining what type of resource a particular BTree record holds.  All
the code for the inline metadata service is in
org.apache.xindice.core.inlinemeta.

Because the Xindice BTree doesn't care much about what you put into it
(bytes, bytes, bytes), binary records are ordinary BTree records.

Most of the details for managing binary resources is found in
Collection.  The low-level getDocument() and putDocument() methods have
been joined by getEntry() (record type agnostic, needed for XML:DB),
getBinary(), and putBinary().

All unit tests succeed; test-integration-embed works except for the
XUpdate test that's been failing all along; I actually haven't tried
test-integration-xmlrpc, whoops!

To run a binary-specific test, try: bin/ant test-embed-binary

One final note, these patches also fix a bug associated with
XMLSerializable.  Most of the implementations of streamFromXML(Document)
added themselves to the parent document, but putObject() in Collection
also added the Element returned by streamFromXML(Document) to the document,
causing it to end up in there twice.  Took me a while to figure out that
one!.  To me it made more sense that a method should act as locally as
possible, reducing side-effects, so now no
XMLSerializable.streamFromXML() implementation adds itself to its parent
Document.

Have fun!

	Gary

RE: binary resource in xindice

Posted by Matt Liotta <ml...@r337.com>.
> At this point I can think of two possibilities, either or both of
which
> would solve this problem:
> 
> 1) Store the binary resources in the b-tree.  This requires (based on
my
> current understanding) one additional byte for each resource, which
> isn't much overhead when you consider that a disk access is fairly
> probable on any given database request.
> 
> 2) Change the XML:DB definition so that get() may be called with a
> Resource object.
> 
> Any thoughts?
> 
It seems to me like the best solution would be for both to be done. I
would imagine you could do the first method to get binary resource
support into the tree and then use that as a lobbying point to get
XML:DB updated.

-Matt