You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by "Williams, Murray Todd" <mu...@merck.com> on 2000/04/21 17:09:11 UTC

Storing XML in databases w/o lots of string parsing....

I've presently got a small web application that stores XML content
(personnel/project progress logs) in a database (MySQL) and allows a viewer
to selectively view these logs through a web form in Cocoon.

In order to insert the XML content into the rest of the page, I pull the
standard trick of creating a SAX Parser, having the parser read the stored
(ASCII-text) XML content, and connecting the parsers events to the XSP
page's default event handlers.

This works, but I can't help feeling a little bothered by the fact that I
have to store the XML content in ASCII and run lots and lots of parsers on
it.  What I'm wondering is if there is a better way to create some sort of
binary serialization of the XML snippets so I can store them in more
quickly-accessible form in some sort of SQL binary BLOB object.  I image
this is akin to how the Cocoon cache system works, and I tried going through
the source code, but I couldn't quite understand what was going on.

Can anyone enlighten me as to whether there's a better way to store XML
content in a database?

Thanks!

Murray Todd Williams
Merck & Co., Inc.

Re: Storing XML in databases w/o lots of string parsing....

Posted by Kevin Sonney <ke...@webslingerZ.com>.
On Fri, 21 Apr 2000, Mike Engelhart wrote:
> Much better design --- I just wrote my classes as a testbed to handle
> writing a HashMap that were in HttpSession's to a database using the session
> id as primary key (which btw is apparently not a good idea as session id's
> get reused for some reason..)

Yeah, that's the case. Sessiosn cannot be considered persistant across
Session IDs. 

We just stored each document with a unique id in both th DB and the
document - something like :

<page id="4657">

has a Primary Key field of 4657 in the database. You can then use stored
procedured or auto-incrementing columns to insure that the id is unique on
a per-document basis. 

For sessions, we implemented a login, and serialized the important session
data to/from the database on each request. That way, you can have
persistant settings on some options. 

> Actually the postgresql driver has some non-standard JDBC calls that will
> actually just write an object to a table without doing anything at but
> calling a method that accepts the Object.  It's very cool but it wasn't what
> I needed.

That is very cool, but also not very portable. The prepaired statement +
ByteStream trick works with any database and any serialized  object. I
learned that little trick from the IBM Websphere sample code. *grin* 

-- 
- Kevin Sonney
  kevin@webslingerZ.com


Re: Storing XML in databases w/o lots of string parsing....

Posted by Mike Engelhart <me...@earthtrip.com>.
Kevin Sonney wrote:

> Yup. I've done something similar in a grander scale when writing a wrapper
> to store whole XML documents into SQL databases. I'd keep a large doc in a
> CLOB - allowing for "where XMLDOC like '%keyword%'" searches. No need to
> parse docs until *AFTER* you've determined which ones contain the data
> you need. You only parse to see if the data is in the right
> Element/Attributes *grin*
Much better design --- I just wrote my classes as a testbed to handle
writing a HashMap that were in HttpSession's to a database using the session
id as primary key (which btw is apparently not a good idea as session id's
get reused for some reason..)

Actually the postgresql driver has some non-standard JDBC calls that will
actually just write an object to a table without doing anything at but
calling a method that accepts the Object.  It's very cool but it wasn't what
I needed.

mike


Re: Storing XML in databases w/o lots of string parsing....

Posted by Kevin Sonney <ke...@webslingerZ.com>.
On Fri, 21 Apr 2000, Mike Engelhart wrote:
> I have two simple classes (ObjectReader and ObjectWriter) that convert
> arbitrary objects to ByteArrayOutputStream and ByteArrayInputStream to pass
> onto a database using the standard JDBC call setBytes().

Yup. I've done something similar in a grander scale when writing a wrapper
to store whole XML documents into SQL databases. I'd keep a large doc in a
CLOB - allowing for "where XMLDOC like '%keyword%'" searches. No need to
parse docs until *AFTER* you've determined which ones contain the data
you need. You only parse to see if the data is in the right
Element/Attributes *grin*

Alas, all that code (and, technically, the knowledge on how to do it) is
wrapped up in an NDA, and I can't re-use any of my ideas on that. Yet. 
*grin* It expires in October.

> anyway, let me know if you interested and I'll forward the source.  You may
> need to tweak it a little bit to work with mySQL as I only tested it with
> PostgreSQL.

Nope, if it's standard JDBC, those calls work on MySQL, PostgresSQL, and
DB/2 out of the box. I figure it works almost anywhere, assuming the
driver supports it.

-- 
- Kevin Sonney
  kevin@webslingerZ.com


Re: Storing XML in databases w/o lots of string parsing....

Posted by Mike Engelhart <me...@earthtrip.com>.
Williams, Murray Todd wrote:

> I've presently got a small web application that stores XML content
> (personnel/project progress logs) in a database (MySQL) and allows a viewer
> to selectively view these logs through a web form in Cocoon.
> 
 
> This works, but I can't help feeling a little bothered by the fact that I
> have to store the XML content in ASCII and run lots and lots of parsers on
> it.  What I'm wondering is if there is a better way to create some sort of
> binary serialization of the XML snippets so I can store them in more
> quickly-accessible form in some sort of SQL binary BLOB object.  I image
> this is akin to how the Cocoon cache system works, and I tried going through
> the source code, but I couldn't quite understand what was going on.

I have two simple classes (ObjectReader and ObjectWriter) that convert
arbitrary objects to ByteArrayOutputStream and ByteArrayInputStream to pass
onto a database using the standard JDBC call setBytes().
I haven't used it in a while because I wrote it to come up with a way to
load balance HttpSession objects but then found out that JServ plus Tomcat
could do a lot of that for me and I'm currently not concerned with
maintaining state for more than the time the user is on the site.  It should
be able to take a Node or any arbitrary object and stick it into a database.

anyway, let me know if you interested and I'll forward the source.  You may
need to tweak it a little bit to work with mySQL as I only tested it with
PostgreSQL.

Mike

mengelhart@earthtrip.com