You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Stefano Mazzocchi <st...@apache.org> on 2001/10/18 21:53:08 UTC

[vote] A native XML database project under Apache

Hi,

while the world of native XML databases is full of marketing hype and
promises, it is evident (for all those who tried) that mapping general
XML schemas to relational databases can be sometimes very painful and
not very efficient.

In fact, it is widely recognized from the database research community
that while well structured can be easily and efficiently mapped to a
relational database, less structured (often called semi-structured) data
is much more difficult to map.

Don't get me wrong: there are a number of way to store XML in a database
to add ACID properties to XML documents, but while this is a
straightforward process for very repeatitive and well structured schemas
(invoices, stock quotes, money transactions), it is not so for
semi-structured schemas such as DocBook, SVG or even XSLT.

I here you say: I use BLOBS and I'm fine with them. I'm sure you are,
but in all honesty, I'm not. And for a few reasons:

1) each documentation system requires a repository for document. This is
often called "content management system". Since publishing is going
toward replacing all content with an XML syntax (and we all love to see
that happening in full extend), we must consider that such a system will
require a persistent way to manage the content and a fast and efficient
way to query it.

If you use BLOBS you loose an efficient way to look into the blobs
themselves so you are doomed before you even start.

You can fragment the XML document into relational mapping to
semi-structured data (and remember that documentation is almost always
semi-structured!) but it can be shown that this is hard, very expensive
and might require (depending on the document schema) a very high number
of nested queries to translate even a very simple XPath expression.

Add complexities such as namespaces and the proposed XQL and you see
that a XQL -> SQL might well be possible but is clearly going to become
a nightmare to manage and very painful to optimize for efficiency.

The remaining solution is to create a specific solution that leaves
structured data to RDBMS (where they really shine, no question about it)
but moves semi-structured data over to a more specific and
algorithmically optimized system.

Note that while ODBMS were supposed to solve the problem of
semi-structured data, they, in fact, do not.

This is why we need a native XML DB solution with full support for
namespaced content, XPath and XQL for querying, RDF for metadata.

2) so, the content management system that everybody is crying out loud
for requires a storage solution and I believe that a native XML DB is
the way to go.

Also because:

3) if we ever want to get deeper into the semantic web (and I,
personally, want), we must forget well structured data. Vocabularies
such as RDF, RDFSchema, Topic Maps and the like are *not* going to be
easily mapped into relational databases and efficiently searched.

So, this is why I propose the creation of a project hosted here under
xml.apache.org to implement this effort.

Since it's generally very hard to bootstrap an open development
community without some code to start working on, I suggest to start this
project over the code that the dbXML guys are willing to donate to the
ASF in order to create such development community that can research and
implement in this new field and, by doing so, hopefully lead the way
reducing the marketing crap and the hype around this.

FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
written in the Java language that is close to reaching its first final
release.

I've been talking to one of the community leaders (here copied) that
independently came out with my same conclusion and wanted to propose
dbXML for donation even before I expressed my intentions.

Also Sam Ruby has been subscribed to their development list watching
over them.

dbXML was created with the sponsor of a commercial entity called "dbXML
Group" which still exists but has no economic energy to continue its
development and the main developers are now working on the project
unpaid.

But I'd like something to be clear: I'm *NOT* proposing that Apache
takes over 'dbXML group' to save dbXML and continue its development. I'm
proposing that Apache creates a new project for the creation of a
production quality native XML database solution that implements existing
and future standards (and hopefully have the power to influence their
establishment) and that in order to help bootstrap the community, we
start with the current dbXML implementation which is going to be donated
to the ASF.

To show this and to avoid confusion with past releases and the "dbXML
group" commercial entity, the project is *NOT* going to be called Apache
dbXML, but rather something without acronims, in the spirit of
xml.apache.org.

Kimbro and I have been talking about "Apache BooBoo", but that is just
the first name that crossed my mind :) If you have better names, please,
let us discuss this publicly if the deal gets approuved.

Anyway, the dbXML folks are willing to donate the code, to change the
name as long as we give proper credit to "dbXML group" for having
bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
others), and more than willing to help in both development, user
support, research, community and evangelization. In fact, if the deal is
accepted by this list, they are even willing to close down the site and
move everything overhere with the new name.

Let me finish by saying that I do not consider important what the actual
code implementation is (few, myself included, might not like some of
their architectural choices, such as the use of CORBA and Jaggernaut),
but I'm *NOT* asking for a vote on their _actual_ technological status,
I'm asking for a vote to create a community that can create, maintain
and show the power of a native XML DB solution.

It might takes years to have something solid enough to compete with big
commercial names, but it is important, IMO, for Apache to have something
to say even on this front by creating a community and attracting people
and their ideas.

In fact, the dbXML guys are willing to donate the code, but also very
happy about the possibility of a higher visibility that would bring more
people and more ideas into the design process that is going to happen
for their next major release.

So, people, I'm asking you to judge the idea to create a community,
rather than the current dbXML implementation which is only a way to give
to users the meat the look for in that area, but then attract them for
new development and further research.

Sorry for the long mail.

Please, place your vote.

Thanks.

Stefano.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


RE: [vote] A native XML database project under Apache

Posted by Paulo Gaspar <pa...@krankikom.de>.
Object databases are having much more success being adapted to store XML
than relational ones. (And I do not like Object databases. =:o( )

Have fun,
Paulo Gaspar

http://www.krankikom.de
http://www.ruhronline.de
 

> -----Original Message-----
> From: Stefano Mazzocchi [mailto:stefano@apache.org]
> Sent: Friday, October 19, 2001 12:33 PM
> 
> ........ 
>
> "Kevin A. Burton" wrote:
> > 
> > Also.  What about other approaches. AKA the Exist project uses 
> relational
> > database systems to provide a persistence layer for XML.  I 
> think this is a
> > solid and logical approach.
> 
> I don't, but this is just my personal opinion. It appears elegant from a
> strictly mathematical point of view but becomes impractical when put in
> production (I know this for experience)
>
> ....

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Stefano Mazzocchi <st...@apache.org>.
"Kevin A. Burton" wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Stefano Mazzocchi <st...@apache.org> writes:
> <snip>
> > Kimbro and I have been talking about "Apache BooBoo", but that is just
> > the first name that crossed my mind :) If you have better names, please,
> > let us discuss this publicly if the deal gets approuved.
>
> BooBoo is too close to bonobo which is a GNOME component model.
> 
> Also.. at least in the US, a "booboo" is not a very powerfull word.  It is
> generally how a small child would describe a small cut or broose...

Gosh, people, it was supposed to be ironic :) Of course, I would not
propose that name.
 
> >
> > It might takes years to have something solid enough to compete with big
> > commercial names, but it is important, IMO, for Apache to have something
> > to say even on this front by creating a community and attracting people
> > and their ideas.
> <snip>
> 
> Also.  What about other approaches. AKA the Exist project uses relational
> database systems to provide a persistence layer for XML.  I think this is a
> solid and logical approach.

I don't, but this is just my personal opinion. It appears elegant from a
strictly mathematical point of view but becomes impractical when put in
production (I know this for experience)
 
> If this project does well does this mean that the Apache project wouldn't accept
> it because we would already have had an XML db project?

No.

I'm simply proposing the creation of a subproject to back up the
creation of a native XML db solution because I think this is the way to
go.

But I'm *NOT* proposing to throw out other proposals since I might find
myself wrong in the future (as I did several times in the past).

Apache is committed to diversity and I couldn't change that even if I
wanted.

And I don't want that!

Stefano.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by "Kevin A. Burton" <bu...@relativity.yi.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stefano Mazzocchi <st...@apache.org> writes:
<snip>
> Kimbro and I have been talking about "Apache BooBoo", but that is just
> the first name that crossed my mind :) If you have better names, please,
> let us discuss this publicly if the deal gets approuved.

BooBoo is too close to bonobo which is a GNOME component model.

Also.. at least in the US, a "booboo" is not a very powerfull word.  It is
generally how a small child would describe a small cut or broose...

Anyway... -0 
<snip>

> 
> It might takes years to have something solid enough to compete with big
> commercial names, but it is important, IMO, for Apache to have something
> to say even on this front by creating a community and attracting people
> and their ideas.
<snip>

Also.  What about other approaches. AKA the Exist project uses relational
database systems to provide a persistence layer for XML.  I think this is a
solid and logical approach.

If this project does well does this mean that the Apache project wouldn't accept
it because we would already have had an XML db project?

Kevin

- -- 

   Need a good Engineer?  Hire me!  ( Java | P2P | XML | Linux | Open Source )

Kevin A. Burton burton@apache.org, burton@openprivacy.org, burtonator@acm.org )
  Location: San Francisco, CA Cell: 415-595-9965 URL: http://relativity.yi.org 

Nearly all men can stand adversity, but if you want to test a man's
 character, give him power.      - Abraham Lincoln
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt

iD8DBQE7z8NwAwM6xb2dfE0RAi8zAJ9Y7HbhMnk7uoOUE7fYG1y9PGDskwCfe/tM
x/BB79AzPHUxqBRUElGVeUc=
=jgtV
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Stefano Mazzocchi <st...@apache.org>.
Eric van der Vlist wrote:
> 
> Hi Stefano,
> 
> Stefano Mazzocchi wrote:
> 
> > Hi,
> >
> > while the world of native XML databases is full of marketing hype and
> > promises, it is evident (for all those who tried) that mapping general
> > XML schemas to relational databases can be sometimes very painful and
> > not very efficient.
> >
> > In fact, it is widely recognized from the database research community
> > that while well structured can be easily and efficiently mapped to a
> > relational database, less structured (often called semi-structured) data
> > is much more difficult to map.
> >
> 
> Just for fun and also to take a step backward and evaluate the changes
> and invariants in our perception of XML systems, I couldn't resist
> reading some old emails exchanged two years ago!
> 
> http://groups.yahoo.com/group/xml-server/message/32
> 
> Enjoy :)

:)

Eric,

thanks much for this. Really! I have not forgotten and let me tell you,
I really love when I make mistakes and I understand I did. Understanding
my faults is the best way to learn and to earn experience and respect
for other's opinions.

Of course, I will make mistakes in the future and I could even be making
a mistake now by proposing this project, I don't know, but one thing is
for sure, everytime I did or I changed my mind, I humbly acknowledged
it.

Yes, in the past I thought (as others are doing now) that the relational
model was sufficient to map XML documents, thus an XML->relational
binding was the right way to go. This is why I thought XML db were pure
marketing hype.

Now I have changed my mind, expecially after reading the papers written
by the Standford DB group that worked for years on a DBMS solution for
semi-structured data proving that relational is enough but not well
suited.

(just because it's possible to write an operating system in Java or
Assembly it doesn't mean you should do it, right?)

So, yes, just like I did with several architectural decisions made for
Cocoon1, I have changed my mind on XML databases and want to start a
community exactly to research the best way to add ACID properties to
semi-structured XML documents.

If this way turns out to be a relational mapping, I was wrong again, if
not, well, we'll have a solid solution to build our XML repositories on.

Stefano.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Eric van der Vlist <vd...@dyomedea.com>.
Hi Stefano,

Stefano Mazzocchi wrote:

> Hi,
> 
> while the world of native XML databases is full of marketing hype and
> promises, it is evident (for all those who tried) that mapping general
> XML schemas to relational databases can be sometimes very painful and
> not very efficient.
> 
> In fact, it is widely recognized from the database research community
> that while well structured can be easily and efficiently mapped to a
> relational database, less structured (often called semi-structured) data
> is much more difficult to map.
> 


Just for fun and also to take a step backward and evaluate the changes 
and invariants in our perception of XML systems, I couldn't resist 
reading some old emails exchanged two years ago!

http://groups.yahoo.com/group/xml-server/message/32

Enjoy :)

Eric
-- 
Rendez-vous à Paris pour le Forum XML.
                    http://www.technoforum.fr/Pages/forumXML01/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Kimbro Staken <ks...@dbxmlgroup.com>.
I wanted to add my two cents to this and clarify a few points that Stefano 
made.

On Thursday, October 18, 2001, at 12:53 PM, Stefano Mazzocchi wrote:
>
> FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
> written in the Java language that is close to reaching its first final
> release.
>
>

First I want to make it clear that dbXML is Open Source and has been for 
over a year. It started out LGPL and was changed to an Apache style 
license about 6 months ago. We have an existing user community several 
hundred strong and this community is very interested in seeing dbXML 
become a part of the ASF family of projects. The current developer 
community is small right now, but will hopefully grow significantly in the 
coming months.

>
> dbXML was created with the sponsor of a commercial entity called "dbXML
> Group" which still exists but has no economic energy to continue its
> development and the main developers are now working on the project
> unpaid.

> But I'd like something to be clear: I'm *NOT* proposing that Apache
> takes over 'dbXML group' to save dbXML and continue its development. I'm

The dbXML Group is not proposing that this has anything to do with the 
company either, the company is going to go do whatever it is that 
companies do and shouldn't factor too much into any decision. The only 
thing they needed to approve was the transfer of the copyright and that is 
already taken care of.

The dbXML Project which is the open source effort will continue, 
regardless of the existence of the company, just as it already is. Stefano 
kind of implies that the project is in trouble which isn't true. The dbXML 
Project definitely could use more help and visibility but isn't really in 
any trouble beyond that. We simply felt that now would be a good time to 
offer the code to the ASF as we're about to reach an initial 1.0 release. 
This makes it a good time to bring in more mindshare to enable a big leap 
forward when a 2.0 release comes in the future. The 1.0 release will 
convincingly prove the concept, now we want to take the next step and 
really make dbXML into a solid production level native XML database.

>
> Kimbro and I have been talking about "Apache BooBoo", but that is just
> the first name that crossed my mind :) If you have better names, please,
> let us discuss this publicly if the deal gets approuved.
>
> Anyway, the dbXML folks are willing to donate the code, to change the
> name as long as we give proper credit to "dbXML group" for having
> bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
> others), and more than willing to help in both development, user
> support, research, community and evangelization. In fact, if the deal is
> accepted by this list, they are even willing to close down the site and
> move everything overhere with the new name.
>

One other contingency is that in my opinion the existing dbXML community 
should really be allowed to choose the new name, subject to approval by 
the wider Apache community of course. We've already started this search on 
our mailing list and I'd really like that to continue, assuming approval 
of the creation of the project. Everybody is very interested in keeping it 
in line with the style of the existing XML apache project names.

> Let me finish by saying that I do not consider important what the actual
> code implementation is (few, myself included, might not like some of
> their architectural choices, such as the use of CORBA and Jaggernaut),
> but I'm *NOT* asking for a vote on their _actual_ technological status,
> I'm asking for a vote to create a community that can create, maintain
> and show the power of a native XML DB solution.
>

I think Stefano makes dbXML sound much worse then it really is. :-) Is 
dbXML immature, yes of course it is. It is a version 1.0 piece of software 
for a technology that itself is brand new. However,  it is also pretty 
close to being the most widely used native XML database around. So while 
compared to MySQL the numbers are small, compared to our commercial 
brethren in the native XML space they are not. Also if you've used any of 
the commercial products, other then Tamino and Excelon, you'll also know 
that dbXML is already about as good. Now, we just need to focus on getting 
the engine mature enough to blow away the Taminos and Excelons in the 
space. As Stefano mentioned, it's going to take a long time to get to a 
truly industrial strength solution, but I think dbXML provides a solid 
foundation to start with.

Not that it really matters, but just to clarify Stefano's specific issues.
  The CORBA layer is just that, a layer, and it is already slated to be 
replaced in the next version of dbXML with something like SOAP. It isn't a 
central piece of architecture to the database it self. Also to use the 
server you don't have to deal with CORBA at all, as long as you're working 
in Java. CORBA is there for other languages. Juggernaut was actually 
developed as part of dbXML before it was even made open source, and is 
something that we split into a separate project to make it clearer what is 
the database and what is the server framework that the database runs under.
  Juggernaut is also slated to be replaced in the next version, most likely 
with Avalon.

> It might takes years to have something solid enough to compete with big
> commercial names, but it is important, IMO, for Apache to have something
> to say even on this front by creating a community and attracting people
> and their ideas.
>
> In fact, the dbXML guys are willing to donate the code, but also very
> happy about the possibility of a higher visibility that would bring more
> people and more ideas into the design process that is going to happen
> for their next major release.

I want to make it clear here too, the current dbXML community is totally 
committed to the future development of dbXML (or whatever it is called in 
the future). We decided to offer the code to the ASF for the very reasons 
that Stefano mentions here. We want dbXML to grow into what becomes the 
defacto standard for native XML databases. Forming an ASF project around 
the code is just the next step on the path to achieving that goal. We're 
committed to achieving this, and hopefully continuing the effort under the 
Apache umbrella will give us the advantage that's needed and a significant 
benefit for everyone involved. Our current community will move over to 
support the new effort and to continue development into the future.

>
> So, people, I'm asking you to judge the idea to create a community,
> rather than the current dbXML implementation which is only a way to give
> to users the meat the look for in that area, but then attract them for
> new development and further research.
>
> Sorry for the long mail.
>
> Please, place your vote.
>
> Thanks.
>
> Stefano.
>
>
>
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
>
>
>
Kimbro Staken
The dbXML Project
http://www.dbxml.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Vivek Chopra <vi...@yahoo.com>.
I am in favor of having individual components

--- Stefano Mazzocchi <st...@apache.org> wrote:
> Hi,
> 
> while the world of native XML databases is full of
> marketing hype and
> promises, it is evident (for all those who tried)
> that mapping general
> XML schemas to relational databases can be sometimes
> very painful and
> not very efficient.


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Stefano Mazzocchi <st...@apache.org>.
Martin Stricker wrote:
 
> +1 for the general idea (but I'm not a committer), *but* before we can
> really vote we'll have to get several things clear (others already
> mentioned a lot of them).
> 
> My personal opinion about the programming language: C/C++! Java is far
> too slow.

Ok, I'll state this once again: the vote is *NOT* about taking over
dbXML and making it an Apache project, but is about creating a
development community that will create, maintain and reserach a native
XML database and in order to bootstrap it, the dbXML codebase is
proposed as a seed.

So, if any of you have a better seed to propose, I'm all ears.

If not, please, avoid making technical discussions on something that is
not even accepted (even if, so far, we have more votes that needed).

If you want to discuss technical details, I welcome any of you to
partecipate to the development of that newly created subproject and I
the more follow my suggestion and the more ideas get discussed, the
better the software will turn out to be.

This concludes the official response.

                             - o -

Now, talking personally, I believe that Java makes it a perfect choice
because even if it's not recognized as the fastest language in the world
(even if Sergio is right: there is no such thing as a fast language and
a slow one, it always depends on the situations and on the written
software), it *IS* recognized as an easier language to deal with
compared to C/C++ and friends and allows to research algorithmic
solutions faster than any other language yet maintaining a rock solid
architectural foundation (sometimes, even better than what results out
of C/C++)

And if development speed and ease-of-reserach were not enough for you,
there are examples (Sergio's is one of them, but there are many) were
java on the server side proves itself as fast as native solutions, even
because hotspot JVMs work on server software much better because they
have more cycles to use to optimize a program that normally stays up for
days or even months.

And if hotspotted java performance matching native performance was not
enough for you, I suggest you to take a look at the new NIO
(non-blocking IO) that is very likely to turn Java into a very powerful
enviornment even for I/O intensive usages as databases, file servers or
media servers.

Finally, if you think java is slow, you can always volunteer to rewrite
dbXML in C++. When you're done, the new released version of the software
with better algorithms, new JVM optimizations on the server side (which
is where Sun is heading) and hardware advancements might make the entire
port useless.

But if this still doesn't stop you, I'll welcome your submissions and
let the community decide the future of this eventual native port.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Sergio Carvalho <se...@acm.org>.
On Sun, 21 Oct 2001 01:00:42 +0200, Martin Stricker wrote:
From: Martin Stricker <sh...@gmx.de>
--
 
> +1 for the general idea (but I'm not a committer), *but* before we can
> really vote we'll have to get several things clear (others already
> mentioned a lot of them).
> 
> My personal opinion about the programming language: C/C++! Java is far
> too slow. But the database must have built-in APIs for C/C++, Java,
> Perl, Python and Tcl/Tk, maybe more.

Unless backed by strong testing, I always take the 'XYZ language is too slow'
with a grain of salt. In this case (Java), I have personally been happily
surprised with Java, its speed, scalability and stability in a recent Cocoon 2
deployment. 

I have no personal opinion on the subject of the implementation language for
dbXML, but wouldn't discard the existing codebase without a very strong reason.



--
Sergio Carvalho
---------------
sergio.carvalho@acm.org

If at first you don't succeed, skydiving is not for you

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: [vote] A native XML database project under Apache

Posted by Martin Stricker <sh...@gmx.de>.
> Stefano Mazzocchi wrote:
> while the world of native XML databases is full of marketing hype and
> promises, it is evident (for all those who tried) that mapping general
> XML schemas to relational databases can be sometimes very painful and
> not very efficient.
<snip/>

+1 for the general idea (but I'm not a committer), *but* before we can
really vote we'll have to get several things clear (others already
mentioned a lot of them).

My personal opinion about the programming language: C/C++! Java is far
too slow. But the database must have built-in APIs for C/C++, Java,
Perl, Python and Tcl/Tk, maybe more.

Best regards,
Martin Stricker
-- 
Homepage: http://www.martin-stricker.de/
Registered Linux user #210635: http://counter.li.org/

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


RE: [vote] A native XML database project under Apache

Posted by Davanum Srinivas <di...@yahoo.com>.
+1. We need to start somewhere. dbXML is as good a starting place as anything else...

Thanks,
dims

--- Carsten Ziegeler <cz...@sundn.de> wrote:
> Absolutely +1.
> 
> It might be a little bit difficult, if the project starts with dbXML
> as a base to switch over to the "real" solution (I don't want to
> say that dbXML is bad, I even don't know it, but it's often the
> case when some new people are involved that they find a different
> solution etc.)
> 
> But as having a base which you can discuss on is thousand times
> better than creating lots of theoretical threads with no praxis
> behind it, this should be the way to go. But there need to be
> some strong personalities in the project which are able to
> fulfill that job.
> 
> Carsten
> 
> > Stefano Mazzocchi wrote:
> > 
> > Hi,
> > 
> > while the world of native XML databases is full of marketing hype and
> > promises, it is evident (for all those who tried) that mapping general
> > XML schemas to relational databases can be sometimes very painful and
> > not very efficient.
> > 
> > In fact, it is widely recognized from the database research community
> > that while well structured can be easily and efficiently mapped to a
> > relational database, less structured (often called semi-structured) data
> > is much more difficult to map.
> > 
> > Don't get me wrong: there are a number of way to store XML in a database
> > to add ACID properties to XML documents, but while this is a
> > straightforward process for very repeatitive and well structured schemas
> > (invoices, stock quotes, money transactions), it is not so for
> > semi-structured schemas such as DocBook, SVG or even XSLT.
> > 
> > I here you say: I use BLOBS and I'm fine with them. I'm sure you are,
> > but in all honesty, I'm not. And for a few reasons:
> > 
> > 1) each documentation system requires a repository for document. This is
> > often called "content management system". Since publishing is going
> > toward replacing all content with an XML syntax (and we all love to see
> > that happening in full extend), we must consider that such a system will
> > require a persistent way to manage the content and a fast and efficient
> > way to query it.
> > 
> > If you use BLOBS you loose an efficient way to look into the blobs
> > themselves so you are doomed before you even start.
> > 
> > You can fragment the XML document into relational mapping to
> > semi-structured data (and remember that documentation is almost always
> > semi-structured!) but it can be shown that this is hard, very expensive
> > and might require (depending on the document schema) a very high number
> > of nested queries to translate even a very simple XPath expression.
> > 
> > Add complexities such as namespaces and the proposed XQL and you see
> > that a XQL -> SQL might well be possible but is clearly going to become
> > a nightmare to manage and very painful to optimize for efficiency.
> > 
> > The remaining solution is to create a specific solution that leaves
> > structured data to RDBMS (where they really shine, no question about it)
> > but moves semi-structured data over to a more specific and
> > algorithmically optimized system.
> > 
> > Note that while ODBMS were supposed to solve the problem of
> > semi-structured data, they, in fact, do not.
> > 
> > This is why we need a native XML DB solution with full support for
> > namespaced content, XPath and XQL for querying, RDF for metadata.
> > 
> > 2) so, the content management system that everybody is crying out loud
> > for requires a storage solution and I believe that a native XML DB is
> > the way to go.
> > 
> > Also because:
> > 
> > 3) if we ever want to get deeper into the semantic web (and I,
> > personally, want), we must forget well structured data. Vocabularies
> > such as RDF, RDFSchema, Topic Maps and the like are *not* going to be
> > easily mapped into relational databases and efficiently searched.
> > 
> > So, this is why I propose the creation of a project hosted here under
> > xml.apache.org to implement this effort.
> > 
> > Since it's generally very hard to bootstrap an open development
> > community without some code to start working on, I suggest to start this
> > project over the code that the dbXML guys are willing to donate to the
> > ASF in order to create such development community that can research and
> > implement in this new field and, by doing so, hopefully lead the way
> > reducing the marketing crap and the hype around this.
> > 
> > FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
> > written in the Java language that is close to reaching its first final
> > release.
> > 
> > I've been talking to one of the community leaders (here copied) that
> > independently came out with my same conclusion and wanted to propose
> > dbXML for donation even before I expressed my intentions.
> > 
> > Also Sam Ruby has been subscribed to their development list watching
> > over them.
> > 
> > dbXML was created with the sponsor of a commercial entity called "dbXML
> > Group" which still exists but has no economic energy to continue its
> > development and the main developers are now working on the project
> > unpaid.
> > 
> > But I'd like something to be clear: I'm *NOT* proposing that Apache
> > takes over 'dbXML group' to save dbXML and continue its development. I'm
> > proposing that Apache creates a new project for the creation of a
> > production quality native XML database solution that implements existing
> > and future standards (and hopefully have the power to influence their
> > establishment) and that in order to help bootstrap the community, we
> > start with the current dbXML implementation which is going to be donated
> > to the ASF.
> > 
> > To show this and to avoid confusion with past releases and the "dbXML
> > group" commercial entity, the project is *NOT* going to be called Apache
> > dbXML, but rather something without acronims, in the spirit of
> > xml.apache.org.
> > 
> > Kimbro and I have been talking about "Apache BooBoo", but that is just
> > the first name that crossed my mind :) If you have better names, please,
> > let us discuss this publicly if the deal gets approuved.
> > 
> > Anyway, the dbXML folks are willing to donate the code, to change the
> > name as long as we give proper credit to "dbXML group" for having
> > bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
> > others), and more than willing to help in both development, user
> > support, research, community and evangelization. In fact, if the deal is
> > accepted by this list, they are even willing to close down the site and
> > move everything overhere with the new name.
> > 
> > Let me finish by saying that I do not consider important what the actual
> > code implementation is (few, myself included, might not like some of
> > their architectural choices, such as the use of CORBA and Jaggernaut),
> > but I'm *NOT* asking for a vote on their _actual_ technological status,
> > I'm asking for a vote to create a community that can create, maintain
> > and show the power of a native XML DB solution.
> > 
> > It might takes years to have something solid enough to compete with big
> > commercial names, but it is important, IMO, for Apache to have something
> > to say even on this front by creating a community and attracting people
> > and their ideas.
> > 
> > In fact, the dbXML guys are willing to donate the code, but also very
> > happy about the possibility of a higher visibility that would bring more
> > people and more ideas into the design process that is going to happen
> > for their next major release.
> > 
> > So, people, I'm asking you to judge the idea to create a community,
> > rather than the current dbXML implementation which is only a way to give
> > to users the meat the look for in that area, but then attract them for
> > new development and further research.
> > 
> > Sorry for the long mail.
> > 
> > Please, place your vote.
> > 
> > Thanks.
> > 
> > Stefano.
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > In case of troubles, e-mail:     webmaster@xml.apache.org
> > To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> > For additional commands, e-mail: general-help@xml.apache.org
> > 
> 
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
> 


=====
Davanum Srinivas - http://jguru.com/dims/

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


RE: [vote] A native XML database project under Apache

Posted by Carsten Ziegeler <cz...@sundn.de>.
Absolutely +1.

It might be a little bit difficult, if the project starts with dbXML
as a base to switch over to the "real" solution (I don't want to
say that dbXML is bad, I even don't know it, but it's often the
case when some new people are involved that they find a different
solution etc.)

But as having a base which you can discuss on is thousand times
better than creating lots of theoretical threads with no praxis
behind it, this should be the way to go. But there need to be
some strong personalities in the project which are able to
fulfill that job.

Carsten

> Stefano Mazzocchi wrote:
> 
> Hi,
> 
> while the world of native XML databases is full of marketing hype and
> promises, it is evident (for all those who tried) that mapping general
> XML schemas to relational databases can be sometimes very painful and
> not very efficient.
> 
> In fact, it is widely recognized from the database research community
> that while well structured can be easily and efficiently mapped to a
> relational database, less structured (often called semi-structured) data
> is much more difficult to map.
> 
> Don't get me wrong: there are a number of way to store XML in a database
> to add ACID properties to XML documents, but while this is a
> straightforward process for very repeatitive and well structured schemas
> (invoices, stock quotes, money transactions), it is not so for
> semi-structured schemas such as DocBook, SVG or even XSLT.
> 
> I here you say: I use BLOBS and I'm fine with them. I'm sure you are,
> but in all honesty, I'm not. And for a few reasons:
> 
> 1) each documentation system requires a repository for document. This is
> often called "content management system". Since publishing is going
> toward replacing all content with an XML syntax (and we all love to see
> that happening in full extend), we must consider that such a system will
> require a persistent way to manage the content and a fast and efficient
> way to query it.
> 
> If you use BLOBS you loose an efficient way to look into the blobs
> themselves so you are doomed before you even start.
> 
> You can fragment the XML document into relational mapping to
> semi-structured data (and remember that documentation is almost always
> semi-structured!) but it can be shown that this is hard, very expensive
> and might require (depending on the document schema) a very high number
> of nested queries to translate even a very simple XPath expression.
> 
> Add complexities such as namespaces and the proposed XQL and you see
> that a XQL -> SQL might well be possible but is clearly going to become
> a nightmare to manage and very painful to optimize for efficiency.
> 
> The remaining solution is to create a specific solution that leaves
> structured data to RDBMS (where they really shine, no question about it)
> but moves semi-structured data over to a more specific and
> algorithmically optimized system.
> 
> Note that while ODBMS were supposed to solve the problem of
> semi-structured data, they, in fact, do not.
> 
> This is why we need a native XML DB solution with full support for
> namespaced content, XPath and XQL for querying, RDF for metadata.
> 
> 2) so, the content management system that everybody is crying out loud
> for requires a storage solution and I believe that a native XML DB is
> the way to go.
> 
> Also because:
> 
> 3) if we ever want to get deeper into the semantic web (and I,
> personally, want), we must forget well structured data. Vocabularies
> such as RDF, RDFSchema, Topic Maps and the like are *not* going to be
> easily mapped into relational databases and efficiently searched.
> 
> So, this is why I propose the creation of a project hosted here under
> xml.apache.org to implement this effort.
> 
> Since it's generally very hard to bootstrap an open development
> community without some code to start working on, I suggest to start this
> project over the code that the dbXML guys are willing to donate to the
> ASF in order to create such development community that can research and
> implement in this new field and, by doing so, hopefully lead the way
> reducing the marketing crap and the hype around this.
> 
> FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
> written in the Java language that is close to reaching its first final
> release.
> 
> I've been talking to one of the community leaders (here copied) that
> independently came out with my same conclusion and wanted to propose
> dbXML for donation even before I expressed my intentions.
> 
> Also Sam Ruby has been subscribed to their development list watching
> over them.
> 
> dbXML was created with the sponsor of a commercial entity called "dbXML
> Group" which still exists but has no economic energy to continue its
> development and the main developers are now working on the project
> unpaid.
> 
> But I'd like something to be clear: I'm *NOT* proposing that Apache
> takes over 'dbXML group' to save dbXML and continue its development. I'm
> proposing that Apache creates a new project for the creation of a
> production quality native XML database solution that implements existing
> and future standards (and hopefully have the power to influence their
> establishment) and that in order to help bootstrap the community, we
> start with the current dbXML implementation which is going to be donated
> to the ASF.
> 
> To show this and to avoid confusion with past releases and the "dbXML
> group" commercial entity, the project is *NOT* going to be called Apache
> dbXML, but rather something without acronims, in the spirit of
> xml.apache.org.
> 
> Kimbro and I have been talking about "Apache BooBoo", but that is just
> the first name that crossed my mind :) If you have better names, please,
> let us discuss this publicly if the deal gets approuved.
> 
> Anyway, the dbXML folks are willing to donate the code, to change the
> name as long as we give proper credit to "dbXML group" for having
> bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
> others), and more than willing to help in both development, user
> support, research, community and evangelization. In fact, if the deal is
> accepted by this list, they are even willing to close down the site and
> move everything overhere with the new name.
> 
> Let me finish by saying that I do not consider important what the actual
> code implementation is (few, myself included, might not like some of
> their architectural choices, such as the use of CORBA and Jaggernaut),
> but I'm *NOT* asking for a vote on their _actual_ technological status,
> I'm asking for a vote to create a community that can create, maintain
> and show the power of a native XML DB solution.
> 
> It might takes years to have something solid enough to compete with big
> commercial names, but it is important, IMO, for Apache to have something
> to say even on this front by creating a community and attracting people
> and their ideas.
> 
> In fact, the dbXML guys are willing to donate the code, but also very
> happy about the possibility of a higher visibility that would bring more
> people and more ideas into the design process that is going to happen
> for their next major release.
> 
> So, people, I'm asking you to judge the idea to create a community,
> rather than the current dbXML implementation which is only a way to give
> to users the meat the look for in that area, but then attract them for
> new development and further research.
> 
> Sorry for the long mail.
> 
> Please, place your vote.
> 
> Thanks.
> 
> Stefano.
> 
> 
> 
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
> 

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


RE: [vote] A native XML database project under Apache

Posted by Ki-Nam Choi <kc...@acm.org>.
+1

thanks,
KI

-----Original Message-----
From: Stefano Mazzocchi [mailto:stefano@apache.org]
Sent: Thursday, October 18, 2001 3:53 PM
To: Apache XML; Kimbro Staken
Subject: [vote] A native XML database project under Apache


Hi,

while the world of native XML databases is full of marketing hype and
promises, it is evident (for all those who tried) that mapping general
XML schemas to relational databases can be sometimes very painful and
not very efficient.

In fact, it is widely recognized from the database research community
that while well structured can be easily and efficiently mapped to a
relational database, less structured (often called semi-structured) data
is much more difficult to map.

Don't get me wrong: there are a number of way to store XML in a database
to add ACID properties to XML documents, but while this is a
straightforward process for very repeatitive and well structured schemas
(invoices, stock quotes, money transactions), it is not so for
semi-structured schemas such as DocBook, SVG or even XSLT.

I here you say: I use BLOBS and I'm fine with them. I'm sure you are,
but in all honesty, I'm not. And for a few reasons:

1) each documentation system requires a repository for document. This is
often called "content management system". Since publishing is going
toward replacing all content with an XML syntax (and we all love to see
that happening in full extend), we must consider that such a system will
require a persistent way to manage the content and a fast and efficient
way to query it.

If you use BLOBS you loose an efficient way to look into the blobs
themselves so you are doomed before you even start.

You can fragment the XML document into relational mapping to
semi-structured data (and remember that documentation is almost always
semi-structured!) but it can be shown that this is hard, very expensive
and might require (depending on the document schema) a very high number
of nested queries to translate even a very simple XPath expression.

Add complexities such as namespaces and the proposed XQL and you see
that a XQL -> SQL might well be possible but is clearly going to become
a nightmare to manage and very painful to optimize for efficiency.

The remaining solution is to create a specific solution that leaves
structured data to RDBMS (where they really shine, no question about it)
but moves semi-structured data over to a more specific and
algorithmically optimized system.

Note that while ODBMS were supposed to solve the problem of
semi-structured data, they, in fact, do not.

This is why we need a native XML DB solution with full support for
namespaced content, XPath and XQL for querying, RDF for metadata.

2) so, the content management system that everybody is crying out loud
for requires a storage solution and I believe that a native XML DB is
the way to go.

Also because:

3) if we ever want to get deeper into the semantic web (and I,
personally, want), we must forget well structured data. Vocabularies
such as RDF, RDFSchema, Topic Maps and the like are *not* going to be
easily mapped into relational databases and efficiently searched.

So, this is why I propose the creation of a project hosted here under
xml.apache.org to implement this effort.

Since it's generally very hard to bootstrap an open development
community without some code to start working on, I suggest to start this
project over the code that the dbXML guys are willing to donate to the
ASF in order to create such development community that can research and
implement in this new field and, by doing so, hopefully lead the way
reducing the marketing crap and the hype around this.

FYI, dbXML (www.dbxml.org) is an implementation of a native XML database
written in the Java language that is close to reaching its first final
release.

I've been talking to one of the community leaders (here copied) that
independently came out with my same conclusion and wanted to propose
dbXML for donation even before I expressed my intentions.

Also Sam Ruby has been subscribed to their development list watching
over them.

dbXML was created with the sponsor of a commercial entity called "dbXML
Group" which still exists but has no economic energy to continue its
development and the main developers are now working on the project
unpaid.

But I'd like something to be clear: I'm *NOT* proposing that Apache
takes over 'dbXML group' to save dbXML and continue its development. I'm
proposing that Apache creates a new project for the creation of a
production quality native XML database solution that implements existing
and future standards (and hopefully have the power to influence their
establishment) and that in order to help bootstrap the community, we
start with the current dbXML implementation which is going to be donated
to the ASF.

To show this and to avoid confusion with past releases and the "dbXML
group" commercial entity, the project is *NOT* going to be called Apache
dbXML, but rather something without acronims, in the spirit of
xml.apache.org.

Kimbro and I have been talking about "Apache BooBoo", but that is just
the first name that crossed my mind :) If you have better names, please,
let us discuss this publicly if the deal gets approuved.

Anyway, the dbXML folks are willing to donate the code, to change the
name as long as we give proper credit to "dbXML group" for having
bootstrapped and donated the code (as we do for IBM, Lotus, Sun and
others), and more than willing to help in both development, user
support, research, community and evangelization. In fact, if the deal is
accepted by this list, they are even willing to close down the site and
move everything overhere with the new name.

Let me finish by saying that I do not consider important what the actual
code implementation is (few, myself included, might not like some of
their architectural choices, such as the use of CORBA and Jaggernaut),
but I'm *NOT* asking for a vote on their _actual_ technological status,
I'm asking for a vote to create a community that can create, maintain
and show the power of a native XML DB solution.

It might takes years to have something solid enough to compete with big
commercial names, but it is important, IMO, for Apache to have something
to say even on this front by creating a community and attracting people
and their ideas.

In fact, the dbXML guys are willing to donate the code, but also very
happy about the possibility of a higher visibility that would bring more
people and more ideas into the design process that is going to happen
for their next major release.

So, people, I'm asking you to judge the idea to create a community,
rather than the current dbXML implementation which is only a way to give
to users the meat the look for in that area, but then attract them for
new development and further research.

Sorry for the long mail.

Please, place your vote.

Thanks.

Stefano.



---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org