You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by vl...@apache.org on 2002/11/27 09:23:49 UTC
cvs commit: xml-xindice README
vladimir 2002/11/27 00:23:49
Modified: . README
Log:
from the old documentation
Revision Changes Path
1.2 +637 -3 xml-xindice/README
Index: README
===================================================================
RCS file: /home/cvs/xml-xindice/README,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- README 6 Dec 2001 19:33:45 -0000 1.1
+++ README 27 Nov 2002 08:23:49 -0000 1.2
@@ -1,3 +1,637 @@
-Please refer to docs/README for updates about this version. For help about
-installation refer to docs/INSTALL for unix and docs/INSTALL.windows for
-windows. All other documentation can also be found under the docs directory.
+See docs/LICENSE for The Xindice License
+
+
+Table of Contents
+=================
+ Introduction
+ Target Platforms
+ Release Notes
+ Acknowledgments
+
+
+
+Introduction
+------------
+
+Xindice is an open source Native XML Database. It stores and indexes
+compressed XML documents in order to provide that data to a client
+application with very little server-side processing overhead. It
+also provides functionality that is unique to XML data, which can't
+easily be reproduced by relational databases.
+
+
+Target Platforms
+----------------
+
+The Xindice code base is written in Java. Because of this, Xindice
+will work on any operating system that supports the Java Developer's
+Kit. The following is a list of tested platforms and the JDK version
+that was used:
+
+ Platform Java VM Used
+ =========== ====================
+ Linux Sun's Java 2 SDK 1.3 (ulimit -s 2048 for SDK 1.3.1)
+ Linux IBM Java 2 SDK 1.3
+ Windows NT Sun's Java 2 SDK 1.3
+ Solaris Sun's Java 2 SDK 1.3
+ Mac OS X Apple Java 2 SDK 1.3 (Included with Mac OS X)
+
+
+Release Notes
+-------------
+
+Apache Xindice Version 1.1
+=============================
+
+- The network access API is now based on XML-RPC rather then
+ CORBA. This was done for simplification and to eliminate the
+ constant problems with the CORBA ORB and consumption of
+ resources. It also was done to address UTF-8 encoding issues
+ that were present with CORBA. Initial tests show a minimal
+ performance impact from this change.
+
+- All CORBA related code has been removed from the system.
+
+- The server should now fully support the storage and retrieval
+ of documents encoded as UTF-8.
+
+- There is now an embedded version of the XML:DB API. This allows
+ you to build Xindice applications that access the database
+ without using the network. The API should be fully compatible
+ with the network enabled XML:DB API implementation. An embedded
+ database can be used by simply changing the XML:DB URI from
+ xmldb:xindice:// to xmldb:xindice-embed://.
+
+- The xindiceadmin tool has been removed. All commands that were
+ previously only accessible through xindiceadmin are now
+ available through the xindice command. This should make
+ working with the server a little simpler.
+
+- An option was added to the command line tools to allow the
+ specification of namespaces to be used with XPath queries.
+ This option is -s. Refer to the Tools Reference document for
+ more information.
+
+- On the command line tools the confirmation during deletion
+ has been removed. Along with this the -y option that would force
+ automatic deletion has also been removed.
+
+- The command line tools can be run against a network version of
+ Xindice or against a local database. See the -l and -d options
+ to learn more about local database access.
+
+- XMLObjects have been removed.
+
+Apache Xindice Version 1.0
+=============================
+This is the first production release of Xindice. Changes from the
+last release candidate are minimal.
+
+- Fixed a path traversal security problem in the HTTP server.
+- Fixed the Addressbook example to not send data to the client
+ after the connection had already been commited.
+- SAXGenerator now properly generates prefixMapping events.
+
+Known issues in version 1.0:
+
+- UTF-8 Encoding is not entirely clean. Most latin derived
+ languages should be OK, but English is the most
+ robust. Xindice 1.1 will resolve any issues in this area.
+- XPath queries that return a single atomic value (i.e. the value
+ of an attribute) rather then a node will return no result. You
+ must retrieve the containing element to retrieve the content
+ of an attribute.
+- When using XUpdate with JDK 1.4 you must use the
+ standards override mechanism to replace the version of
+ Xalan included in the JDK with the version included in
+ Xindice.
+ See: http://java.sun.com/j2se/1.4/docs/guide/standards/index.html
+ for more information.
+- On Windows, command line queries can have problems with the
+ quote handling of the windows shell. In general you should
+ put double quotes around the entire query string and use
+ single quotes in your XPath.
+- This initial release of Xindice does not have any built in
+ security. If you run it on a public server you should insure
+ that remote access to port 4080 is restricted at the network
+ level. Security will be added in a future release.
+
+Apache Xindice Version 1.0rc2
+=============================
+The focus of this release is on stabilization of the server.
+
+- Fixed the Index corrupted error that some people were seeing
+ with 1.0rc1. If you saw this error it is recommended that
+ you rebuild your database files.
+- Changed the way Xindice locates its files to make it easier
+ to embed the server into another process. Files are now
+ located relative to the xindice.home system property instead
+ of the working directory of the process.
+- Changed the kernel to enable running it embedded without
+ exiting the VM on startup error and exit.
+- Minor encoding fixes in the command line tools. More serious
+ attention will be payed to encoding issues in the 1.1 release
+ of Xindice. As it is some languages such as Russian and
+ Chinese can not be successfully stored in the server. This
+ will be fixed in a Xindice 1.1 release.
+
+Apache Xindice Version 1.0rc1
+=============================
+dbXML is now an Apache project, and has been renamed to Xindice
+(Zeen-dee-chay). Parts of the dbXML 1.5 tree were merged into
+the dbXML 1.0 tree in the process of this name change and
+migration, so we thought it best to release at least one release
+candidate as an Apache project. There are also many changes as a
+result of the branch merging.
+
+- Name changes. There have been a lot of changes in package, class,
+ documentation, and identifier naming throughout the project as a
+ result of the migration to the Apache project. The most important
+ are summarized here.
+
+ - XML:DB URI changes. All XML:DB API uri should now be of the form
+ xmldb:xindice: instead of xmldb:dbxml:
+
+ - Source package changes. If you have any code that imported any
+ org.dbxml.* source code it will need to be changed to import the
+ proper packages from org.apache.xindice.*.
+
+ - XML Namespace changes. XML namespaces that were defined by
+ dbXML have been renamed. The "http://www.dbxml.org/" portion
+ of those namespaces has been changed to
+ "http://xml.apache.org/xindice/"
+
+- The Collection configuration system now uses the Database's
+ system collection instead of the system.xml file. The
+ system.xml file is now read-only, and is used for configuring
+ the server framework. Collection management is read/write and
+ uses the Xindice native file system to maintain configuration.
+
+- As installed the server no longer has any default collections
+ that can store documents. You must create a collection manually
+ before attempting to store any documents in the server.
+
+- Complete JAXP bootstrapping. Xindice will bootstrap with
+ whichever JAXP-capable XML parser the Java VM will resolve.
+ You can override the JAXP SAXParserFactory using vm.cfg. It is
+ not recommended that you override the JAXP DocumentBuilderFactory
+ because Xindice implements an optimized DOM that utilizes the
+ Xindice compression system.
+
+- Lazy writes have been added to the Paged system, which is the
+ foundation for standard Filers and Indexers in Xindice. Long
+ operations (like index creation) will now delay writes until
+ the write buffer is filled or until the operation is completed.
+ This can yield a 10% to 30% performance increase on index
+ creation.
+
+- The --pagesize and --maxkeysize switches now work on Collection
+ creation in addition to Index creation.
+
+
+Version 1.0b4 (Final Beta... No Really)
+=======================================
+After releasing beta 3, we found out that there were some stability
+issues with the latest developer releases of Xalan, whose XPath
+engine we use for our query resolver. Some users were experiencing
+query failures with certain data sets. Because of this, we've had
+to roll back to a previous version of Xalan (2.0.1).
+
+
+Version 1.0b3 (Final Beta)
+==========================
+Beta 3 is the final beta for dbXML before we release our 1.0 FCS
+version. This version provides improved concurrency, as well as
+several bug fixes.
+
+
+Version 1.0b2 (Beta 2!)
+=======================
+Improved stability and scalability of the server.
+
+- ORB Change. In the past JacORB was used as the dbXML CORBA ORB, with
+ this release JacORB has been replaced with OpenORB. It was found
+ JacORB utilized too much memory while running as part of the server
+ which severly limited the capacity of the system.
+
+- The XML:DB API has once again been brought in to conformance with
+ the latest draft.
+
+- Several DOM Level 3 Core methods have been added, and the version
+ of Xerces shipped with dbXML is now the most recent version in the
+ Xerces 1 distribution.
+
+- Several bugs within the XUpdate system have been fixed.
+
+
+Version 1.0b1 (Beta!)
+=====================
+We have reached Beta status. The server is fully functional, and
+the number of bugs should be minimal at this point.
+
+- Namespace support. The query and indexing systems now properly
+ support namespaced elements and attributes (regardless of prefix
+ consistency).
+
+- The most recent draft of the XML:DB API is now supported. This
+ includes namespace support for XPath queries, and a few minor
+ changes to the API.
+
+- A Testing framework has been added under java/tests. It is
+ based on junit and can be used to perform regression testing
+ against the server.
+
+- GZip compression was removed from the filers. It was slow.
+ Also, because it was both buggy and out of our control, we had
+ to get rid of it.
+
+- Lots of little bugs fixed here and there.
+
+
+Version 0.9.1 (The Broken ORB)
+==============================
+Some minor updates, nothing to be alarmed about. Move along.
+
+- The XUpdateQueryService is now available via the XML:DB
+ Collection class.
+
+- A lot of the problems that were being reporting regarding ORB
+ versioning and VM configuration have been resolved.
+
+- Our DOM was broken in respect to DocumentFragments. Also, a bug
+ in reporting node modification status up the tree has been
+ fixed. This was causing XUpdate queries to break in some cases.
+
+- The Exception system has been further refined.
+
+
+Version 0.9 (Feature Complete)
+==============================
+Several major changes have happened to the dbXML code base between
+versions 0.6 and 0.9. The most important of which is that we are
+now feature complete.
+
+- We are now feature complete. All of the features that will be
+ in the 1.0 version of dbXML are now available. All we have to
+ do now is continue to stabilize the server and fix bugs as they
+ pop up. You can consider the status of the project to be Alpha
+ quality now.
+
+- dbXML is now based on an Apache style license. We decided that
+ the LGPL was too restrictive regarding what you could do with
+ the source code. Beyond that, we're using several BSD and
+ Apache licensed libraries, and it seemed unfair that we could
+ build from their code, but they couldn't build from our's.
+
+- dbXML now includes support for the XML:DB XUpdate specification.
+ We've integrated The Infozone Group's Lexus library into dbXML
+ in order to provide support for XUpdate update logic.
+
+- Wire Compression is now supported by the CORBA APIs. The style
+ of compression that is used by our DOM and SAX classes for
+ Document storage is now being exposed via CORBA. This allows
+ Documents and query results to be retrieved without requiring
+ textual serialization on the server or parsing on the client.
+ This capability is transparently supported by the XML:DB API.
+
+- NodeIndexer has split into NameIndexer and ValueIndexer. The
+ ValueIndexer is used as NodeIndexer was, to store values for
+ predicate comparisons. NameIndexer is used to store element
+ references for standalone name components in location paths.
+ Use a type of 'name' when defining an Index to create a
+ NameIndexer.
+
+- Better Exception categorization. Exception fault codes have
+ been further defined, categorized, and broken out by severity.
+ The FaultCodes class now includes several utility methods for
+ generating APIException instances and examining the fault
+ codes that are stored in various types of Exception classes.
+
+- Application has been renamed to Database. Also, references to
+ Application in various methods need to be changed to Database.
+ Ex: getApplication() is now getDatabase()
+
+- An Address Book example is included, built on Tomcat. You can
+ find more information in java/examples/Addressbook/README
+
+- The CORBA ORB used by the server is now easily pluggable. So
+ far, JacORB and OpenORB are known to work.
+
+- SAX support has been added to the XML:DB API implementation.
+
+- The HTTP server port has been changed to 4080 to avoid conflicts
+ on the commonly used 8080 port. This is mainly because Tomcat
+ uses that port. Also, the Gopher port has changed to 4070.
+
+- More and more documentation.
+
+
+Version 0.6 (Much Closer)
+=========================
+In the past couple of weeks, we've made quite a bit of progress in
+building out the server, and contributing to its overall stability.
+There's a lot left to do, but it's getting very close to being
+usable.
+
+- Lots of bug fixes in this version, but many more to come.
+
+- The Developer's Guide has been fleshed out quite a bit, and the
+ Command-line Tools reference has been updated and converted to
+ DocBook format.
+
+- dbXML fully supports the XML:DB API as it is currently published
+ by the XML:DB Initiative. XML:DB API documentation is now
+ included in the distribution.
+
+- Types are now supported by the NodeIndexer to ensure proper
+ sorting. The available types can roughly be mapped to the Java
+ native types (string, short, int, etc...)
+
+- The XPathQueryResolver now supports partial evaluation of some
+ functions and index-based evalution of the starts-with function.
+
+- The XPathQueryResolver also supports the highly experimental,
+ very cool, and potentially catastrophic autoindex feature. By
+ default, it's turned off, so there's nothing to worry about.
+
+- IndexManager now performs background indexing instead of
+ synchronous. Issuing a create index command will now
+ immediately return as successful even though the index itself
+ hasn't yet been built.
+
+- Query results now include a set of namespaced attributes that
+ identify the collection and document that a particular node
+ was retrieved from.
+
+- The command line tools now require an instance name when
+ referencing a collection. The default instance name in a dbXML
+ server is 'db'. So, for example, you might refer to a collection
+ as '/db/root/addressbook'. Also, the short form of some of the
+ action verbs have changed. See the Tools reference for more
+ information.
+
+
+Version 0.5 (Woah!)
+===================
+We've made some major changes to dbXML between version 0.4 and 0.5
+that will affect the type of applications that can be developed
+solely with dbXML, so it's important to read this change log for
+more information.
+
+- dbXML has been broken into three separate projects, with the
+ development focus remaining on the dbXML Core database server.
+ Two other projects: The Juggernaut Server Framework, and dbXML
+ App Services are available as separate CVS trees and are being
+ developed in parallel. The Juggernaut class files are available
+ in a Jar file as part of the distribution. The following is a
+ list of the features that have been removed from the dbXML Core,
+ and where they are now:
+
+ - Juggernaut - cvs co Juggernaut
+ - Service Framework
+ - HTTP Server
+ - App Services - cvs co dbXML-AppServices
+ - GetObject (HTTP Retrieval)
+ - SOAP Support
+ - Cocoon Support
+ - Scripting Support
+ - Schema Compiler
+ - XMLObject Compiler
+
+- We've renamed our packages from com.dbxml.* to org.dbxml.*
+
+- The ENTIRE Filing, Indexing, and Query systems have been
+ completely rearchitected and rewritten pretty much from scratch.
+ As a result:
+
+ - QueryResolvers can be developed and plugged into the QueryEngine.
+ - Full XPath syntax is now supported for Collection queries. This
+ functionality is provided by the XPathQueryResolver.
+ - The Indexing system participates in queries wherever possible.
+ - You can safely add and remove Indexes to existing Collections.
+
+- A new Filer named BTreeFiler is available in addition to
+ HashFiler. BTreeFiler is much more space conservative and doesn't
+ suffer from collision and overflow issues as the Collection begins
+ to grow past its original bounds. Both Filers are useful, but
+ which you choose depends on your needs. By default, dbXML core
+ will use BTreeFilers.
+
+- The Application class now extends Collection and can be thought of
+ as a top-level root Collection. At some point in the future,
+ Application will be renamed Database.
+
+- There have been a few changes to the Collection class. You can no
+ longer store binary data in a Collection, only Documents. The
+ getDocumentSet method allows you to enumerate through the Documents
+ in a particular Collection. Collection has been broken into two
+ classes. CollectionManager contains all management functionality
+ for nested Collections (create, drop, list) while Collection
+ contains functionality for the Collection instance (getDocument,
+ insertDocument, etc...)
+
+- XMLObjects have been scaled back. There is now only one type of
+ XMLObject. Application and Document XMLObjects have been removed.
+ Because Application is now derived from Collection, a standard
+ XMLObject can serve both roles. Document XMLObjects have been
+ removed completely, requiring a developer to implement this
+ functionality manually (it's about 1 line of code). The mapping
+ looks like this:
+
+ ApplicationContext -> XMLObject
+ ApplicationXMLObject -> (gone)
+ CollectionContext -> XMLObject
+ CollectionXMLObject -> SimpleXMLObject
+ DocumentContext -> (gone)
+ DocumentXMLObject -> (gone)
+
+- The dbXML Client API has been replaced by an XML:DB Core Level 1
+ implementation. The XML:DB API is still a work in progress, and
+ is likely to change, but this opens the doors to interoperable
+ XML Database applications. For more information on the XML:DB
+ API, visit http://www.xmldb.org
+
+- The Command-Line Tools have been broken into two separate tools.
+ dbxmladmin provides administrative commands, while dbxml provides
+ user-level commands. The Command-Line Tools now utilize the
+ XML:DB API instead of the Client API. Some new features in the
+ Command-Line Tools include:
+
+ - Server Shutdown - You can now safely shut down the server,
+ instead of having to send it a KILL signal.
+ - Import/Export - You can import/export multiple Documents and
+ directory structures between Collections and the file system.
+ - XMLObject invokation. You can execute XMLObject methods
+ and retrieve their results.
+
+- We're now using JacORB for our CORBA services. The JDK's ORB was
+ very much lacking in a lot of areas.
+
+- JAXP support for creating and parsing dbXML compressed DOM
+ Documents is now available.
+
+- And a whole bunch of other stuff.
+
+
+Version 0.4 (Progress)
+======================
+We've made quite a bit of progress between version 0.3 and 0.4 in
+features and in general system stability and performance.
+
+- The Indexing System and XPath querying are working. The indexing
+ system now allows you to specify a XPath for narrowing individual
+ indexes.
+
+- The Compressed DOM is essentially complete.
+
+- We've integrated Cocoon into dbXML to maximize transformation
+ performance.
+
+- XMLObjects can now be created at various contexts within the
+ server. These are Application, Collection, and Document. The
+ ability to associate business logic at various levels of the
+ repository is a powerful application design/management capability.
+
+ As part of this:
+ - What used to be XMLObjects are now DocumentContext XMLObjects.
+ - What used to be Procedures are now CollectionContext XMLObjects.
+
+- Nested Collections. You can now manage collections of documents in
+ a nested fashion for logically laying out your data stores.
+ Databases have been replaced by top-level Collections.
+
+- The SystemCollection class will automatically compile a Schema
+ using the XMLSchemaCompiler upon calling the setSchema() method.
+
+- XMLSerializable objects are classes whose state can be serialized
+ to and from XML documents. The serialization is not an automated
+ process at the moment, but the ability to introspect an object
+ graph and produce XML is planned for a future release.
+ XMLSerializable objects can be stored/retrieved to/from the
+ database with the Collection set/getObject methods.
+
+ As part of this:
+ - SymbolTables are now represented using XMLSerializable objects.
+ - Schemas are now represented using XMLSerializable objects.
+ - XMLSchemaCompiler now produces XMLSerializable objects.
+
+- A Compressed DOM Symbol Table can be defined in the system
+ configuration for hard-coding or using standardized symbol tables.
+ SystemCollection uses a hard-coded symbol table to store
+ compressed symbol tables.
+
+- A Gopher Service is now available, allowing Gopher-based directory
+ and document browsing and querying of a dbXML repository. Gopher
+ is useful for quickly browsing to documents being stored in the
+ repository.
+
+
+Version 0.3 (Bye Bye C++)
+=========================
+The C++ code is gone. dbXML is now 100% Java code. There have also
+been a few major additions to the system:
+
+- More Documentation. Yippee!
+
+- The Configuration framework is essentially fully functional.
+
+- The Compressed DOM is functional but still in an experimental
+ state. A compressed Collection can be created by setting the
+ compressed attribute to 'true' in the collection element. There
+ are still some missing implementations, especially where DTD types
+ are concerned, but most of the document core should work.
+
+- The foundation for dbXML autolinking is part of the dbXML
+ Compressed DOM system. dbXML will automatically expand elements
+ with links and respect document caching policies in expanding those
+ links. See the User's Guide for more information.
+
+- The Indexing system is getting much closer to completion. Basic
+ XPath querying is also in an experimental state.
+
+- A command line tool for managing the running server. This uses the
+ CORBA APIs to manage the server.
+
+- XML Schema Compiler - The XML Schema compiler takes a W3C XML
+ Schema (xsd) resource, and generates a set of Java classes based on
+ the element, attribute, and element-relationship definitions in the
+ Schema. The compiler still needs a lot of work in order to
+ generate typed attributes (right now everything is a string), but
+ it's a good start. In the future, this compiler will be an
+ internal process, compiling all stored schemas for utilization by
+ XMLObjects (so you don't have to use the DOM directly).
+
+- SOAP Support - All XMLObject and Procedures are automatically
+ exposed by the server as SOAP services (as well as their original
+ native protocol). SOAP support is limited to the capabilities of
+ Procedures and XMLObjects. Object structure serialization may be
+ implemented in a future release.
+
+
+Version 0.2 (Switch To Java)
+============================
+A major architectural shift occurred between 0.1 and 0.2. A design
+that had once consisted of about a 90%/10% C++ to Java ratio, has
+flip-flopped to a 95%/5% Java to C++ ratio. There are several
+reasons for this. First, in order to provide better integration with
+existing open source XML Server architectures, which are almost all
+Java-based, we decided that it would be best to avoid mixing the Java
+and non-Java worlds wherever possible. Second, we would be able to
+afford ourselves a major kick-start by utilizing some of the better
+parts of the Juggernaut architecture in our design. Third, doing XML
+in C++ is a headache. You spend more time worrying about memory
+management than you do in actually writing functioning code. In
+order to maintain a certain level of sanity for our staff, and
+contributors to the dbXML source code, we decided that Java would be
+the best choice for an implementation language.
+
+
+Version 0.1 (In The Beginning)
+==============================
+The Three Filing Systems are sort of finished. There are likely a
+lot of places to optimize them and there are absolutely some
+re-entrant code issues, but these will be ironed out as I actually
+start using the filers with the Parser and Query Engine. HashFiler
+is a disk-based hashed bucket filing system. FSFiler is a filer that
+loads data directly from the operating system's file system based on
+their file name. MemFiler is a memory-based filing system, mainly
+for temporary in-memory tables and query result sets.
+
+A quick note about the HashFiler. dbXML's filing system was not
+written to be disk-space conservative, it was written to be
+incredibly efficient for handling large, variable-sized chunks of
+data. Where systems like gdbm and dbm try to be everything to
+everyone, HashFiler is really targeted for the dbXML project.
+HashFiler provides a simple block read caching mechanism with a
+default size of 50 blocks. All writes are performed immediately.
+
+Blocks should generally be optimized to a multiple of the operating
+system's block size and the number of pages per block should be a
+power of 2 and the resulting size of a page should be large enough to
+store the PageHeader (~64 bytes), key (up to you), and at least a
+fair amount of record data.
+
+HashFiler supports record compression if the size of a record spans
+past a single block and if compression will actually yield a
+compressed value(meaning if the compression actually lengthens the
+record, the compression is canceled). Compression can be toggled
+with the setCompressed method and tested with the isCompressed
+method. If a HashFiler has operated for a time with compression and
+compression is then turned off, existing compressed records are not
+decompressed until rewritten. Compression is performed using zlib
+with a compression method set to Z_BEST_SPEED which will perform well
+against variable length textual data (such as XML documents) but not
+most binary data.
+
+
+Acknowledgments
+---------------
+
+This product includes software developed by the Infozone Group
+(http://www.infozone.org)
+
+This product includes software developed by the XML:DB Initiative
+(http://www.xmldb.org)
+
+This product includes software developed by the Exolab Project
+(http://www.exolab.org)