You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Che Dong <ch...@hotmail.com> on 2003/06/05 18:55:01 UTC

[PLAN]: SAXIndexer, indexing database via XML gateway

In current weblucene project including a SAX Based xml source indexer:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/index/

It can parse  xml data source like following example: 
<?xml version="1.0" encoding="GB2312"?>
<Table>
 <Record id="1">
  <Field name="Id">39314</Field>
  <Field name="Title">title of document</Field>
  <Field name="Author">chedong</Field>
  <Field name="Content">blah blah</Field>
  <Field name="PubTime">2003-06-06</Field>
  <Index name="FullIndex">Title,Content</Index>
  <Index name="TitleIndex" token="no">Author</Index>
 </Record>
 ...
 
</Table>

I use two Index elements in  each Record block to speciefy field => index mapping, The SAXIndexer will parse this xml source into Id, Title, Author, Content ,PubTime into Lucene store only Fields and create another two index fields:
one index field with Title + Content 
one index field Author without token

Recently I notice more and more application provided xml interface very similar to RSS:
for example: you can even dump table into xml output from phpMyAdmin like following:
<?xml version="1.0" encoding="iso-8859-1"?>
<mysql>
  <!-- Table user -->
    <user>
        <Host>localhost</Host>
        <User>root</User>
        <Password></Password>
        <Select_priv>Y</Select_priv>
        <Insert_priv>Y</Insert_priv>
        <Update_priv>Y</Update_priv>
        <Delete_priv>Y</Delete_priv>
        <Create_priv>Y</Create_priv>
        <Drop_priv>Y</Drop_priv>
        <Reload_priv>Y</Reload_priv>
        <Shutdown_priv>Y</Shutdown_priv>
        <Process_priv>Y</Process_priv>
        <File_priv>Y</File_priv>
        <Grant_priv>Y</Grant_priv>
        <References_priv>Y</References_priv>
        <Index_priv>Y</Index_priv>
        <Alter_priv>Y</Alter_priv>
        <Show_db_priv>Y</Show_db_priv>
        <Super_priv>Y</Super_priv>
        <Create_tmp_table_priv>Y</Create_tmp_table_priv>
        <Lock_tables_priv>Y</Lock_tables_priv>
        <Execute_priv>Y</Execute_priv>
        <Repl_slave_priv>Y</Repl_slave_priv>
        <Repl_client_priv>Y</Repl_client_priv>
        <ssl_type></ssl_type>
        <ssl_cipher></ssl_cipher>
        <x509_issuer></x509_issuer>
        <x509_subject></x509_subject>
        <max_questions>0</max_questions>
        <max_updates>0</max_updates>
        <max_connections>0</max_connections>
    </user>
    ...
</mysql>

the SAXIndexer will be able to database xml dump directly if SAXIndexer can let specify field => index mapping rule from enternal program.
for example: 
java IndexRunner -c field_index_mapping.conf -i http://localhost/table_dump.xml

#the config file like following:
FullIndex       Title,Content 
AuthorIndex  Author          no

Hope this SAXIndexer can be added into Lucene demos make lucene end user can make lucene index from current database applications.

Regards

Che, Dong
http://www.chedong.com/