You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Che Dong <ch...@hotmail.com> on 2003/06/05 18:55:01 UTC
[PLAN]: SAXIndexer, indexing database via XML gateway
In current weblucene project including a SAX Based xml source indexer:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/index/
It can parse xml data source like following example:
<?xml version="1.0" encoding="GB2312"?>
<Table>
<Record id="1">
<Field name="Id">39314</Field>
<Field name="Title">title of document</Field>
<Field name="Author">chedong</Field>
<Field name="Content">blah blah</Field>
<Field name="PubTime">2003-06-06</Field>
<Index name="FullIndex">Title,Content</Index>
<Index name="TitleIndex" token="no">Author</Index>
</Record>
...
</Table>
I use two Index elements in each Record block to speciefy field => index mapping, The SAXIndexer will parse this xml source into Id, Title, Author, Content ,PubTime into Lucene store only Fields and create another two index fields:
one index field with Title + Content
one index field Author without token
Recently I notice more and more application provided xml interface very similar to RSS:
for example: you can even dump table into xml output from phpMyAdmin like following:
<?xml version="1.0" encoding="iso-8859-1"?>
<mysql>
<!-- Table user -->
<user>
<Host>localhost</Host>
<User>root</User>
<Password></Password>
<Select_priv>Y</Select_priv>
<Insert_priv>Y</Insert_priv>
<Update_priv>Y</Update_priv>
<Delete_priv>Y</Delete_priv>
<Create_priv>Y</Create_priv>
<Drop_priv>Y</Drop_priv>
<Reload_priv>Y</Reload_priv>
<Shutdown_priv>Y</Shutdown_priv>
<Process_priv>Y</Process_priv>
<File_priv>Y</File_priv>
<Grant_priv>Y</Grant_priv>
<References_priv>Y</References_priv>
<Index_priv>Y</Index_priv>
<Alter_priv>Y</Alter_priv>
<Show_db_priv>Y</Show_db_priv>
<Super_priv>Y</Super_priv>
<Create_tmp_table_priv>Y</Create_tmp_table_priv>
<Lock_tables_priv>Y</Lock_tables_priv>
<Execute_priv>Y</Execute_priv>
<Repl_slave_priv>Y</Repl_slave_priv>
<Repl_client_priv>Y</Repl_client_priv>
<ssl_type></ssl_type>
<ssl_cipher></ssl_cipher>
<x509_issuer></x509_issuer>
<x509_subject></x509_subject>
<max_questions>0</max_questions>
<max_updates>0</max_updates>
<max_connections>0</max_connections>
</user>
...
</mysql>
the SAXIndexer will be able to database xml dump directly if SAXIndexer can let specify field => index mapping rule from enternal program.
for example:
java IndexRunner -c field_index_mapping.conf -i http://localhost/table_dump.xml
#the config file like following:
FullIndex Title,Content
AuthorIndex Author no
Hope this SAXIndexer can be added into Lucene demos make lucene end user can make lucene index from current database applications.
Regards
Che, Dong
http://www.chedong.com/