You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Libbrecht <pa...@activemath.org> on 2005/04/19 21:55:36 UTC

Passing XML objects to the analyzer ?

Hi,

I am working on an index to search XML data in a fixed format that I 
master well...
The idea is that the XML content (which I have as JDOM object) actually 
carries the semantic which would be best converted directly into tokens 
by something like an analyzer. However, adding fields is done not using 
the result of the analysis (or a stream thereof) but using readers or 
strings.

I have two choices and would like to know what's the best:
- make the text passed to the analyzer a simple "instruction" which 
will fetch the XML objects and do the analysis there
- make a pre-analysis step which converts it into tokens of text which 
then my analyzer catches again.
I'd be more inclined for the first solution but I fear there's a catch.

Is there one ?

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Passing XML objects to the analyzer ?

Posted by Paul Libbrecht <pa...@activemath.org>.
Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
> The only catch that I know if is that an Analyzer is invoked on a 
> per-field basis.  I can't tell exactly what you have in mind, but a 
> Lucene Analyzer cannot split data into separate fields itself - it has 
> to have been split prior.

That's an easy one... ok, yes, I was clearly aware of this.

> I'm indexing a lot of XML myself, with JDOM in the middle, and using 
> XPath to extract data per field before building the Document.

So wouldn't Field.Unstored(Object) actually make sense ?
That object, instead of being a reader, would be passed around till the 
analyzer call which would then decide to accept, say, JDOM objects...

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Passing XML objects to the analyzer ?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote:

>
> Hi,
>
> I am working on an index to search XML data in a fixed format that I 
> master well...
> The idea is that the XML content (which I have as JDOM object) 
> actually carries the semantic which would be best converted directly 
> into tokens by something like an analyzer. However, adding fields is 
> done not using the result of the analysis (or a stream thereof) but 
> using readers or strings.
>
> I have two choices and would like to know what's the best:
> - make the text passed to the analyzer a simple "instruction" which 
> will fetch the XML objects and do the analysis there
> - make a pre-analysis step which converts it into tokens of text which 
> then my analyzer catches again.
> I'd be more inclined for the first solution but I fear there's a catch.
>
> Is there one ?

The only catch that I know if is that an Analyzer is invoked on a 
per-field basis.  I can't tell exactly what you have in mind, but a 
Lucene Analyzer cannot split data into separate fields itself - it has 
to have been split prior.

I'm indexing a lot of XML myself, with JDOM in the middle, and using 
XPath to extract data per field before building the Document.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org