You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Libbrecht <pa...@activemath.org> on 2005/04/19 21:55:36 UTC
Passing XML objects to the analyzer ?
Hi,
I am working on an index to search XML data in a fixed format that I
master well...
The idea is that the XML content (which I have as JDOM object) actually
carries the semantic which would be best converted directly into tokens
by something like an analyzer. However, adding fields is done not using
the result of the analysis (or a stream thereof) but using readers or
strings.
I have two choices and would like to know what's the best:
- make the text passed to the analyzer a simple "instruction" which
will fetch the XML objects and do the analysis there
- make a pre-analysis step which converts it into tokens of text which
then my analyzer catches again.
I'd be more inclined for the first solution but I fear there's a catch.
Is there one ?
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Passing XML objects to the analyzer ?
Posted by Paul Libbrecht <pa...@activemath.org>.
Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
> The only catch that I know if is that an Analyzer is invoked on a
> per-field basis. I can't tell exactly what you have in mind, but a
> Lucene Analyzer cannot split data into separate fields itself - it has
> to have been split prior.
That's an easy one... ok, yes, I was clearly aware of this.
> I'm indexing a lot of XML myself, with JDOM in the middle, and using
> XPath to extract data per field before building the Document.
So wouldn't Field.Unstored(Object) actually make sense ?
That object, instead of being a reader, would be passed around till the
analyzer call which would then decide to accept, say, JDOM objects...
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Passing XML objects to the analyzer ?
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote:
>
> Hi,
>
> I am working on an index to search XML data in a fixed format that I
> master well...
> The idea is that the XML content (which I have as JDOM object)
> actually carries the semantic which would be best converted directly
> into tokens by something like an analyzer. However, adding fields is
> done not using the result of the analysis (or a stream thereof) but
> using readers or strings.
>
> I have two choices and would like to know what's the best:
> - make the text passed to the analyzer a simple "instruction" which
> will fetch the XML objects and do the analysis there
> - make a pre-analysis step which converts it into tokens of text which
> then my analyzer catches again.
> I'd be more inclined for the first solution but I fear there's a catch.
>
> Is there one ?
The only catch that I know if is that an Analyzer is invoked on a
per-field basis. I can't tell exactly what you have in mind, but a
Lucene Analyzer cannot split data into separate fields itself - it has
to have been split prior.
I'm indexing a lot of XML myself, with JDOM in the middle, and using
XPath to extract data per field before building the Document.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org