You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Lmhelp <le...@ign.fr> on 2010/06/28 16:18:38 UTC
Lucene - Retrieve extracted/indexed tokens for further analysis
Hi,
Thank you for reading my post.
Here is what I wish I could do.
Having an XML file with the following structure:
------------------------------
<root_element>
<page>
<title>[...]</title>
<text>[...]</text>
</page>
[...]
<page>
<title>[...]</title>
<text>[...]</text>
</page>
</root_element>
------------------------------
I wish I could:
- "ask" Lucene to extract tokens for each "text" element
- "give" me these tokens for further analysis.
--------------------------------------
- "text" element 1 => list of tokens 1
- "text" element 2 => list of tokens 2
[...]
- "text" element n => list of tokens n
--------------------------------------
Is it possible to do such a thing?
Can you put me on the trail?
Thanks and all the best,
--
Lmhelp
--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Retrieve-extracted-indexed-tokens-for-further-analysis-tp927910p927910.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene - Retrieve extracted/indexed tokens for further analysis
Posted by Lmhelp <le...@ign.fr>.
Hi,
Thank you for your answer.
I must work with Java.
Well, suppose (using an XML stream reader) I can provide
"Lucene" with a stream of characters, one for each "text"
element, can I use "Lucene" to extract the corresponding
tokens and store them for further use?
Thanks and all the best,
--
Lmhelp
--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Retrieve-extracted-indexed-tokens-for-further-analysis-tp927910p927968.html
Sent from the Lucene - General mailing list archive at Nabble.com.