You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Lmhelp <le...@ign.fr> on 2010/06/28 16:18:38 UTC

Lucene - Retrieve extracted/indexed tokens for further analysis

Hi,

Thank you for reading my post.

Here is what I wish I could do.

Having an XML file with the following structure:
------------------------------
<root_element>
    <page>
        <title>[...]</title>
        <text>[...]</text>
    </page>
    [...]
    <page> 
        <title>[...]</title>
        <text>[...]</text>
    </page>
</root_element>
------------------------------

I wish I could:
- "ask" Lucene to extract tokens for each "text" element
- "give" me these tokens for further analysis.
 
     --------------------------------------
     - "text" element 1 => list of tokens 1
     - "text" element 2 => list of tokens 2
       [...]
     - "text" element n => list of tokens n
     --------------------------------------

Is it possible to do such a thing?
Can you put me on the trail?

Thanks and all the best,
--
Lmhelp


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Retrieve-extracted-indexed-tokens-for-further-analysis-tp927910p927910.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Lucene - Retrieve extracted/indexed tokens for further analysis

Posted by Lmhelp <le...@ign.fr>.
Hi,

Thank you for your answer.

I must work with Java.

Well, suppose (using an XML stream reader) I can provide
"Lucene" with a stream of characters, one for each "text" 
element, can I use "Lucene" to extract the corresponding
tokens and store them for further use?

Thanks and all the best,
--
Lmhelp

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Retrieve-extracted-indexed-tokens-for-further-analysis-tp927910p927968.html
Sent from the Lucene - General mailing list archive at Nabble.com.