You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by manjula wijewickrema <ma...@gmail.com> on 2010/05/14 11:35:49 UTC

Access indexed terms

Hi,

Is it possible to put the indexed terms into an array in lucene. For
example, imagine I have indexed a single document in Lucene and now I want
to acces those terms in the index. Is it possible to retrieve (call) those
terms as array elements? If it is possible, then how?

Thanks,
Manjula

Re: Access indexed terms

Posted by manjula wijewickrema <ma...@gmail.com>.

Dear Andrzej,

Thanx for your valuable help. I also noticed this HighFreqTerms approach in
the Lucene email archive and try to use it. In order to do that I have
downloaded lucene-misc-2.9.1.jar and added org.apache.lucene.misc package
into my project. Now I think I have to call this HighFreqTerms class in my
code. But I was unable to find any guidence of how to do it? If you can pls.
be kind enough to tell me how can I use this class in my code.

Thanx
Manjula


On Fri, May 14, 2010 at 6:16 PM, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-05-14 14:24, manjula wijewickrema wrote:
> > Hi Andrzej
> >
> > Thanx for the reply. But as you have mentioned, creating arrays for
> indexed
> > terms seems to be little difficult. Here my intention is to find the term
> > frequencies (of terms) of an indexed document. I can find the term
> frequency
> > of a particular term (giving as a query) if I specify the term in the
> code.
> > But I really want is to get the term frequency (or even the number of
> times
> > it appears in the document) of the all indexed terms (or high frequency
> > terms) without named them in the code. Is there an alternative way to do
> > that?
>
> Yes, see the discussion here:
>
> https://issues.apache.org/jira/browse/LUCENE-2393
>
>
> --
>  Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Access indexed terms

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 2010-05-14 14:24, manjula wijewickrema wrote:
> Hi Andrzej
> 
> Thanx for the reply. But as you have mentioned, creating arrays for indexed
> terms seems to be little difficult. Here my intention is to find the term
> frequencies (of terms) of an indexed document. I can find the term frequency
> of a particular term (giving as a query) if I specify the term in the code.
> But I really want is to get the term frequency (or even the number of times
> it appears in the document) of the all indexed terms (or high frequency
> terms) without named them in the code. Is there an alternative way to do
> that?

Yes, see the discussion here:

https://issues.apache.org/jira/browse/LUCENE-2393


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Access indexed terms

Posted by manjula wijewickrema <ma...@gmail.com>.

Hi Andrzej

Thanx for the reply. But as you have mentioned, creating arrays for indexed
terms seems to be little difficult. Here my intention is to find the term
frequencies (of terms) of an indexed document. I can find the term frequency
of a particular term (giving as a query) if I specify the term in the code.
But I really want is to get the term frequency (or even the number of times
it appears in the document) of the all indexed terms (or high frequency
terms) without named them in the code. Is there an alternative way to do
that?

Thanks
Manjula


On Fri, May 14, 2010 at 4:00 PM, Andrzej Bialecki <ab...@getopt.org> wrote:

>  On 2010-05-14 11:35, manjula wijewickrema wrote:
> > Hi,
> >
> > Is it possible to put the indexed terms into an array in lucene. For
> > example, imagine I have indexed a single document in Lucene and now I
> want
> > to acces those terms in the index. Is it possible to retrieve (call)
> those
> > terms as array elements? If it is possible, then how?
>
> In short: unless you created TermFrequencyVector when adding the
> document, the answer is "with great difficulty".
>
> For a working code that does this see here:
>
>
> http://code.google.com/p/luke/source/browse/trunk/src/org/getopt/luke/DocReconstructor.java
>
> If you really need such kind of access in your application then add your
> documents with term vectors with offsets and positions. Even then,
> depending on the Analyzer you used, the process is lossy - some input
> data that was discarded by Analyzer is simply no longer available.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Access indexed terms

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 2010-05-14 11:35, manjula wijewickrema wrote:
> Hi,
> 
> Is it possible to put the indexed terms into an array in lucene. For
> example, imagine I have indexed a single document in Lucene and now I want
> to acces those terms in the index. Is it possible to retrieve (call) those
> terms as array elements? If it is possible, then how?

In short: unless you created TermFrequencyVector when adding the
document, the answer is "with great difficulty".

For a working code that does this see here:

http://code.google.com/p/luke/source/browse/trunk/src/org/getopt/luke/DocReconstructor.java

If you really need such kind of access in your application then add your
documents with term vectors with offsets and positions. Even then,
depending on the Analyzer you used, the process is lossy - some input
data that was discarded by Analyzer is simply no longer available.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org