You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Gregory Tarr <Gr...@detica.com> on 2010/07/30 10:16:06 UTC

Closing and reopening readers

I'm having trouble with the IndexReader class as per below: (using
lucene 2.9.1)

RAMDirectory dir = new RAMDirectory();
createIndex(dir);
IndexReader reader = IndexReader.open(dir);
IndexReader reader2 = reader.reopen();
reader.close();
reader2.terms(); // AlreadyClosedException - this IndexReader is closed

Can anyone see where I'm going wrong?

Thanks

Greg


This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.

Re: lucene/solr full text search

Posted by Shuai Weng <sh...@genome.stanford.edu>.

I just tried the long query string as you suggested and it works great.

Thanks,
Shuai

On Jul 30, 2010, at 1:35 PM, Ian Lea wrote:

> Yes, you can do that.  Make a Query for the 30 papers and use that
> with your main query in a BooleanQuery if doing it programatically.
> Or with so few documents and papers to match, just in a long string
> via QueryParser.  See
> http://lucene.apache.org/java/3_0_2/queryparsersyntax.html for details
> on query parser syntax - that is also the answer to your other
> question.
> 
> If performance is a concern and you will be reusing the same set of
> pubmed ids you should look at filters, specifically QueryWrapperFilter
> and CachingWrapperFilter.
> 
> 
> --
> Ian.
> 
> 
> On Fri, Jul 30, 2010 at 6:19 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>> 
>> Sorry for the confusion..
>> 
>> Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as the unique IDs)
>> in the lucene index.  We were wondering if we can search for a given term in a subset of these papers
>> (eg, 30 papers; by providing a list of the pubmed IDs) instead of searching the term in these 7000
>> papers. In another word, we only care about the hits in the 30 given papers. How can we easily filter out
>> the other papers?
>> 
>> Thanks,
>> Shuai
>> 
>> 
>> On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:
>> 
>>> Depending on what exactly you mean by "subset" and "index pool", then yes.
>>> 
>>> If you've got one lucene index containing docs
>>> 
>>> docno: 1
>>> category: computers
>>> text: some words about computers
>>> 
>>> docno: 2
>>> category: computers
>>> text: some more words about computers
>>> 
>>> docno: 3
>>> category: finance
>>> text: some words about finance
>>> 
>>> then a search for "text:words" will match all 3 whereas a search for
>>> "category:computers text:words" will only match 2.
>>> 
>>> 
>>> If this isn't what you are asking about I suggest you provide more detail.
>>> 
>>> 
>>> --
>>> Ian.
>>> 
>>> On Fri, Jul 30, 2010 at 4:15 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> I was wondering if we can search info from a subset of papers
>>>> instead of from the whole index pool.
>>>> 
>>>> Thanks,
>>>> Shuai
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene indexing configuration

Posted by Shuai Weng <sh...@genome.stanford.edu>.

Oh, thanks.

Shuai

On Fri, 20 Aug 2010, Otis Gospodnetic wrote:

> Hi,
>
> Are you actually talking about Solr?  Sounds like it.  Check solr-user@lucene
> list.
>
> Maybe you need to treat those words are protected words?  See the protwords.txt
> file in the conf dir.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: Shuai Weng <sh...@genome.stanford.edu>
>> To: java-user@lucene.apache.org
>> Sent: Fri, August 20, 2010 5:47:31 PM
>> Subject: Re: lucene indexing configuration
>>
>>
>> Hey,
>>
>> Currently we have indexed some biological full text pages,   I was wondering
>> how to config the schema.xml such that
>>
>> the gene names  'met1', 'met2', 'met3' will be treated as different words.
>> Currently they are  all mapped to 'met'.
>>
>>
>> Thanks,
>> Shuai
>> ---------------------------------------------------------------------
>> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For  additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene indexing configuration

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi,

Are you actually talking about Solr?  Sounds like it.  Check solr-user@lucene 
list.

Maybe you need to treat those words are protected words?  See the protwords.txt 
file in the conf dir.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Shuai Weng <sh...@genome.stanford.edu>
> To: java-user@lucene.apache.org
> Sent: Fri, August 20, 2010 5:47:31 PM
> Subject: Re: lucene indexing configuration
> 
> 
> Hey,
> 
> Currently we have indexed some biological full text pages,   I was wondering 
>how to config the schema.xml such that 
>
> the gene names  'met1', 'met2', 'met3' will be treated as different words. 
>Currently they are  all mapped to 'met'. 
>
> 
> Thanks,
> Shuai
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For  additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene indexing configuration

Posted by Shuai Weng <sh...@genome.stanford.edu>.

Hey,

Currently we have indexed some biological full text pages,  I was wondering how to config the schema.xml such that 
the gene names 'met1', 'met2', 'met3' will be treated as different words. Currently they are all mapped to 'met'. 

Thanks,
Shuai
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene/solr full text search

Posted by Ian Lea <ia...@gmail.com>.

Yes, you can do that.  Make a Query for the 30 papers and use that
with your main query in a BooleanQuery if doing it programatically.
Or with so few documents and papers to match, just in a long string
via QueryParser.  See
http://lucene.apache.org/java/3_0_2/queryparsersyntax.html for details
on query parser syntax - that is also the answer to your other
question.

If performance is a concern and you will be reusing the same set of
pubmed ids you should look at filters, specifically QueryWrapperFilter
and CachingWrapperFilter.


--
Ian.


On Fri, Jul 30, 2010 at 6:19 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>
> Sorry for the confusion..
>
> Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as the unique IDs)
> in the lucene index.  We were wondering if we can search for a given term in a subset of these papers
> (eg, 30 papers; by providing a list of the pubmed IDs) instead of searching the term in these 7000
> papers. In another word, we only care about the hits in the 30 given papers. How can we easily filter out
> the other papers?
>
> Thanks,
> Shuai
>
>
> On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:
>
>> Depending on what exactly you mean by "subset" and "index pool", then yes.
>>
>> If you've got one lucene index containing docs
>>
>> docno: 1
>> category: computers
>> text: some words about computers
>>
>> docno: 2
>> category: computers
>> text: some more words about computers
>>
>> docno: 3
>> category: finance
>> text: some words about finance
>>
>> then a search for "text:words" will match all 3 whereas a search for
>> "category:computers text:words" will only match 2.
>>
>>
>> If this isn't what you are asking about I suggest you provide more detail.
>>
>>
>> --
>> Ian.
>>
>> On Fri, Jul 30, 2010 at 4:15 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>>>
>>> Hey,
>>>
>>> I was wondering if we can search info from a subset of papers
>>> instead of from the whole index pool.
>>>
>>> Thanks,
>>> Shuai
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene/solr full text search

Posted by Shuai Weng <sh...@genome.stanford.edu>.

Sorry for the confusion.. 

Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as the unique IDs) 
in the lucene index.  We were wondering if we can search for a given term in a subset of these papers
(eg, 30 papers; by providing a list of the pubmed IDs) instead of searching the term in these 7000
papers. In another word, we only care about the hits in the 30 given papers. How can we easily filter out
the other papers?

Thanks,
Shuai


On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:

> Depending on what exactly you mean by "subset" and "index pool", then yes.
> 
> If you've got one lucene index containing docs
> 
> docno: 1
> category: computers
> text: some words about computers
> 
> docno: 2
> category: computers
> text: some more words about computers
> 
> docno: 3
> category: finance
> text: some words about finance
> 
> then a search for "text:words" will match all 3 whereas a search for
> "category:computers text:words" will only match 2.
> 
> 
> If this isn't what you are asking about I suggest you provide more detail.
> 
> 
> --
> Ian.
> 
> On Fri, Jul 30, 2010 at 4:15 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>> 
>> Hey,
>> 
>> I was wondering if we can search info from a subset of papers
>> instead of from the whole index pool.
>> 
>> Thanks,
>> Shuai
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene/solr full text search

Posted by Shuai Weng <sh...@genome.stanford.edu>.

Hi Ian,

In your example below, how do we set the parameters so we can search for "category:computers" AND "text:words"?

Thanks,
Shuai

On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:

> Depending on what exactly you mean by "subset" and "index pool", then yes.
> 
> If you've got one lucene index containing docs
> 
> docno: 1
> category: computers
> text: some words about computers
> 
> docno: 2
> category: computers
> text: some more words about computers
> 
> docno: 3
> category: finance
> text: some words about finance
> 
> then a search for "text:words" will match all 3 whereas a search for
> "category:computers text:words" will only match 2.
> 
> 
> If this isn't what you are asking about I suggest you provide more detail.
> 
> 
> --
> Ian.
> 
> On Fri, Jul 30, 2010 at 4:15 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>> 
>> Hey,
>> 
>> I was wondering if we can search info from a subset of papers
>> instead of from the whole index pool.
>> 
>> Thanks,
>> Shuai
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: lucene/solr full text search

Posted by Ian Lea <ia...@gmail.com>.

Depending on what exactly you mean by "subset" and "index pool", then yes.

If you've got one lucene index containing docs

docno: 1
category: computers
text: some words about computers

docno: 2
category: computers
text: some more words about computers

docno: 3
category: finance
text: some words about finance

then a search for "text:words" will match all 3 whereas a search for
"category:computers text:words" will only match 2.


If this isn't what you are asking about I suggest you provide more detail.


--
Ian.

On Fri, Jul 30, 2010 at 4:15 PM, Shuai Weng <sh...@genome.stanford.edu> wrote:
>
> Hey,
>
> I was wondering if we can search info from a subset of papers
> instead of from the whole index pool.
>
> Thanks,
> Shuai

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

lucene/solr full text search

Posted by Shuai Weng <sh...@genome.stanford.edu>.

Hey,

I was wondering if we can search info from a subset of papers
instead of from the whole index pool.

Thanks,
Shuai



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Closing and reopening readers

Posted by Ian Lea <ia...@gmail.com>.

http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29

...
If the index has not changed since this instance was (re)opened, then
this call is a NOOP and returns this instance


--
Ian.

On Fri, Jul 30, 2010 at 9:16 AM, Gregory Tarr <Gr...@detica.com> wrote:
> I'm having trouble with the IndexReader class as per below: (using
> lucene 2.9.1)
>
> RAMDirectory dir = new RAMDirectory();
> createIndex(dir);
> IndexReader reader = IndexReader.open(dir);
> IndexReader reader2 = reader.reopen();
> reader.close();
> reader2.terms(); // AlreadyClosedException - this IndexReader is closed
>
> Can anyone see where I'm going wrong?
>
> Thanks
>
> Greg
>
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org