You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by semelak ss <se...@yahoo.com> on 2008/11/01 18:03:07 UTC

Re: Exact Phrase Query

When using Luke,, searching for the followings gives me hits now:
"insurer storm"
The synatx of the query as parsed by Luke is :
word:"insurer storm"

The code I am using is as follows:
----------------------
_searcher = new IndexSearcher(INDEX_DIR);
_parser = new QueryParser("word", new WhitespaceAnalyzer());
Query q = _parser.parse(query);
System.out.println(q.toString()); // this outputs ->  word:"insurer storm"
TopDocs vv= _searcher.search(q, 1);
Hits tmph = _searcher.search(q);
---------------------------------

both vv and tmph give no results (their size is 0)



--- On Fri, 10/31/08, semelak ss <se...@yahoo.com> wrote:

> From: semelak ss <se...@yahoo.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org
> Date: Friday, October 31, 2008, 9:41 AM
> For indexing, I use the following:
> ===========
> writer = new IndexWriter(INDEX_DIR,new
> WhitespaceAnalyzer(),true
> ,IndexWriter.MaxFieldLength.UNLIMITED);
> Document doc = new Document();            
> String tmpword = this.getProperForm(word1, word2);
> doc.add(new Field("WORDS", tmpword,
> Field.Store.YES, Field.Index.TOKENIZED));
> doc.add(new Field("score", Double.toString(score)
> , Field.Store.YES, Field.Index.NO));
> writer.addDocument(adoc);
> ============
> 
> For searching,, I use the following (query =
> "homeowner work" gives no hits ,,
> "homeowner" gives results):
> ============
> _searcher = new IndexSearcher(INDEX_DIR);
> _parser = new QueryParser("WORDS", new
> WhitespaceAnalyzer());
> q = _parser.parse(query);
> Hits tmph = _searcher.search(q);
> 
> ============
> 
> A sample document (contained in the index) is the
> following:
> 
> filed: value
> -----: -----
> WORDS:"homeowners work"
> score: 0.1515417
> 
> 
> Also, please note that I tried using Luke to browse the
> index and the fields seem to be filled out with words just
> as expected. Searching, however, with exact phrases yield no
> answer. Searching with single words gives hits.
> 
> 
> 
> 
> --- On Fri, 10/31/08, Erick Erickson
> <er...@gmail.com> wrote:
> 
> > From: Erick Erickson <er...@gmail.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org, semelak_14@yahoo.com
> > Date: Friday, October 31, 2008, 5:57 AM
> > You need to give us more information for meaningful
> replies,
> > like
> > the analyzers you use when indexing and searching, the
> > exact
> > query you use, perhaps the snippets of the code, etc.
> > 
> > That said, things to check:
> > Get a copy of Luke and examine your index. You can
> even
> > run queries through that tool and see what gets sent
> to the
> > database and what responses you get with those
> analyzers.
> > 
> > Make sure you're analyzers at query and index time
> are
> > doing
> > what you expect. Query.toString() is your friend. If
> you
> > don't
> > take the time to understand analyzers, you'll
> spend
> > lots of time
> > spinning your wheels.
> > 
> > And you really should wait more than 9 minutes before
> > pinging
> > the list....
> > 
> > Best
> > Erick
> > 
> > 
> > 
> > On Fri, Oct 31, 2008 at 8:44 AM, semelak ss
> > <se...@yahoo.com> wrote:
> > 
> > > I have documents containing multiple words in the
> the
> > field "word"
> > > for example, one of the documents contain in the
> field
> > "word" the
> > > following:
> > > homeowners work
> > >
> > > When searching for single words (i.e. homewoners
> ) I
> > get hits.
> > >
> > > However, searching for the exact phrase
> > "homeowners work" gives me no
> > > hits!! I use the double quotes when searching for
> > exact phrases.
> > >
> > > Any idea why ??
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > >
> 
> 
>       
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Exact Phrase Query

Posted by semelak ss <se...@yahoo.com>.
Hello Erick,

If it weren't for your help and kind response, I would be struggling now with the initial problem I had. The solution to that problem turned out to be the one you mentioned in your response (indexwriters/indexreaders both being opened at the same time).

The problem I mentioned in my last response is different from the initial question I posted. It's really a request for thoughts and people inputs on how to improve searching given the structure of the data described in my last response.

Again, I appreciate your help (and I am not saying this because I am looking forward to your response.) 


--- On Sun, 11/2/08, Erick Erickson <er...@gmail.com> wrote:

> From: Erick Erickson <er...@gmail.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org, semelak_14@yahoo.com
> Date: Sunday, November 2, 2008, 12:11 PM
> Sorry, but I've really run out of patience here. You
> have consistently
> stated only
> part of the problem, never posting enough information to
> allow me to answer
> helpfully. You haven't even taken the time to proofread
> your posts, which
> has wasted my (limited, volunteer) time.
> 
> In the future, please consider the fact that people trying
> to help with your
> 
> problem are volunteering their time and respect that fact
> by making a
> greater effort to make it easy and efficient for us to help
> with what is,
> after all, *your* problem.
> 
> Best
> Erick
> 
> On Sun, Nov 2, 2008 at 11:03 AM, semelak ss
> <se...@yahoo.com> wrote:
> 
> > Also, is there a way to pass a null or no tokenizer
> when writing to the
> > index the field "words" ?? I have no need
> for tokenizing the words and the
> > exact query will always be known.
> >
> > To understand better the problem, when are performing
> words comparison in
> > large number of text documents. Each word in each
> sentence is compared with
> > the rest of the words in the other sentences. A
> similarity score is computed
> > for each pair and stored in the index for fast
> retrieval in the future
> > (computation of the score is resource intensive). What
> we used to do is
> > construct a matrix and store the words in alphabetical
> order (for binary
> > search) and then load the words when the program is
> launched. Due to the
> > size of the files generated, the update was a real
> struggle.
> >
> > Thus, we decided to use Lucene and store a score for
> each pair of words.
> > Updates should be much easier and faster, however
> improving the search is
> > something we're looking into. We are new to
> Lucene, and would appreciate any
> > input in this regard.
> >
> > Knowing that the document would contain only two
> fields : score and words
> > and that no tokenization is needed, what would be the
> most efficient way for
> > implementing this index using Lucene ?
> >
> >
> > --- On Sun, 11/2/08, semelak ss
> <se...@yahoo.com> wrote:
> >
> > > From: semelak ss <se...@yahoo.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org
> > > Date: Sunday, November 2, 2008, 7:26 AM
> > > I was in a hurry when copying and pasting the
> code. What
> > > I've been using is only writer. RamWriter was
> never used
> > > as it never really worked (thanks to you, I now
> understand
> > > the reason).
> > >
> > > The above is not really related to the problem I
> was
> > > facing. I modified my code so that an
> > > indexreader/indexwriter is opened right before
> the words
> > > comparison takes place and is closed right after.
> (currently
> > > not using RamDir due to the problems faced
> earlier)
> > >
> > > Considering that the program is basically a loop
> that does
> > > thousands and thousands of comparison, this is
> definitely
> > > not the most efficient way of handling things.
> > >
> > > I would appreciate any input in this regard on
> how to
> > > improve the efficiency.
> > >
> > >
> > >
> > > --- On Sat, 11/1/08, Erick Erickson
> > > <er...@gmail.com> wrote:
> > >
> > > > From: Erick Erickson
> <er...@gmail.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org,
> semelak_14@yahoo.com
> > > > Date: Saturday, November 1, 2008, 5:06 PM
> > > > ahhhh, finally. I'm almost completely
> sure you
> > > can't
> > > > *write* to a
> > > > RAMDirectory
> > > > and expect the underlying FSDir to be
> updated. The
> > > intent
> > > > of RAMDirectorys
> > > > is to *read* in an index from disk and keep
> it in
> > > memory.
> > > > Essentially I
> > > > believe
> > > > that your RAMDirecotry constructor is taking
> a
> > > snapshot of
> > > > the underlying
> > > > disk index, modifying that in-memory copy,
> and
> > > throwing it
> > > > away without
> > > > ever writing it to disk. I wouldn't
> expect opening
> > > the
> > > > FSDirectory after
> > > > writing
> > > > to the RAMDirectory to find anything. Ever.
> > > >
> > > > If you really need the RAMDir, I suspect
> you'll
> > > have to
> > > > open an FS-based
> > > > writer as well as a RAM-based writer, and
> write to
> > > both
> > > > when necessary.
> > > > You'll probably also have to open/search
> your
> > > RAM-based
> > > > index as the
> > > > faster alternative to re-opening the
> FS-based index.
> > > Either
> > > > way, reopening
> > > > the index is probably expensive, are you
> sure you need
> > > to?
> > > > Is there a way
> > > > to keep your information in an internal data
> structure
> > > for
> > > > some period of
> > > > time?
> > > >
> > > > Best
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Sat, Nov 1, 2008 at 6:31 PM, semelak ss
> > > > <se...@yahoo.com> wrote:
> > > >
> > > > > I am not entirely sure if this can be
> the cause,
> > > but
> > > > here is something I
> > > > > thought might be related:
> > > > > The idea is have an index containing
> documents
> > > where
> > > > each document has a
> > > > > combination of two words : word1 and
> word2 and a
> > > score
> > > > for these two words.
> > > > > The index would be searched first if
> the two
> > > words
> > > > exist, and if not the
> > > > > score would be computed on the fly and
> then added
> > > to
> > > > the index. This process
> > > > > would be repeated thousands of times
> for
> > > thousands of
> > > > words.
> > > > >
> > > > > Hence, I have an indexwriter and a
> searcher
> > > > > --------------------
> > > > > RAMDirectory ramDir = new
> > > RAMDirectory(INDEX_DIR);
> > > > > IndexWriter  ramWriter = new
> IndexWriter(ramDir,
> > > new
> > > > WhitespaceAnalyzer(),
> > > > >
> true,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > > writer = new IndexWriter(INDEX_DIR,new
> > > > WhitespaceAnalyzer(),true
> > > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > >
> > > > > FSDirectory fsdir =
> > > > FSDirectory.getDirectory(INDEX_DIR);
> > > > > IndexReader ir =
> IndexReader.open(fsdir);
> > > > > _searcher = new IndexSearcher(ir);
> > > > > --------------------
> > > > >
> > > > > The indexWriter is closed near the end
> of the
> > > program
> > > > (it's open while
> > > > > searching for words combinations ).
> > > > >
> > > > > When using Luke,, I was able to search
> > > successfully
> > > > for exact phrases. My
> > > > > guess is that the problem I am facing
> has
> > > something to
> > > > do with the
> > > > > indexWriter, but I can not pinpoint the
> exact
> > > cause of
> > > > the problem.
> > > > >
> > > > >
> > > > > --- On Sat, 11/1/08, semelak ss
> > > > <se...@yahoo.com> wrote:
> > > > >
> > > > > > From: semelak ss
> > > <se...@yahoo.com>
> > > > > > Subject: Re: Exact Phrase Query
> > > > > > To: java-user@lucene.apache.org
> > > > > > Date: Saturday, November 1, 2008,
> 10:03 AM
> > > > > > When using Luke,, searching for
> the
> > > followings
> > > > gives me hits
> > > > > > now:
> > > > > > "insurer storm"
> > > > > > The synatx of the query as parsed
> by Luke is
> > > :
> > > > > > word:"insurer storm"
> > > > > >
> > > > > > The code I am using is as follows:
> > > > > > ----------------------
> > > > > > _searcher = new
> IndexSearcher(INDEX_DIR);
> > > > > > _parser = new
> QueryParser("word",
> > > new
> > > > > > WhitespaceAnalyzer());
> > > > > > Query q = _parser.parse(query);
> > > > > > System.out.println(q.toString());
> // this
> > > outputs
> > > > ->
> > > > > > word:"insurer storm"
> > > > > > TopDocs vv= _searcher.search(q,
> 1);
> > > > > > Hits tmph = _searcher.search(q);
> > > > > > ---------------------------------
> > > > > >
> > > > > > both vv and tmph give no results
> (their size
> > > is
> > > > 0)
> > > > > >
> > > > > >
> > > > > >
> > > > > > --- On Fri, 10/31/08, semelak ss
> > > > > > <se...@yahoo.com>
> wrote:
> > > > > >
> > > > > > > From: semelak ss
> > > > <se...@yahoo.com>
> > > > > > > Subject: Re: Exact Phrase
> Query
> > > > > > > To:
> java-user@lucene.apache.org
> > > > > > > Date: Friday, October 31,
> 2008, 9:41 AM
> > > > > > > For indexing, I use the
> following:
> > > > > > > ===========
> > > > > > > writer = new
> IndexWriter(INDEX_DIR,new
> > > > > > > WhitespaceAnalyzer(),true
> > > > > > >
> ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > > > > Document doc = new
> Document();
> > > > > > > String tmpword =
> > > this.getProperForm(word1,
> > > > word2);
> > > > > > > doc.add(new
> Field("WORDS",
> > > > tmpword,
> > > > > > > Field.Store.YES,
> > > Field.Index.TOKENIZED));
> > > > > > > doc.add(new
> Field("score",
> > > > > > Double.toString(score)
> > > > > > > , Field.Store.YES,
> Field.Index.NO));
> > > > > > > writer.addDocument(adoc);
> > > > > > > ============
> > > > > > >
> > > > > > > For searching,, I use the
> following
> > > (query =
> > > > > > > "homeowner work"
> gives no
> > > hits ,,
> > > > > > > "homeowner" gives
> results):
> > > > > > > ============
> > > > > > > _searcher = new
> > > IndexSearcher(INDEX_DIR);
> > > > > > > _parser = new
> > > QueryParser("WORDS",
> > > > new
> > > > > > > WhitespaceAnalyzer());
> > > > > > > q = _parser.parse(query);
> > > > > > > Hits tmph =
> _searcher.search(q);
> > > > > > >
> > > > > > > ============
> > > > > > >
> > > > > > > A sample document (contained
> in the
> > > index)
> > > > is the
> > > > > > > following:
> > > > > > >
> > > > > > > filed: value
> > > > > > > -----: -----
> > > > > > > WORDS:"homeowners
> work"
> > > > > > > score: 0.1515417
> > > > > > >
> > > > > > >
> > > > > > > Also, please note that I
> tried using
> > > Luke to
> > > > browse
> > > > > > the
> > > > > > > index and the fields seem to
> be filled
> > > out
> > > > with words
> > > > > > just
> > > > > > > as expected. Searching,
> however, with
> > > exact
> > > > phrases
> > > > > > yield no
> > > > > > > answer. Searching with single
> words
> > > gives
> > > > hits.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --- On Fri, 10/31/08, Erick
> Erickson
> > > > > > >
> <er...@gmail.com> wrote:
> > > > > > >
> > > > > > > > From: Erick Erickson
> > > > > > <er...@gmail.com>
> > > > > > > > Subject: Re: Exact
> Phrase Query
> > > > > > > > To:
> java-user@lucene.apache.org,
> > > > > > semelak_14@yahoo.com
> > > > > > > > Date: Friday, October
> 31, 2008,
> > > 5:57 AM
> > > > > > > > You need to give us more
> > > information
> > > > for
> > > > > > meaningful
> > > > > > > replies,
> > > > > > > > like
> > > > > > > > the analyzers you use
> when
> > > indexing and
> > > > > > searching, the
> > > > > > > > exact
> > > > > > > > query you use, perhaps
> the
> > > snippets of
> > > > the code,
> > > > > > etc.
> > > > > > > >
> > > > > > > > That said, things to
> check:
> > > > > > > > Get a copy of Luke and
> examine
> > > your
> > > > index. You
> > > > > > can
> > > > > > > even
> > > > > > > > run queries through that
> tool and
> > > see
> > > > what gets
> > > > > > sent
> > > > > > > to the
> > > > > > > > database and what
> responses you
> > > get
> > > > with those
> > > > > > > analyzers.
> > > > > > > >
> > > > > > > > Make sure you're
> analyzers at
> > > query
> > > > and index
> > > > > > time
> > > > > > > are
> > > > > > > > doing
> > > > > > > > what you expect.
> Query.toString()
> > > is
> > > > your friend.
> > > > > > If
> > > > > > > you
> > > > > > > > don't
> > > > > > > > take the time to
> understand
> > > analyzers,
> > > > you'll
> > > > > > > spend
> > > > > > > > lots of time
> > > > > > > > spinning your wheels.
> > > > > > > >
> > > > > > > > And you really should
> wait more
> > > than 9
> > > > minutes
> > > > > > before
> > > > > > > > pinging
> > > > > > > > the list....
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Erick
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Oct 31, 2008 at
> 8:44 AM,
> > > > semelak ss
> > > > > > > >
> <se...@yahoo.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > I have documents
> containing
> > > > multiple words
> > > > > > in the
> > > > > > > the
> > > > > > > > field "word"
> > > > > > > > > for example, one of
> the
> > > documents
> > > > contain in
> > > > > > the
> > > > > > > field
> > > > > > > > "word" the
> > > > > > > > > following:
> > > > > > > > > homeowners work
> > > > > > > > >
> > > > > > > > > When searching for
> single
> > > words
> > > > (i.e.
> > > > > > homewoners
> > > > > > > ) I
> > > > > > > > get hits.
> > > > > > > > >
> > > > > > > > > However, searching
> for the
> > > exact
> > > > phrase
> > > > > > > > "homeowners
> work" gives
> > > me no
> > > > > > > > > hits!! I use the
> double
> > > quotes
> > > > when
> > > > > > searching for
> > > > > > > > exact phrases.
> > > > > > > > >
> > > > > > > > > Any idea why ??
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe,
> e-mail:
> > > > > > > >
> > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > For additional
> commands,
> > > e-mail:
> > > > > > > >
> java-user-help@lucene.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> > > > > > >
> java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands,
> e-mail:
> > > > > > >
> java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> > > > > >
> java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Exact Phrase Query

Posted by Erick Erickson <er...@gmail.com>.
Sorry, but I've really run out of patience here. You have consistently
stated only
part of the problem, never posting enough information to allow me to answer
helpfully. You haven't even taken the time to proofread your posts, which
has wasted my (limited, volunteer) time.

In the future, please consider the fact that people trying to help with your

problem are volunteering their time and respect that fact by making a
greater effort to make it easy and efficient for us to help with what is,
after all, *your* problem.

Best
Erick

On Sun, Nov 2, 2008 at 11:03 AM, semelak ss <se...@yahoo.com> wrote:

> Also, is there a way to pass a null or no tokenizer when writing to the
> index the field "words" ?? I have no need for tokenizing the words and the
> exact query will always be known.
>
> To understand better the problem, when are performing words comparison in
> large number of text documents. Each word in each sentence is compared with
> the rest of the words in the other sentences. A similarity score is computed
> for each pair and stored in the index for fast retrieval in the future
> (computation of the score is resource intensive). What we used to do is
> construct a matrix and store the words in alphabetical order (for binary
> search) and then load the words when the program is launched. Due to the
> size of the files generated, the update was a real struggle.
>
> Thus, we decided to use Lucene and store a score for each pair of words.
> Updates should be much easier and faster, however improving the search is
> something we're looking into. We are new to Lucene, and would appreciate any
> input in this regard.
>
> Knowing that the document would contain only two fields : score and words
> and that no tokenization is needed, what would be the most efficient way for
> implementing this index using Lucene ?
>
>
> --- On Sun, 11/2/08, semelak ss <se...@yahoo.com> wrote:
>
> > From: semelak ss <se...@yahoo.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org
> > Date: Sunday, November 2, 2008, 7:26 AM
> > I was in a hurry when copying and pasting the code. What
> > I've been using is only writer. RamWriter was never used
> > as it never really worked (thanks to you, I now understand
> > the reason).
> >
> > The above is not really related to the problem I was
> > facing. I modified my code so that an
> > indexreader/indexwriter is opened right before the words
> > comparison takes place and is closed right after. (currently
> > not using RamDir due to the problems faced earlier)
> >
> > Considering that the program is basically a loop that does
> > thousands and thousands of comparison, this is definitely
> > not the most efficient way of handling things.
> >
> > I would appreciate any input in this regard on how to
> > improve the efficiency.
> >
> >
> >
> > --- On Sat, 11/1/08, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> > > From: Erick Erickson <er...@gmail.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org, semelak_14@yahoo.com
> > > Date: Saturday, November 1, 2008, 5:06 PM
> > > ahhhh, finally. I'm almost completely sure you
> > can't
> > > *write* to a
> > > RAMDirectory
> > > and expect the underlying FSDir to be updated. The
> > intent
> > > of RAMDirectorys
> > > is to *read* in an index from disk and keep it in
> > memory.
> > > Essentially I
> > > believe
> > > that your RAMDirecotry constructor is taking a
> > snapshot of
> > > the underlying
> > > disk index, modifying that in-memory copy, and
> > throwing it
> > > away without
> > > ever writing it to disk. I wouldn't expect opening
> > the
> > > FSDirectory after
> > > writing
> > > to the RAMDirectory to find anything. Ever.
> > >
> > > If you really need the RAMDir, I suspect you'll
> > have to
> > > open an FS-based
> > > writer as well as a RAM-based writer, and write to
> > both
> > > when necessary.
> > > You'll probably also have to open/search your
> > RAM-based
> > > index as the
> > > faster alternative to re-opening the FS-based index.
> > Either
> > > way, reopening
> > > the index is probably expensive, are you sure you need
> > to?
> > > Is there a way
> > > to keep your information in an internal data structure
> > for
> > > some period of
> > > time?
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Sat, Nov 1, 2008 at 6:31 PM, semelak ss
> > > <se...@yahoo.com> wrote:
> > >
> > > > I am not entirely sure if this can be the cause,
> > but
> > > here is something I
> > > > thought might be related:
> > > > The idea is have an index containing documents
> > where
> > > each document has a
> > > > combination of two words : word1 and word2 and a
> > score
> > > for these two words.
> > > > The index would be searched first if the two
> > words
> > > exist, and if not the
> > > > score would be computed on the fly and then added
> > to
> > > the index. This process
> > > > would be repeated thousands of times for
> > thousands of
> > > words.
> > > >
> > > > Hence, I have an indexwriter and a searcher
> > > > --------------------
> > > > RAMDirectory ramDir = new
> > RAMDirectory(INDEX_DIR);
> > > > IndexWriter  ramWriter = new IndexWriter(ramDir,
> > new
> > > WhitespaceAnalyzer(),
> > > > true,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > writer = new IndexWriter(INDEX_DIR,new
> > > WhitespaceAnalyzer(),true
> > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > >
> > > > FSDirectory fsdir =
> > > FSDirectory.getDirectory(INDEX_DIR);
> > > > IndexReader ir = IndexReader.open(fsdir);
> > > > _searcher = new IndexSearcher(ir);
> > > > --------------------
> > > >
> > > > The indexWriter is closed near the end of the
> > program
> > > (it's open while
> > > > searching for words combinations ).
> > > >
> > > > When using Luke,, I was able to search
> > successfully
> > > for exact phrases. My
> > > > guess is that the problem I am facing has
> > something to
> > > do with the
> > > > indexWriter, but I can not pinpoint the exact
> > cause of
> > > the problem.
> > > >
> > > >
> > > > --- On Sat, 11/1/08, semelak ss
> > > <se...@yahoo.com> wrote:
> > > >
> > > > > From: semelak ss
> > <se...@yahoo.com>
> > > > > Subject: Re: Exact Phrase Query
> > > > > To: java-user@lucene.apache.org
> > > > > Date: Saturday, November 1, 2008, 10:03 AM
> > > > > When using Luke,, searching for the
> > followings
> > > gives me hits
> > > > > now:
> > > > > "insurer storm"
> > > > > The synatx of the query as parsed by Luke is
> > :
> > > > > word:"insurer storm"
> > > > >
> > > > > The code I am using is as follows:
> > > > > ----------------------
> > > > > _searcher = new IndexSearcher(INDEX_DIR);
> > > > > _parser = new QueryParser("word",
> > new
> > > > > WhitespaceAnalyzer());
> > > > > Query q = _parser.parse(query);
> > > > > System.out.println(q.toString()); // this
> > outputs
> > > ->
> > > > > word:"insurer storm"
> > > > > TopDocs vv= _searcher.search(q, 1);
> > > > > Hits tmph = _searcher.search(q);
> > > > > ---------------------------------
> > > > >
> > > > > both vv and tmph give no results (their size
> > is
> > > 0)
> > > > >
> > > > >
> > > > >
> > > > > --- On Fri, 10/31/08, semelak ss
> > > > > <se...@yahoo.com> wrote:
> > > > >
> > > > > > From: semelak ss
> > > <se...@yahoo.com>
> > > > > > Subject: Re: Exact Phrase Query
> > > > > > To: java-user@lucene.apache.org
> > > > > > Date: Friday, October 31, 2008, 9:41 AM
> > > > > > For indexing, I use the following:
> > > > > > ===========
> > > > > > writer = new IndexWriter(INDEX_DIR,new
> > > > > > WhitespaceAnalyzer(),true
> > > > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > > > Document doc = new Document();
> > > > > > String tmpword =
> > this.getProperForm(word1,
> > > word2);
> > > > > > doc.add(new Field("WORDS",
> > > tmpword,
> > > > > > Field.Store.YES,
> > Field.Index.TOKENIZED));
> > > > > > doc.add(new Field("score",
> > > > > Double.toString(score)
> > > > > > , Field.Store.YES, Field.Index.NO));
> > > > > > writer.addDocument(adoc);
> > > > > > ============
> > > > > >
> > > > > > For searching,, I use the following
> > (query =
> > > > > > "homeowner work" gives no
> > hits ,,
> > > > > > "homeowner" gives results):
> > > > > > ============
> > > > > > _searcher = new
> > IndexSearcher(INDEX_DIR);
> > > > > > _parser = new
> > QueryParser("WORDS",
> > > new
> > > > > > WhitespaceAnalyzer());
> > > > > > q = _parser.parse(query);
> > > > > > Hits tmph = _searcher.search(q);
> > > > > >
> > > > > > ============
> > > > > >
> > > > > > A sample document (contained in the
> > index)
> > > is the
> > > > > > following:
> > > > > >
> > > > > > filed: value
> > > > > > -----: -----
> > > > > > WORDS:"homeowners work"
> > > > > > score: 0.1515417
> > > > > >
> > > > > >
> > > > > > Also, please note that I tried using
> > Luke to
> > > browse
> > > > > the
> > > > > > index and the fields seem to be filled
> > out
> > > with words
> > > > > just
> > > > > > as expected. Searching, however, with
> > exact
> > > phrases
> > > > > yield no
> > > > > > answer. Searching with single words
> > gives
> > > hits.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --- On Fri, 10/31/08, Erick Erickson
> > > > > > <er...@gmail.com> wrote:
> > > > > >
> > > > > > > From: Erick Erickson
> > > > > <er...@gmail.com>
> > > > > > > Subject: Re: Exact Phrase Query
> > > > > > > To: java-user@lucene.apache.org,
> > > > > semelak_14@yahoo.com
> > > > > > > Date: Friday, October 31, 2008,
> > 5:57 AM
> > > > > > > You need to give us more
> > information
> > > for
> > > > > meaningful
> > > > > > replies,
> > > > > > > like
> > > > > > > the analyzers you use when
> > indexing and
> > > > > searching, the
> > > > > > > exact
> > > > > > > query you use, perhaps the
> > snippets of
> > > the code,
> > > > > etc.
> > > > > > >
> > > > > > > That said, things to check:
> > > > > > > Get a copy of Luke and examine
> > your
> > > index. You
> > > > > can
> > > > > > even
> > > > > > > run queries through that tool and
> > see
> > > what gets
> > > > > sent
> > > > > > to the
> > > > > > > database and what responses you
> > get
> > > with those
> > > > > > analyzers.
> > > > > > >
> > > > > > > Make sure you're analyzers at
> > query
> > > and index
> > > > > time
> > > > > > are
> > > > > > > doing
> > > > > > > what you expect. Query.toString()
> > is
> > > your friend.
> > > > > If
> > > > > > you
> > > > > > > don't
> > > > > > > take the time to understand
> > analyzers,
> > > you'll
> > > > > > spend
> > > > > > > lots of time
> > > > > > > spinning your wheels.
> > > > > > >
> > > > > > > And you really should wait more
> > than 9
> > > minutes
> > > > > before
> > > > > > > pinging
> > > > > > > the list....
> > > > > > >
> > > > > > > Best
> > > > > > > Erick
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Oct 31, 2008 at 8:44 AM,
> > > semelak ss
> > > > > > > <se...@yahoo.com>
> > wrote:
> > > > > > >
> > > > > > > > I have documents containing
> > > multiple words
> > > > > in the
> > > > > > the
> > > > > > > field "word"
> > > > > > > > for example, one of the
> > documents
> > > contain in
> > > > > the
> > > > > > field
> > > > > > > "word" the
> > > > > > > > following:
> > > > > > > > homeowners work
> > > > > > > >
> > > > > > > > When searching for single
> > words
> > > (i.e.
> > > > > homewoners
> > > > > > ) I
> > > > > > > get hits.
> > > > > > > >
> > > > > > > > However, searching for the
> > exact
> > > phrase
> > > > > > > "homeowners work" gives
> > me no
> > > > > > > > hits!! I use the double
> > quotes
> > > when
> > > > > searching for
> > > > > > > exact phrases.
> > > > > > > >
> > > > > > > > Any idea why ??
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail:
> > > > > > >
> > java-user-unsubscribe@lucene.apache.org
> > > > > > > > For additional commands,
> > e-mail:
> > > > > > > java-user-help@lucene.apache.org
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Exact Phrase Query

Posted by semelak ss <se...@yahoo.com>.
Also, is there a way to pass a null or no tokenizer when writing to the index the field "words" ?? I have no need for tokenizing the words and the exact query will always be known. 

To understand better the problem, when are performing words comparison in large number of text documents. Each word in each sentence is compared with the rest of the words in the other sentences. A similarity score is computed for each pair and stored in the index for fast retrieval in the future (computation of the score is resource intensive). What we used to do is construct a matrix and store the words in alphabetical order (for binary search) and then load the words when the program is launched. Due to the size of the files generated, the update was a real struggle.

Thus, we decided to use Lucene and store a score for each pair of words. Updates should be much easier and faster, however improving the search is something we're looking into. We are new to Lucene, and would appreciate any input in this regard. 

Knowing that the document would contain only two fields : score and words and that no tokenization is needed, what would be the most efficient way for implementing this index using Lucene ? 


--- On Sun, 11/2/08, semelak ss <se...@yahoo.com> wrote:

> From: semelak ss <se...@yahoo.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org
> Date: Sunday, November 2, 2008, 7:26 AM
> I was in a hurry when copying and pasting the code. What
> I've been using is only writer. RamWriter was never used
> as it never really worked (thanks to you, I now understand
> the reason).
> 
> The above is not really related to the problem I was
> facing. I modified my code so that an
> indexreader/indexwriter is opened right before the words
> comparison takes place and is closed right after. (currently
> not using RamDir due to the problems faced earlier)
> 
> Considering that the program is basically a loop that does
> thousands and thousands of comparison, this is definitely
> not the most efficient way of handling things.
> 
> I would appreciate any input in this regard on how to
> improve the efficiency. 
> 
> 
> 
> --- On Sat, 11/1/08, Erick Erickson
> <er...@gmail.com> wrote:
> 
> > From: Erick Erickson <er...@gmail.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org, semelak_14@yahoo.com
> > Date: Saturday, November 1, 2008, 5:06 PM
> > ahhhh, finally. I'm almost completely sure you
> can't
> > *write* to a
> > RAMDirectory
> > and expect the underlying FSDir to be updated. The
> intent
> > of RAMDirectorys
> > is to *read* in an index from disk and keep it in
> memory.
> > Essentially I
> > believe
> > that your RAMDirecotry constructor is taking a
> snapshot of
> > the underlying
> > disk index, modifying that in-memory copy, and
> throwing it
> > away without
> > ever writing it to disk. I wouldn't expect opening
> the
> > FSDirectory after
> > writing
> > to the RAMDirectory to find anything. Ever.
> > 
> > If you really need the RAMDir, I suspect you'll
> have to
> > open an FS-based
> > writer as well as a RAM-based writer, and write to
> both
> > when necessary.
> > You'll probably also have to open/search your
> RAM-based
> > index as the
> > faster alternative to re-opening the FS-based index.
> Either
> > way, reopening
> > the index is probably expensive, are you sure you need
> to?
> > Is there a way
> > to keep your information in an internal data structure
> for
> > some period of
> > time?
> > 
> > Best
> > Erick
> > 
> > 
> > 
> > On Sat, Nov 1, 2008 at 6:31 PM, semelak ss
> > <se...@yahoo.com> wrote:
> > 
> > > I am not entirely sure if this can be the cause,
> but
> > here is something I
> > > thought might be related:
> > > The idea is have an index containing documents
> where
> > each document has a
> > > combination of two words : word1 and word2 and a
> score
> > for these two words.
> > > The index would be searched first if the two
> words
> > exist, and if not the
> > > score would be computed on the fly and then added
> to
> > the index. This process
> > > would be repeated thousands of times for
> thousands of
> > words.
> > >
> > > Hence, I have an indexwriter and a searcher
> > > --------------------
> > > RAMDirectory ramDir = new
> RAMDirectory(INDEX_DIR);
> > > IndexWriter  ramWriter = new IndexWriter(ramDir,
> new
> > WhitespaceAnalyzer(),
> > > true,IndexWriter.MaxFieldLength.UNLIMITED);
> > > writer = new IndexWriter(INDEX_DIR,new
> > WhitespaceAnalyzer(),true
> > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > >
> > > FSDirectory fsdir =
> > FSDirectory.getDirectory(INDEX_DIR);
> > > IndexReader ir = IndexReader.open(fsdir);
> > > _searcher = new IndexSearcher(ir);
> > > --------------------
> > >
> > > The indexWriter is closed near the end of the
> program
> > (it's open while
> > > searching for words combinations ).
> > >
> > > When using Luke,, I was able to search
> successfully
> > for exact phrases. My
> > > guess is that the problem I am facing has
> something to
> > do with the
> > > indexWriter, but I can not pinpoint the exact
> cause of
> > the problem.
> > >
> > >
> > > --- On Sat, 11/1/08, semelak ss
> > <se...@yahoo.com> wrote:
> > >
> > > > From: semelak ss
> <se...@yahoo.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org
> > > > Date: Saturday, November 1, 2008, 10:03 AM
> > > > When using Luke,, searching for the
> followings
> > gives me hits
> > > > now:
> > > > "insurer storm"
> > > > The synatx of the query as parsed by Luke is
> :
> > > > word:"insurer storm"
> > > >
> > > > The code I am using is as follows:
> > > > ----------------------
> > > > _searcher = new IndexSearcher(INDEX_DIR);
> > > > _parser = new QueryParser("word",
> new
> > > > WhitespaceAnalyzer());
> > > > Query q = _parser.parse(query);
> > > > System.out.println(q.toString()); // this
> outputs
> > ->
> > > > word:"insurer storm"
> > > > TopDocs vv= _searcher.search(q, 1);
> > > > Hits tmph = _searcher.search(q);
> > > > ---------------------------------
> > > >
> > > > both vv and tmph give no results (their size
> is
> > 0)
> > > >
> > > >
> > > >
> > > > --- On Fri, 10/31/08, semelak ss
> > > > <se...@yahoo.com> wrote:
> > > >
> > > > > From: semelak ss
> > <se...@yahoo.com>
> > > > > Subject: Re: Exact Phrase Query
> > > > > To: java-user@lucene.apache.org
> > > > > Date: Friday, October 31, 2008, 9:41 AM
> > > > > For indexing, I use the following:
> > > > > ===========
> > > > > writer = new IndexWriter(INDEX_DIR,new
> > > > > WhitespaceAnalyzer(),true
> > > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > > Document doc = new Document();
> > > > > String tmpword =
> this.getProperForm(word1,
> > word2);
> > > > > doc.add(new Field("WORDS",
> > tmpword,
> > > > > Field.Store.YES,
> Field.Index.TOKENIZED));
> > > > > doc.add(new Field("score",
> > > > Double.toString(score)
> > > > > , Field.Store.YES, Field.Index.NO));
> > > > > writer.addDocument(adoc);
> > > > > ============
> > > > >
> > > > > For searching,, I use the following
> (query =
> > > > > "homeowner work" gives no
> hits ,,
> > > > > "homeowner" gives results):
> > > > > ============
> > > > > _searcher = new
> IndexSearcher(INDEX_DIR);
> > > > > _parser = new
> QueryParser("WORDS",
> > new
> > > > > WhitespaceAnalyzer());
> > > > > q = _parser.parse(query);
> > > > > Hits tmph = _searcher.search(q);
> > > > >
> > > > > ============
> > > > >
> > > > > A sample document (contained in the
> index)
> > is the
> > > > > following:
> > > > >
> > > > > filed: value
> > > > > -----: -----
> > > > > WORDS:"homeowners work"
> > > > > score: 0.1515417
> > > > >
> > > > >
> > > > > Also, please note that I tried using
> Luke to
> > browse
> > > > the
> > > > > index and the fields seem to be filled
> out
> > with words
> > > > just
> > > > > as expected. Searching, however, with
> exact
> > phrases
> > > > yield no
> > > > > answer. Searching with single words
> gives
> > hits.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --- On Fri, 10/31/08, Erick Erickson
> > > > > <er...@gmail.com> wrote:
> > > > >
> > > > > > From: Erick Erickson
> > > > <er...@gmail.com>
> > > > > > Subject: Re: Exact Phrase Query
> > > > > > To: java-user@lucene.apache.org,
> > > > semelak_14@yahoo.com
> > > > > > Date: Friday, October 31, 2008,
> 5:57 AM
> > > > > > You need to give us more
> information
> > for
> > > > meaningful
> > > > > replies,
> > > > > > like
> > > > > > the analyzers you use when
> indexing and
> > > > searching, the
> > > > > > exact
> > > > > > query you use, perhaps the
> snippets of
> > the code,
> > > > etc.
> > > > > >
> > > > > > That said, things to check:
> > > > > > Get a copy of Luke and examine
> your
> > index. You
> > > > can
> > > > > even
> > > > > > run queries through that tool and
> see
> > what gets
> > > > sent
> > > > > to the
> > > > > > database and what responses you
> get
> > with those
> > > > > analyzers.
> > > > > >
> > > > > > Make sure you're analyzers at
> query
> > and index
> > > > time
> > > > > are
> > > > > > doing
> > > > > > what you expect. Query.toString()
> is
> > your friend.
> > > > If
> > > > > you
> > > > > > don't
> > > > > > take the time to understand
> analyzers,
> > you'll
> > > > > spend
> > > > > > lots of time
> > > > > > spinning your wheels.
> > > > > >
> > > > > > And you really should wait more
> than 9
> > minutes
> > > > before
> > > > > > pinging
> > > > > > the list....
> > > > > >
> > > > > > Best
> > > > > > Erick
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 31, 2008 at 8:44 AM,
> > semelak ss
> > > > > > <se...@yahoo.com>
> wrote:
> > > > > >
> > > > > > > I have documents containing
> > multiple words
> > > > in the
> > > > > the
> > > > > > field "word"
> > > > > > > for example, one of the
> documents
> > contain in
> > > > the
> > > > > field
> > > > > > "word" the
> > > > > > > following:
> > > > > > > homeowners work
> > > > > > >
> > > > > > > When searching for single
> words
> > (i.e.
> > > > homewoners
> > > > > ) I
> > > > > > get hits.
> > > > > > >
> > > > > > > However, searching for the
> exact
> > phrase
> > > > > > "homeowners work" gives
> me no
> > > > > > > hits!! I use the double
> quotes
> > when
> > > > searching for
> > > > > > exact phrases.
> > > > > > >
> > > > > > > Any idea why ??
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> > > > > >
> java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands,
> e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > >
> 
> 
>       
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Exact Phrase Query

Posted by semelak ss <se...@yahoo.com>.
I was in a hurry when copying and pasting the code. What I've been using is only writer. RamWriter was never used as it never really worked (thanks to you, I now understand the reason).

The above is not really related to the problem I was facing. I modified my code so that an indexreader/indexwriter is opened right before the words comparison takes place and is closed right after. (currently not using RamDir due to the problems faced earlier)

Considering that the program is basically a loop that does thousands and thousands of comparison, this is definitely not the most efficient way of handling things.

I would appreciate any input in this regard on how to improve the efficiency. 



--- On Sat, 11/1/08, Erick Erickson <er...@gmail.com> wrote:

> From: Erick Erickson <er...@gmail.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org, semelak_14@yahoo.com
> Date: Saturday, November 1, 2008, 5:06 PM
> ahhhh, finally. I'm almost completely sure you can't
> *write* to a
> RAMDirectory
> and expect the underlying FSDir to be updated. The intent
> of RAMDirectorys
> is to *read* in an index from disk and keep it in memory.
> Essentially I
> believe
> that your RAMDirecotry constructor is taking a snapshot of
> the underlying
> disk index, modifying that in-memory copy, and throwing it
> away without
> ever writing it to disk. I wouldn't expect opening the
> FSDirectory after
> writing
> to the RAMDirectory to find anything. Ever.
> 
> If you really need the RAMDir, I suspect you'll have to
> open an FS-based
> writer as well as a RAM-based writer, and write to both
> when necessary.
> You'll probably also have to open/search your RAM-based
> index as the
> faster alternative to re-opening the FS-based index. Either
> way, reopening
> the index is probably expensive, are you sure you need to?
> Is there a way
> to keep your information in an internal data structure for
> some period of
> time?
> 
> Best
> Erick
> 
> 
> 
> On Sat, Nov 1, 2008 at 6:31 PM, semelak ss
> <se...@yahoo.com> wrote:
> 
> > I am not entirely sure if this can be the cause, but
> here is something I
> > thought might be related:
> > The idea is have an index containing documents where
> each document has a
> > combination of two words : word1 and word2 and a score
> for these two words.
> > The index would be searched first if the two words
> exist, and if not the
> > score would be computed on the fly and then added to
> the index. This process
> > would be repeated thousands of times for thousands of
> words.
> >
> > Hence, I have an indexwriter and a searcher
> > --------------------
> > RAMDirectory ramDir = new RAMDirectory(INDEX_DIR);
> > IndexWriter  ramWriter = new IndexWriter(ramDir, new
> WhitespaceAnalyzer(),
> > true,IndexWriter.MaxFieldLength.UNLIMITED);
> > writer = new IndexWriter(INDEX_DIR,new
> WhitespaceAnalyzer(),true
> > ,IndexWriter.MaxFieldLength.UNLIMITED);
> >
> > FSDirectory fsdir =
> FSDirectory.getDirectory(INDEX_DIR);
> > IndexReader ir = IndexReader.open(fsdir);
> > _searcher = new IndexSearcher(ir);
> > --------------------
> >
> > The indexWriter is closed near the end of the program
> (it's open while
> > searching for words combinations ).
> >
> > When using Luke,, I was able to search successfully
> for exact phrases. My
> > guess is that the problem I am facing has something to
> do with the
> > indexWriter, but I can not pinpoint the exact cause of
> the problem.
> >
> >
> > --- On Sat, 11/1/08, semelak ss
> <se...@yahoo.com> wrote:
> >
> > > From: semelak ss <se...@yahoo.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org
> > > Date: Saturday, November 1, 2008, 10:03 AM
> > > When using Luke,, searching for the followings
> gives me hits
> > > now:
> > > "insurer storm"
> > > The synatx of the query as parsed by Luke is :
> > > word:"insurer storm"
> > >
> > > The code I am using is as follows:
> > > ----------------------
> > > _searcher = new IndexSearcher(INDEX_DIR);
> > > _parser = new QueryParser("word", new
> > > WhitespaceAnalyzer());
> > > Query q = _parser.parse(query);
> > > System.out.println(q.toString()); // this outputs
> ->
> > > word:"insurer storm"
> > > TopDocs vv= _searcher.search(q, 1);
> > > Hits tmph = _searcher.search(q);
> > > ---------------------------------
> > >
> > > both vv and tmph give no results (their size is
> 0)
> > >
> > >
> > >
> > > --- On Fri, 10/31/08, semelak ss
> > > <se...@yahoo.com> wrote:
> > >
> > > > From: semelak ss
> <se...@yahoo.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org
> > > > Date: Friday, October 31, 2008, 9:41 AM
> > > > For indexing, I use the following:
> > > > ===========
> > > > writer = new IndexWriter(INDEX_DIR,new
> > > > WhitespaceAnalyzer(),true
> > > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > > Document doc = new Document();
> > > > String tmpword = this.getProperForm(word1,
> word2);
> > > > doc.add(new Field("WORDS",
> tmpword,
> > > > Field.Store.YES, Field.Index.TOKENIZED));
> > > > doc.add(new Field("score",
> > > Double.toString(score)
> > > > , Field.Store.YES, Field.Index.NO));
> > > > writer.addDocument(adoc);
> > > > ============
> > > >
> > > > For searching,, I use the following (query =
> > > > "homeowner work" gives no hits ,,
> > > > "homeowner" gives results):
> > > > ============
> > > > _searcher = new IndexSearcher(INDEX_DIR);
> > > > _parser = new QueryParser("WORDS",
> new
> > > > WhitespaceAnalyzer());
> > > > q = _parser.parse(query);
> > > > Hits tmph = _searcher.search(q);
> > > >
> > > > ============
> > > >
> > > > A sample document (contained in the index)
> is the
> > > > following:
> > > >
> > > > filed: value
> > > > -----: -----
> > > > WORDS:"homeowners work"
> > > > score: 0.1515417
> > > >
> > > >
> > > > Also, please note that I tried using Luke to
> browse
> > > the
> > > > index and the fields seem to be filled out
> with words
> > > just
> > > > as expected. Searching, however, with exact
> phrases
> > > yield no
> > > > answer. Searching with single words gives
> hits.
> > > >
> > > >
> > > >
> > > >
> > > > --- On Fri, 10/31/08, Erick Erickson
> > > > <er...@gmail.com> wrote:
> > > >
> > > > > From: Erick Erickson
> > > <er...@gmail.com>
> > > > > Subject: Re: Exact Phrase Query
> > > > > To: java-user@lucene.apache.org,
> > > semelak_14@yahoo.com
> > > > > Date: Friday, October 31, 2008, 5:57 AM
> > > > > You need to give us more information
> for
> > > meaningful
> > > > replies,
> > > > > like
> > > > > the analyzers you use when indexing and
> > > searching, the
> > > > > exact
> > > > > query you use, perhaps the snippets of
> the code,
> > > etc.
> > > > >
> > > > > That said, things to check:
> > > > > Get a copy of Luke and examine your
> index. You
> > > can
> > > > even
> > > > > run queries through that tool and see
> what gets
> > > sent
> > > > to the
> > > > > database and what responses you get
> with those
> > > > analyzers.
> > > > >
> > > > > Make sure you're analyzers at query
> and index
> > > time
> > > > are
> > > > > doing
> > > > > what you expect. Query.toString() is
> your friend.
> > > If
> > > > you
> > > > > don't
> > > > > take the time to understand analyzers,
> you'll
> > > > spend
> > > > > lots of time
> > > > > spinning your wheels.
> > > > >
> > > > > And you really should wait more than 9
> minutes
> > > before
> > > > > pinging
> > > > > the list....
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 31, 2008 at 8:44 AM,
> semelak ss
> > > > > <se...@yahoo.com> wrote:
> > > > >
> > > > > > I have documents containing
> multiple words
> > > in the
> > > > the
> > > > > field "word"
> > > > > > for example, one of the documents
> contain in
> > > the
> > > > field
> > > > > "word" the
> > > > > > following:
> > > > > > homeowners work
> > > > > >
> > > > > > When searching for single words
> (i.e.
> > > homewoners
> > > > ) I
> > > > > get hits.
> > > > > >
> > > > > > However, searching for the exact
> phrase
> > > > > "homeowners work" gives me no
> > > > > > hits!! I use the double quotes
> when
> > > searching for
> > > > > exact phrases.
> > > > > >
> > > > > > Any idea why ??
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Exact Phrase Query

Posted by Erick Erickson <er...@gmail.com>.
ahhhh, finally. I'm almost completely sure you can't *write* to a
RAMDirectory
and expect the underlying FSDir to be updated. The intent of RAMDirectorys
is to *read* in an index from disk and keep it in memory. Essentially I
believe
that your RAMDirecotry constructor is taking a snapshot of the underlying
disk index, modifying that in-memory copy, and throwing it away without
ever writing it to disk. I wouldn't expect opening the FSDirectory after
writing
to the RAMDirectory to find anything. Ever.

If you really need the RAMDir, I suspect you'll have to open an FS-based
writer as well as a RAM-based writer, and write to both when necessary.
You'll probably also have to open/search your RAM-based index as the
faster alternative to re-opening the FS-based index. Either way, reopening
the index is probably expensive, are you sure you need to? Is there a way
to keep your information in an internal data structure for some period of
time?

Best
Erick



On Sat, Nov 1, 2008 at 6:31 PM, semelak ss <se...@yahoo.com> wrote:

> I am not entirely sure if this can be the cause, but here is something I
> thought might be related:
> The idea is have an index containing documents where each document has a
> combination of two words : word1 and word2 and a score for these two words.
> The index would be searched first if the two words exist, and if not the
> score would be computed on the fly and then added to the index. This process
> would be repeated thousands of times for thousands of words.
>
> Hence, I have an indexwriter and a searcher
> --------------------
> RAMDirectory ramDir = new RAMDirectory(INDEX_DIR);
> IndexWriter  ramWriter = new IndexWriter(ramDir, new WhitespaceAnalyzer(),
> true,IndexWriter.MaxFieldLength.UNLIMITED);
> writer = new IndexWriter(INDEX_DIR,new WhitespaceAnalyzer(),true
> ,IndexWriter.MaxFieldLength.UNLIMITED);
>
> FSDirectory fsdir = FSDirectory.getDirectory(INDEX_DIR);
> IndexReader ir = IndexReader.open(fsdir);
> _searcher = new IndexSearcher(ir);
> --------------------
>
> The indexWriter is closed near the end of the program (it's open while
> searching for words combinations ).
>
> When using Luke,, I was able to search successfully for exact phrases. My
> guess is that the problem I am facing has something to do with the
> indexWriter, but I can not pinpoint the exact cause of the problem.
>
>
> --- On Sat, 11/1/08, semelak ss <se...@yahoo.com> wrote:
>
> > From: semelak ss <se...@yahoo.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org
> > Date: Saturday, November 1, 2008, 10:03 AM
> > When using Luke,, searching for the followings gives me hits
> > now:
> > "insurer storm"
> > The synatx of the query as parsed by Luke is :
> > word:"insurer storm"
> >
> > The code I am using is as follows:
> > ----------------------
> > _searcher = new IndexSearcher(INDEX_DIR);
> > _parser = new QueryParser("word", new
> > WhitespaceAnalyzer());
> > Query q = _parser.parse(query);
> > System.out.println(q.toString()); // this outputs ->
> > word:"insurer storm"
> > TopDocs vv= _searcher.search(q, 1);
> > Hits tmph = _searcher.search(q);
> > ---------------------------------
> >
> > both vv and tmph give no results (their size is 0)
> >
> >
> >
> > --- On Fri, 10/31/08, semelak ss
> > <se...@yahoo.com> wrote:
> >
> > > From: semelak ss <se...@yahoo.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org
> > > Date: Friday, October 31, 2008, 9:41 AM
> > > For indexing, I use the following:
> > > ===========
> > > writer = new IndexWriter(INDEX_DIR,new
> > > WhitespaceAnalyzer(),true
> > > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > > Document doc = new Document();
> > > String tmpword = this.getProperForm(word1, word2);
> > > doc.add(new Field("WORDS", tmpword,
> > > Field.Store.YES, Field.Index.TOKENIZED));
> > > doc.add(new Field("score",
> > Double.toString(score)
> > > , Field.Store.YES, Field.Index.NO));
> > > writer.addDocument(adoc);
> > > ============
> > >
> > > For searching,, I use the following (query =
> > > "homeowner work" gives no hits ,,
> > > "homeowner" gives results):
> > > ============
> > > _searcher = new IndexSearcher(INDEX_DIR);
> > > _parser = new QueryParser("WORDS", new
> > > WhitespaceAnalyzer());
> > > q = _parser.parse(query);
> > > Hits tmph = _searcher.search(q);
> > >
> > > ============
> > >
> > > A sample document (contained in the index) is the
> > > following:
> > >
> > > filed: value
> > > -----: -----
> > > WORDS:"homeowners work"
> > > score: 0.1515417
> > >
> > >
> > > Also, please note that I tried using Luke to browse
> > the
> > > index and the fields seem to be filled out with words
> > just
> > > as expected. Searching, however, with exact phrases
> > yield no
> > > answer. Searching with single words gives hits.
> > >
> > >
> > >
> > >
> > > --- On Fri, 10/31/08, Erick Erickson
> > > <er...@gmail.com> wrote:
> > >
> > > > From: Erick Erickson
> > <er...@gmail.com>
> > > > Subject: Re: Exact Phrase Query
> > > > To: java-user@lucene.apache.org,
> > semelak_14@yahoo.com
> > > > Date: Friday, October 31, 2008, 5:57 AM
> > > > You need to give us more information for
> > meaningful
> > > replies,
> > > > like
> > > > the analyzers you use when indexing and
> > searching, the
> > > > exact
> > > > query you use, perhaps the snippets of the code,
> > etc.
> > > >
> > > > That said, things to check:
> > > > Get a copy of Luke and examine your index. You
> > can
> > > even
> > > > run queries through that tool and see what gets
> > sent
> > > to the
> > > > database and what responses you get with those
> > > analyzers.
> > > >
> > > > Make sure you're analyzers at query and index
> > time
> > > are
> > > > doing
> > > > what you expect. Query.toString() is your friend.
> > If
> > > you
> > > > don't
> > > > take the time to understand analyzers, you'll
> > > spend
> > > > lots of time
> > > > spinning your wheels.
> > > >
> > > > And you really should wait more than 9 minutes
> > before
> > > > pinging
> > > > the list....
> > > >
> > > > Best
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Fri, Oct 31, 2008 at 8:44 AM, semelak ss
> > > > <se...@yahoo.com> wrote:
> > > >
> > > > > I have documents containing multiple words
> > in the
> > > the
> > > > field "word"
> > > > > for example, one of the documents contain in
> > the
> > > field
> > > > "word" the
> > > > > following:
> > > > > homeowners work
> > > > >
> > > > > When searching for single words (i.e.
> > homewoners
> > > ) I
> > > > get hits.
> > > > >
> > > > > However, searching for the exact phrase
> > > > "homeowners work" gives me no
> > > > > hits!! I use the double quotes when
> > searching for
> > > > exact phrases.
> > > > >
> > > > > Any idea why ??
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Exact Phrase Query

Posted by semelak ss <se...@yahoo.com>.
I am not entirely sure if this can be the cause, but here is something I thought might be related:
The idea is have an index containing documents where each document has a combination of two words : word1 and word2 and a score for these two words.
The index would be searched first if the two words exist, and if not the score would be computed on the fly and then added to the index. This process would be repeated thousands of times for thousands of words.

Hence, I have an indexwriter and a searcher
--------------------
RAMDirectory ramDir = new RAMDirectory(INDEX_DIR);
IndexWriter  ramWriter = new IndexWriter(ramDir, new WhitespaceAnalyzer(), true,IndexWriter.MaxFieldLength.UNLIMITED);
writer = new IndexWriter(INDEX_DIR,new WhitespaceAnalyzer(),true ,IndexWriter.MaxFieldLength.UNLIMITED);

FSDirectory fsdir = FSDirectory.getDirectory(INDEX_DIR);
IndexReader ir = IndexReader.open(fsdir);
_searcher = new IndexSearcher(ir);
--------------------

The indexWriter is closed near the end of the program (it's open while searching for words combinations ).

When using Luke,, I was able to search successfully for exact phrases. My guess is that the problem I am facing has something to do with the indexWriter, but I can not pinpoint the exact cause of the problem.


--- On Sat, 11/1/08, semelak ss <se...@yahoo.com> wrote:

> From: semelak ss <se...@yahoo.com>
> Subject: Re: Exact Phrase Query
> To: java-user@lucene.apache.org
> Date: Saturday, November 1, 2008, 10:03 AM
> When using Luke,, searching for the followings gives me hits
> now:
> "insurer storm"
> The synatx of the query as parsed by Luke is :
> word:"insurer storm"
> 
> The code I am using is as follows:
> ----------------------
> _searcher = new IndexSearcher(INDEX_DIR);
> _parser = new QueryParser("word", new
> WhitespaceAnalyzer());
> Query q = _parser.parse(query);
> System.out.println(q.toString()); // this outputs -> 
> word:"insurer storm"
> TopDocs vv= _searcher.search(q, 1);
> Hits tmph = _searcher.search(q);
> ---------------------------------
> 
> both vv and tmph give no results (their size is 0)
> 
> 
> 
> --- On Fri, 10/31/08, semelak ss
> <se...@yahoo.com> wrote:
> 
> > From: semelak ss <se...@yahoo.com>
> > Subject: Re: Exact Phrase Query
> > To: java-user@lucene.apache.org
> > Date: Friday, October 31, 2008, 9:41 AM
> > For indexing, I use the following:
> > ===========
> > writer = new IndexWriter(INDEX_DIR,new
> > WhitespaceAnalyzer(),true
> > ,IndexWriter.MaxFieldLength.UNLIMITED);
> > Document doc = new Document();            
> > String tmpword = this.getProperForm(word1, word2);
> > doc.add(new Field("WORDS", tmpword,
> > Field.Store.YES, Field.Index.TOKENIZED));
> > doc.add(new Field("score",
> Double.toString(score)
> > , Field.Store.YES, Field.Index.NO));
> > writer.addDocument(adoc);
> > ============
> > 
> > For searching,, I use the following (query =
> > "homeowner work" gives no hits ,,
> > "homeowner" gives results):
> > ============
> > _searcher = new IndexSearcher(INDEX_DIR);
> > _parser = new QueryParser("WORDS", new
> > WhitespaceAnalyzer());
> > q = _parser.parse(query);
> > Hits tmph = _searcher.search(q);
> > 
> > ============
> > 
> > A sample document (contained in the index) is the
> > following:
> > 
> > filed: value
> > -----: -----
> > WORDS:"homeowners work"
> > score: 0.1515417
> > 
> > 
> > Also, please note that I tried using Luke to browse
> the
> > index and the fields seem to be filled out with words
> just
> > as expected. Searching, however, with exact phrases
> yield no
> > answer. Searching with single words gives hits.
> > 
> > 
> > 
> > 
> > --- On Fri, 10/31/08, Erick Erickson
> > <er...@gmail.com> wrote:
> > 
> > > From: Erick Erickson
> <er...@gmail.com>
> > > Subject: Re: Exact Phrase Query
> > > To: java-user@lucene.apache.org,
> semelak_14@yahoo.com
> > > Date: Friday, October 31, 2008, 5:57 AM
> > > You need to give us more information for
> meaningful
> > replies,
> > > like
> > > the analyzers you use when indexing and
> searching, the
> > > exact
> > > query you use, perhaps the snippets of the code,
> etc.
> > > 
> > > That said, things to check:
> > > Get a copy of Luke and examine your index. You
> can
> > even
> > > run queries through that tool and see what gets
> sent
> > to the
> > > database and what responses you get with those
> > analyzers.
> > > 
> > > Make sure you're analyzers at query and index
> time
> > are
> > > doing
> > > what you expect. Query.toString() is your friend.
> If
> > you
> > > don't
> > > take the time to understand analyzers, you'll
> > spend
> > > lots of time
> > > spinning your wheels.
> > > 
> > > And you really should wait more than 9 minutes
> before
> > > pinging
> > > the list....
> > > 
> > > Best
> > > Erick
> > > 
> > > 
> > > 
> > > On Fri, Oct 31, 2008 at 8:44 AM, semelak ss
> > > <se...@yahoo.com> wrote:
> > > 
> > > > I have documents containing multiple words
> in the
> > the
> > > field "word"
> > > > for example, one of the documents contain in
> the
> > field
> > > "word" the
> > > > following:
> > > > homeowners work
> > > >
> > > > When searching for single words (i.e.
> homewoners
> > ) I
> > > get hits.
> > > >
> > > > However, searching for the exact phrase
> > > "homeowners work" gives me no
> > > > hits!! I use the double quotes when
> searching for
> > > exact phrases.
> > > >
> > > > Any idea why ??
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > >
> > > >
> > 
> > 
> >       
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> 
> 
>       
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org


      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org