You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by VIGNESH S <vi...@gmail.com> on 2013/10/03 07:53:42 UTC

Re: Multiphrase Query in Lucene 4.3

Hi Ian,

In Lucene Is there any Default Analyzer we can use which will ignore only
Spaces.
All other numbers,punctuation,dates everything it should preserve.

I created my analyzer  with tokenizer which returns Character.isDefined(cn)
&& (!Character.isWhitespace(cn)).
My analyzer will use a lowe case filter on top of the tokenizer.This Woks
Perfect in case of 3.6
In 4.3 it is creating problems in offsets of tokens.




On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ia...@gmail.com> wrote:

> Whenever someone says they are using a custom analyzer that has to be
> a suspect.  Does it work if you use one of the core lucene analyzers
> instead?  Have you used Luke to verify that the index holds what you
> think it does?
>
>
> --
> Ian.
>
>
> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vi...@gmail.com>
> wrote:
> > Hi,
> >
> > It is not the problem with case..Because Iam using LowercaseFilter.
> >
> > My Analyzer is a custom analyzer which will ignore just white spaces.All
> > other numbers date and other special characters it will consider.The Same
> > analyzer works for Lucene 3.6.
> >
> >
> > When i do a single term query for "Geoffrey" it is giving hits..But when
> > given as a part of multiphrase query ,it is not able to find..When the
> > below code is Executed with say word ="Geoffrey",it is not finding the
> word
> > itself ..
> >
> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
> >  {                            do {
> >                                   String s = trm.term().utf8ToString();
> >                                   if (s.equals(word)) {
> >                                     termsWithPrefix.add(new
> Term("content",
> > s));
> >                                   } else {
> >                                     break;
> >                                   }
> >                                 }
> >  while (trm.next() != null);
> >  }
> >
> >
> >
> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ia...@gmail.com> wrote:
> >
> >> Whenever someone says something along the lines of a search for
> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
> >> Can't recall what if anything you said about the analysis side of
> >> things but that could be the cause.  See
> >>
> >>
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
> >>
> >> If on the other hand the problem is more obscure, and only related to
> >> the multi phrase stuff, I suggest you build a tiny but complete
> >> RAMDirectory based program or test case that shows the problem and
> >> post it here.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >>
> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vi...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > Thanks for your Reply.The Problem I face is there is a word called
> >> Geoffrey
> >> > Romer in my Field.
> >> >
> >> > I am Forming a Multiphrase query object properly like " Geoffrey
> >> Romer".But
> >> > When i do a Search,it is not returning Hits.This Problem I am facing
> is
> >> not
> >> > for all phrases
> >> > This Problem happens for only few Phrases.
> >> >
> >> > When i do a single query like Geoffrey it is giving a Hit..But when i
> do
> >> it
> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
> this
> >> by
> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then when i
> >> > do String s = trm.term().utf8ToString().It is pointing to a diffrent
> word
> >> > instead of geoffrey.seekceil is working properly for many phrases
> though.
> >> >
> >> > What could be the problem..please kindly suggest.
> >> >
> >> >
> >> >
> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
> tallison@mitre.org
> >> >wrote:
> >> >
> >> >> 1) An alternate method to your original question would be to do
> >> something
> >> >> like this (I haven't compiled or tested this!):
> >> >>
> >> >> Query q = new PrefixQuery(new Term("field", "app"));
> >> >>
> >> >> q = q.rewrite(indexReader) ;
> >> >> Set<Term> terms = new HashSet<Term>();
> >> >> q.extractTerms(terms);
> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
> >> >> mpq.add(new Term("field", "microsoft");
> >> >> mpq.add(arr);
> >> >>
> >> >>
> >> >> 2) At a higher level, do you need to generate your query
> >> programmatically?
> >> >>  Here are three parsers that could handle this:
> >> >>   a) ComplexPhraseQueryParser
> >> >>   b) SurroundQueryParser: oal.queryparser.surround.parser.QueryParser
> >> >>   c) experimental: <self_promotion degree="shameless">
> >> >> http://issues.apache.org/jira/browse/LUCENE-5205</self_promotion>
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
> >> >> Sent: Friday, September 27, 2013 3:33 AM
> >> >> To: java-user@lucene.apache.org
> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
> >> >>
> >> >> Hi,
> >> >>
> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the Field.
> >> >>
> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s =
> >> >> trm.term().utf8ToString(); and hence
> >> >>
> >> >> It is giving a diffrent word..I think this is why my
> multiphrasequery is
> >> >> not giving desired results.
> >> >>
> >> >> What may be the reason..
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <vigneshklncit@gmail.com
> >
> >> >> wrote:
> >> >>
> >> >> > Hi Lan,
> >> >> >
> >> >> > Thanks for your Reply.
> >> >> >
> >> >> > I am doing similar to this only..In MultiPhraseQuery object actual
> >> phrase
> >> >> > is going proper but it is not returning any hits..
> >> >> >
> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
> >> >> >
> >> >> > In Lucene 4.3,I implemented the Index for that  using
> >> >> >
> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
> >> >> >
> >> >> >
> >> >>
> >>
>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> >> >> >
> >> >> > For MultiphraseQuery, whether I need to add any other parameter in
> >> >> > addition to this while indexing?
> >> >> >
> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
> checked
> >> in
> >> >> > Lucene branch and i was not able to find..Please kindly help.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ia...@gmail.com>
> wrote:
> >> >> >
> >> >> >> I use the code below to do something like this.  Not exactly what
> you
> >> >> >> want but should be easy to adapt.
> >> >> >>
> >> >> >>
> >> >> >> public List<String> findTerms(IndexReader _reader,
> >> >> >>                               String _field) throws IOException {
> >> >> >>   List<String> l = new ArrayList<String>();
> >> >> >>   Fields ff = MultiFields.getFields(_reader);
> >> >> >>   Terms trms = ff.terms(_field);
> >> >> >>   TermsEnum te = trms.iterator(null);
> >> >> >>   BytesRef br;
> >> >> >>   while ((br = te.next()) != null) {
> >> >> >>     l.add(br.utf8ToString());
> >> >> >>   }
> >> >> >>   return l;
> >> >> >> }
> >> >> >>
> >> >> >> --
> >> >> >> Ian.
> >> >> >>
> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
> vigneshklncit@gmail.com>
> >> >> >> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > In the Example of Multiphrase Query it is mentioned
> >> >> >> >
> >> >> >> > "To use this class, to search for the phrase "Microsoft app*"
> first
> >> >> use
> >> >> >> > add(Term) on the term "Microsoft", then find all terms that have
> >> "app"
> >> >> >> as
> >> >> >> > prefix using IndexReader.terms(Term), and use
> >> >> >> MultiPhraseQuery.add(Term[]
> >> >> >> > terms) to add them to the query"
> >> >> >> >
> >> >> >> >
> >> >> >> > How can i replicate the Same in Lucene 4.3 since
> >> >> >> IndexReader.terms(Term) is
> >> >> >> > no more used
> >> >> >> >
> >> >> >> > --
> >> >> >> > Thanks and Regards
> >> >> >> > Vignesh Srinivasan
> >> >> >>
> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and Regards
> >> >> > Vignesh Srinivasan
> >> >> > 9739135640
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thanks and Regards
> >> >> Vignesh Srinivasan
> >> >> 9739135640
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Thanks and Regards
> >> > Vignesh Srinivasan
> >> > 9739135640
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Thanks and Regards
> > Vignesh Srinivasan
> > 9739135640
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: Multiphrase Query in Lucene 4.3

Posted by Ian Lea <ia...@gmail.com>.

Then I suggest you start a new thread, posting all relevant details
and preferably a cut down but complete program, with all relevant
code, and no irrelevant code, with simple examples, input and output,
of what does and doesn't work,


--
Ian.


On Thu, Oct 3, 2013 at 12:28 PM, VIGNESH S <vi...@gmail.com> wrote:
> Ian,
> Thanks for your reply..
> I am facing the same problem if i use whiteSpaceTokenizer also.
> My analyzer works perfect in case of Lucene 3.6.
>
> Thanks and Regards
> Vignesh Srinivasan
>
> On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Certainly sounds like a bug in your analyzer.  You could start a new
>> thread if you need help with that.  But from your previous email it
>> sounds like you could use WhitespaceTokenizer chained with
>> LowerCaseFilter.
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vi...@gmail.com> wrote:
>> > Hi,
>> >
>> > In my Analyzer,problem actually occurs for words which are preceded by
>> > punctuation marks..
>> >
>> > For Example:
>> > If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."
>> >
>> > If I search "Andrew Gubarev" ,It is not working properly since word
>> Andrew
>> > is preceded by punctuation ",".
>> >
>> >
>> > On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vi...@gmail.com>
>> wrote:
>> >
>> >> Hi Ian,
>> >>
>> >> In Lucene Is there any Default Analyzer we can use which will ignore
>> only
>> >> Spaces.
>> >> All other numbers,punctuation,dates everything it should preserve.
>> >>
>> >> I created my analyzer  with tokenizer which returns
>> >> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
>> >> My analyzer will use a lowe case filter on top of the tokenizer.This
>> Woks
>> >> Perfect in case of 3.6
>> >> In 4.3 it is creating problems in offsets of tokens.
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ia...@gmail.com> wrote:
>> >>
>> >>> Whenever someone says they are using a custom analyzer that has to be
>> >>> a suspect.  Does it work if you use one of the core lucene analyzers
>> >>> instead?  Have you used Luke to verify that the index holds what you
>> >>> think it does?
>> >>>
>> >>>
>> >>> --
>> >>> Ian.
>> >>>
>> >>>
>> >>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vi...@gmail.com>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > It is not the problem with case..Because Iam using LowercaseFilter.
>> >>> >
>> >>> > My Analyzer is a custom analyzer which will ignore just white
>> spaces.All
>> >>> > other numbers date and other special characters it will consider.The
>> >>> Same
>> >>> > analyzer works for Lucene 3.6.
>> >>> >
>> >>> >
>> >>> > When i do a single term query for "Geoffrey" it is giving hits..But
>> when
>> >>> > given as a part of multiphrase query ,it is not able to find..When
>> the
>> >>> > below code is Executed with say word ="Geoffrey",it is not finding
>> the
>> >>> word
>> >>> > itself ..
>> >>> >
>> >>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
>> >>> >  {                            do {
>> >>> >                                   String s =
>> trm.term().utf8ToString();
>> >>> >                                   if (s.equals(word)) {
>> >>> >                                     termsWithPrefix.add(new
>> >>> Term("content",
>> >>> > s));
>> >>> >                                   } else {
>> >>> >                                     break;
>> >>> >                                   }
>> >>> >                                 }
>> >>> >  while (trm.next() != null);
>> >>> >  }
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ia...@gmail.com> wrote:
>> >>> >
>> >>> >> Whenever someone says something along the lines of a search for
>> >>> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
>> >>> >> Can't recall what if anything you said about the analysis side of
>> >>> >> things but that could be the cause.  See
>> >>> >>
>> >>> >>
>> >>>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>> >>> >>
>> >>> >> If on the other hand the problem is more obscure, and only related
>> to
>> >>> >> the multi phrase stuff, I suggest you build a tiny but complete
>> >>> >> RAMDirectory based program or test case that shows the problem and
>> >>> >> post it here.
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Ian.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshklncit@gmail.com
>> >
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > Thanks for your Reply.The Problem I face is there is a word called
>> >>> >> Geoffrey
>> >>> >> > Romer in my Field.
>> >>> >> >
>> >>> >> > I am Forming a Multiphrase query object properly like " Geoffrey
>> >>> >> Romer".But
>> >>> >> > When i do a Search,it is not returning Hits.This Problem I am
>> facing
>> >>> is
>> >>> >> not
>> >>> >> > for all phrases
>> >>> >> > This Problem happens for only few Phrases.
>> >>> >> >
>> >>> >> > When i do a single query like Geoffrey it is giving a Hit..But
>> when
>> >>> i do
>> >>> >> it
>> >>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
>> >>> this
>> >>> >> by
>> >>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then
>> when
>> >>> i
>> >>> >> > do String s = trm.term().utf8ToString().It is pointing to a
>> diffrent
>> >>> word
>> >>> >> > instead of geoffrey.seekceil is working properly for many phrases
>> >>> though.
>> >>> >> >
>> >>> >> > What could be the problem..please kindly suggest.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
>> >>> tallison@mitre.org
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> 1) An alternate method to your original question would be to do
>> >>> >> something
>> >>> >> >> like this (I haven't compiled or tested this!):
>> >>> >> >>
>> >>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
>> >>> >> >>
>> >>> >> >> q = q.rewrite(indexReader) ;
>> >>> >> >> Set<Term> terms = new HashSet<Term>();
>> >>> >> >> q.extractTerms(terms);
>> >>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
>> >>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
>> >>> >> >> mpq.add(new Term("field", "microsoft");
>> >>> >> >> mpq.add(arr);
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> 2) At a higher level, do you need to generate your query
>> >>> >> programmatically?
>> >>> >> >>  Here are three parsers that could handle this:
>> >>> >> >>   a) ComplexPhraseQueryParser
>> >>> >> >>   b) SurroundQueryParser:
>> >>> oal.queryparser.surround.parser.QueryParser
>> >>> >> >>   c) experimental: <self_promotion degree="shameless">
>> >>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205
>> </self_promotion>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> -----Original Message-----
>> >>> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
>> >>> >> >> Sent: Friday, September 27, 2013 3:33 AM
>> >>> >> >> To: java-user@lucene.apache.org
>> >>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
>> >>> >> >>
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the
>> Field.
>> >>> >> >>
>> >>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do
>> String s
>> >>> =
>> >>> >> >> trm.term().utf8ToString(); and hence
>> >>> >> >>
>> >>> >> >> It is giving a diffrent word..I think this is why my
>> >>> multiphrasequery is
>> >>> >> >> not giving desired results.
>> >>> >> >>
>> >>> >> >> What may be the reason..
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
>> >>> vigneshklncit@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Hi Lan,
>> >>> >> >> >
>> >>> >> >> > Thanks for your Reply.
>> >>> >> >> >
>> >>> >> >> > I am doing similar to this only..In MultiPhraseQuery object
>> actual
>> >>> >> phrase
>> >>> >> >> > is going proper but it is not returning any hits..
>> >>> >> >> >
>> >>> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
>> >>> >> >> >
>> >>> >> >> > In Lucene 4.3,I implemented the Index for that  using
>> >>> >> >> >
>> >>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>>
>>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>> >>> >> >> >
>> >>> >> >> > For MultiphraseQuery, whether I need to add any other
>> parameter in
>> >>> >> >> > addition to this while indexing?
>> >>> >> >> >
>> >>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
>> >>> checked
>> >>> >> in
>> >>> >> >> > Lucene branch and i was not able to find..Please kindly help.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ia...@gmail.com>
>> >>> wrote:
>> >>> >> >> >
>> >>> >> >> >> I use the code below to do something like this.  Not exactly
>> >>> what you
>> >>> >> >> >> want but should be easy to adapt.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> public List<String> findTerms(IndexReader _reader,
>> >>> >> >> >>                               String _field) throws
>> IOException {
>> >>> >> >> >>   List<String> l = new ArrayList<String>();
>> >>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
>> >>> >> >> >>   Terms trms = ff.terms(_field);
>> >>> >> >> >>   TermsEnum te = trms.iterator(null);
>> >>> >> >> >>   BytesRef br;
>> >>> >> >> >>   while ((br = te.next()) != null) {
>> >>> >> >> >>     l.add(br.utf8ToString());
>> >>> >> >> >>   }
>> >>> >> >> >>   return l;
>> >>> >> >> >> }
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Ian.
>> >>> >> >> >>
>> >>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
>> >>> vigneshklncit@gmail.com>
>> >>> >> >> >> wrote:
>> >>> >> >> >> > Hi,
>> >>> >> >> >> >
>> >>> >> >> >> > In the Example of Multiphrase Query it is mentioned
>> >>> >> >> >> >
>> >>> >> >> >> > "To use this class, to search for the phrase "Microsoft
>> app*"
>> >>> first
>> >>> >> >> use
>> >>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that
>> >>> have
>> >>> >> "app"
>> >>> >> >> >> as
>> >>> >> >> >> > prefix using IndexReader.terms(Term), and use
>> >>> >> >> >> MultiPhraseQuery.add(Term[]
>> >>> >> >> >> > terms) to add them to the query"
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > How can i replicate the Same in Lucene 4.3 since
>> >>> >> >> >> IndexReader.terms(Term) is
>> >>> >> >> >> > no more used
>> >>> >> >> >> >
>> >>> >> >> >> > --
>> >>> >> >> >> > Thanks and Regards
>> >>> >> >> >> > Vignesh Srinivasan
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> ---------------------------------------------------------------------
>> >>> >> >> >> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>> >>> >> >> >> For additional commands, e-mail:
>> >>> java-user-help@lucene.apache.org
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Thanks and Regards
>> >>> >> >> > Vignesh Srinivasan
>> >>> >> >> > 9739135640
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Thanks and Regards
>> >>> >> >> Vignesh Srinivasan
>> >>> >> >> 9739135640
>> >>> >> >>
>> >>> >> >>
>> >>> ---------------------------------------------------------------------
>> >>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> >> >> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> >>> >> >>
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> > Thanks and Regards
>> >>> >> > Vignesh Srinivasan
>> >>> >> > 9739135640
>> >>> >>
>> >>> >>
>> ---------------------------------------------------------------------
>> >>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Thanks and Regards
>> >>> > Vignesh Srinivasan
>> >>> > 9739135640
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Thanks and Regards
>> >> Vignesh Srinivasan
>> >> 9739135640
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>> > 9739135640
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Multiphrase Query in Lucene 4.3

Posted by VIGNESH S <vi...@gmail.com>.

Ian,
Thanks for your reply..
I am facing the same problem if i use whiteSpaceTokenizer also.
My analyzer works perfect in case of Lucene 3.6.

Thanks and Regards
Vignesh Srinivasan

On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea <ia...@gmail.com> wrote:

> Certainly sounds like a bug in your analyzer.  You could start a new
> thread if you need help with that.  But from your previous email it
> sounds like you could use WhitespaceTokenizer chained with
> LowerCaseFilter.
>
>
> --
> Ian.
>
>
> On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vi...@gmail.com> wrote:
> > Hi,
> >
> > In my Analyzer,problem actually occurs for words which are preceded by
> > punctuation marks..
> >
> > For Example:
> > If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."
> >
> > If I search "Andrew Gubarev" ,It is not working properly since word
> Andrew
> > is preceded by punctuation ",".
> >
> >
> > On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vi...@gmail.com>
> wrote:
> >
> >> Hi Ian,
> >>
> >> In Lucene Is there any Default Analyzer we can use which will ignore
> only
> >> Spaces.
> >> All other numbers,punctuation,dates everything it should preserve.
> >>
> >> I created my analyzer  with tokenizer which returns
> >> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
> >> My analyzer will use a lowe case filter on top of the tokenizer.This
> Woks
> >> Perfect in case of 3.6
> >> In 4.3 it is creating problems in offsets of tokens.
> >>
> >>
> >>
> >>
> >> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ia...@gmail.com> wrote:
> >>
> >>> Whenever someone says they are using a custom analyzer that has to be
> >>> a suspect.  Does it work if you use one of the core lucene analyzers
> >>> instead?  Have you used Luke to verify that the index holds what you
> >>> think it does?
> >>>
> >>>
> >>> --
> >>> Ian.
> >>>
> >>>
> >>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vi...@gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > It is not the problem with case..Because Iam using LowercaseFilter.
> >>> >
> >>> > My Analyzer is a custom analyzer which will ignore just white
> spaces.All
> >>> > other numbers date and other special characters it will consider.The
> >>> Same
> >>> > analyzer works for Lucene 3.6.
> >>> >
> >>> >
> >>> > When i do a single term query for "Geoffrey" it is giving hits..But
> when
> >>> > given as a part of multiphrase query ,it is not able to find..When
> the
> >>> > below code is Executed with say word ="Geoffrey",it is not finding
> the
> >>> word
> >>> > itself ..
> >>> >
> >>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
> >>> >  {                            do {
> >>> >                                   String s =
> trm.term().utf8ToString();
> >>> >                                   if (s.equals(word)) {
> >>> >                                     termsWithPrefix.add(new
> >>> Term("content",
> >>> > s));
> >>> >                                   } else {
> >>> >                                     break;
> >>> >                                   }
> >>> >                                 }
> >>> >  while (trm.next() != null);
> >>> >  }
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ia...@gmail.com> wrote:
> >>> >
> >>> >> Whenever someone says something along the lines of a search for
> >>> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
> >>> >> Can't recall what if anything you said about the analysis side of
> >>> >> things but that could be the cause.  See
> >>> >>
> >>> >>
> >>>
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
> >>> >>
> >>> >> If on the other hand the problem is more obscure, and only related
> to
> >>> >> the multi phrase stuff, I suggest you build a tiny but complete
> >>> >> RAMDirectory based program or test case that shows the problem and
> >>> >> post it here.
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Ian.
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshklncit@gmail.com
> >
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> >
> >>> >> > Thanks for your Reply.The Problem I face is there is a word called
> >>> >> Geoffrey
> >>> >> > Romer in my Field.
> >>> >> >
> >>> >> > I am Forming a Multiphrase query object properly like " Geoffrey
> >>> >> Romer".But
> >>> >> > When i do a Search,it is not returning Hits.This Problem I am
> facing
> >>> is
> >>> >> not
> >>> >> > for all phrases
> >>> >> > This Problem happens for only few Phrases.
> >>> >> >
> >>> >> > When i do a single query like Geoffrey it is giving a Hit..But
> when
> >>> i do
> >>> >> it
> >>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
> >>> this
> >>> >> by
> >>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then
> when
> >>> i
> >>> >> > do String s = trm.term().utf8ToString().It is pointing to a
> diffrent
> >>> word
> >>> >> > instead of geoffrey.seekceil is working properly for many phrases
> >>> though.
> >>> >> >
> >>> >> > What could be the problem..please kindly suggest.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
> >>> tallison@mitre.org
> >>> >> >wrote:
> >>> >> >
> >>> >> >> 1) An alternate method to your original question would be to do
> >>> >> something
> >>> >> >> like this (I haven't compiled or tested this!):
> >>> >> >>
> >>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
> >>> >> >>
> >>> >> >> q = q.rewrite(indexReader) ;
> >>> >> >> Set<Term> terms = new HashSet<Term>();
> >>> >> >> q.extractTerms(terms);
> >>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
> >>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
> >>> >> >> mpq.add(new Term("field", "microsoft");
> >>> >> >> mpq.add(arr);
> >>> >> >>
> >>> >> >>
> >>> >> >> 2) At a higher level, do you need to generate your query
> >>> >> programmatically?
> >>> >> >>  Here are three parsers that could handle this:
> >>> >> >>   a) ComplexPhraseQueryParser
> >>> >> >>   b) SurroundQueryParser:
> >>> oal.queryparser.surround.parser.QueryParser
> >>> >> >>   c) experimental: <self_promotion degree="shameless">
> >>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205
> </self_promotion>
> >>> >> >>
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
> >>> >> >> Sent: Friday, September 27, 2013 3:33 AM
> >>> >> >> To: java-user@lucene.apache.org
> >>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
> >>> >> >>
> >>> >> >> Hi,
> >>> >> >>
> >>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the
> Field.
> >>> >> >>
> >>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do
> String s
> >>> =
> >>> >> >> trm.term().utf8ToString(); and hence
> >>> >> >>
> >>> >> >> It is giving a diffrent word..I think this is why my
> >>> multiphrasequery is
> >>> >> >> not giving desired results.
> >>> >> >>
> >>> >> >> What may be the reason..
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
> >>> vigneshklncit@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >> > Hi Lan,
> >>> >> >> >
> >>> >> >> > Thanks for your Reply.
> >>> >> >> >
> >>> >> >> > I am doing similar to this only..In MultiPhraseQuery object
> actual
> >>> >> phrase
> >>> >> >> > is going proper but it is not returning any hits..
> >>> >> >> >
> >>> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
> >>> >> >> >
> >>> >> >> > In Lucene 4.3,I implemented the Index for that  using
> >>> >> >> >
> >>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
> >>> >> >> >
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> >>> >> >> >
> >>> >> >> > For MultiphraseQuery, whether I need to add any other
> parameter in
> >>> >> >> > addition to this while indexing?
> >>> >> >> >
> >>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
> >>> checked
> >>> >> in
> >>> >> >> > Lucene branch and i was not able to find..Please kindly help.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ia...@gmail.com>
> >>> wrote:
> >>> >> >> >
> >>> >> >> >> I use the code below to do something like this.  Not exactly
> >>> what you
> >>> >> >> >> want but should be easy to adapt.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> public List<String> findTerms(IndexReader _reader,
> >>> >> >> >>                               String _field) throws
> IOException {
> >>> >> >> >>   List<String> l = new ArrayList<String>();
> >>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
> >>> >> >> >>   Terms trms = ff.terms(_field);
> >>> >> >> >>   TermsEnum te = trms.iterator(null);
> >>> >> >> >>   BytesRef br;
> >>> >> >> >>   while ((br = te.next()) != null) {
> >>> >> >> >>     l.add(br.utf8ToString());
> >>> >> >> >>   }
> >>> >> >> >>   return l;
> >>> >> >> >> }
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Ian.
> >>> >> >> >>
> >>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
> >>> vigneshklncit@gmail.com>
> >>> >> >> >> wrote:
> >>> >> >> >> > Hi,
> >>> >> >> >> >
> >>> >> >> >> > In the Example of Multiphrase Query it is mentioned
> >>> >> >> >> >
> >>> >> >> >> > "To use this class, to search for the phrase "Microsoft
> app*"
> >>> first
> >>> >> >> use
> >>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that
> >>> have
> >>> >> "app"
> >>> >> >> >> as
> >>> >> >> >> > prefix using IndexReader.terms(Term), and use
> >>> >> >> >> MultiPhraseQuery.add(Term[]
> >>> >> >> >> > terms) to add them to the query"
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > How can i replicate the Same in Lucene 4.3 since
> >>> >> >> >> IndexReader.terms(Term) is
> >>> >> >> >> > no more used
> >>> >> >> >> >
> >>> >> >> >> > --
> >>> >> >> >> > Thanks and Regards
> >>> >> >> >> > Vignesh Srinivasan
> >>> >> >> >>
> >>> >> >> >>
> >>> ---------------------------------------------------------------------
> >>> >> >> >> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> >>> >> >> >> For additional commands, e-mail:
> >>> java-user-help@lucene.apache.org
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Thanks and Regards
> >>> >> >> > Vignesh Srinivasan
> >>> >> >> > 9739135640
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Thanks and Regards
> >>> >> >> Vignesh Srinivasan
> >>> >> >> 9739135640
> >>> >> >>
> >>> >> >>
> >>> ---------------------------------------------------------------------
> >>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> >> >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Thanks and Regards
> >>> >> > Vignesh Srinivasan
> >>> >> > 9739135640
> >>> >>
> >>> >>
> ---------------------------------------------------------------------
> >>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> > --
> >>> > Thanks and Regards
> >>> > Vignesh Srinivasan
> >>> > 9739135640
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >>
> >> --
> >> Thanks and Regards
> >> Vignesh Srinivasan
> >> 9739135640
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Vignesh Srinivasan
> > 9739135640
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: Multiphrase Query in Lucene 4.3

Posted by Ian Lea <ia...@gmail.com>.

Certainly sounds like a bug in your analyzer.  You could start a new
thread if you need help with that.  But from your previous email it
sounds like you could use WhitespaceTokenizer chained with
LowerCaseFilter.


--
Ian.


On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vi...@gmail.com> wrote:
> Hi,
>
> In my Analyzer,problem actually occurs for words which are preceded by
> punctuation marks..
>
> For Example:
> If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."
>
> If I search "Andrew Gubarev" ,It is not working properly since word Andrew
> is preceded by punctuation ",".
>
>
> On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vi...@gmail.com> wrote:
>
>> Hi Ian,
>>
>> In Lucene Is there any Default Analyzer we can use which will ignore only
>> Spaces.
>> All other numbers,punctuation,dates everything it should preserve.
>>
>> I created my analyzer  with tokenizer which returns
>> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
>> My analyzer will use a lowe case filter on top of the tokenizer.This Woks
>> Perfect in case of 3.6
>> In 4.3 it is creating problems in offsets of tokens.
>>
>>
>>
>>
>> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ia...@gmail.com> wrote:
>>
>>> Whenever someone says they are using a custom analyzer that has to be
>>> a suspect.  Does it work if you use one of the core lucene analyzers
>>> instead?  Have you used Luke to verify that the index holds what you
>>> think it does?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vi...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > It is not the problem with case..Because Iam using LowercaseFilter.
>>> >
>>> > My Analyzer is a custom analyzer which will ignore just white spaces.All
>>> > other numbers date and other special characters it will consider.The
>>> Same
>>> > analyzer works for Lucene 3.6.
>>> >
>>> >
>>> > When i do a single term query for "Geoffrey" it is giving hits..But when
>>> > given as a part of multiphrase query ,it is not able to find..When the
>>> > below code is Executed with say word ="Geoffrey",it is not finding the
>>> word
>>> > itself ..
>>> >
>>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
>>> >  {                            do {
>>> >                                   String s = trm.term().utf8ToString();
>>> >                                   if (s.equals(word)) {
>>> >                                     termsWithPrefix.add(new
>>> Term("content",
>>> > s));
>>> >                                   } else {
>>> >                                     break;
>>> >                                   }
>>> >                                 }
>>> >  while (trm.next() != null);
>>> >  }
>>> >
>>> >
>>> >
>>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ia...@gmail.com> wrote:
>>> >
>>> >> Whenever someone says something along the lines of a search for
>>> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
>>> >> Can't recall what if anything you said about the analysis side of
>>> >> things but that could be the cause.  See
>>> >>
>>> >>
>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>> >>
>>> >> If on the other hand the problem is more obscure, and only related to
>>> >> the multi phrase stuff, I suggest you build a tiny but complete
>>> >> RAMDirectory based program or test case that shows the problem and
>>> >> post it here.
>>> >>
>>> >>
>>> >> --
>>> >> Ian.
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vi...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > Thanks for your Reply.The Problem I face is there is a word called
>>> >> Geoffrey
>>> >> > Romer in my Field.
>>> >> >
>>> >> > I am Forming a Multiphrase query object properly like " Geoffrey
>>> >> Romer".But
>>> >> > When i do a Search,it is not returning Hits.This Problem I am facing
>>> is
>>> >> not
>>> >> > for all phrases
>>> >> > This Problem happens for only few Phrases.
>>> >> >
>>> >> > When i do a single query like Geoffrey it is giving a Hit..But when
>>> i do
>>> >> it
>>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
>>> this
>>> >> by
>>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then when
>>> i
>>> >> > do String s = trm.term().utf8ToString().It is pointing to a diffrent
>>> word
>>> >> > instead of geoffrey.seekceil is working properly for many phrases
>>> though.
>>> >> >
>>> >> > What could be the problem..please kindly suggest.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
>>> tallison@mitre.org
>>> >> >wrote:
>>> >> >
>>> >> >> 1) An alternate method to your original question would be to do
>>> >> something
>>> >> >> like this (I haven't compiled or tested this!):
>>> >> >>
>>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
>>> >> >>
>>> >> >> q = q.rewrite(indexReader) ;
>>> >> >> Set<Term> terms = new HashSet<Term>();
>>> >> >> q.extractTerms(terms);
>>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
>>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
>>> >> >> mpq.add(new Term("field", "microsoft");
>>> >> >> mpq.add(arr);
>>> >> >>
>>> >> >>
>>> >> >> 2) At a higher level, do you need to generate your query
>>> >> programmatically?
>>> >> >>  Here are three parsers that could handle this:
>>> >> >>   a) ComplexPhraseQueryParser
>>> >> >>   b) SurroundQueryParser:
>>> oal.queryparser.surround.parser.QueryParser
>>> >> >>   c) experimental: <self_promotion degree="shameless">
>>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205</self_promotion>
>>> >> >>
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
>>> >> >> Sent: Friday, September 27, 2013 3:33 AM
>>> >> >> To: java-user@lucene.apache.org
>>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the Field.
>>> >> >>
>>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s
>>> =
>>> >> >> trm.term().utf8ToString(); and hence
>>> >> >>
>>> >> >> It is giving a diffrent word..I think this is why my
>>> multiphrasequery is
>>> >> >> not giving desired results.
>>> >> >>
>>> >> >> What may be the reason..
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
>>> vigneshklncit@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Hi Lan,
>>> >> >> >
>>> >> >> > Thanks for your Reply.
>>> >> >> >
>>> >> >> > I am doing similar to this only..In MultiPhraseQuery object actual
>>> >> phrase
>>> >> >> > is going proper but it is not returning any hits..
>>> >> >> >
>>> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
>>> >> >> >
>>> >> >> > In Lucene 4.3,I implemented the Index for that  using
>>> >> >> >
>>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >>
>>>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>>> >> >> >
>>> >> >> > For MultiphraseQuery, whether I need to add any other parameter in
>>> >> >> > addition to this while indexing?
>>> >> >> >
>>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
>>> checked
>>> >> in
>>> >> >> > Lucene branch and i was not able to find..Please kindly help.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ia...@gmail.com>
>>> wrote:
>>> >> >> >
>>> >> >> >> I use the code below to do something like this.  Not exactly
>>> what you
>>> >> >> >> want but should be easy to adapt.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> public List<String> findTerms(IndexReader _reader,
>>> >> >> >>                               String _field) throws IOException {
>>> >> >> >>   List<String> l = new ArrayList<String>();
>>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
>>> >> >> >>   Terms trms = ff.terms(_field);
>>> >> >> >>   TermsEnum te = trms.iterator(null);
>>> >> >> >>   BytesRef br;
>>> >> >> >>   while ((br = te.next()) != null) {
>>> >> >> >>     l.add(br.utf8ToString());
>>> >> >> >>   }
>>> >> >> >>   return l;
>>> >> >> >> }
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Ian.
>>> >> >> >>
>>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
>>> vigneshklncit@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> > Hi,
>>> >> >> >> >
>>> >> >> >> > In the Example of Multiphrase Query it is mentioned
>>> >> >> >> >
>>> >> >> >> > "To use this class, to search for the phrase "Microsoft app*"
>>> first
>>> >> >> use
>>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that
>>> have
>>> >> "app"
>>> >> >> >> as
>>> >> >> >> > prefix using IndexReader.terms(Term), and use
>>> >> >> >> MultiPhraseQuery.add(Term[]
>>> >> >> >> > terms) to add them to the query"
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > How can i replicate the Same in Lucene 4.3 since
>>> >> >> >> IndexReader.terms(Term) is
>>> >> >> >> > no more used
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Thanks and Regards
>>> >> >> >> > Vignesh Srinivasan
>>> >> >> >>
>>> >> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> >> >> For additional commands, e-mail:
>>> java-user-help@lucene.apache.org
>>> >> >> >>
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Thanks and Regards
>>> >> >> > Vignesh Srinivasan
>>> >> >> > 9739135640
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Thanks and Regards
>>> >> >> Vignesh Srinivasan
>>> >> >> 9739135640
>>> >> >>
>>> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Thanks and Regards
>>> >> > Vignesh Srinivasan
>>> >> > 9739135640
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Thanks and Regards
>>> > Vignesh Srinivasan
>>> > 9739135640
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Thanks and Regards
>> Vignesh Srinivasan
>> 9739135640
>>
>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Multiphrase Query in Lucene 4.3

Posted by VIGNESH S <vi...@gmail.com>.

Hi,

In my Analyzer,problem actually occurs for words which are preceded by
punctuation marks..

For Example:
If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."

If I search "Andrew Gubarev" ,It is not working properly since word Andrew
is preceded by punctuation ",".


On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vi...@gmail.com> wrote:

> Hi Ian,
>
> In Lucene Is there any Default Analyzer we can use which will ignore only
> Spaces.
> All other numbers,punctuation,dates everything it should preserve.
>
> I created my analyzer  with tokenizer which returns
> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
> My analyzer will use a lowe case filter on top of the tokenizer.This Woks
> Perfect in case of 3.6
> In 4.3 it is creating problems in offsets of tokens.
>
>
>
>
> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Whenever someone says they are using a custom analyzer that has to be
>> a suspect.  Does it work if you use one of the core lucene analyzers
>> instead?  Have you used Luke to verify that the index holds what you
>> think it does?
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vi...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > It is not the problem with case..Because Iam using LowercaseFilter.
>> >
>> > My Analyzer is a custom analyzer which will ignore just white spaces.All
>> > other numbers date and other special characters it will consider.The
>> Same
>> > analyzer works for Lucene 3.6.
>> >
>> >
>> > When i do a single term query for "Geoffrey" it is giving hits..But when
>> > given as a part of multiphrase query ,it is not able to find..When the
>> > below code is Executed with say word ="Geoffrey",it is not finding the
>> word
>> > itself ..
>> >
>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
>> >  {                            do {
>> >                                   String s = trm.term().utf8ToString();
>> >                                   if (s.equals(word)) {
>> >                                     termsWithPrefix.add(new
>> Term("content",
>> > s));
>> >                                   } else {
>> >                                     break;
>> >                                   }
>> >                                 }
>> >  while (trm.next() != null);
>> >  }
>> >
>> >
>> >
>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ia...@gmail.com> wrote:
>> >
>> >> Whenever someone says something along the lines of a search for
>> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
>> >> Can't recall what if anything you said about the analysis side of
>> >> things but that could be the cause.  See
>> >>
>> >>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>> >>
>> >> If on the other hand the problem is more obscure, and only related to
>> >> the multi phrase stuff, I suggest you build a tiny but complete
>> >> RAMDirectory based program or test case that shows the problem and
>> >> post it here.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >>
>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vi...@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > Thanks for your Reply.The Problem I face is there is a word called
>> >> Geoffrey
>> >> > Romer in my Field.
>> >> >
>> >> > I am Forming a Multiphrase query object properly like " Geoffrey
>> >> Romer".But
>> >> > When i do a Search,it is not returning Hits.This Problem I am facing
>> is
>> >> not
>> >> > for all phrases
>> >> > This Problem happens for only few Phrases.
>> >> >
>> >> > When i do a single query like Geoffrey it is giving a Hit..But when
>> i do
>> >> it
>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
>> this
>> >> by
>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then when
>> i
>> >> > do String s = trm.term().utf8ToString().It is pointing to a diffrent
>> word
>> >> > instead of geoffrey.seekceil is working properly for many phrases
>> though.
>> >> >
>> >> > What could be the problem..please kindly suggest.
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
>> tallison@mitre.org
>> >> >wrote:
>> >> >
>> >> >> 1) An alternate method to your original question would be to do
>> >> something
>> >> >> like this (I haven't compiled or tested this!):
>> >> >>
>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
>> >> >>
>> >> >> q = q.rewrite(indexReader) ;
>> >> >> Set<Term> terms = new HashSet<Term>();
>> >> >> q.extractTerms(terms);
>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
>> >> >> mpq.add(new Term("field", "microsoft");
>> >> >> mpq.add(arr);
>> >> >>
>> >> >>
>> >> >> 2) At a higher level, do you need to generate your query
>> >> programmatically?
>> >> >>  Here are three parsers that could handle this:
>> >> >>   a) ComplexPhraseQueryParser
>> >> >>   b) SurroundQueryParser:
>> oal.queryparser.surround.parser.QueryParser
>> >> >>   c) experimental: <self_promotion degree="shameless">
>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205</self_promotion>
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: VIGNESH S [mailto:vigneshklncit@gmail.com]
>> >> >> Sent: Friday, September 27, 2013 3:33 AM
>> >> >> To: java-user@lucene.apache.org
>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the Field.
>> >> >>
>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s
>> =
>> >> >> trm.term().utf8ToString(); and hence
>> >> >>
>> >> >> It is giving a diffrent word..I think this is why my
>> multiphrasequery is
>> >> >> not giving desired results.
>> >> >>
>> >> >> What may be the reason..
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
>> vigneshklncit@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >> > Hi Lan,
>> >> >> >
>> >> >> > Thanks for your Reply.
>> >> >> >
>> >> >> > I am doing similar to this only..In MultiPhraseQuery object actual
>> >> phrase
>> >> >> > is going proper but it is not returning any hits..
>> >> >> >
>> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
>> >> >> >
>> >> >> > In Lucene 4.3,I implemented the Index for that  using
>> >> >> >
>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>>  offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>> >> >> >
>> >> >> > For MultiphraseQuery, whether I need to add any other parameter in
>> >> >> > addition to this while indexing?
>> >> >> >
>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
>> checked
>> >> in
>> >> >> > Lucene branch and i was not able to find..Please kindly help.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ia...@gmail.com>
>> wrote:
>> >> >> >
>> >> >> >> I use the code below to do something like this.  Not exactly
>> what you
>> >> >> >> want but should be easy to adapt.
>> >> >> >>
>> >> >> >>
>> >> >> >> public List<String> findTerms(IndexReader _reader,
>> >> >> >>                               String _field) throws IOException {
>> >> >> >>   List<String> l = new ArrayList<String>();
>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
>> >> >> >>   Terms trms = ff.terms(_field);
>> >> >> >>   TermsEnum te = trms.iterator(null);
>> >> >> >>   BytesRef br;
>> >> >> >>   while ((br = te.next()) != null) {
>> >> >> >>     l.add(br.utf8ToString());
>> >> >> >>   }
>> >> >> >>   return l;
>> >> >> >> }
>> >> >> >>
>> >> >> >> --
>> >> >> >> Ian.
>> >> >> >>
>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
>> vigneshklncit@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > In the Example of Multiphrase Query it is mentioned
>> >> >> >> >
>> >> >> >> > "To use this class, to search for the phrase "Microsoft app*"
>> first
>> >> >> use
>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that
>> have
>> >> "app"
>> >> >> >> as
>> >> >> >> > prefix using IndexReader.terms(Term), and use
>> >> >> >> MultiPhraseQuery.add(Term[]
>> >> >> >> > terms) to add them to the query"
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > How can i replicate the Same in Lucene 4.3 since
>> >> >> >> IndexReader.terms(Term) is
>> >> >> >> > no more used
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Thanks and Regards
>> >> >> >> > Vignesh Srinivasan
>> >> >> >>
>> >> >> >>
>> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> >> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Thanks and Regards
>> >> >> > Vignesh Srinivasan
>> >> >> > 9739135640
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thanks and Regards
>> >> >> Vignesh Srinivasan
>> >> >> 9739135640
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Thanks and Regards
>> >> > Vignesh Srinivasan
>> >> > 9739135640
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>> > 9739135640
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640
>



-- 
Thanks and Regards
Vignesh Srinivasan
9739135640