You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by VIGNESH S <vi...@gmail.com> on 2013/10/30 15:59:38 UTC

trm.seekCeil() not giving proper value when used in MP Query for some words

Hi,

I have indexed the below text file "filename.txt" using the test code
G1.java..

When I search for "check for old" trm.seekceil() method gives "checking"
and "checks" and ignores "check" which is there in text document..

It is working for most cases except a few

Please kindly help me..

-- 
Thanks and Regards
Vignesh Srinivasan

Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by Michael McCandless <lu...@mikemccandless.com>.
You can instantiate StandardAnalyzer, passing an empty stopwords set.
Or make a custom analyzer that doesn't insert StopFilter ...

I'm not aware of any changes in how WhitespaceAnalyzer(Tokenizer)
tokenizes between 3.6.x and 4.x; both versions seem to use
Character.isWhitespace to detect which characters to tokenize on.  So
it's odd you're seeing a difference in behavior between the two
versions.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 31, 2013 at 7:57 AM, VIGNESH S <vi...@gmail.com> wrote:
> Hi Mike,
>
> I can not use other analyzers since they involve stop words..
>
> I need to just index every word..
>
> I have used WhitespaceAnalyer in Lucene 3.6 and it is indexing
> properly..But this problem iam facing in Lucene 4.3 only..
>
>
> Thanks and Regards
> Vignesh Srinivasan
>
>
> On Thu, Oct 31, 2013 at 4:12 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Pick a better analyzer.
>>
>> Maybe StandardAnalyzer?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Oct 31, 2013 at 2:22 AM, VIGNESH S <vi...@gmail.com>
>> wrote:
>> > Hi Mike,
>> >
>> > I am using white space analyzer with lower case filter. The test code is
>> > same as i send above.
>> >
>> > The contents i am indexing is
>> >
>> >         String contents = "•Check for vulnerable ports  •Check for old
>> and
>> > vulnerable versions of services on open ports  •Transfer a code which";
>> >
>> >    In that "Check" is not getting indexed properly since it has the
>> symbol
>> > "•"..How can i index it properly..
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Oct 31, 2013 at 9:58 AM, VIGNESH S <vi...@gmail.com>
>> wrote:
>> >
>> >> Hi Mike,
>> >> I got the problem.The term is not indexed properly..
>> >>
>> >>
>> >> On Thu, Oct 31, 2013 at 7:19 AM, VIGNESH S <vigneshklncit@gmail.com
>> >wrote:
>> >>
>> >>> Hi Mike,
>> >>>
>> >>> please find tha attached test case G1.java..
>> >>>
>> >>>
>> >>> On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
>> >>> lucene@mikemccandless.com> wrote:
>> >>>
>> >>>> I don't see any java sources here?
>> >>>>
>> >>>> Make sure "check" is in fact being indexed; can you boil it down to a
>> >>>> small test case?
>> >>>>
>> >>>> Mike McCandless
>> >>>>
>> >>>> http://blog.mikemccandless.com
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
>> >>>> wrote:
>> >>>> > Hi,
>> >>>> >
>> >>>> > I have indexed the below text file "filename.txt" using the test
>> code
>> >>>> > G1.java..
>> >>>> >
>> >>>> > When I search for "check for old" trm.seekceil() method gives
>> >>>> "checking" and
>> >>>> > "checks" and ignores "check" which is there in text document..
>> >>>> >
>> >>>> > It is working for most cases except a few
>> >>>> >
>> >>>> > Please kindly help me..
>> >>>> >
>> >>>> > --
>> >>>> > Thanks and Regards
>> >>>> > Vignesh Srinivasan
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> ---------------------------------------------------------------------
>> >>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Thanks and Regards
>> >>> Vignesh Srinivasan
>> >>> 9739135640
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks and Regards
>> >> Vignesh Srinivasan
>> >> 9739135640
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>> > 9739135640
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by VIGNESH S <vi...@gmail.com>.
Hi Mike,

I can not use other analyzers since they involve stop words..

I need to just index every word..

I have used WhitespaceAnalyer in Lucene 3.6 and it is indexing
properly..But this problem iam facing in Lucene 4.3 only..


Thanks and Regards
Vignesh Srinivasan


On Thu, Oct 31, 2013 at 4:12 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Pick a better analyzer.
>
> Maybe StandardAnalyzer?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Oct 31, 2013 at 2:22 AM, VIGNESH S <vi...@gmail.com>
> wrote:
> > Hi Mike,
> >
> > I am using white space analyzer with lower case filter. The test code is
> > same as i send above.
> >
> > The contents i am indexing is
> >
> >         String contents = "•Check for vulnerable ports  •Check for old
> and
> > vulnerable versions of services on open ports  •Transfer a code which";
> >
> >    In that "Check" is not getting indexed properly since it has the
> symbol
> > "•"..How can i index it properly..
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 31, 2013 at 9:58 AM, VIGNESH S <vi...@gmail.com>
> wrote:
> >
> >> Hi Mike,
> >> I got the problem.The term is not indexed properly..
> >>
> >>
> >> On Thu, Oct 31, 2013 at 7:19 AM, VIGNESH S <vigneshklncit@gmail.com
> >wrote:
> >>
> >>> Hi Mike,
> >>>
> >>> please find tha attached test case G1.java..
> >>>
> >>>
> >>> On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
> >>> lucene@mikemccandless.com> wrote:
> >>>
> >>>> I don't see any java sources here?
> >>>>
> >>>> Make sure "check" is in fact being indexed; can you boil it down to a
> >>>> small test case?
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com
> >>>>
> >>>>
> >>>> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
> >>>> wrote:
> >>>> > Hi,
> >>>> >
> >>>> > I have indexed the below text file "filename.txt" using the test
> code
> >>>> > G1.java..
> >>>> >
> >>>> > When I search for "check for old" trm.seekceil() method gives
> >>>> "checking" and
> >>>> > "checks" and ignores "check" which is there in text document..
> >>>> >
> >>>> > It is working for most cases except a few
> >>>> >
> >>>> > Please kindly help me..
> >>>> >
> >>>> > --
> >>>> > Thanks and Regards
> >>>> > Vignesh Srinivasan
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> ---------------------------------------------------------------------
> >>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Thanks and Regards
> >>> Vignesh Srinivasan
> >>> 9739135640
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks and Regards
> >> Vignesh Srinivasan
> >> 9739135640
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Vignesh Srinivasan
> > 9739135640
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by Michael McCandless <lu...@mikemccandless.com>.
Pick a better analyzer.

Maybe StandardAnalyzer?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 31, 2013 at 2:22 AM, VIGNESH S <vi...@gmail.com> wrote:
> Hi Mike,
>
> I am using white space analyzer with lower case filter. The test code is
> same as i send above.
>
> The contents i am indexing is
>
>         String contents = "•Check for vulnerable ports  •Check for old and
> vulnerable versions of services on open ports  •Transfer a code which";
>
>    In that "Check" is not getting indexed properly since it has the symbol
> "•"..How can i index it properly..
>
>
>
>
>
>
> On Thu, Oct 31, 2013 at 9:58 AM, VIGNESH S <vi...@gmail.com> wrote:
>
>> Hi Mike,
>> I got the problem.The term is not indexed properly..
>>
>>
>> On Thu, Oct 31, 2013 at 7:19 AM, VIGNESH S <vi...@gmail.com>wrote:
>>
>>> Hi Mike,
>>>
>>> please find tha attached test case G1.java..
>>>
>>>
>>> On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> I don't see any java sources here?
>>>>
>>>> Make sure "check" is in fact being indexed; can you boil it down to a
>>>> small test case?
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I have indexed the below text file "filename.txt" using the test code
>>>> > G1.java..
>>>> >
>>>> > When I search for "check for old" trm.seekceil() method gives
>>>> "checking" and
>>>> > "checks" and ignores "check" which is there in text document..
>>>> >
>>>> > It is working for most cases except a few
>>>> >
>>>> > Please kindly help me..
>>>> >
>>>> > --
>>>> > Thanks and Regards
>>>> > Vignesh Srinivasan
>>>> >
>>>> >
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards
>>> Vignesh Srinivasan
>>> 9739135640
>>>
>>
>>
>>
>> --
>> Thanks and Regards
>> Vignesh Srinivasan
>> 9739135640
>>
>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by VIGNESH S <vi...@gmail.com>.
Hi Mike,

I am using white space analyzer with lower case filter. The test code is
same as i send above.

The contents i am indexing is

        String contents = "•Check for vulnerable ports  •Check for old and
vulnerable versions of services on open ports  •Transfer a code which";

   In that "Check" is not getting indexed properly since it has the symbol
"•"..How can i index it properly..






On Thu, Oct 31, 2013 at 9:58 AM, VIGNESH S <vi...@gmail.com> wrote:

> Hi Mike,
> I got the problem.The term is not indexed properly..
>
>
> On Thu, Oct 31, 2013 at 7:19 AM, VIGNESH S <vi...@gmail.com>wrote:
>
>> Hi Mike,
>>
>> please find tha attached test case G1.java..
>>
>>
>> On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> I don't see any java sources here?
>>>
>>> Make sure "check" is in fact being indexed; can you boil it down to a
>>> small test case?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I have indexed the below text file "filename.txt" using the test code
>>> > G1.java..
>>> >
>>> > When I search for "check for old" trm.seekceil() method gives
>>> "checking" and
>>> > "checks" and ignores "check" which is there in text document..
>>> >
>>> > It is working for most cases except a few
>>> >
>>> > Please kindly help me..
>>> >
>>> > --
>>> > Thanks and Regards
>>> > Vignesh Srinivasan
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Thanks and Regards
>> Vignesh Srinivasan
>> 9739135640
>>
>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640
>



-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by VIGNESH S <vi...@gmail.com>.
Hi Mike,
I got the problem.The term is not indexed properly..


On Thu, Oct 31, 2013 at 7:19 AM, VIGNESH S <vi...@gmail.com> wrote:

> Hi Mike,
>
> please find tha attached test case G1.java..
>
>
> On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I don't see any java sources here?
>>
>> Make sure "check" is in fact being indexed; can you boil it down to a
>> small test case?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I have indexed the below text file "filename.txt" using the test code
>> > G1.java..
>> >
>> > When I search for "check for old" trm.seekceil() method gives
>> "checking" and
>> > "checks" and ignores "check" which is there in text document..
>> >
>> > It is working for most cases except a few
>> >
>> > Please kindly help me..
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640
>



-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by VIGNESH S <vi...@gmail.com>.
Hi Mike,

please find tha attached test case G1.java..


On Wed, Oct 30, 2013 at 8:41 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I don't see any java sources here?
>
> Make sure "check" is in fact being indexed; can you boil it down to a
> small test case?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com>
> wrote:
> > Hi,
> >
> > I have indexed the below text file "filename.txt" using the test code
> > G1.java..
> >
> > When I search for "check for old" trm.seekceil() method gives "checking"
> and
> > "checks" and ignores "check" which is there in text document..
> >
> > It is working for most cases except a few
> >
> > Please kindly help me..
> >
> > --
> > Thanks and Regards
> > Vignesh Srinivasan
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: trm.seekCeil() not giving proper value when used in MP Query for some words

Posted by Michael McCandless <lu...@mikemccandless.com>.
I don't see any java sources here?

Make sure "check" is in fact being indexed; can you boil it down to a
small test case?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Oct 30, 2013 at 10:59 AM, VIGNESH S <vi...@gmail.com> wrote:
> Hi,
>
> I have indexed the below text file "filename.txt" using the test code
> G1.java..
>
> When I search for "check for old" trm.seekceil() method gives "checking" and
> "checks" and ignores "check" which is there in text document..
>
> It is working for most cases except a few
>
> Please kindly help me..
>
> --
> Thanks and Regards
> Vignesh Srinivasan
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org