You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by vhanuman kumarbaburavi <V....@NIIT-Tech.com> on 2010/03/19 16:36:57 UTC

Prefix And Fuzzy

Dear Paul,



Still I am fighting with Prefix and Fuzzy search. Based on archive post I have noticed once thing i.e. when the prefix search will work only for "Field.Index.NOT_ANALYZED" type. I have changed that one prefix working perfectly. But the fuzzy search is working stopped, I hope because of the Index type when I changed to "Field.Index.ANALYZED" it starts working. I am using "lucene 3.0"version.



I have implemented these two searches in my application following way.



IndexSearcher is = new IndexSearcher(FSDirectory.open(new File("index11")));

is.setSimilarity(Similarity.getDefault());

Analyzer analyzer = new StopAnalyzer(Version.LUCENE_30);

QueryParser parser = new QueryParser(Version.LUCENE_30, "name",analyzer);

parser.setLowercaseExpandedTerms(true);

parser.setEnablePositionIncrements(true);

parser.setAllowLeadingWildcard(true);

parser.setFuzzyPrefixLength(4);

parser.setFuzzyMinSim(0.8f);



BooleanQuery booleanQuery = new BooleanQuery(true);

parser.setAllowLeadingWildcard(true);

booleanQuery.add(parser.parse("Tick*"), BooleanClause.Occur.SHOULD);

booleanQuery.add(parser.parse("Tick~"), BooleanClause.Occur.SHOULD);



ScoreDoc[] hits = is.search(booleanQuery, 1000).scoreDocs;

for (int i = 0; i < hits.length; i++) {

      Document hitDoc = is.doc(hits[i].doc);

      System.out.println(hitDoc.get("name") + "\t" + i);

}



Please suggest me ASAP.



Thanks & Regards



Veera.





-----Original Message-----
From: Paul Taylor [mailto:paul_t100@fastmail.fm]
Sent: Friday, March 19, 2010 5:02 PM
To: java-user@lucene.apache.org
Subject: Version.onOrAfter() complaing its Deprecated but it isnt



Hi since downloading Lucene 3.1 my code complains that

Version.onOrAfter() complaing its deprecated but i also have svn access

to the source and it isn't deprecated , and doesnt look like it ever has

been, anyone else get this ?



Paul



---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org



________________________________
DISCLAIMER

The content of this email and any attachments ('email') is confidential, may be privileged, subject to copyright and may be read and used only by the intended recipient. If you are not the intended recipient please notify us by return email or telephone and erase all copies and do not disclose the email or any part of it to any person. Email transmission cannot be guaranteed to be secure, or error free as information could be intercepted, corrupted, lost or destroyed as a result of the transmission process. The sender, therefore, does not accept liability for any errors, omissions, viruses or delay in transmission as a result of this mail. We monitor email communications through our networks for regulatory compliance purposes and to protect our clients, employees and business. Opinions, conclusions, and other information in this message that do not relate to the official business of NIIT Technologies Ltd. or its affiliate(s) shall be understood to be neither given nor endorsed by NIIT Technologies Ltd. or its affiliate(s).

Re: Prefix And Fuzzy

Posted by Anshum <an...@gmail.com>.
Hi Veera,
I'd say you should get yourself a copy of Lucene In Action 2 Ed. It would
really help you figure out the hows and why's in case you are looking at
using lucene further.
Talking about your solution, let me explain it to you a bit. When
you analyse a particular field in lucene, it gets tokenized/processed prior
to getting indexed. The way the processing would happen depends on your
analyzer (which here is StopAnalyzer). So point 1. If you analyze a field
with value *'My name is anshum' *it would get broken down into tokens, e.g.
[my] [name] [is] [anshum] where each term in the [] is a token
indexed separately.
A search on my or a search on name or a search on anshum would get you this
document. Whereas in case you don't analyze it, the field would get indexed
as it is and this document would only be picked up if its an exact match for
the complete value i.e. the document would have one token with the value *[my
name is anshum].*
**Hope you get what 'm trying to put forward here.
Reanalyze your requirement, you might really want to analyze the field. Do
not avoid analysis of the field only to get a particular test case work
fine, it might trouble you later!
Analyze your field using a suitable analyzer and the searches should just
work fine.

--
Anshum Gupta
http://anshumgupta.net

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Fri, Mar 19, 2010 at 9:06 PM, vhanuman kumarbaburavi <
V.Ravi@niit-tech.com> wrote:

> Dear Paul,
>
>
>
> Still I am fighting with Prefix and Fuzzy search. Based on archive post I
> have noticed once thing i.e. when the prefix search will work only for
> "Field.Index.NOT_ANALYZED" type. I have changed that one prefix working
> perfectly. But the fuzzy search is working stopped, I hope because of the
> Index type when I changed to "Field.Index.ANALYZED" it starts working. I am
> using "lucene 3.0"version.
>
>
>
> I have implemented these two searches in my application following way.
>
>
>
> IndexSearcher is = new IndexSearcher(FSDirectory.open(new
> File("index11")));
>
> is.setSimilarity(Similarity.getDefault());
>
> Analyzer analyzer = new StopAnalyzer(Version.LUCENE_30);
>
> QueryParser parser = new QueryParser(Version.LUCENE_30, "name",analyzer);
>
> parser.setLowercaseExpandedTerms(true);
>
> parser.setEnablePositionIncrements(true);
>
> parser.setAllowLeadingWildcard(true);
>
> parser.setFuzzyPrefixLength(4);
>
> parser.setFuzzyMinSim(0.8f);
>
>
>
> BooleanQuery booleanQuery = new BooleanQuery(true);
>
> parser.setAllowLeadingWildcard(true);
>
> booleanQuery.add(parser.parse("Tick*"), BooleanClause.Occur.SHOULD);
>
> booleanQuery.add(parser.parse("Tick~"), BooleanClause.Occur.SHOULD);
>
>
>
> ScoreDoc[] hits = is.search(booleanQuery, 1000).scoreDocs;
>
> for (int i = 0; i < hits.length; i++) {
>
>      Document hitDoc = is.doc(hits[i].doc);
>
>      System.out.println(hitDoc.get("name") + "\t" + i);
>
> }
>
>
>
> Please suggest me ASAP.
>
>
>
> Thanks & Regards
>
>
>
> Veera.
>
>
>
>
>
> -----Original Message-----
> From: Paul Taylor [mailto:paul_t100@fastmail.fm]
> Sent: Friday, March 19, 2010 5:02 PM
> To: java-user@lucene.apache.org
> Subject: Version.onOrAfter() complaing its Deprecated but it isnt
>
>
>
> Hi since downloading Lucene 3.1 my code complains that
>
> Version.onOrAfter() complaing its deprecated but i also have svn access
>
> to the source and it isn't deprecated , and doesnt look like it ever has
>
> been, anyone else get this ?
>
>
>
> Paul
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ________________________________
> DISCLAIMER
>
> The content of this email and any attachments ('email') is confidential,
> may be privileged, subject to copyright and may be read and used only by the
> intended recipient. If you are not the intended recipient please notify us
> by return email or telephone and erase all copies and do not disclose the
> email or any part of it to any person. Email transmission cannot be
> guaranteed to be secure, or error free as information could be intercepted,
> corrupted, lost or destroyed as a result of the transmission process. The
> sender, therefore, does not accept liability for any errors, omissions,
> viruses or delay in transmission as a result of this mail. We monitor email
> communications through our networks for regulatory compliance purposes and
> to protect our clients, employees and business. Opinions, conclusions, and
> other information in this message that do not relate to the official
> business of NIIT Technologies Ltd. or its affiliate(s) shall be understood
> to be neither given nor endorsed by NIIT Technologies Ltd. or its
> affiliate(s).
>