You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pranav goyal <pr...@gmail.com> on 2011/06/22 09:30:48 UTC

Lucene Searching

Hi all,

I am in a fix regarding lucene search. I know a little bit about lucene and
have successfully created index and searched a lot of queries on that.
My main worry is that whenever I search for let say "000" it doesn't give me
any result while if I seach for "00000341" it'll give me a hit. Even if I
search for 341 it doesn't give me anything.

I have checked through luke also and luke is also showing no results.

Do I have to use some different analyzer? Currently I am using Keyword
Analyzer.

Thanks
Pranav

Re: Lucene Searching

Posted by digy digy <di...@gmail.com>.
Maybe, you need
queryParser.setLowercaseExpandedTerms(false)

DIGY

On Thu, Jun 23, 2011 at 9:37 AM, Pranav goyal <pr...@gmail.com>wrote:

> I tried it and it worked, although it's having one peculiarity.
>
> When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
> gives me 0 hits. What mistake am I doing here?
>
> Also when I search for *341* it is giving me correct results i.e
> 00000341-000-000-DR
> but it's not working for above case.
>
>
> Thanks
> Pranav
>
> On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea <ia...@gmail.com> wrote:
>
> > What does Luke show as being indexed for that field?  Other useful
> > tips at
> >
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
> >
> > If that field is numeric you could use a NumericField - gets rid of
> > problems with leading zeros.
> >
> > If by "I just want to get everything which has 341 in it" you mean you
> > want to match aaa341bbb and 0000341 and 341, see related thread on
> > this list from yesterday.  Or
> > org.apache.lucene.search.regex.RegexQuery.
> >
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
> > <pr...@gmail.com> wrote:
> > > I can always use * , ?
> > >
> > > But here I am not talking of this. I just want to get everything which
> > has
> > > 341 in it. How to do it without * or ?
> > >
> > > On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
> > pranavgoyal40341@gmail.com>wrote:
> > >
> > >> Hi all,
> > >>
> > >> I am in a fix regarding lucene search. I know a little bit about
> lucene
> > and
> > >> have successfully created index and searched a lot of queries on that.
> > >> My main worry is that whenever I search for let say "000" it doesn't
> > give
> > >> me any result while if I seach for "00000341" it'll give me a hit.
> Even
> > if I
> > >> search for 341 it doesn't give me anything.
> > >>
> > >> I have checked through luke also and luke is also showing no results.
> > >>
> > >> Do I have to use some different analyzer? Currently I am using Keyword
> > >> Analyzer.
> > >>
> > >> Thanks
> > >> Pranav
> > >>
> > >
> > >
> > >
> > > --
> > > I'm very responsible, when ever something goes wrong they always say
> I'm
> > > responsible --
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> I'm very responsible, when ever something goes wrong they always say I'm
> responsible --
>

Re: Lucene Searching

Posted by Ian Lea <ia...@gmail.com>.
Looks OK to me.  You are searching on Item without adding any docs
with that field, you could use writer.updateDocument() rather than
delete and add, but those are just quibbles and don't explain your
searching problem.

Having done most of the hard work, why don't you adapt the code you
posted into a simple standalone program or test case that demonstrates
the problem.  As simple as possible, no external dependencies, clearly
showing what you are indexing and what you are searching on, with one
search that works and one that doesn't.

One warning: using MultiFieldQueryParser with leading wildcards is
pretty much guaranteed to be slow on a large index.


--
Ian.


On Thu, Jun 23, 2011 at 10:08 AM, Pranav goyal
<pr...@gmail.com> wrote:
> Here's the code which I am implementing (Indexing and Searching codes are in
> different files)
>
> Indexing Part :
>
>        d=new Document();
>        File indexDir = new File("index-dir");
>        KeywordAnalyzer analyzer = new KeywordAnalyzer();
>
>
>        IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31,
> analyzer);
>        try {
>            writer = new IndexWriter(FSDirectory.open(indexDir),conf);
>        } catch (IOException e1) {
>            e1.printStackTrace();
>        }
>        String q1 = contract.getDocId();
>        String q2 = contract.getDocName();
>        String q3 = contract.getCustomer(ctx).getMemberName();
>
>        Term term = new Term("DocId",contract.getDocId());
>        writer.deleteDocuments(term);
>
>        d.add(new
> Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED));
>        d.add(new Field("All",q2,Field.Store.NO,Field.Index.NOT_ANALYZED));
>        d.add(new Field("Cust",q3,Field.Store.NO,Field.Index.NOT_ANALYZED));
>
>        try {
>            writer.addDocument(d);
>            writer.close();
>            endTime = System.currentTimeMillis();
>            //System.out.println("Time taken to index the contract with
> DocID "+q1 +" is -> " +(endTime-startTime));
>        }
>
>        catch (IOException e1) {
>            e1.printStackTrace();
>        }
>
>
> Searching Code :
>
>            File indexDir = new File("index-dir");
>            KeywordAnalyzer analyzer = new KeywordAnalyzer();
>            IndexSearcher searcher = null;
>
>            searcher = new IndexSearcher(FSDirectory.open(indexDir));
>
>
>            String[] fields = new String[] { "DocId","Item","Cust","All"};
>            MultiFieldQueryParser parser = new
> MultiFieldQueryParser(Version.LUCENE_31,fields,analyzer);
>            parser.setAllowLeadingWildcard(true);
>
>            String queryString = field.getValue().toString();
>            TopDocs results = null;
>
>
>                Query query1;
>                query1 = parser.parse(queryString);
>                results = searcher.search(query1,1000);
>
>
>                System.out.println("total hits: " + results.totalHits);
>                ScoreDoc[] hits = results.scoreDocs;
>                Document doc = null;
>                ArrayList docIds =  new ArrayList();
>                for (ScoreDoc hit : hits)
>                {
>                        doc = searcher.doc(hit.doc);
>                        System.out.println(doc.get("DocId"));
>
>                        ((ArrayList) docIds).add(doc.get("DocId"));
>
>                }
>  // Function which you need not to understand
>               IMnCriterion criterion =
> contractQuery.createInCriterion(contractQuery.ATTR_P_DOC_ID, docIds);
>               contractQuery.setCriterion(criterion);
>               searcher.close();
>        }
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Searching

Posted by Pranav goyal <pr...@gmail.com>.
Here's the code which I am implementing (Indexing and Searching codes are in
different files)

Indexing Part :

        d=new Document();
        File indexDir = new File("index-dir");
        KeywordAnalyzer analyzer = new KeywordAnalyzer();


        IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31,
analyzer);
        try {
            writer = new IndexWriter(FSDirectory.open(indexDir),conf);
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        String q1 = contract.getDocId();
        String q2 = contract.getDocName();
        String q3 = contract.getCustomer(ctx).getMemberName();

        Term term = new Term("DocId",contract.getDocId());
        writer.deleteDocuments(term);

        d.add(new
Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED));
        d.add(new Field("All",q2,Field.Store.NO,Field.Index.NOT_ANALYZED));
        d.add(new Field("Cust",q3,Field.Store.NO,Field.Index.NOT_ANALYZED));

        try {
            writer.addDocument(d);
            writer.close();
            endTime = System.currentTimeMillis();
            //System.out.println("Time taken to index the contract with
DocID "+q1 +" is -> " +(endTime-startTime));
        }

        catch (IOException e1) {
            e1.printStackTrace();
        }


Searching Code :

            File indexDir = new File("index-dir");
            KeywordAnalyzer analyzer = new KeywordAnalyzer();
            IndexSearcher searcher = null;

            searcher = new IndexSearcher(FSDirectory.open(indexDir));


            String[] fields = new String[] { "DocId","Item","Cust","All"};
            MultiFieldQueryParser parser = new
MultiFieldQueryParser(Version.LUCENE_31,fields,analyzer);
            parser.setAllowLeadingWildcard(true);

            String queryString = field.getValue().toString();
            TopDocs results = null;


                Query query1;
                query1 = parser.parse(queryString);
                results = searcher.search(query1,1000);


                System.out.println("total hits: " + results.totalHits);
                ScoreDoc[] hits = results.scoreDocs;
                Document doc = null;
                ArrayList docIds =  new ArrayList();
                for (ScoreDoc hit : hits)
                {
                        doc = searcher.doc(hit.doc);
                        System.out.println(doc.get("DocId"));

                        ((ArrayList) docIds).add(doc.get("DocId"));

                }
 // Function which you need not to understand
               IMnCriterion criterion =
contractQuery.createInCriterion(contractQuery.ATTR_P_DOC_ID, docIds);
               contractQuery.setCriterion(criterion);
               searcher.close();
        }

Re: Lucene Searching

Posted by Ian Lea <ia...@gmail.com>.
What exactly is "it"?  Show us what you are indexing, how, and how you
are building the query and we may be able to help.

Whenever I see a report of incorrect results on a Mixed Case field I
always suspect that the term is being lowercased on indexing and not
at searching, or vice versa.

--
Ian.


On Thu, Jun 23, 2011 at 7:37 AM, Pranav goyal
<pr...@gmail.com> wrote:
> I tried it and it worked, although it's having one peculiarity.
>
> When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
> gives me 0 hits. What mistake am I doing here?
>
> Also when I search for *341* it is giving me correct results i.e
> 00000341-000-000-DR
> but it's not working for above case.
>
>
> Thanks
> Pranav
>
> On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> What does Luke show as being indexed for that field?  Other useful
>> tips at
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
>>
>> If that field is numeric you could use a NumericField - gets rid of
>> problems with leading zeros.
>>
>> If by "I just want to get everything which has 341 in it" you mean you
>> want to match aaa341bbb and 0000341 and 341, see related thread on
>> this list from yesterday.  Or
>> org.apache.lucene.search.regex.RegexQuery.
>>
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
>> <pr...@gmail.com> wrote:
>> > I can always use * , ?
>> >
>> > But here I am not talking of this. I just want to get everything which
>> has
>> > 341 in it. How to do it without * or ?
>> >
>> > On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
>> pranavgoyal40341@gmail.com>wrote:
>> >
>> >> Hi all,
>> >>
>> >> I am in a fix regarding lucene search. I know a little bit about lucene
>> and
>> >> have successfully created index and searched a lot of queries on that.
>> >> My main worry is that whenever I search for let say "000" it doesn't
>> give
>> >> me any result while if I seach for "00000341" it'll give me a hit. Even
>> if I
>> >> search for 341 it doesn't give me anything.
>> >>
>> >> I have checked through luke also and luke is also showing no results.
>> >>
>> >> Do I have to use some different analyzer? Currently I am using Keyword
>> >> Analyzer.
>> >>
>> >> Thanks
>> >> Pranav
>> >>
>> >
>> >
>> >
>> > --
>> > I'm very responsible, when ever something goes wrong they always say I'm
>> > responsible --
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> I'm very responsible, when ever something goes wrong they always say I'm
> responsible --
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Searching

Posted by Pranav goyal <pr...@gmail.com>.
I tried it and it worked, although it's having one peculiarity.

When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
gives me 0 hits. What mistake am I doing here?

Also when I search for *341* it is giving me correct results i.e
00000341-000-000-DR
but it's not working for above case.


Thanks
Pranav

On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea <ia...@gmail.com> wrote:

> What does Luke show as being indexed for that field?  Other useful
> tips at
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
>
> If that field is numeric you could use a NumericField - gets rid of
> problems with leading zeros.
>
> If by "I just want to get everything which has 341 in it" you mean you
> want to match aaa341bbb and 0000341 and 341, see related thread on
> this list from yesterday.  Or
> org.apache.lucene.search.regex.RegexQuery.
>
>
>
> --
> Ian.
>
>
> On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
> <pr...@gmail.com> wrote:
> > I can always use * , ?
> >
> > But here I am not talking of this. I just want to get everything which
> has
> > 341 in it. How to do it without * or ?
> >
> > On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
> pranavgoyal40341@gmail.com>wrote:
> >
> >> Hi all,
> >>
> >> I am in a fix regarding lucene search. I know a little bit about lucene
> and
> >> have successfully created index and searched a lot of queries on that.
> >> My main worry is that whenever I search for let say "000" it doesn't
> give
> >> me any result while if I seach for "00000341" it'll give me a hit. Even
> if I
> >> search for 341 it doesn't give me anything.
> >>
> >> I have checked through luke also and luke is also showing no results.
> >>
> >> Do I have to use some different analyzer? Currently I am using Keyword
> >> Analyzer.
> >>
> >> Thanks
> >> Pranav
> >>
> >
> >
> >
> > --
> > I'm very responsible, when ever something goes wrong they always say I'm
> > responsible --
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
I'm very responsible, when ever something goes wrong they always say I'm
responsible --

Re: Lucene Searching

Posted by Ian Lea <ia...@gmail.com>.
What does Luke show as being indexed for that field?  Other useful
tips at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

If that field is numeric you could use a NumericField - gets rid of
problems with leading zeros.

If by "I just want to get everything which has 341 in it" you mean you
want to match aaa341bbb and 0000341 and 341, see related thread on
this list from yesterday.  Or
org.apache.lucene.search.regex.RegexQuery.



--
Ian.


On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
<pr...@gmail.com> wrote:
> I can always use * , ?
>
> But here I am not talking of this. I just want to get everything which has
> 341 in it. How to do it without * or ?
>
> On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <pr...@gmail.com>wrote:
>
>> Hi all,
>>
>> I am in a fix regarding lucene search. I know a little bit about lucene and
>> have successfully created index and searched a lot of queries on that.
>> My main worry is that whenever I search for let say "000" it doesn't give
>> me any result while if I seach for "00000341" it'll give me a hit. Even if I
>> search for 341 it doesn't give me anything.
>>
>> I have checked through luke also and luke is also showing no results.
>>
>> Do I have to use some different analyzer? Currently I am using Keyword
>> Analyzer.
>>
>> Thanks
>> Pranav
>>
>
>
>
> --
> I'm very responsible, when ever something goes wrong they always say I'm
> responsible --
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Searching

Posted by Pranav goyal <pr...@gmail.com>.
I can always use * , ?

But here I am not talking of this. I just want to get everything which has
341 in it. How to do it without * or ?

On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <pr...@gmail.com>wrote:

> Hi all,
>
> I am in a fix regarding lucene search. I know a little bit about lucene and
> have successfully created index and searched a lot of queries on that.
> My main worry is that whenever I search for let say "000" it doesn't give
> me any result while if I seach for "00000341" it'll give me a hit. Even if I
> search for 341 it doesn't give me anything.
>
> I have checked through luke also and luke is also showing no results.
>
> Do I have to use some different analyzer? Currently I am using Keyword
> Analyzer.
>
> Thanks
> Pranav
>



-- 
I'm very responsible, when ever something goes wrong they always say I'm
responsible --