You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by nischal reddy <ni...@gmail.com> on 2013/10/14 10:12:37 UTC

wildcard search not working on file paths

Hi,

I have problem with doing wild card search on file path fields.

i have a field "filePath" where i store complete path of files.

i have used StringField to store the field ("i assume by default
StringField will not be tokenized") .

doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));

I am using StandardAnalyzer for IndexWriter

but since i am using a StringField the fields are not analyzed.

After the files are indexed i checked it with Luke the path seems fine. And
when i do wildcard searches with luke i am getting desired results.

But when i do the same search in my code with IndexSearcher i am getting
zero docs

My searching code looks something like this

indexSearcher.search(new WildcardQuery(new
Term("filePath","*SuperClass.cls")),100);

this is returning zero documents.

But when i just use "*" in query it is returning all the documents

indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);

only when i use some queries like prefix wildcard etc it is not working

What is possibly going wrong.

Thanks,
Nischal Y

Re: wildcard search not working on file paths

Posted by Ian Lea <ia...@gmail.com>.
You seem to be indexing paths delimited by backslash then saying a
search for Samples/* doesn't match anything.  No surprises there, if
I've read your code correctly.  Since you are creating wildcard
queries directly from Terms I don't think that lucene escaping is
relevant here,  But the presence of all the backslashes in paths and
java code doesn't help.  I'd convert them all to standard unix /a/b/c
format, for searching anyway: you can always store the original if you
want to use that in results.

One further small tip: your sample program is good, with no external
dependencies, but would be even better if you used RAMDirectory.  That
way I could run it on my non-Windows system if I wanted to, with the
addition of some imports.


--
Ian.


On Mon, Oct 14, 2013 at 7:55 PM, nischal reddy
<ni...@gmail.com> wrote:
> Hi Ian,
>
> Please find a sample program below which better illustrates the scenario
>
>
> public class TestWriter {
>     public static void main(String[] args) throws IOException {
>         createIndex();
>         searchIndex();
>     }
>
>     public static void createIndex() throws IOException {
>             Directory directory = FSDirectory.open(new File("C:\\temp"));
>
>             IndexWriterConfig iwriter = new IndexWriterConfig(
>                     Version.LUCENE_44, new
> StandardAnalyzer(Version.LUCENE_44));
>
>             IndexWriter iWriter = new IndexWriter(directory, iwriter);
>
>             Document document1 = new Document();
>
>             document1.add(new StringField("FILE_PATH",
>                     "\\Samples\\Batching\\runner.p", Store.YES));
>             document1.add(new StringField("contents", "runnerfile",
> Store.YES));
>
>             iWriter.addDocument(document1);
>
>             Document document2 = new Document();
>
>             document2.add(new StringField("FILE_PATH",
>                     "\\Samples\\Business\\stopper.p", Store.YES));
>             document2
>                     .add(new StringField("contents", "stopperfile",
> Store.YES));
>
>             iWriter.addDocument(document2);
>             iWriter.commit();
>             iWriter.close();
>
>
>     }
>
>     public static void searchIndex() throws IOException {
>
>         Directory directory = FSDirectory.open(new File("C:\\temp"));
>         IndexReader indexReader = DirectoryReader.open(directory);
>         IndexSearcher indexSearcher = new IndexSearcher(indexReader);
>
>         // Create a wildcard query to get all file paths
>         // This query works fine and returns all the docs in index
>         Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
>         TopDocs topDocs = indexSearcher.search(query1, 100);
>         System.out.println("total no of docs " + topDocs.totalHits);
>
>         // Create a wildcard query to search for paths starting with
> /Samples
>         // This query doesnt work and returns zero docs
>         //doest work with "*Samples//*" either
>         // but works with "*Samples*"
>         Query query2 = new WildcardQuery(new Term("FILE_PATH",
> "*Samples/*"));
>         TopDocs topDocs2 = indexSearcher.search(query2, 100);
>         System.out.println("total no of docs " + topDocs2.totalHits);
>
>         // Create a wildcard query to search for paths ending with runner.p
>         // This query works and returns 1 doc
>         Query query3 = new WildcardQuery(new Term("FILE_PATH",
> "*runner.p"));
>         TopDocs topDocs3 = indexSearcher.search(query3, 100);
>         System.out.println("total no of docs " + topDocs3.totalHits);
>
>         // Queries to search in "contents" field
>
>         // Create a wildcard query to search for contents starting with
> runner
>         // This query works and returns one doc
>         Query query4 = new WildcardQuery(new Term("contents", "runner*"));
>         TopDocs topDocs4 = indexSearcher.search(query4, 100);
>         System.out.println("total no of docs " + topDocs4.totalHits);
>
>         // Create a wildcard query to search for contents ending with file
>         // This query works and returns two  docs
>         Query query5 = new WildcardQuery(new Term("contents", "*file"));
>         TopDocs topDocs5 = indexSearcher.search(query5, 100);
>         System.out.println("total no of docs " + topDocs5.totalHits);
>
>     }
>
> }
>
>
> I observed that the file path seperator that i am using in the field and
> lucene escape charater seem to be same. so whenever i am using a escape
> character in the query the search is failing, if i dont use the escape
> sequence it is returning the results properly.
>
> Though i am escaping "\" by giving two "\\" the query is still failing.
>
> one way to solve this problem is to replace all "\" with "/" while
> indexing. and subsequently using "/" as file path seperator while searching.
>
> But i wouldnt prefer to meddle with the filepath. So is there any
> alternative to solve this problem without replacing the file path.
>
> TIA,
> Nischal Y
>
>
>
> On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Seems to me that it should work.  I suggest you show us a complete
>> self-contained example program that demonstrates the problem.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
>> <ni...@gmail.com> wrote:
>> > Hi Ian,
>> >
>> > Actually im able to do wildcard searches on all the fields except the
>> > "filePath" field. I am able to do both the leading and trailing wildcard
>> > searches on all the fields,
>> > but when i do the wildcard search on filepath field it is somehow not
>> > working, an eg file path would look some thing like this
>> "\Samples\F1.cls"
>> > i think because of "\" present in the field it is failing. when i do a
>> > wildcard search with the query "filePath : *" it is indeed returning all
>> > the docs in the index. But when i do any other wildcard searches(leading
>> or
>> > trailing) it is not working, any clues why it is working in other fields
>> > and not working on "filePath" field.
>> >
>> > TIA,
>> > Nischal Y
>> >
>> >
>> > On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ia...@gmail.com> wrote:
>> >
>> >> Do some googling on leading wildcards and read things like
>> >> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
>> >> an option you like.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
>> >> <ni...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I have problem with doing wild card search on file path fields.
>> >> >
>> >> > i have a field "filePath" where i store complete path of files.
>> >> >
>> >> > i have used StringField to store the field ("i assume by default
>> >> > StringField will not be tokenized") .
>> >> >
>> >> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>> >> >
>> >> > I am using StandardAnalyzer for IndexWriter
>> >> >
>> >> > but since i am using a StringField the fields are not analyzed.
>> >> >
>> >> > After the files are indexed i checked it with Luke the path seems
>> fine.
>> >> And
>> >> > when i do wildcard searches with luke i am getting desired results.
>> >> >
>> >> > But when i do the same search in my code with IndexSearcher i am
>> getting
>> >> > zero docs
>> >> >
>> >> > My searching code looks something like this
>> >> >
>> >> > indexSearcher.search(new WildcardQuery(new
>> >> > Term("filePath","*SuperClass.cls")),100);
>> >> >
>> >> > this is returning zero documents.
>> >> >
>> >> > But when i just use "*" in query it is returning all the documents
>> >> >
>> >> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>> >> >
>> >> > only when i use some queries like prefix wildcard etc it is not
>> working
>> >> >
>> >> > What is possibly going wrong.
>> >> >
>> >> > Thanks,
>> >> > Nischal Y
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: wildcard search not working on file paths

Posted by nischal reddy <ni...@gmail.com>.
Hi Ian,

Please find a sample program below which better illustrates the scenario


public class TestWriter {
    public static void main(String[] args) throws IOException {
        createIndex();
        searchIndex();
    }

    public static void createIndex() throws IOException {
            Directory directory = FSDirectory.open(new File("C:\\temp"));

            IndexWriterConfig iwriter = new IndexWriterConfig(
                    Version.LUCENE_44, new
StandardAnalyzer(Version.LUCENE_44));

            IndexWriter iWriter = new IndexWriter(directory, iwriter);

            Document document1 = new Document();

            document1.add(new StringField("FILE_PATH",
                    "\\Samples\\Batching\\runner.p", Store.YES));
            document1.add(new StringField("contents", "runnerfile",
Store.YES));

            iWriter.addDocument(document1);

            Document document2 = new Document();

            document2.add(new StringField("FILE_PATH",
                    "\\Samples\\Business\\stopper.p", Store.YES));
            document2
                    .add(new StringField("contents", "stopperfile",
Store.YES));

            iWriter.addDocument(document2);
            iWriter.commit();
            iWriter.close();


    }

    public static void searchIndex() throws IOException {

        Directory directory = FSDirectory.open(new File("C:\\temp"));
        IndexReader indexReader = DirectoryReader.open(directory);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);

        // Create a wildcard query to get all file paths
        // This query works fine and returns all the docs in index
        Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
        TopDocs topDocs = indexSearcher.search(query1, 100);
        System.out.println("total no of docs " + topDocs.totalHits);

        // Create a wildcard query to search for paths starting with
/Samples
        // This query doesnt work and returns zero docs
        //doest work with "*Samples//*" either
        // but works with "*Samples*"
        Query query2 = new WildcardQuery(new Term("FILE_PATH",
"*Samples/*"));
        TopDocs topDocs2 = indexSearcher.search(query2, 100);
        System.out.println("total no of docs " + topDocs2.totalHits);

        // Create a wildcard query to search for paths ending with runner.p
        // This query works and returns 1 doc
        Query query3 = new WildcardQuery(new Term("FILE_PATH",
"*runner.p"));
        TopDocs topDocs3 = indexSearcher.search(query3, 100);
        System.out.println("total no of docs " + topDocs3.totalHits);

        // Queries to search in "contents" field

        // Create a wildcard query to search for contents starting with
runner
        // This query works and returns one doc
        Query query4 = new WildcardQuery(new Term("contents", "runner*"));
        TopDocs topDocs4 = indexSearcher.search(query4, 100);
        System.out.println("total no of docs " + topDocs4.totalHits);

        // Create a wildcard query to search for contents ending with file
        // This query works and returns two  docs
        Query query5 = new WildcardQuery(new Term("contents", "*file"));
        TopDocs topDocs5 = indexSearcher.search(query5, 100);
        System.out.println("total no of docs " + topDocs5.totalHits);

    }

}


I observed that the file path seperator that i am using in the field and
lucene escape charater seem to be same. so whenever i am using a escape
character in the query the search is failing, if i dont use the escape
sequence it is returning the results properly.

Though i am escaping "\" by giving two "\\" the query is still failing.

one way to solve this problem is to replace all "\" with "/" while
indexing. and subsequently using "/" as file path seperator while searching.

But i wouldnt prefer to meddle with the filepath. So is there any
alternative to solve this problem without replacing the file path.

TIA,
Nischal Y



On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea <ia...@gmail.com> wrote:

> Seems to me that it should work.  I suggest you show us a complete
> self-contained example program that demonstrates the problem.
>
>
> --
> Ian.
>
>
> On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
> <ni...@gmail.com> wrote:
> > Hi Ian,
> >
> > Actually im able to do wildcard searches on all the fields except the
> > "filePath" field. I am able to do both the leading and trailing wildcard
> > searches on all the fields,
> > but when i do the wildcard search on filepath field it is somehow not
> > working, an eg file path would look some thing like this
> "\Samples\F1.cls"
> > i think because of "\" present in the field it is failing. when i do a
> > wildcard search with the query "filePath : *" it is indeed returning all
> > the docs in the index. But when i do any other wildcard searches(leading
> or
> > trailing) it is not working, any clues why it is working in other fields
> > and not working on "filePath" field.
> >
> > TIA,
> > Nischal Y
> >
> >
> > On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ia...@gmail.com> wrote:
> >
> >> Do some googling on leading wildcards and read things like
> >> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
> >> an option you like.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
> >> <ni...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I have problem with doing wild card search on file path fields.
> >> >
> >> > i have a field "filePath" where i store complete path of files.
> >> >
> >> > i have used StringField to store the field ("i assume by default
> >> > StringField will not be tokenized") .
> >> >
> >> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
> >> >
> >> > I am using StandardAnalyzer for IndexWriter
> >> >
> >> > but since i am using a StringField the fields are not analyzed.
> >> >
> >> > After the files are indexed i checked it with Luke the path seems
> fine.
> >> And
> >> > when i do wildcard searches with luke i am getting desired results.
> >> >
> >> > But when i do the same search in my code with IndexSearcher i am
> getting
> >> > zero docs
> >> >
> >> > My searching code looks something like this
> >> >
> >> > indexSearcher.search(new WildcardQuery(new
> >> > Term("filePath","*SuperClass.cls")),100);
> >> >
> >> > this is returning zero documents.
> >> >
> >> > But when i just use "*" in query it is returning all the documents
> >> >
> >> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
> >> >
> >> > only when i use some queries like prefix wildcard etc it is not
> working
> >> >
> >> > What is possibly going wrong.
> >> >
> >> > Thanks,
> >> > Nischal Y
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: wildcard search not working on file paths

Posted by Ian Lea <ia...@gmail.com>.
Seems to me that it should work.  I suggest you show us a complete
self-contained example program that demonstrates the problem.


--
Ian.


On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
<ni...@gmail.com> wrote:
> Hi Ian,
>
> Actually im able to do wildcard searches on all the fields except the
> "filePath" field. I am able to do both the leading and trailing wildcard
> searches on all the fields,
> but when i do the wildcard search on filepath field it is somehow not
> working, an eg file path would look some thing like this "\Samples\F1.cls"
> i think because of "\" present in the field it is failing. when i do a
> wildcard search with the query "filePath : *" it is indeed returning all
> the docs in the index. But when i do any other wildcard searches(leading or
> trailing) it is not working, any clues why it is working in other fields
> and not working on "filePath" field.
>
> TIA,
> Nischal Y
>
>
> On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Do some googling on leading wildcards and read things like
>> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
>> an option you like.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
>> <ni...@gmail.com> wrote:
>> > Hi,
>> >
>> > I have problem with doing wild card search on file path fields.
>> >
>> > i have a field "filePath" where i store complete path of files.
>> >
>> > i have used StringField to store the field ("i assume by default
>> > StringField will not be tokenized") .
>> >
>> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>> >
>> > I am using StandardAnalyzer for IndexWriter
>> >
>> > but since i am using a StringField the fields are not analyzed.
>> >
>> > After the files are indexed i checked it with Luke the path seems fine.
>> And
>> > when i do wildcard searches with luke i am getting desired results.
>> >
>> > But when i do the same search in my code with IndexSearcher i am getting
>> > zero docs
>> >
>> > My searching code looks something like this
>> >
>> > indexSearcher.search(new WildcardQuery(new
>> > Term("filePath","*SuperClass.cls")),100);
>> >
>> > this is returning zero documents.
>> >
>> > But when i just use "*" in query it is returning all the documents
>> >
>> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>> >
>> > only when i use some queries like prefix wildcard etc it is not working
>> >
>> > What is possibly going wrong.
>> >
>> > Thanks,
>> > Nischal Y
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: wildcard search not working on file paths

Posted by nischal reddy <ni...@gmail.com>.
Hi Ian,

Actually im able to do wildcard searches on all the fields except the
"filePath" field. I am able to do both the leading and trailing wildcard
searches on all the fields,
but when i do the wildcard search on filepath field it is somehow not
working, an eg file path would look some thing like this "\Samples\F1.cls"
i think because of "\" present in the field it is failing. when i do a
wildcard search with the query "filePath : *" it is indeed returning all
the docs in the index. But when i do any other wildcard searches(leading or
trailing) it is not working, any clues why it is working in other fields
and not working on "filePath" field.

TIA,
Nischal Y


On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea <ia...@gmail.com> wrote:

> Do some googling on leading wildcards and read things like
> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
> an option you like.
>
>
> --
> Ian.
>
>
> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
> <ni...@gmail.com> wrote:
> > Hi,
> >
> > I have problem with doing wild card search on file path fields.
> >
> > i have a field "filePath" where i store complete path of files.
> >
> > i have used StringField to store the field ("i assume by default
> > StringField will not be tokenized") .
> >
> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
> >
> > I am using StandardAnalyzer for IndexWriter
> >
> > but since i am using a StringField the fields are not analyzed.
> >
> > After the files are indexed i checked it with Luke the path seems fine.
> And
> > when i do wildcard searches with luke i am getting desired results.
> >
> > But when i do the same search in my code with IndexSearcher i am getting
> > zero docs
> >
> > My searching code looks something like this
> >
> > indexSearcher.search(new WildcardQuery(new
> > Term("filePath","*SuperClass.cls")),100);
> >
> > this is returning zero documents.
> >
> > But when i just use "*" in query it is returning all the documents
> >
> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
> >
> > only when i use some queries like prefix wildcard etc it is not working
> >
> > What is possibly going wrong.
> >
> > Thanks,
> > Nischal Y
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: wildcard search not working on file paths

Posted by Ian Lea <ia...@gmail.com>.
Do some googling on leading wildcards and read things like
http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
an option you like.


--
Ian.


On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
<ni...@gmail.com> wrote:
> Hi,
>
> I have problem with doing wild card search on file path fields.
>
> i have a field "filePath" where i store complete path of files.
>
> i have used StringField to store the field ("i assume by default
> StringField will not be tokenized") .
>
> doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>
> I am using StandardAnalyzer for IndexWriter
>
> but since i am using a StringField the fields are not analyzed.
>
> After the files are indexed i checked it with Luke the path seems fine. And
> when i do wildcard searches with luke i am getting desired results.
>
> But when i do the same search in my code with IndexSearcher i am getting
> zero docs
>
> My searching code looks something like this
>
> indexSearcher.search(new WildcardQuery(new
> Term("filePath","*SuperClass.cls")),100);
>
> this is returning zero documents.
>
> But when i just use "*" in query it is returning all the documents
>
> indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>
> only when i use some queries like prefix wildcard etc it is not working
>
> What is possibly going wrong.
>
> Thanks,
> Nischal Y

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org