You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2004/05/27 07:00:27 UTC

Range Query Sombody HELP please

Hi
Lucene developers

Is it possible to do Search and retrieve relevant information on the Indexed
Document
within in specific range settings which may be  similar to an

Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100 and 200

ex:-

       "search_word"  ,   Book between  100   AND   200

[ Note:- where Book uniquefield  hit info which is already Indexed ]


Sombody Please Help me   :(


with regards
Karthik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?

Posted by Che Dong <ch...@hotmail.com>.

> I would be against such a move.  I think Lucene's core has too many 
> analyzers in it already, such as the German and Russian ones.  The core 
> could do without any of the concrete analyzers altogether, in my 
> opinion - but it is handy to have a few general purpose convenience 
> ones.
+1
> 
> What benefit, besides convenience, would there be in CJKAnalyzer into 
> the core?  What about the all the others in the sandbox?  If we bring 
> one in, why not all of them?
but for CJK there is no space for word segment in nature. so the Bigram Co-occurrences will be the better way for  Word Discrimination. 
For example: term C1C2 if segment into C1 and C2 the results will contains C2C1... but in Chinese, the word C1C2 and C2C1 maybe in different meaning.
compare to the the sigram base tokenizer implement in StandardTokenizer the bigram based token will return MUCH better results.

According to my feed back on CJKTokenizer: 
for CJK users, the bigram based CJKTokenzier was strongly recommended for better results.

for more:
Word Discrimination Based on Bigram Co-occurrences
... There is a match routine that detects any common segment between the target word and each of the ... The entries
of the matrix indicate whether a reference word and a lexicon word share at least one n- gram ... It also
shows the bigram match list for an unknown word generated by the feature-matching process ... 
www.ecse.rpi.edu/homepages/nagy/PDF_files/ ElNasan-Nagy-ICDAR01.pdf 

Segmenting Chinese in Unicode
... However, to date no in-depth analysis has been performed analyzing the deficiencies in segmentation
that lead to the improved performance of the simpler bigram methods. ... The part-of-speech of the segment
and the ... A study on integrating Chinese word segmentation and part-of-speech tagging. ... 
www.basistech.com/papers/chinese/iuc-16-paper.pdf 

> 
> It has been brought up to bring in the SnowballAnalyzer - as it 
> actually is general purpose and spans many languages.  I'm not really 
> for bringing that one in either.
> 
> I'm but one voice and would not veto bringing in other analyzers, I 
> just don't think there is much benefit, especially if we improve the 
> release process to incorporate the sandbox goodies into a single 
> distribution but as separate JARs.
> 
> Erik
Thank you,  Erik. Hope we can more communications on this issue with other east Asian Luaguage users.

Che Dong

> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?

Posted by Che Dong <ch...@hotmail.com>.

> I would be against such a move.  I think Lucene's core has too many 
> analyzers in it already, such as the German and Russian ones.  The core 
> could do without any of the concrete analyzers altogether, in my 
> opinion - but it is handy to have a few general purpose convenience 
> ones.
+1
> 
> What benefit, besides convenience, would there be in CJKAnalyzer into 
> the core?  What about the all the others in the sandbox?  If we bring 
> one in, why not all of them?
but for CJK there is no space for word segment in nature. so the Bigram Co-occurrences will be the better way for  Word Discrimination. 
For example: term C1C2 if segment into C1 and C2 the results will contains C2C1... but in Chinese, the word C1C2 and C2C1 maybe in different meaning.
compare to the the sigram base tokenizer implement in StandardTokenizer the bigram based token will return MUCH better results.

According to my feed back on CJKTokenizer: 
for CJK users, the bigram based CJKTokenzier was strongly recommended for better results.

for more:
Word Discrimination Based on Bigram Co-occurrences
... There is a match routine that detects any common segment between the target word and each of the ... The entries
of the matrix indicate whether a reference word and a lexicon word share at least one n- gram ... It also
shows the bigram match list for an unknown word generated by the feature-matching process ... 
www.ecse.rpi.edu/homepages/nagy/PDF_files/ ElNasan-Nagy-ICDAR01.pdf 

Segmenting Chinese in Unicode
... However, to date no in-depth analysis has been performed analyzing the deficiencies in segmentation
that lead to the improved performance of the simpler bigram methods. ... The part-of-speech of the segment
and the ... A study on integrating Chinese word segmentation and part-of-speech tagging. ... 
www.basistech.com/papers/chinese/iuc-16-paper.pdf 

> 
> It has been brought up to bring in the SnowballAnalyzer - as it 
> actually is general purpose and spans many languages.  I'm not really 
> for bringing that one in either.
> 
> I'm but one voice and would not veto bringing in other analyzers, I 
> just don't think there is much benefit, especially if we improve the 
> release process to incorporate the sandbox goodies into a single 
> distribution but as separate JARs.
> 
> Erik
Thank you,  Erik. Hope we can more communications on this issue with other east Asian Luaguage users.

Che Dong

> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

Re: Will CJKAnalyser be release with Lucene 1.4?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 29, 2004, at 11:37 PM, Che Dong wrote:
> Is it possable move CJKAnalyser out of sandbox to jakarta-lucene 
> package?

I would be against such a move.  I think Lucene's core has too many 
analyzers in it already, such as the German and Russian ones.  The core 
could do without any of the concrete analyzers altogether, in my 
opinion - but it is handy to have a few general purpose convenience 
ones.

What benefit, besides convenience, would there be in CJKAnalyzer into 
the core?  What about the all the others in the sandbox?  If we bring 
one in, why not all of them?

It has been brought up to bring in the SnowballAnalyzer - as it 
actually is general purpose and spans many languages.  I'm not really 
for bringing that one in either.

I'm but one voice and would not veto bringing in other analyzers, I 
just don't think there is much benefit, especially if we improve the 
release process to incorporate the sandbox goodies into a single 
distribution but as separate JARs.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Will CJKAnalyser be release with Lucene 1.4?

Posted by Che Dong <ch...@hotmail.com>.

Hi Erik:
Is it possable move CJKAnalyser out of sandbox to jakarta-lucene package?

Regards

Che Dong
----- Original Message ----- 
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Sunday, May 30, 2004 5:17 AM
Subject: Re: Will CJKAnalyser be release with Lucene 1.4?


> I'm not sure I understand your question.
> 
> At this point there is no plan to "release" the sandbox components -  
> they are really there in a batteries-not-included fashion at this point  
> but its all there to freely use if you like.
> 
> I did centralize the build system in contributions area, so each piece  
> should easily build into JAR files.
> 
> Is there more that you desire?
> 
> Erik
> 
> 
> On May 29, 2004, at 3:24 PM, Che Dong wrote:
> 
> > Hi All:
> > I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
> > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
> > contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/
> >
> > The original version works fine at http://search.163.com  
> > http://search.soufun.com and www.blogchina.com/weblucene/
> >
> > Regards
> >
> > Che Dong
> > http://www.chedong.com/tech/lucene.html
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

Re: question on design for ordering of field names written to FieldInfos object

Posted by Peter M Cipollone <lu...@bihvhar.com>.

sorry.  I meant to send this to the dev list...

----- Original Message ----- 
From: "Peter M Cipollone" <lu...@bihvhar.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, May 29, 2004 6:42 PM
Subject: question on design for ordering of field names written to
FieldInfos object


> Hi,
>
> I have a question about the following code from
> org.apache.lucene.index.SegmentMerger.  I would like to know if the
ordering
> of the fields as they are stored to the FieldInfos object is critical to
> some other purpose.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

question on design for ordering of field names written to FieldInfos object

Posted by Peter M Cipollone <lu...@bihvhar.com>.

Hi,

I have a question about the following code from
org.apache.lucene.index.SegmentMerger.  I would like to know if the ordering
of the fields as they are stored to the FieldInfos object is critical to
some other purpose.

In the code below (from a week+/- ago CVS pull), the fields are stored in
the following order:
1. fields indexed=true, termVectors=true
2. fields indexed=true, termVectors=false
3. fields stored=true, indexed=false

The reason I ask is because I am working on some functionality that will
require the order of fields to be immutable across merges.  At present
FieldInfos are created in two ways, one from a Document as it is merged into
an index, and again when index segments are merged.  They use two different
ordering mechanisms.

Thanks for your help.
Pete

private final int mergeFields() throws IOException {
    fieldInfos = new FieldInfos();    // merge field names
    int docCount = 0;
    for (int i = 0; i < readers.size(); i++) {
      IndexReader reader = (IndexReader) readers.elementAt(i);
1.      fieldInfos.addIndexed(reader.getIndexedFieldNames(true), true);
2.      fieldInfos.addIndexed(reader.getIndexedFieldNames(false), false);
3.      fieldInfos.add(reader.getFieldNames(false), false);
    }
    fieldInfos.write(directory, segment + ".fnm");


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Will CJKAnalyser be release with Lucene 1.4?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I'm not sure I understand your question.

At this point there is no plan to "release" the sandbox components -  
they are really there in a batteries-not-included fashion at this point  
but its all there to freely use if you like.

I did centralize the build system in contributions area, so each piece  
should easily build into JAR files.

Is there more that you desire?

	Erik

On May 29, 2004, at 3:24 PM, Che Dong wrote:

> Hi All:
> I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
> contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/
>
> The original version works fine at http://search.163.com  
> http://search.soufun.com and www.blogchina.com/weblucene/
>
> Regards
>
> Che Dong
> http://www.chedong.com/tech/lucene.html

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Will CJKAnalyser be release with Lucene 1.4?

Posted by Che Dong <ch...@hotmail.com>.

Hi All:
I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/

The original version works fine at http://search.163.com http://search.soufun.com and www.blogchina.com/weblucene/

Regards

Che Dong
http://www.chedong.com/tech/lucene.html

Re: Range Query Sombody HELP please

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Try my AnalysisDemo code on some filename field samples:

	http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

You mentioned earlier, I think, that you are using a custom analyzer.  
Give us the output of AnalysisDemo on some samples so we can see what 
is coming out.

If you can put together a 10-line Java program that uses RAMDirectory 
and has some sample hard-coded text that I can easily run standalone I 
would look into your situation further.  As it is, you are providing 
far more complexity than I have time to delve into.  Narrow it down to 
a very very simple example that we can all see in one screen.

	Erik


On May 31, 2004, at 7:47 AM, Karthik N S wrote:

> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
>     will fail When I search for the the Field type  "filename"
>     so,I still maintained it to be Text
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name     : B10181_P388
>
>
> 3)On Search for range between 2 file names  B10181_P702   to  
> B01081_P355
>     still returns me  0 hits  [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button +filename:[b10181_p702 TO b10181_p355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
>
> 				or
>
> D:\JAVA\lucene\src\demo>java com.controlnet.indexing.search.SearchFiles
> Search Keyword : +contents:button +filename:[b10181_p702 TO 
> b10181_p355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+contents:button +filename:[b10181_p702 TO b10181_p355]'] in 
> Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
>
>
> Also the does the search varies on the Field Type if so My Indexed 
> Field
> types as below....
>
> doc.add(Field.Text("path", fhtml.getPath()));
> doc.add(Field.Keyword("modified",fhtml.lastModified()+""));
> doc.add(Field.Text("filename",fhtml.getName()));
> doc.add(Field.Keyword("creation",CREATION_));
> doc.add(Field.Keyword("bookid",BOOKID_));
> doc.add(Field.Text("chapNme",CHAPNAME_));
> doc.add(Field.Text("itmName",ITEMNAME_));
>
>
>
> please do advise me.
> Karthik
>
>
>
> [ James Goslink says   Microsoft has More Money to burn then GOD has
>   ...on his visit to India,In an interview to MSNBC TV Last night ]
>
>
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Monday, May 31, 2004 2:52 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
>
> On Monday 31 May 2004 11:09, Karthik N S wrote:
>
> ...
>> I re indexed my folder 10181 [Seem's to be corrupted]
>
> Was the index writer closed?
>
>> Now I am getting the hits as....
>>
>>
>> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
>> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
>
> The query needs to have space before the 2nd + :
>
> +button  +filename:[B10181_P702 TO B01081_P355]
>
>> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
>> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
>> e:/indexer3/b10181/b10181_indx_
>> Not a Found document(s) that matched query Field 'filename':
>> Not a Found document(s) that matched query Field 'bookid':
>> Not a Found document(s) that matched query Field 'creation':
>> Not a Found document(s) that matched query Field 'contents':
>> Not a Found document(s) that matched query Field 'chapNme':
>> Not a Found document(s) that matched query Field 'itmName':
>
> You seem to use a search mechanism that searches all these fields.
> I'd recommend to switch this off until a query with explicit fields 
> works,
> eg.:
>
> +contents:button  +filename:[B10181_P702 TO B01081_P355]
>
> Btw. You'll need to make sure that a term like B10181_P702 is
> not split at the underscore _ by a tokenizer at indexing time.
> If your filename is not a keyword field, you might consider
> changing it into a keyword field.
>
> You seem to index book pages as Lucene documents, which is ok.
> However, you may also need to index larger parts of the books in
> order to retrieve books with multiple subjects on different pages.
> Is this what your original question is about?
>
> Have fun,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jun 1, 2004, at 8:10 AM, Karthik N S wrote:

> Hey Ype/Erick
>
>   Apologies please
>
>    I sent u guys some code as per mail
>    did u recieve it or shall i re send them.

I did not send it.  Please just copy/paste it into an e-mail to the 
list.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype/Erick

  Apologies please

   I sent u guys some code as per mail
   did u recieve it or shall i re send them.

with regards
Karthik

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 8:41 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


Karthik,

On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
>     will fail When I search for the the Field type  "filename"
>     so,I still maintained it to be Text

Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.

> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name     : B10181_P388
>
>
> 3)On Search for range between 2 file names  B10181_P702   to  B01081_P355
>     still returns me  0 hits  [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]

Could you try this:

+button +filename:[b10181_p355 TO b10181_p702]

?
If this does not work, please narrow your problem down to a java test
program
of 10-20 lines, and post the code.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Thursday 03 June 2004 07:10, Karthik N S wrote:
> Hey
>
>    Ype the Query  of range
>
>    +button +shirt +filename:[b10181_p100 TO b10181_p200]
>
>   did not work for me but on other way around
>
>   +(button OR shirt) +filename:[b10181_p100 TO b10181_p200]
>
>   resulted to me in 2 hits with either one term  "button / shirt "  in each
> page,but not both of them
>
>  I found from the Html file that both words are present  in more then 2
> files,
>
>  Are there any other possibilities  for getting both words.

Your index contains book pages as Lucene documents.
In this case you need to index larger parts of the books
as Lucene documents in order to retrieve books with multiple
subjects on different pages.


Kind regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey

   Ype the Query  of range

   +button +shirt +filename:[b10181_p100 TO b10181_p200]

  did not work for me but on other way around

  +(button OR shirt) +filename:[b10181_p100 TO b10181_p200]

  resulted to me in 2 hits with either one term  "button / shirt "  in each
page,but not both of them

 I found from the Html file that both words are present  in more then 2
files,

 Are there any other possibilities  for getting both words.


with regards
Karthik


-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, June 03, 2004 12:26 AM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
> On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
> > I still have 3 small Questions.
> >
> > 1)While creating the Range Query Is it possible for Lucene to do
> > somthing
> > similar..
> >
> >      +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
> >
> >      [Do you think this will work]  It's not on returning hits , but
> > it does
> > return hits with either one of them  "Shirt" or "button" Only.
>
> My guess is you have documents none of your documents in that range
> have button AND shirt in them.

You can also try this:

+button +shirt +filename:[b10181_p100 TO b10181_p200]

I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
> On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
> > I still have 3 small Questions.
> >
> > 1)While creating the Range Query Is it possible for Lucene to do
> > somthing
> > similar..
> >
> >      +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
> >
> >      [Do you think this will work]  It's not on returning hits , but
> > it does
> > return hits with either one of them  "Shirt" or "button" Only.
>
> My guess is you have documents none of your documents in that range
> have button AND shirt in them.

You can also try this:

+button +shirt +filename:[b10181_p100 TO b10181_p200]

I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
>
> Hey Ype/Erick

If you're gonna ask for help, the least ya could do is spell my name 
correctly :)

> I still have 3 small Questions.
>
> 1)While creating the Range Query Is it possible for Lucene to do 
> somthing
> similar..
>
>      +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
>
>      [Do you think this will work]  It's not on returning hits , but 
> it does
> return hits with either one of them  "Shirt" or "button" Only.

My guess is you have documents none of your documents in that range 
have button AND shirt in them.

> 2)When the indexer start indexing does it do according to alphabetic 
> order
> or is it some other way...

I don't understand the question, sorry.  Terms in the index are ordered 
lexicographically, if that is what you mean.

> 3)The Field Type  "Keyword"  is not accepting name of Files as it 
> indexes
>    [ Try indexing filenames and then do a search on them ,the hits will
> return u 0 defnitly,  lucene1.3-final version ]
>
>      doc.add(Field.Text("filename",file.getName()))
>           < --------------------  Will return Hits
>
>     doc.add(Field.Keyword("filename",file.getName()))
> <--------------------  Will Not return Hits
>
>
>  why???

Because of your analyzer.  Try indexing as a Keyword and search using a 
TermQuery.  Don't use QueryParser at first - it gets in the way of 
understanding what is really going on.  For fun, look at the .toString 
of the Query generated by QueryParser if you like.  Look at the 
AnalysisParalysis page on the wiki for more details.  Read my java.net 
articles to get a better understanding.   The short answer is that it 
is analysis that is bogging you down here.

You need to decide how to index file names on how you plan on querying 
for them.  We cannot answer this for you.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype/Erick

Thx in advance in helping me for the Range of Queries.
Finally I was able to trace the wrong process within my code and closed
them.

I still have 3 small Questions.

1)While creating the Range Query Is it possible for Lucene to do somthing
similar..

     +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]

     [Do you think this will work]  It's not on returning hits , but it does
return hits with either one of them  "Shirt" or "button" Only.

2)When the indexer start indexing does it do according to alphabetic order
or is it some other way...

3)The Field Type  "Keyword"  is not accepting name of Files as it indexes
   [ Try indexing filenames and then do a search on them ,the hits will
return u 0 defnitly,  lucene1.3-final version ]

     doc.add(Field.Text("filename",file.getName()))
          < --------------------  Will return Hits

    doc.add(Field.Keyword("filename",file.getName()))
<--------------------  Will Not return Hits


 why???



with regards
Karthik


On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
>     will fail When I search for the the Field type  "filename"
>     so,I still maintained it to be Text

Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.

> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name     : B10181_P388
>
>
> 3)On Search for range between 2 file names  B10181_P702   to  B01081_P355
>     still returns me  0 hits  [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]

Could you try this:

+button +filename:[b10181_p355 TO b10181_p702]

?
If this does not work, please narrow your problem down to a java test
program
of 10-20 lines, and post the code.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

Karthik,

On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
>     will fail When I search for the the Field type  "filename"
>     so,I still maintained it to be Text

Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.

> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name     : B10181_P388
>
>
> 3)On Search for range between 2 file names  B10181_P702   to  B01081_P355
>     still returns me  0 hits  [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]

Could you try this:

+button +filename:[b10181_p355 TO b10181_p702]

?
If this does not work, please narrow your problem down to a java test program
of 10-20 lines, and post the code.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype...

1) I switched Off the Multi search Senerio.

2) Changing the Field type from Text to Keyword
    will fail When I search for the the Field type  "filename"
    so,I still maintained it to be Text

D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : b10181_p388
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_

Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
Field :'filename'
File Name     : B10181_P388


3)On Search for range between 2 file names  B10181_P702   to  B01081_P355
    still returns me  0 hits  [Included space before the 2nd '+' ]

D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+button +filename:[b10181_p702 TO b10181_p355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':

				or

D:\JAVA\lucene\src\demo>java com.controlnet.indexing.search.SearchFiles
Search Keyword : +contents:button +filename:[b10181_p702 TO b10181_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+contents:button +filename:[b10181_p702 TO b10181_p355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':


Also the does the search varies on the Field Type if so My Indexed Field
types as below....

doc.add(Field.Text("path", fhtml.getPath()));
doc.add(Field.Keyword("modified",fhtml.lastModified()+""));
doc.add(Field.Text("filename",fhtml.getName()));
doc.add(Field.Keyword("creation",CREATION_));
doc.add(Field.Keyword("bookid",BOOKID_));
doc.add(Field.Text("chapNme",CHAPNAME_));
doc.add(Field.Text("itmName",ITEMNAME_));



please do advise me.
Karthik



[ James Goslink says   Microsoft has More Money to burn then GOD has
  ...on his visit to India,In an interview to MSNBC TV Last night ]



-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 2:52 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


On Monday 31 May 2004 11:09, Karthik N S wrote:

...
> I re indexed my folder 10181 [Seem's to be corrupted]

Was the index writer closed?

> Now I am getting the hits as....
>
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]

The query needs to have space before the 2nd + :

+button  +filename:[B10181_P702 TO B01081_P355]

> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
> Not a Found document(s) that matched query Field 'bookid':
> Not a Found document(s) that matched query Field 'creation':
> Not a Found document(s) that matched query Field 'contents':
> Not a Found document(s) that matched query Field 'chapNme':
> Not a Found document(s) that matched query Field 'itmName':

You seem to use a search mechanism that searches all these fields.
I'd recommend to switch this off until a query with explicit fields works,
eg.:

+contents:button  +filename:[B10181_P702 TO B01081_P355]

Btw. You'll need to make sure that a term like B10181_P702 is
not split at the underscore _ by a tokenizer at indexing time.
If your filename is not a keyword field, you might consider
changing it into a keyword field.

You seem to index book pages as Lucene documents, which is ok.
However, you may also need to index larger parts of the books in
order to retrieve books with multiple subjects on different pages.
Is this what your original question is about?

Have fun,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Monday 31 May 2004 11:09, Karthik N S wrote:

...
> I re indexed my folder 10181 [Seem's to be corrupted]

Was the index writer closed?

> Now I am getting the hits as....
>
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]

The query needs to have space before the 2nd + :

+button  +filename:[B10181_P702 TO B01081_P355]

> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
> Not a Found document(s) that matched query Field 'bookid':
> Not a Found document(s) that matched query Field 'creation':
> Not a Found document(s) that matched query Field 'contents':
> Not a Found document(s) that matched query Field 'chapNme':
> Not a Found document(s) that matched query Field 'itmName':

You seem to use a search mechanism that searches all these fields.
I'd recommend to switch this off until a query with explicit fields works,
eg.:

+contents:button  +filename:[B10181_P702 TO B01081_P355]

Btw. You'll need to make sure that a term like B10181_P702 is
not split at the underscore _ by a tokenizer at indexing time.
If your filename is not a keyword field, you might consider
changing it into a keyword field.

You seem to index book pages as Lucene documents, which is ok.
However, you may also need to index larger parts of the books in
order to retrieve books with multiple subjects on different pages.
Is this what your original question is about?

Have fun,
Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype

Sorry Once again Apologies for my last mail

I re indexed my folder 10181 [Seem's to be corrupted]
Now I am getting the hits as....


D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'contents':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':


204 Total milliseconds

D:\JAVA\lucene\src\demo>java java org.lucene.src.indexer.search.SearchFiles
Search Keyword : button+filename:[B10181_P702 TO B01081_P355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['button+filename:[B10181_P702 TO B01081_P355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'contents':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':


Is this Correct......
Or something still wrong as per Query parse String is concerned.



with regards
Karthik


-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 1:47 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


Karthik,

On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype

...
>
> My Question now is, If I want to Use Range Query  to  get search hits
> between
> fileName "B10181_P702"  and   "B10181_P355" only Instead of all the 67
hits
> ,
>
In this case there is no need to override range query, just use

+fileName:[B10181_P702 TO B10181_P355]

as part of the query.

Kind regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey YPE

 Apologies again

I did as per u'r  mail but see the  ERROR...

Search Keyword : +king+filename:[b10181_p702 TO b01081_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
The Exception Raised file = SearchFiles.searchIndx0
java.lang.NegativeArraySizeException
        at
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106)
        at
org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:82)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:141)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:120)
        at
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
        at org.apache.lucene.store.Lock$With.run(Lock.java:148)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
        at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:80)
        at
com.controlnet.indexing.search.SearchFiles.searchIndex0(SearchFiles.java:68)
        at
com.controlnet.indexing.search.SearchFiles.main(SearchFiles.java:240)

[Note the Field filename is in lower case not fileName ,sorry about that]

Am I doing some thing wrong in here........

With regards
Karthik


-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 1:47 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


Karthik,

On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype

...
>
> My Question now is, If I want to Use Range Query  to  get search hits
> between
> fileName "B10181_P702"  and   "B10181_P355" only Instead of all the 67
hits
> ,
>
In this case there is no need to override range query, just use

+fileName:[B10181_P702 TO B10181_P355]

as part of the query.

Kind regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

Karthik,

On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype

...
>
> My Question now is, If I want to Use Range Query  to  get search hits
> between
> fileName "B10181_P702"  and   "B10181_P355" only Instead of all the 67 hits
> ,
>
In this case there is no need to override range query, just use

+fileName:[B10181_P702 TO B10181_P355]

as part of the query.

Kind regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype

Apologies please

Have a look at the Search Factor  hits in the O/p sample of my indexed file


================== Start Searching ==========================
Search Keyword : king~
Source path [ E:/po/aaaa ] : e:/indexer/b10181
Query: ['king~'] in Folder e:/indexer/b10181/b10181_indx_

Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':

Found document(s) that matched : 'king~' no of hits :'67' in query Field
:'contents'

File Name                   : B10181_P703
File Path                     : E:\po\catalog\B10181\B10181_P703
Modified Date            : 1080036442000
Bookid                         : B10181
Chapter Name           :
Item Name                 :

File Name                    : B10181_P702
File Path                      : E:\po\catalog\B10181\B10181_P702
Modified Date             : 1080036442000
Bookid                          : B10181
Chapter Name            :
Item Name                  :

File Name                     : B10181_P512
File Path                       : E:\po\catalog\B10181\B10181_P512
Modified Date              : 1080036438000
Bookid                          : B10181
Chapter Name            :
Item Name                  :

File Name                    : B10181_P40
File Path                      : E:\po\catalog\B10181\B10181_P40
Modified Date             : 1080036444000
Bookid                          : B10181
Chapter Name            :
Item Name                  :

File Name                   : B10181_P355
File Path                     : E:\po\catalog\B10181\B10181_P355
Modified Date            : 1080036436000
Bookid                         : B10181
Chapter Name           :
Item Name                 :

File Name                   : B10181_P379
File Path                     : E:\po\catalog\B10181\B10181_P379
Modified Date            : 1080036436000
Bookid                         : B10181
Chapter Name           :
Item Name                 :


 .   .   .   .   .    .



328 Total milliseconds

================== End Searching ============


The o/p says a hit of 67 in total  [ I have sniped out most of them for view
case ] , the search word is present in field "Contents"
where the content part of html file is indexed.

If u see the Field  " File Name" it's Unique and is  indexed/ Viewed /  as
per Windows O/s Explore case.

My Question now is, If I want to Use Range Query  to  get search hits
between
fileName "B10181_P702"  and   "B10181_P355" only Instead of all the 67 hits
,

How Do I do it [Please state with clear Example or send me an attachement
for the same ,
  I overrided  the getRange() Query  method as per u'r last mail ,but still
not able to achive the Results  ].




with regards
Karthik









-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Saturday, May 29, 2004 12:10 AM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please


On Friday 28 May 2004 10:54, Karthik N S wrote:
> Hey ype
>
> Thx for the advice but still I need to get the  exact situation working ,
>
> 1) I have a unique Field [ called filename ] which is indexed of type
Text.
>     It accepts the name of the HTML files as  the indexing parameter ,
>    Also there is another Field called "Contents"   which stores all the
> contents of that
>    indicated unique named html file.
>
> 2) The indexer complete indexes for about 5000 html files  sucessfully .
>
> 3) When I do a search for word ,it returns a hit of  400  on various html
> files
>
> Now in this situation if I want to limit the hits  between  First 200  to
> 400  html Page Names  only
> what exactly should I do to using getRange() method.

A range query will provide a range of indexed values, and
I thought you needed to add the record number as an indexed field
in each record.

However, you seem to use the 200 and 400 here as the order number
for each record in the result of the query on the Contents field.
Is that correct?
When so, in which order do you expect the results of your query?

Kind regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Friday 28 May 2004 10:54, Karthik N S wrote:
> Hey ype
>
> Thx for the advice but still I need to get the  exact situation working ,
>
> 1) I have a unique Field [ called filename ] which is indexed of type Text.
>     It accepts the name of the HTML files as  the indexing parameter ,
>    Also there is another Field called "Contents"   which stores all the
> contents of that
>    indicated unique named html file.
>
> 2) The indexer complete indexes for about 5000 html files  sucessfully .
>
> 3) When I do a search for word ,it returns a hit of  400  on various html
> files
>
> Now in this situation if I want to limit the hits  between  First 200  to
> 400  html Page Names  only
> what exactly should I do to using getRange() method.

A range query will provide a range of indexed values, and
I thought you needed to add the record number as an indexed field
in each record.

However, you seem to use the 200 and 400 here as the order number
for each record in the result of the query on the Contents field.
Is that correct?
When so, in which order do you expect the results of your query?

Kind regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Erik

  Apologies again

 [ You probably do not want to use Field.Text for a filename.  Use
Field.Keyword instead. ]

1)  When changing the Field type from Text to Keyword, I do not get the hits
at all
     [Since most of parameters avaliable to this  Field  are of String type
... file[i].getName() ]

2) On successfull Indexing  the search   hits retun me  400  numbers on
various html files
     presence of the SearchWord in content Field.

3) If I have to limit the hits between file name (file[100].getName()   and
file[200].getName() )
    on the Field  type  "filename" for the SearchedWord.

I did the way YPE  advised in his last mail but still no improvement in
sitution.

I need to get  hit samples  in between the  2 files  [ 100 files  between ]
 and not the max no of hits.

Please  advise me
How do I proceed....

4)
  I Installed luke [ via Java webstart ]  from
http://www.getopt.org/luke/webstart.html
  but since my Index files are built on a custom made Analyzer [ not the set
of standard analyzer avaliable from drop box] ,
 Will it search the index  for the same.

with regards
Karthik

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, May 28, 2004 3:38 PM
To: Lucene Users List
Subject: Re: Range Query Sombody HELP please

On May 28, 2004, at 4:54 AM, Karthik N S wrote:
> 1) I have a unique Field [ called filename ] which is indexed of type
> Text.

You probably do not want to use Field.Text for a filename.  Use
Field.Keyword instead.

> 2) The indexer complete indexes for about 5000 html files  sucessfully
> .

Now use Luke (Google for _luke lucene_) to browse your index, and check
that you are getting what you think.  You can do ad-hoc queries there
also.

> Now in this situation if I want to limit the hits  between  First 200
> to
> 400  html Page Names  only
> what exactly should I do to using getRange() method.

If you want the first 200 - 400, start your Hits walking at index 200,
and proceed through 400.

Is there some field you want to key off to do the range?  Or do you
just want the 200th - 400th hits from the search, which is an entirely
different question than about ranges.

> Please advise on how to proceed ...

Please send (succinct) code examples in the future to really keep this
discussion concrete and clear.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 28, 2004, at 4:54 AM, Karthik N S wrote:
> 1) I have a unique Field [ called filename ] which is indexed of type 
> Text.

You probably do not want to use Field.Text for a filename.  Use 
Field.Keyword instead.

> 2) The indexer complete indexes for about 5000 html files  sucessfully 
> .

Now use Luke (Google for _luke lucene_) to browse your index, and check 
that you are getting what you think.  You can do ad-hoc queries there 
also.

> Now in this situation if I want to limit the hits  between  First 200  
> to
> 400  html Page Names  only
> what exactly should I do to using getRange() method.

If you want the first 200 - 400, start your Hits walking at index 200, 
and proceed through 400.

Is there some field you want to key off to do the range?  Or do you 
just want the 200th - 400th hits from the search, which is an entirely 
different question than about ranges.

> Please advise on how to proceed ...

Please send (succinct) code examples in the future to really keep this 
discussion concrete and clear.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey ype

Thx for the advice but still I need to get the  exact situation working ,

1) I have a unique Field [ called filename ] which is indexed of type Text.
    It accepts the name of the HTML files as  the indexing parameter ,
   Also there is another Field called "Contents"   which stores all the
contents of that
   indicated unique named html file.

2) The indexer complete indexes for about 5000 html files  sucessfully .

3) When I do a search for word ,it returns a hit of  400  on various html
files

Now in this situation if I want to limit the hits  between  First 200  to
400  html Page Names  only
what exactly should I do to using getRange() method.

Please advise on how to proceed ...

with regards
Karthik

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Friday, May 28, 2004 1:14 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please

Karthik,

On Friday 28 May 2004 05:54, Karthik N S wrote:

...
> Weh we do a search in SQL  using '*' we all know that the result would be
> total no of records in the table,but when  we want to get limit our record
> we apply  range between 2 specific row records [Which we call it as
> subsearch]
>
>
>    Similarly  on a indexed  record  I would like perform the same tecnique
> as above.

In case you need to reuse the limitation a filter is the way to go in
Lucene.
However it seems to be better to get the range query working first.

>   In fact I was looking at the url u sent me in the last mail on using
> getRange Queries
>  and was working on the same
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

The query I gave uses two +'s prefixed to the query parts:

+search_word +(book:[100 TO 200])

Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.

> and
>
> http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
>
> but witou results for the last 12 hrs.

You have probably seen a lot of different things that will be useful later.

> If u could spare a few minuts and please expalin or provide a simple  [
> full ] example using and
> over riding the  getRange() method .

The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:

As Erik
indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.

Then in the getRangeQuery() method of your parser you'll need to prefix the
queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.

When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end,
boolean inclusive)
it will be called for the example query with  start = "100" and end = "200".

(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).

In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical
fields
referred to above.

Have fun, you'll get it working,

Ype

> with regards
> Karthik
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Thursday, May 27, 2004 11:03 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
> On Thursday 27 May 2004 09:37, Karthik N S wrote:
> > Hi
> >    Lucene -Developer My main intention was
> >
> >  Search for an word hit  in a Unique Field  between  ranges     say
> > book100  - book 200  indexed numbers
> >  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
...
> Could you explain what you mean by subsearch?
> I suppose you might want to have a look at the various filter classes
> in the org.apache.lucene.search package.
>
> Regards,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

Karthik,

On Friday 28 May 2004 05:54, Karthik N S wrote:

...
> Weh we do a search in SQL  using '*' we all know that the result would be
> total no of records in the table,but when  we want to get limit our record
> we apply  range between 2 specific row records [Which we call it as
> subsearch]
>
>
>    Similarly  on a indexed  record  I would like perform the same tecnique
> as above.

In case you need to reuse the limitation a filter is the way to go in Lucene.
However it seems to be better to get the range query working first.

>   In fact I was looking at the url u sent me in the last mail on using
> getRange Queries
>  and was working on the same
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

The query I gave uses two +'s prefixed to the query parts:

+search_word +(book:[100 TO 200])

Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.

> and
>
> http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
>
> but witou results for the last 12 hrs.

You have probably seen a lot of different things that will be useful later.

> If u could spare a few minuts and please expalin or provide a simple  [
> full ] example using and
> over riding the  getRange() method .

The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:

As Erik indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.

Then in the getRangeQuery() method of your parser you'll need to prefix the queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.

When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end, boolean inclusive)
it will be called for the example query with  start = "100" and end = "200".

(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).

In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical fields
referred to above.

Have fun, you'll get it working,

Ype

> with regards
> Karthik
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Thursday, May 27, 2004 11:03 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
> On Thursday 27 May 2004 09:37, Karthik N S wrote:
> > Hi
> >    Lucene -Developer My main intention was
> >
> >  Search for an word hit  in a Unique Field  between  ranges     say
> > book100  - book 200  indexed numbers
> >  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
...
> Could you explain what you mean by subsearch?
> I suppose you might want to have a look at the various filter classes
> in the org.apache.lucene.search package.
>
> Regards,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hey Ype

     Apologies for the misconduct.

Weh we do a search in SQL  using '*' we all know that the result would be
total no of records in the table,but when  we want to get limit our record
we apply  range between 2 specific row records [Which we call it as
subsearch]

   Similarly  on a indexed  record  I would like perform the same tecnique
as above.
  In fact I was looking at the url u sent me in the last mail on using
getRange Queries
 and was working on the same

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

and

http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

but witou results for the last 12 hrs.

If u could spare a few minuts and please expalin or provide a simple  [
full ] example using and
over riding the  getRange() method .

with regards
Karthik

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, May 27, 2004 11:03 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please

On Thursday 27 May 2004 09:37, Karthik N S wrote:
> Hi
>    Lucene -Developer My main intention was
>
>  Search for an word hit  in a Unique Field  between  ranges     say
> book100  - book 200  indexed numbers
>  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

You don't need to shout (uppercase), I've been teaching SQL.

Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.

Regards,
Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Thursday 27 May 2004 09:37, Karthik N S wrote:
> Hi
>    Lucene -Developer My main intention was
>
>  Search for an word hit  in a Unique Field  between  ranges     say
> book100  - book 200  indexed numbers
>  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

You don't need to shout (uppercase), I've been teaching SQL.

Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 27, 2004, at 3:37 AM, Karthik N S wrote:
> Hi
>    Lucene -Developer My main intention was
>
>  Search for an word hit  in a Unique Field  between  ranges     say
> book100  - book 200  indexed numbers
>  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
>
>   This is similar to a SQL =
>
>      select  *  from BOOKSHELF.
>                  or
>      select  *  from BOOKSHELF where  book1  between 100 and  200.

Karthik - I'm having a hard time understanding your questions 
unfortunately.  Ype replied with solution suggestion by overriding 
getRangeQuery on a custom QueryParser subclass.  You need to ensure you 
are indexing numbers in a padded fashion:

	http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Range Query Sombody HELP please

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi
   Lucene -Developer My main intention was

 Search for an word hit  in a Unique Field  between  ranges     say
book100  - book 200  indexed numbers
 It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

  This is similar to a SQL =

     select  *  from BOOKSHELF.
                 or
     select  *  from BOOKSHELF where  book1  between 100 and  200.

with regards
Karthik

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, May 27, 2004 12:46 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please

On Thursday 27 May 2004 07:00, Karthik N S wrote:
> Hi
> Lucene developers
>
> Is it possible to do Search and retrieve relevant information on the
> Indexed Document
> within in specific range settings which may be  similar to an
>
> Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100 and
> 200
>
> ex:-
>
>        "search_word"  ,   Book between  100   AND   200
>
> [ Note:- where Book uniquefield  hit info which is already Indexed ]

The query parser can construct this query for you (assuming search_word
is in the query default field):

+search_word +(book:[100 TO 200])

See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with
sufficient
zero's prefixed and add these prefix zero's in the query.

Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.

Have fun,
Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Ype Kingma <yk...@xs4all.nl>.

On Thursday 27 May 2004 07:00, Karthik N S wrote:
> Hi
> Lucene developers
>
> Is it possible to do Search and retrieve relevant information on the
> Indexed Document
> within in specific range settings which may be  similar to an
>
> Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100 and
> 200
>
> ex:-
>
>        "search_word"  ,   Book between  100   AND   200
>
> [ Note:- where Book uniquefield  hit info which is already Indexed ]

The query parser can construct this query for you (assuming search_word
is in the query default field):

+search_word +(book:[100 TO 200])

See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with sufficient
zero's prefixed and add these prefix zero's in the query.

Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.

Have fun,
Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Range Query Sombody HELP please

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Karthik, namaste!

I seem to be getting multiple copies of your email.
I received 4 copies of this email.

Could you please limit things to 1 message per subject?
I get hundreds of messages every day as is. :(

Thank you,
Otis

--- Karthik N S <ka...@controlnet.co.in> wrote:
> 
> Hi
> Lucene developers
> 
> Is it possible to do Search and retrieve relevant information on the
> Indexed
> Document
> within in specific range settings which may be  similar to an
> 
> Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100
> and 200
> 
> ex:-
> 
>        "search_word"  ,   Book between  100   AND   200
> 
> [ Note:- where Book uniquefield  hit info which is already Indexed ]
> 
> 
> Sombody Please Help me   :(
> 
> 
> with regards
> Karthik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org