You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2004/05/27 07:00:27 UTC
Range Query Sombody HELP please
Hi
Lucene developers
Is it possible to do Search and retrieve relevant information on the Indexed
Document
within in specific range settings which may be similar to an
Query in SQL = select * from BOOKSHELF where book1 between 100 and 200
ex:-
"search_word" , Book between 100 AND 200
[ Note:- where Book uniquefield hit info which is already Indexed ]
Sombody Please Help me :(
with regards
Karthik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?
Posted by Che Dong <ch...@hotmail.com>.
> I would be against such a move. I think Lucene's core has too many
> analyzers in it already, such as the German and Russian ones. The core
> could do without any of the concrete analyzers altogether, in my
> opinion - but it is handy to have a few general purpose convenience
> ones.
+1
>
> What benefit, besides convenience, would there be in CJKAnalyzer into
> the core? What about the all the others in the sandbox? If we bring
> one in, why not all of them?
but for CJK there is no space for word segment in nature. so the Bigram Co-occurrences will be the better way for Word Discrimination.
For example: term C1C2 if segment into C1 and C2 the results will contains C2C1... but in Chinese, the word C1C2 and C2C1 maybe in different meaning.
compare to the the sigram base tokenizer implement in StandardTokenizer the bigram based token will return MUCH better results.
According to my feed back on CJKTokenizer:
for CJK users, the bigram based CJKTokenzier was strongly recommended for better results.
for more:
Word Discrimination Based on Bigram Co-occurrences
... There is a match routine that detects any common segment between the target word and each of the ... The entries
of the matrix indicate whether a reference word and a lexicon word share at least one n- gram ... It also
shows the bigram match list for an unknown word generated by the feature-matching process ...
www.ecse.rpi.edu/homepages/nagy/PDF_files/ ElNasan-Nagy-ICDAR01.pdf
Segmenting Chinese in Unicode
... However, to date no in-depth analysis has been performed analyzing the deficiencies in segmentation
that lead to the improved performance of the simpler bigram methods. ... The part-of-speech of the segment
and the ... A study on integrating Chinese word segmentation and part-of-speech tagging. ...
www.basistech.com/papers/chinese/iuc-16-paper.pdf
>
> It has been brought up to bring in the SnowballAnalyzer - as it
> actually is general purpose and spans many languages. I'm not really
> for bringing that one in either.
>
> I'm but one voice and would not veto bringing in other analyzers, I
> just don't think there is much benefit, especially if we improve the
> release process to incorporate the sandbox goodies into a single
> distribution but as separate JARs.
>
> Erik
Thank you, Erik. Hope we can more communications on this issue with other east Asian Luaguage users.
Che Dong
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?
Posted by Che Dong <ch...@hotmail.com>.
> I would be against such a move. I think Lucene's core has too many
> analyzers in it already, such as the German and Russian ones. The core
> could do without any of the concrete analyzers altogether, in my
> opinion - but it is handy to have a few general purpose convenience
> ones.
+1
>
> What benefit, besides convenience, would there be in CJKAnalyzer into
> the core? What about the all the others in the sandbox? If we bring
> one in, why not all of them?
but for CJK there is no space for word segment in nature. so the Bigram Co-occurrences will be the better way for Word Discrimination.
For example: term C1C2 if segment into C1 and C2 the results will contains C2C1... but in Chinese, the word C1C2 and C2C1 maybe in different meaning.
compare to the the sigram base tokenizer implement in StandardTokenizer the bigram based token will return MUCH better results.
According to my feed back on CJKTokenizer:
for CJK users, the bigram based CJKTokenzier was strongly recommended for better results.
for more:
Word Discrimination Based on Bigram Co-occurrences
... There is a match routine that detects any common segment between the target word and each of the ... The entries
of the matrix indicate whether a reference word and a lexicon word share at least one n- gram ... It also
shows the bigram match list for an unknown word generated by the feature-matching process ...
www.ecse.rpi.edu/homepages/nagy/PDF_files/ ElNasan-Nagy-ICDAR01.pdf
Segmenting Chinese in Unicode
... However, to date no in-depth analysis has been performed analyzing the deficiencies in segmentation
that lead to the improved performance of the simpler bigram methods. ... The part-of-speech of the segment
and the ... A study on integrating Chinese word segmentation and part-of-speech tagging. ...
www.basistech.com/papers/chinese/iuc-16-paper.pdf
>
> It has been brought up to bring in the SnowballAnalyzer - as it
> actually is general purpose and spans many languages. I'm not really
> for bringing that one in either.
>
> I'm but one voice and would not veto bringing in other analyzers, I
> just don't think there is much benefit, especially if we improve the
> release process to incorporate the sandbox goodies into a single
> distribution but as separate JARs.
>
> Erik
Thank you, Erik. Hope we can more communications on this issue with other east Asian Luaguage users.
Che Dong
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
Re: Will CJKAnalyser be release with Lucene 1.4?
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 29, 2004, at 11:37 PM, Che Dong wrote:
> Is it possable move CJKAnalyser out of sandbox to jakarta-lucene
> package?
I would be against such a move. I think Lucene's core has too many
analyzers in it already, such as the German and Russian ones. The core
could do without any of the concrete analyzers altogether, in my
opinion - but it is handy to have a few general purpose convenience
ones.
What benefit, besides convenience, would there be in CJKAnalyzer into
the core? What about the all the others in the sandbox? If we bring
one in, why not all of them?
It has been brought up to bring in the SnowballAnalyzer - as it
actually is general purpose and spans many languages. I'm not really
for bringing that one in either.
I'm but one voice and would not veto bringing in other analyzers, I
just don't think there is much benefit, especially if we improve the
release process to incorporate the sandbox goodies into a single
distribution but as separate JARs.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Will CJKAnalyser be release with Lucene 1.4?
Posted by Che Dong <ch...@hotmail.com>.
Hi Erik:
Is it possable move CJKAnalyser out of sandbox to jakarta-lucene package?
Regards
Che Dong
----- Original Message -----
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Sunday, May 30, 2004 5:17 AM
Subject: Re: Will CJKAnalyser be release with Lucene 1.4?
> I'm not sure I understand your question.
>
> At this point there is no plan to "release" the sandbox components -
> they are really there in a batteries-not-included fashion at this point
> but its all there to freely use if you like.
>
> I did centralize the build system in contributions area, so each piece
> should easily build into JAR files.
>
> Is there more that you desire?
>
> Erik
>
>
> On May 29, 2004, at 3:24 PM, Che Dong wrote:
>
> > Hi All:
> > I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
> > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> > contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/
> >
> > The original version works fine at http://search.163.com
> > http://search.soufun.com and www.blogchina.com/weblucene/
> >
> > Regards
> >
> > Che Dong
> > http://www.chedong.com/tech/lucene.html
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
Re: question on design for ordering of field names written to FieldInfos object
Posted by Peter M Cipollone <lu...@bihvhar.com>.
sorry. I meant to send this to the dev list...
----- Original Message -----
From: "Peter M Cipollone" <lu...@bihvhar.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, May 29, 2004 6:42 PM
Subject: question on design for ordering of field names written to
FieldInfos object
> Hi,
>
> I have a question about the following code from
> org.apache.lucene.index.SegmentMerger. I would like to know if the
ordering
> of the fields as they are stored to the FieldInfos object is critical to
> some other purpose.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
question on design for ordering of field names written to FieldInfos object
Posted by Peter M Cipollone <lu...@bihvhar.com>.
Hi,
I have a question about the following code from
org.apache.lucene.index.SegmentMerger. I would like to know if the ordering
of the fields as they are stored to the FieldInfos object is critical to
some other purpose.
In the code below (from a week+/- ago CVS pull), the fields are stored in
the following order:
1. fields indexed=true, termVectors=true
2. fields indexed=true, termVectors=false
3. fields stored=true, indexed=false
The reason I ask is because I am working on some functionality that will
require the order of fields to be immutable across merges. At present
FieldInfos are created in two ways, one from a Document as it is merged into
an index, and again when index segments are merged. They use two different
ordering mechanisms.
Thanks for your help.
Pete
private final int mergeFields() throws IOException {
fieldInfos = new FieldInfos(); // merge field names
int docCount = 0;
for (int i = 0; i < readers.size(); i++) {
IndexReader reader = (IndexReader) readers.elementAt(i);
1. fieldInfos.addIndexed(reader.getIndexedFieldNames(true), true);
2. fieldInfos.addIndexed(reader.getIndexedFieldNames(false), false);
3. fieldInfos.add(reader.getFieldNames(false), false);
}
fieldInfos.write(directory, segment + ".fnm");
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Will CJKAnalyser be release with Lucene 1.4?
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I'm not sure I understand your question.
At this point there is no plan to "release" the sandbox components -
they are really there in a batteries-not-included fashion at this point
but its all there to freely use if you like.
I did centralize the build system in contributions area, so each piece
should easily build into JAR files.
Is there more that you desire?
Erik
On May 29, 2004, at 3:24 PM, Che Dong wrote:
> Hi All:
> I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/
>
> The original version works fine at http://search.163.com
> http://search.soufun.com and www.blogchina.com/weblucene/
>
> Regards
>
> Che Dong
> http://www.chedong.com/tech/lucene.html
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Will CJKAnalyser be release with Lucene 1.4?
Posted by Che Dong <ch...@hotmail.com>.
Hi All:
I checked the org/apache/lucene/analysis/cjk/ in lucene sandbox:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cjk/
The original version works fine at http://search.163.com http://search.soufun.com and www.blogchina.com/weblucene/
Regards
Che Dong
http://www.chedong.com/tech/lucene.html
Re: Range Query Sombody HELP please
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Try my AnalysisDemo code on some filename field samples:
http://wiki.apache.org/jakarta-lucene/AnalysisParalysis
You mentioned earlier, I think, that you are using a custom analyzer.
Give us the output of AnalysisDemo on some samples so we can see what
is coming out.
If you can put together a 10-line Java program that uses RAMDirectory
and has some sample hard-coded text that I can easily run standalone I
would look into your situation further. As it is, you are providing
far more complexity than I have time to delve into. Narrow it down to
a very very simple example that we can all see in one screen.
Erik
On May 31, 2004, at 7:47 AM, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
> will fail When I search for the the Field type "filename"
> so,I still maintained it to be Text
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name : B10181_P388
>
>
> 3)On Search for range between 2 file names B10181_P702 to
> B01081_P355
> still returns me 0 hits [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button +filename:[b10181_p702 TO b10181_p355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
>
> or
>
> D:\JAVA\lucene\src\demo>java com.controlnet.indexing.search.SearchFiles
> Search Keyword : +contents:button +filename:[b10181_p702 TO
> b10181_p355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+contents:button +filename:[b10181_p702 TO b10181_p355]'] in
> Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
>
>
> Also the does the search varies on the Field Type if so My Indexed
> Field
> types as below....
>
> doc.add(Field.Text("path", fhtml.getPath()));
> doc.add(Field.Keyword("modified",fhtml.lastModified()+""));
> doc.add(Field.Text("filename",fhtml.getName()));
> doc.add(Field.Keyword("creation",CREATION_));
> doc.add(Field.Keyword("bookid",BOOKID_));
> doc.add(Field.Text("chapNme",CHAPNAME_));
> doc.add(Field.Text("itmName",ITEMNAME_));
>
>
>
> please do advise me.
> Karthik
>
>
>
> [ James Goslink says Microsoft has More Money to burn then GOD has
> ...on his visit to India,In an interview to MSNBC TV Last night ]
>
>
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Monday, May 31, 2004 2:52 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
>
> On Monday 31 May 2004 11:09, Karthik N S wrote:
>
> ...
>> I re indexed my folder 10181 [Seem's to be corrupted]
>
> Was the index writer closed?
>
>> Now I am getting the hits as....
>>
>>
>> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
>> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
>
> The query needs to have space before the 2nd + :
>
> +button +filename:[B10181_P702 TO B01081_P355]
>
>> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
>> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
>> e:/indexer3/b10181/b10181_indx_
>> Not a Found document(s) that matched query Field 'filename':
>> Not a Found document(s) that matched query Field 'bookid':
>> Not a Found document(s) that matched query Field 'creation':
>> Not a Found document(s) that matched query Field 'contents':
>> Not a Found document(s) that matched query Field 'chapNme':
>> Not a Found document(s) that matched query Field 'itmName':
>
> You seem to use a search mechanism that searches all these fields.
> I'd recommend to switch this off until a query with explicit fields
> works,
> eg.:
>
> +contents:button +filename:[B10181_P702 TO B01081_P355]
>
> Btw. You'll need to make sure that a term like B10181_P702 is
> not split at the underscore _ by a tokenizer at indexing time.
> If your filename is not a keyword field, you might consider
> changing it into a keyword field.
>
> You seem to index book pages as Lucene documents, which is ok.
> However, you may also need to index larger parts of the books in
> order to retrieve books with multiple subjects on different pages.
> Is this what your original question is about?
>
> Have fun,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 1, 2004, at 8:10 AM, Karthik N S wrote:
> Hey Ype/Erick
>
> Apologies please
>
> I sent u guys some code as per mail
> did u recieve it or shall i re send them.
I did not send it. Please just copy/paste it into an e-mail to the
list.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype/Erick
Apologies please
I sent u guys some code as per mail
did u recieve it or shall i re send them.
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 8:41 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
Karthik,
On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
> will fail When I search for the the Field type "filename"
> so,I still maintained it to be Text
Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name : B10181_P388
>
>
> 3)On Search for range between 2 file names B10181_P702 to B01081_P355
> still returns me 0 hits [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
Could you try this:
+button +filename:[b10181_p355 TO b10181_p702]
?
If this does not work, please narrow your problem down to a java test
program
of 10-20 lines, and post the code.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Thursday 03 June 2004 07:10, Karthik N S wrote:
> Hey
>
> Ype the Query of range
>
> +button +shirt +filename:[b10181_p100 TO b10181_p200]
>
> did not work for me but on other way around
>
> +(button OR shirt) +filename:[b10181_p100 TO b10181_p200]
>
> resulted to me in 2 hits with either one term "button / shirt " in each
> page,but not both of them
>
> I found from the Html file that both words are present in more then 2
> files,
>
> Are there any other possibilities for getting both words.
Your index contains book pages as Lucene documents.
In this case you need to index larger parts of the books
as Lucene documents in order to retrieve books with multiple
subjects on different pages.
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey
Ype the Query of range
+button +shirt +filename:[b10181_p100 TO b10181_p200]
did not work for me but on other way around
+(button OR shirt) +filename:[b10181_p100 TO b10181_p200]
resulted to me in 2 hits with either one term "button / shirt " in each
page,but not both of them
I found from the Html file that both words are present in more then 2
files,
Are there any other possibilities for getting both words.
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, June 03, 2004 12:26 AM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
> On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
> > I still have 3 small Questions.
> >
> > 1)While creating the Range Query Is it possible for Lucene to do
> > somthing
> > similar..
> >
> > +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
> >
> > [Do you think this will work] It's not on returning hits , but
> > it does
> > return hits with either one of them "Shirt" or "button" Only.
>
> My guess is you have documents none of your documents in that range
> have button AND shirt in them.
You can also try this:
+button +shirt +filename:[b10181_p100 TO b10181_p200]
I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
> On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
> > I still have 3 small Questions.
> >
> > 1)While creating the Range Query Is it possible for Lucene to do
> > somthing
> > similar..
> >
> > +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
> >
> > [Do you think this will work] It's not on returning hits , but
> > it does
> > return hits with either one of them "Shirt" or "button" Only.
>
> My guess is you have documents none of your documents in that range
> have button AND shirt in them.
You can also try this:
+button +shirt +filename:[b10181_p100 TO b10181_p200]
I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
>
> Hey Ype/Erick
If you're gonna ask for help, the least ya could do is spell my name
correctly :)
> I still have 3 small Questions.
>
> 1)While creating the Range Query Is it possible for Lucene to do
> somthing
> similar..
>
> +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
>
> [Do you think this will work] It's not on returning hits , but
> it does
> return hits with either one of them "Shirt" or "button" Only.
My guess is you have documents none of your documents in that range
have button AND shirt in them.
> 2)When the indexer start indexing does it do according to alphabetic
> order
> or is it some other way...
I don't understand the question, sorry. Terms in the index are ordered
lexicographically, if that is what you mean.
> 3)The Field Type "Keyword" is not accepting name of Files as it
> indexes
> [ Try indexing filenames and then do a search on them ,the hits will
> return u 0 defnitly, lucene1.3-final version ]
>
> doc.add(Field.Text("filename",file.getName()))
> < -------------------- Will return Hits
>
> doc.add(Field.Keyword("filename",file.getName()))
> <-------------------- Will Not return Hits
>
>
> why???
Because of your analyzer. Try indexing as a Keyword and search using a
TermQuery. Don't use QueryParser at first - it gets in the way of
understanding what is really going on. For fun, look at the .toString
of the Query generated by QueryParser if you like. Look at the
AnalysisParalysis page on the wiki for more details. Read my java.net
articles to get a better understanding. The short answer is that it
is analysis that is bogging you down here.
You need to decide how to index file names on how you plan on querying
for them. We cannot answer this for you.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype/Erick
Thx in advance in helping me for the Range of Queries.
Finally I was able to trace the wrong process within my code and closed
them.
I still have 3 small Questions.
1)While creating the Range Query Is it possible for Lucene to do somthing
similar..
+(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
[Do you think this will work] It's not on returning hits , but it does
return hits with either one of them "Shirt" or "button" Only.
2)When the indexer start indexing does it do according to alphabetic order
or is it some other way...
3)The Field Type "Keyword" is not accepting name of Files as it indexes
[ Try indexing filenames and then do a search on them ,the hits will
return u 0 defnitly, lucene1.3-final version ]
doc.add(Field.Text("filename",file.getName()))
< -------------------- Will return Hits
doc.add(Field.Keyword("filename",file.getName()))
<-------------------- Will Not return Hits
why???
with regards
Karthik
On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
> will fail When I search for the the Field type "filename"
> so,I still maintained it to be Text
Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name : B10181_P388
>
>
> 3)On Search for range between 2 file names B10181_P702 to B01081_P355
> still returns me 0 hits [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
Could you try this:
+button +filename:[b10181_p355 TO b10181_p702]
?
If this does not work, please narrow your problem down to a java test
program
of 10-20 lines, and post the code.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
Karthik,
On Monday 31 May 2004 13:47, Karthik N S wrote:
> Hey Ype...
>
> 1) I switched Off the Multi search Senerio.
>
> 2) Changing the Field type from Text to Keyword
> will fail When I search for the the Field type "filename"
> so,I still maintained it to be Text
Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : b10181_p388
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
>
> Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
> Field :'filename'
> File Name : B10181_P388
>
>
> 3)On Search for range between 2 file names B10181_P702 to B01081_P355
> still returns me 0 hits [Included space before the 2nd '+' ]
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
Could you try this:
+button +filename:[b10181_p355 TO b10181_p702]
?
If this does not work, please narrow your problem down to a java test program
of 10-20 lines, and post the code.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype...
1) I switched Off the Multi search Senerio.
2) Changing the Field type from Text to Keyword
will fail When I search for the the Field type "filename"
so,I still maintained it to be Text
D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : b10181_p388
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_
Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
Field :'filename'
File Name : B10181_P388
3)On Search for range between 2 file names B10181_P702 to B01081_P355
still returns me 0 hits [Included space before the 2nd '+' ]
D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+button +filename:[b10181_p702 TO b10181_p355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
or
D:\JAVA\lucene\src\demo>java com.controlnet.indexing.search.SearchFiles
Search Keyword : +contents:button +filename:[b10181_p702 TO b10181_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+contents:button +filename:[b10181_p702 TO b10181_p355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Also the does the search varies on the Field Type if so My Indexed Field
types as below....
doc.add(Field.Text("path", fhtml.getPath()));
doc.add(Field.Keyword("modified",fhtml.lastModified()+""));
doc.add(Field.Text("filename",fhtml.getName()));
doc.add(Field.Keyword("creation",CREATION_));
doc.add(Field.Keyword("bookid",BOOKID_));
doc.add(Field.Text("chapNme",CHAPNAME_));
doc.add(Field.Text("itmName",ITEMNAME_));
please do advise me.
Karthik
[ James Goslink says Microsoft has More Money to burn then GOD has
...on his visit to India,In an interview to MSNBC TV Last night ]
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 2:52 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
On Monday 31 May 2004 11:09, Karthik N S wrote:
...
> I re indexed my folder 10181 [Seem's to be corrupted]
Was the index writer closed?
> Now I am getting the hits as....
>
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
The query needs to have space before the 2nd + :
+button +filename:[B10181_P702 TO B01081_P355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
> Not a Found document(s) that matched query Field 'bookid':
> Not a Found document(s) that matched query Field 'creation':
> Not a Found document(s) that matched query Field 'contents':
> Not a Found document(s) that matched query Field 'chapNme':
> Not a Found document(s) that matched query Field 'itmName':
You seem to use a search mechanism that searches all these fields.
I'd recommend to switch this off until a query with explicit fields works,
eg.:
+contents:button +filename:[B10181_P702 TO B01081_P355]
Btw. You'll need to make sure that a term like B10181_P702 is
not split at the underscore _ by a tokenizer at indexing time.
If your filename is not a keyword field, you might consider
changing it into a keyword field.
You seem to index book pages as Lucene documents, which is ok.
However, you may also need to index larger parts of the books in
order to retrieve books with multiple subjects on different pages.
Is this what your original question is about?
Have fun,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Monday 31 May 2004 11:09, Karthik N S wrote:
...
> I re indexed my folder 10181 [Seem's to be corrupted]
Was the index writer closed?
> Now I am getting the hits as....
>
>
> D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
> Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
The query needs to have space before the 2nd + :
+button +filename:[B10181_P702 TO B01081_P355]
> Source path [ E:/po/aaaa ] : e:/indexer3/b10181
> Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
> e:/indexer3/b10181/b10181_indx_
> Not a Found document(s) that matched query Field 'filename':
> Not a Found document(s) that matched query Field 'bookid':
> Not a Found document(s) that matched query Field 'creation':
> Not a Found document(s) that matched query Field 'contents':
> Not a Found document(s) that matched query Field 'chapNme':
> Not a Found document(s) that matched query Field 'itmName':
You seem to use a search mechanism that searches all these fields.
I'd recommend to switch this off until a query with explicit fields works,
eg.:
+contents:button +filename:[B10181_P702 TO B01081_P355]
Btw. You'll need to make sure that a term like B10181_P702 is
not split at the underscore _ by a tokenizer at indexing time.
If your filename is not a keyword field, you might consider
changing it into a keyword field.
You seem to index book pages as Lucene documents, which is ok.
However, you may also need to index larger parts of the books in
order to retrieve books with multiple subjects on different pages.
Is this what your original question is about?
Have fun,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype
Sorry Once again Apologies for my last mail
I re indexed my folder 10181 [Seem's to be corrupted]
Now I am getting the hits as....
D:\JAVA\lucene\src\demo>java org.lucene.src.indexer.search.SearchFiles
Search Keyword : +button+filename:[B10181_P702 TO B01081_P355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['+button+filename:[B10181_P702 TO B01081_P355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'contents':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':
204 Total milliseconds
D:\JAVA\lucene\src\demo>java java org.lucene.src.indexer.search.SearchFiles
Search Keyword : button+filename:[B10181_P702 TO B01081_P355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
Query: ['button+filename:[B10181_P702 TO B01081_P355]'] in Folder
e:/indexer3/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'contents':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':
Is this Correct......
Or something still wrong as per Query parse String is concerned.
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 1:47 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
Karthik,
On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype
...
>
> My Question now is, If I want to Use Range Query to get search hits
> between
> fileName "B10181_P702" and "B10181_P355" only Instead of all the 67
hits
> ,
>
In this case there is no need to override range query, just use
+fileName:[B10181_P702 TO B10181_P355]
as part of the query.
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey YPE
Apologies again
I did as per u'r mail but see the ERROR...
Search Keyword : +king+filename:[b10181_p702 TO b01081_p355]
Source path [ E:/po/aaaa ] : e:/indexer3/b10181
The Exception Raised file = SearchFiles.searchIndx0
java.lang.NegativeArraySizeException
at
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106)
at
org.apache.lucene.index.TermInfosReader.<init>(TermInfosReader.java:82)
at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:141)
at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:120)
at
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:148)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:80)
at
com.controlnet.indexing.search.SearchFiles.searchIndex0(SearchFiles.java:68)
at
com.controlnet.indexing.search.SearchFiles.main(SearchFiles.java:240)
[Note the Field filename is in lower case not fileName ,sorry about that]
Am I doing some thing wrong in here........
With regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Monday, May 31, 2004 1:47 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
Karthik,
On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype
...
>
> My Question now is, If I want to Use Range Query to get search hits
> between
> fileName "B10181_P702" and "B10181_P355" only Instead of all the 67
hits
> ,
>
In this case there is no need to override range query, just use
+fileName:[B10181_P702 TO B10181_P355]
as part of the query.
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
Karthik,
On Monday 31 May 2004 06:12, Karthik N S wrote:
> Hey Ype
...
>
> My Question now is, If I want to Use Range Query to get search hits
> between
> fileName "B10181_P702" and "B10181_P355" only Instead of all the 67 hits
> ,
>
In this case there is no need to override range query, just use
+fileName:[B10181_P702 TO B10181_P355]
as part of the query.
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype
Apologies please
Have a look at the Search Factor hits in the O/p sample of my indexed file
================== Start Searching ==========================
Search Keyword : king~
Source path [ E:/po/aaaa ] : e:/indexer/b10181
Query: ['king~'] in Folder e:/indexer/b10181/b10181_indx_
Not a Found document(s) that matched query Field 'filename':
Not a Found document(s) that matched query Field 'bookid':
Not a Found document(s) that matched query Field 'creation':
Not a Found document(s) that matched query Field 'chapNme':
Not a Found document(s) that matched query Field 'itmName':
Found document(s) that matched : 'king~' no of hits :'67' in query Field
:'contents'
File Name : B10181_P703
File Path : E:\po\catalog\B10181\B10181_P703
Modified Date : 1080036442000
Bookid : B10181
Chapter Name :
Item Name :
File Name : B10181_P702
File Path : E:\po\catalog\B10181\B10181_P702
Modified Date : 1080036442000
Bookid : B10181
Chapter Name :
Item Name :
File Name : B10181_P512
File Path : E:\po\catalog\B10181\B10181_P512
Modified Date : 1080036438000
Bookid : B10181
Chapter Name :
Item Name :
File Name : B10181_P40
File Path : E:\po\catalog\B10181\B10181_P40
Modified Date : 1080036444000
Bookid : B10181
Chapter Name :
Item Name :
File Name : B10181_P355
File Path : E:\po\catalog\B10181\B10181_P355
Modified Date : 1080036436000
Bookid : B10181
Chapter Name :
Item Name :
File Name : B10181_P379
File Path : E:\po\catalog\B10181\B10181_P379
Modified Date : 1080036436000
Bookid : B10181
Chapter Name :
Item Name :
. . . . . .
328 Total milliseconds
================== End Searching ============
The o/p says a hit of 67 in total [ I have sniped out most of them for view
case ] , the search word is present in field "Contents"
where the content part of html file is indexed.
If u see the Field " File Name" it's Unique and is indexed/ Viewed / as
per Windows O/s Explore case.
My Question now is, If I want to Use Range Query to get search hits
between
fileName "B10181_P702" and "B10181_P355" only Instead of all the 67 hits
,
How Do I do it [Please state with clear Example or send me an attachement
for the same ,
I overrided the getRange() Query method as per u'r last mail ,but still
not able to achive the Results ].
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Saturday, May 29, 2004 12:10 AM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
On Friday 28 May 2004 10:54, Karthik N S wrote:
> Hey ype
>
> Thx for the advice but still I need to get the exact situation working ,
>
> 1) I have a unique Field [ called filename ] which is indexed of type
Text.
> It accepts the name of the HTML files as the indexing parameter ,
> Also there is another Field called "Contents" which stores all the
> contents of that
> indicated unique named html file.
>
> 2) The indexer complete indexes for about 5000 html files sucessfully .
>
> 3) When I do a search for word ,it returns a hit of 400 on various html
> files
>
> Now in this situation if I want to limit the hits between First 200 to
> 400 html Page Names only
> what exactly should I do to using getRange() method.
A range query will provide a range of indexed values, and
I thought you needed to add the record number as an indexed field
in each record.
However, you seem to use the 200 and 400 here as the order number
for each record in the result of the query on the Contents field.
Is that correct?
When so, in which order do you expect the results of your query?
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Friday 28 May 2004 10:54, Karthik N S wrote:
> Hey ype
>
> Thx for the advice but still I need to get the exact situation working ,
>
> 1) I have a unique Field [ called filename ] which is indexed of type Text.
> It accepts the name of the HTML files as the indexing parameter ,
> Also there is another Field called "Contents" which stores all the
> contents of that
> indicated unique named html file.
>
> 2) The indexer complete indexes for about 5000 html files sucessfully .
>
> 3) When I do a search for word ,it returns a hit of 400 on various html
> files
>
> Now in this situation if I want to limit the hits between First 200 to
> 400 html Page Names only
> what exactly should I do to using getRange() method.
A range query will provide a range of indexed values, and
I thought you needed to add the record number as an indexed field
in each record.
However, you seem to use the 200 and 400 here as the order number
for each record in the result of the query on the Contents field.
Is that correct?
When so, in which order do you expect the results of your query?
Kind regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Erik
Apologies again
[ You probably do not want to use Field.Text for a filename. Use
Field.Keyword instead. ]
1) When changing the Field type from Text to Keyword, I do not get the hits
at all
[Since most of parameters avaliable to this Field are of String type
... file[i].getName() ]
2) On successfull Indexing the search hits retun me 400 numbers on
various html files
presence of the SearchWord in content Field.
3) If I have to limit the hits between file name (file[100].getName() and
file[200].getName() )
on the Field type "filename" for the SearchedWord.
I did the way YPE advised in his last mail but still no improvement in
sitution.
I need to get hit samples in between the 2 files [ 100 files between ]
and not the max no of hits.
Please advise me
How do I proceed....
4)
I Installed luke [ via Java webstart ] from
http://www.getopt.org/luke/webstart.html
but since my Index files are built on a custom made Analyzer [ not the set
of standard analyzer avaliable from drop box] ,
Will it search the index for the same.
with regards
Karthik
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, May 28, 2004 3:38 PM
To: Lucene Users List
Subject: Re: Range Query Sombody HELP please
On May 28, 2004, at 4:54 AM, Karthik N S wrote:
> 1) I have a unique Field [ called filename ] which is indexed of type
> Text.
You probably do not want to use Field.Text for a filename. Use
Field.Keyword instead.
> 2) The indexer complete indexes for about 5000 html files sucessfully
> .
Now use Luke (Google for _luke lucene_) to browse your index, and check
that you are getting what you think. You can do ad-hoc queries there
also.
> Now in this situation if I want to limit the hits between First 200
> to
> 400 html Page Names only
> what exactly should I do to using getRange() method.
If you want the first 200 - 400, start your Hits walking at index 200,
and proceed through 400.
Is there some field you want to key off to do the range? Or do you
just want the 200th - 400th hits from the search, which is an entirely
different question than about ranges.
> Please advise on how to proceed ...
Please send (succinct) code examples in the future to really keep this
discussion concrete and clear.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 28, 2004, at 4:54 AM, Karthik N S wrote:
> 1) I have a unique Field [ called filename ] which is indexed of type
> Text.
You probably do not want to use Field.Text for a filename. Use
Field.Keyword instead.
> 2) The indexer complete indexes for about 5000 html files sucessfully
> .
Now use Luke (Google for _luke lucene_) to browse your index, and check
that you are getting what you think. You can do ad-hoc queries there
also.
> Now in this situation if I want to limit the hits between First 200
> to
> 400 html Page Names only
> what exactly should I do to using getRange() method.
If you want the first 200 - 400, start your Hits walking at index 200,
and proceed through 400.
Is there some field you want to key off to do the range? Or do you
just want the 200th - 400th hits from the search, which is an entirely
different question than about ranges.
> Please advise on how to proceed ...
Please send (succinct) code examples in the future to really keep this
discussion concrete and clear.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey ype
Thx for the advice but still I need to get the exact situation working ,
1) I have a unique Field [ called filename ] which is indexed of type Text.
It accepts the name of the HTML files as the indexing parameter ,
Also there is another Field called "Contents" which stores all the
contents of that
indicated unique named html file.
2) The indexer complete indexes for about 5000 html files sucessfully .
3) When I do a search for word ,it returns a hit of 400 on various html
files
Now in this situation if I want to limit the hits between First 200 to
400 html Page Names only
what exactly should I do to using getRange() method.
Please advise on how to proceed ...
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Friday, May 28, 2004 1:14 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
Karthik,
On Friday 28 May 2004 05:54, Karthik N S wrote:
...
> Weh we do a search in SQL using '*' we all know that the result would be
> total no of records in the table,but when we want to get limit our record
> we apply range between 2 specific row records [Which we call it as
> subsearch]
>
>
> Similarly on a indexed record I would like perform the same tecnique
> as above.
In case you need to reuse the limitation a filter is the way to go in
Lucene.
However it seems to be better to get the range query working first.
> In fact I was looking at the url u sent me in the last mail on using
> getRange Queries
> and was working on the same
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
The query I gave uses two +'s prefixed to the query parts:
+search_word +(book:[100 TO 200])
Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.
> and
>
> http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
>
> but witou results for the last 12 hrs.
You have probably seen a lot of different things that will be useful later.
> If u could spare a few minuts and please expalin or provide a simple [
> full ] example using and
> over riding the getRange() method .
The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:
As Erik
indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields
You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.
Then in the getRangeQuery() method of your parser you'll need to prefix the
queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.
When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end,
boolean inclusive)
it will be called for the example query with start = "100" and end = "200".
(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).
In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical
fields
referred to above.
Have fun, you'll get it working,
Ype
> with regards
> Karthik
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Thursday, May 27, 2004 11:03 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
> On Thursday 27 May 2004 09:37, Karthik N S wrote:
> > Hi
> > Lucene -Developer My main intention was
> >
> > Search for an word hit in a Unique Field between ranges say
> > book100 - book 200 indexed numbers
> > It's something like creating a SUBSEARCH with in the SEARCHINDEX.
...
> Could you explain what you mean by subsearch?
> I suppose you might want to have a look at the various filter classes
> in the org.apache.lucene.search package.
>
> Regards,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
Karthik,
On Friday 28 May 2004 05:54, Karthik N S wrote:
...
> Weh we do a search in SQL using '*' we all know that the result would be
> total no of records in the table,but when we want to get limit our record
> we apply range between 2 specific row records [Which we call it as
> subsearch]
>
>
> Similarly on a indexed record I would like perform the same tecnique
> as above.
In case you need to reuse the limitation a filter is the way to go in Lucene.
However it seems to be better to get the range query working first.
> In fact I was looking at the url u sent me in the last mail on using
> getRange Queries
> and was working on the same
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
The query I gave uses two +'s prefixed to the query parts:
+search_word +(book:[100 TO 200])
Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.
> and
>
> http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
>
> but witou results for the last 12 hrs.
You have probably seen a lot of different things that will be useful later.
> If u could spare a few minuts and please expalin or provide a simple [
> full ] example using and
> over riding the getRange() method .
The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:
As Erik indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields
You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.
Then in the getRangeQuery() method of your parser you'll need to prefix the queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.
When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end, boolean inclusive)
it will be called for the example query with start = "100" and end = "200".
(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).
In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical fields
referred to above.
Have fun, you'll get it working,
Ype
> with regards
> Karthik
>
> -----Original Message-----
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
> Sent: Thursday, May 27, 2004 11:03 PM
> To: lucene-user@jakarta.apache.org
> Subject: Re: Range Query Sombody HELP please
>
> On Thursday 27 May 2004 09:37, Karthik N S wrote:
> > Hi
> > Lucene -Developer My main intention was
> >
> > Search for an word hit in a Unique Field between ranges say
> > book100 - book 200 indexed numbers
> > It's something like creating a SUBSEARCH with in the SEARCHINDEX.
...
> Could you explain what you mean by subsearch?
> I suppose you might want to have a look at the various filter classes
> in the org.apache.lucene.search package.
>
> Regards,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hey Ype
Apologies for the misconduct.
Weh we do a search in SQL using '*' we all know that the result would be
total no of records in the table,but when we want to get limit our record
we apply range between 2 specific row records [Which we call it as
subsearch]
Similarly on a indexed record I would like perform the same tecnique
as above.
In fact I was looking at the url u sent me in the last mail on using
getRange Queries
and was working on the same
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
and
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
but witou results for the last 12 hrs.
If u could spare a few minuts and please expalin or provide a simple [
full ] example using and
over riding the getRange() method .
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, May 27, 2004 11:03 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
On Thursday 27 May 2004 09:37, Karthik N S wrote:
> Hi
> Lucene -Developer My main intention was
>
> Search for an word hit in a Unique Field between ranges say
> book100 - book 200 indexed numbers
> It's something like creating a SUBSEARCH with in the SEARCHINDEX.
You don't need to shout (uppercase), I've been teaching SQL.
Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Thursday 27 May 2004 09:37, Karthik N S wrote:
> Hi
> Lucene -Developer My main intention was
>
> Search for an word hit in a Unique Field between ranges say
> book100 - book 200 indexed numbers
> It's something like creating a SUBSEARCH with in the SEARCHINDEX.
You don't need to shout (uppercase), I've been teaching SQL.
Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.
Regards,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 27, 2004, at 3:37 AM, Karthik N S wrote:
> Hi
> Lucene -Developer My main intention was
>
> Search for an word hit in a Unique Field between ranges say
> book100 - book 200 indexed numbers
> It's something like creating a SUBSEARCH with in the SEARCHINDEX.
>
> This is similar to a SQL =
>
> select * from BOOKSHELF.
> or
> select * from BOOKSHELF where book1 between 100 and 200.
Karthik - I'm having a hard time understanding your questions
unfortunately. Ype replied with solution suggestion by overriding
getRangeQuery on a custom QueryParser subclass. You need to ensure you
are indexing numbers in a padded fashion:
http://wiki.apache.org/jakarta-lucene/SearchNumericalFields
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Range Query Sombody HELP please
Posted by Karthik N S <ka...@controlnet.co.in>.
Hi
Lucene -Developer My main intention was
Search for an word hit in a Unique Field between ranges say
book100 - book 200 indexed numbers
It's something like creating a SUBSEARCH with in the SEARCHINDEX.
This is similar to a SQL =
select * from BOOKSHELF.
or
select * from BOOKSHELF where book1 between 100 and 200.
with regards
Karthik
-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Thursday, May 27, 2004 12:46 PM
To: lucene-user@jakarta.apache.org
Subject: Re: Range Query Sombody HELP please
On Thursday 27 May 2004 07:00, Karthik N S wrote:
> Hi
> Lucene developers
>
> Is it possible to do Search and retrieve relevant information on the
> Indexed Document
> within in specific range settings which may be similar to an
>
> Query in SQL = select * from BOOKSHELF where book1 between 100 and
> 200
>
> ex:-
>
> "search_word" , Book between 100 AND 200
>
> [ Note:- where Book uniquefield hit info which is already Indexed ]
The query parser can construct this query for you (assuming search_word
is in the query default field):
+search_word +(book:[100 TO 200])
See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with
sufficient
zero's prefixed and add these prefix zero's in the query.
Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.
Have fun,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Ype Kingma <yk...@xs4all.nl>.
On Thursday 27 May 2004 07:00, Karthik N S wrote:
> Hi
> Lucene developers
>
> Is it possible to do Search and retrieve relevant information on the
> Indexed Document
> within in specific range settings which may be similar to an
>
> Query in SQL = select * from BOOKSHELF where book1 between 100 and
> 200
>
> ex:-
>
> "search_word" , Book between 100 AND 200
>
> [ Note:- where Book uniquefield hit info which is already Indexed ]
The query parser can construct this query for you (assuming search_word
is in the query default field):
+search_word +(book:[100 TO 200])
See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with sufficient
zero's prefixed and add these prefix zero's in the query.
Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.
Have fun,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Range Query Sombody HELP please
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Karthik, namaste!
I seem to be getting multiple copies of your email.
I received 4 copies of this email.
Could you please limit things to 1 message per subject?
I get hundreds of messages every day as is. :(
Thank you,
Otis
--- Karthik N S <ka...@controlnet.co.in> wrote:
>
> Hi
> Lucene developers
>
> Is it possible to do Search and retrieve relevant information on the
> Indexed
> Document
> within in specific range settings which may be similar to an
>
> Query in SQL = select * from BOOKSHELF where book1 between 100
> and 200
>
> ex:-
>
> "search_word" , Book between 100 AND 200
>
> [ Note:- where Book uniquefield hit info which is already Indexed ]
>
>
> Sombody Please Help me :(
>
>
> with regards
> Karthik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org