You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Sibert <ch...@attbi.com> on 2002/06/25 09:22:14 UTC

Keywords

How do I do a search on a keyword field ? I am using the Query object with a SimpleAnalyzer, and it's parsing my keyword into terms, which I don't want.

RE: Problems understanding RangeQuery...

Posted by Karl Øie <ka...@gan.no>.
thank you, that works! :-) and saves my day!

mvh karl øie



-----Original Message-----
From: Terry Steichen [mailto:terry@net-frame.com]
Sent: 10. august 2002 18:29
To: Lucene Users List; karl@gan.no
Subject: Re: Problems understanding RangeQuery...


Hi Karl,

I have discovered that with range queries you *must* ensure there is a space
on either side of the dash.

That is, [1971 - 1979] rather than [1971-1979].  If you don't, Lucene will
interpret it as [1979 - null].

To illustrate a bit more, here are some result totals that I get on my
index:
pub_mo:[07 - 08]  --> 8370 (note the spaces around the dash
pub_mo:[07-08]    --> 2133 (note the absence of spaces)
pub_mo:[08 - null] --> 2133
pub_mo:(07 08)    --> 8370 (note the use of parentheses, not brackets)

Just put the spaces in and all should be OK.

Regards,

Terry




----- Original Message -----
From: "Karl Øie" <ka...@gan.no>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, August 10, 2002 11:47 AM
Subject: Problems understanding RangeQuery...


> Hi, i have a problem with understanding RangeQueries in Lucene-1.2:
>
> I have created an index with posts that has the field W_PUBLISHING_YEAR
> which contains the YYYY year of publishing. After indexing i loop through
> the terms and finds that i have the following terms present in the index:
>
>
1923,1925,1926,1930,1933,1935,1936,1938,1942,1943,1945,1946,1947,1948,1949,1
>
950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,19
>
65,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,198
>
0,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995
> ,1996,1997,1998,1999,2000,2001,2002,2003,2004,2010,2018,2097 in 232290
> documents.
>
> Then i run these queries on the index "W_PUBLISHING_YEAR:[1971-1979]" and
> "W_PUBLISHING_YEAR:[2000-2002]" and both queries gives me some strange
> results:
>
>
> W_PUBLISHING_YEAR:[1971-1979]
>
> found={1975, 1974, 1973, 1972, 1999, 1998, 1997, 1996, 1995, 1994, 1993,
> 2018, 1992, 1991, 1990, 2010, 1989, 1988, 1987, 1986, 1985, 1984, 1983,
> 1982, 1981, 1980, 2004, 2003, 2002, 2001, 2097, 2000, 1979, 1978, 1977,
> 1976} in 150793 matching documents.
>
>
> W_PUBLISHING_YEAR:[2000-2002]
>
> found={2002, 2001, 2097, 2010, 2018, 2004, 2003} in 10756 matching
> documents.
>
>
>
> Is there something i do wrong here? How is the RangeQuery supposed to
work?
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Problems understanding RangeQuery...

Posted by Terry Steichen <te...@net-frame.com>.
Hi Karl,

I have discovered that with range queries you *must* ensure there is a space
on either side of the dash.

That is, [1971 - 1979] rather than [1971-1979].  If you don't, Lucene will
interpret it as [1979 - null].

To illustrate a bit more, here are some result totals that I get on my
index:
pub_mo:[07 - 08]  --> 8370 (note the spaces around the dash
pub_mo:[07-08]    --> 2133 (note the absence of spaces)
pub_mo:[08 - null] --> 2133
pub_mo:(07 08)    --> 8370 (note the use of parentheses, not brackets)

Just put the spaces in and all should be OK.

Regards,

Terry




----- Original Message -----
From: "Karl Øie" <ka...@gan.no>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, August 10, 2002 11:47 AM
Subject: Problems understanding RangeQuery...


> Hi, i have a problem with understanding RangeQueries in Lucene-1.2:
>
> I have created an index with posts that has the field W_PUBLISHING_YEAR
> which contains the YYYY year of publishing. After indexing i loop through
> the terms and finds that i have the following terms present in the index:
>
>
1923,1925,1926,1930,1933,1935,1936,1938,1942,1943,1945,1946,1947,1948,1949,1
>
950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,19
>
65,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,198
>
0,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995
> ,1996,1997,1998,1999,2000,2001,2002,2003,2004,2010,2018,2097 in 232290
> documents.
>
> Then i run these queries on the index "W_PUBLISHING_YEAR:[1971-1979]" and
> "W_PUBLISHING_YEAR:[2000-2002]" and both queries gives me some strange
> results:
>
>
> W_PUBLISHING_YEAR:[1971-1979]
>
> found={1975, 1974, 1973, 1972, 1999, 1998, 1997, 1996, 1995, 1994, 1993,
> 2018, 1992, 1991, 1990, 2010, 1989, 1988, 1987, 1986, 1985, 1984, 1983,
> 1982, 1981, 1980, 2004, 2003, 2002, 2001, 2097, 2000, 1979, 1978, 1977,
> 1976} in 150793 matching documents.
>
>
> W_PUBLISHING_YEAR:[2000-2002]
>
> found={2002, 2001, 2097, 2010, 2018, 2004, 2003} in 10756 matching
> documents.
>
>
>
> Is there something i do wrong here? How is the RangeQuery supposed to
work?
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Problems understanding RangeQuery...

Posted by Karl Øie <ka...@gan.no>.
Hi, i have a problem with understanding RangeQueries in Lucene-1.2:

I have created an index with posts that has the field W_PUBLISHING_YEAR
which contains the YYYY year of publishing. After indexing i loop through
the terms and finds that i have the following terms present in the index:

1923,1925,1926,1930,1933,1935,1936,1938,1942,1943,1945,1946,1947,1948,1949,1
950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,19
65,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,198
0,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995
,1996,1997,1998,1999,2000,2001,2002,2003,2004,2010,2018,2097 in 232290
documents.

Then i run these queries on the index "W_PUBLISHING_YEAR:[1971-1979]" and
"W_PUBLISHING_YEAR:[2000-2002]" and both queries gives me some strange
results:


W_PUBLISHING_YEAR:[1971-1979]

found={1975, 1974, 1973, 1972, 1999, 1998, 1997, 1996, 1995, 1994, 1993,
2018, 1992, 1991, 1990, 2010, 1989, 1988, 1987, 1986, 1985, 1984, 1983,
1982, 1981, 1980, 2004, 2003, 2002, 2001, 2097, 2000, 1979, 1978, 1977,
1976} in 150793 matching documents.


W_PUBLISHING_YEAR:[2000-2002]

found={2002, 2001, 2097, 2010, 2018, 2004, 2003} in 10756 matching
documents.



Is there something i do wrong here? How is the RangeQuery supposed to work?


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Keywords

Posted by Chris Sibert <ch...@attbi.com>.
My initial post may have been a little confusing. Here's my situation. I
have a keyword field stored in a Lucene index. It is not tokenized, as it's
a keyword field. But, when I go to run a search against this keyword field,
it seems to be tokenizing the search string.

            String searchString = "T0001-001 M"

            myQuery   = QueryParser.parse ( searchString,
"FieldDocumentNumber", simpleAnalyzer )    ;
            hits            = indexSearcher.search ( myQuery )     ;


How do I do a search on a keyword field ? Is there another search method to
use for this purpose, other than the one that I am using here ?


----- Original Message -----
From: "Chris Sibert" <ch...@attbi.com>
To: <lu...@jakarta.apache.org>
Sent: Tuesday, June 25, 2002 3:22 AM
Subject: Keywords


How do I do a search on a keyword field ? I am using the Query object with a
SimpleAnalyzer, and it's parsing my keyword into terms, which I don't want.



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>