You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rathinapriya Nagalingam <rn...@in.ibm.com> on 2009/10/02 17:51:42 UTC
lucene 2.4.1 : document in index but not returned in search
Hi,
I created an index of around 45000 documents. I search using Title and
Abstract field. (Using lucene 2.4.1)
When I look in lukeall, some titles are available in index, but I dont get
them when I search using title as keyword.
I have copied code snapshots below.
Recently we upgraded from lucene 2.0 to 2.4.1 and I am fairly new to
lucene. Please let me know what could be the possible issue.
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
pCreate,
IndexWriter.MaxFieldLength.UNLIMITED);
........................
SimpleAnalyzer cjkAnalyzer = new SimpleAnalyzer();
doc.add(new Field(LuceneDocument.TITLE_FIELD, pTitle, Field.Store.YES,
Field.Index.ANALYZED_NO_NORMS));
doc.add(new Field(LuceneDocument.ABSTRACT_FIELD, pDescription,
Field.Store.YES, Field.Index.
ANALYZED_NO_NORMS));
...............
if (localeStr.equals("zh_CN") || localeStr.equals("ko_KR")) {
writer.addDocument(doc, cjkAnalyzer);
} else {
writer.addDocument(doc);
}
While searching the index, I use the analyser as below.
// construct the proper analyzer based on locale
if (pLocale.equals("zh_CN") || pLocale.equals(
"ko_KR")) { /* NOI18N */
analyzer = new SimpleAnalyzer();
} else {
analyzer = new StandardAnalyzer();
}
// analyze the keywords
ts = analyzer.tokenStream("abstract",
//$NON-NLS-1$
new StringReader(sb.toString()));
tokens = new ArrayList();
try {
while (true) {
Token token = ts.next();
if (token == null) {
break;
}
tokens.add(token.termText());
}
} catch (IOException ex) {
Logger.logException(Logger.TYPE_ERR, this,
"analyzeKeywords", ex);
//$NON-NLS-1$
}
// Loop through the keywords
for (int i = 0; i < tokens.length; i++) {
// Each keyword must be queried
against the title and abstract.
BooleanQuery bQuery = new
BooleanQuery();
TermQuery titleTerm = new
TermQuery(
new Term("title",
tokens[i])); //$NON-NLS-1$
TermQuery abstractTerm = new
TermQuery(new Term(
"abstract",
tokens[i])); //$NON-NLS-1$
if (keywordStatusListSize > i
&& ((Boolean)
keywordStatusList.get(i)).booleanValue()) {
bQuery.add(titleTerm,
BooleanClause.Occur.MUST);
bQuery.add(abstractTerm,
BooleanClause.Occur.MUST);
} else {
bQuery.add(titleTerm,
BooleanClause.Occur.SHOULD);
bQuery.add(abstractTerm,
BooleanClause.Occur.SHOULD);
}
if (flag || ((Boolean)
keywordStatusList.get(i)).booleanValue()) {
keyQuery.add(bQuery,
BooleanClause.Occur.MUST);
} else {
keyQuery.add(bQuery,
BooleanClause.Occur.SHOULD);
}
}
................................
CachingWrapperFilter cf;
Searcher searcher;
HitCollector collector;
................. some assignments...............
searcher.search(keyQuery, cf, collector);
Thanks & Regards,
Priya
PT-7A-012
Residency Road
Bangalore
India
Mob: 99011 22033
Re: lucene 2.4.1 : document in index but not returned in search
Posted by Felipe Lobo <fe...@goshme.com>.
Your are wright! The docs didn't came because the score was 0.
I had to review my boosting rules to fix it.
On Fri, Oct 2, 2009 at 4:15 PM, Anshum <an...@gmail.com> wrote:
> Hi Priya,
> You are using different analyzers for searching and indexing in case of
> CH/KR locales. Could you give the snippet of the doc you are trying to
> index
> and search for. Also, get a print of keyQuery to get an idea of what
> exactly
> is getting searched post (query)analysis. That might give a better picture.
> @filipe : As far as my knowledge, A hit should be a non zero score anyways.
> Correct me if I'm wrong :)
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Fri, Oct 2, 2009 at 11:05 PM, Felipe Lobo <fe...@goshme.com> wrote:
>
> > I had similar problem, it was the score of the documents, they were 0, so
> > they didn't came when i searched!
> > Check your score docs!
> >
> > On Fri, Oct 2, 2009 at 2:12 PM, N Hira <nh...@cognocys.com> wrote:
> >
> > > Which analyzer do you use in luke?
> > >
> > > The general practice is to use the same analyzer for indexing and
> > > searching.
> > >
> > > Good luck.
> > >
> > > -h
> > >
> > >
> > >
> > > ----- Original Message ----
> > > From: Rathinapriya Nagalingam <rn...@in.ibm.com>
> > > To: java-user@lucene.apache.org
> > > Sent: Friday, October 2, 2009 10:51:42 AM
> > > Subject: lucene 2.4.1 : document in index but not returned in search
> > >
> > > Hi,
> > >
> > > I created an index of around 45000 documents. I search using Title and
> > > Abstract field. (Using lucene 2.4.1)
> > > When I look in lukeall, some titles are available in index, but I dont
> > get
> > > them when I search using title as keyword.
> > > I have copied code snapshots below.
> > >
> > > Recently we upgraded from lucene 2.0 to 2.4.1 and I am fairly new to
> > > lucene. Please let me know what could be the possible issue.
> > >
> > > IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
> > > pCreate,
> > > IndexWriter.MaxFieldLength.UNLIMITED);
> > >
> > > ........................
> > > SimpleAnalyzer cjkAnalyzer = new SimpleAnalyzer();
> > > doc.add(new Field(LuceneDocument.TITLE_FIELD, pTitle, Field.Store.YES,
> > > Field.Index.ANALYZED_NO_NORMS));
> > >
> > > doc.add(new Field(LuceneDocument.ABSTRACT_FIELD, pDescription,
> > > Field.Store.YES, Field.Index.
> > > ANALYZED_NO_NORMS));
> > > ...............
> > > if (localeStr.equals("zh_CN") || localeStr.equals("ko_KR")) {
> > > writer.addDocument(doc, cjkAnalyzer);
> > > } else {
> > > writer.addDocument(doc);
> > > }
> > >
> > >
> > > While searching the index, I use the analyser as below.
> > >
> > > // construct the proper analyzer based on locale
> > > if (pLocale.equals("zh_CN") || pLocale.equals(
> > > "ko_KR")) { /* NOI18N */
> > > analyzer = new SimpleAnalyzer();
> > > } else {
> > > analyzer = new StandardAnalyzer();
> > > }
> > > // analyze the keywords
> > > ts = analyzer.tokenStream("abstract",
> > > //$NON-NLS-1$
> > > new
> StringReader(sb.toString()));
> > > tokens = new ArrayList();
> > > try {
> > > while (true) {
> > > Token token = ts.next();
> > > if (token == null) {
> > > break;
> > > }
> > > tokens.add(token.termText());
> > > }
> > > } catch (IOException ex) {
> > > Logger.logException(Logger.TYPE_ERR,
> this,
> > > "analyzeKeywords", ex);
> > > //$NON-NLS-1$
> > > }
> > >
> > > // Loop through the keywords
> > > for (int i = 0; i < tokens.length; i++)
> {
> > > // Each keyword must be queried
> > > against the title and abstract.
> > > BooleanQuery bQuery = new
> > > BooleanQuery();
> > > TermQuery titleTerm = new
> > > TermQuery(
> > > new
> Term("title",
> > > tokens[i])); //$NON-NLS-1$
> > > TermQuery abstractTerm = new
> > > TermQuery(new Term(
> > > "abstract",
> > > tokens[i])); //$NON-NLS-1$
> > > if (keywordStatusListSize > i
> > > && ((Boolean)
> > > keywordStatusList.get(i)).booleanValue()) {
> > > bQuery.add(titleTerm,
> > > BooleanClause.Occur.MUST);
> > > bQuery.add(abstractTerm,
> > > BooleanClause.Occur.MUST);
> > > } else {
> > > bQuery.add(titleTerm,
> > > BooleanClause.Occur.SHOULD);
> > > bQuery.add(abstractTerm,
> > > BooleanClause.Occur.SHOULD);
> > > }
> > > if (flag || ((Boolean)
> > > keywordStatusList.get(i)).booleanValue()) {
> > > keyQuery.add(bQuery,
> > > BooleanClause.Occur.MUST);
> > > } else {
> > > keyQuery.add(bQuery,
> > > BooleanClause.Occur.SHOULD);
> > > }
> > > }
> > >
> > > ................................
> > > CachingWrapperFilter cf;
> > > Searcher searcher;
> > > HitCollector collector;
> > > ................. some assignments...............
> > > searcher.search(keyQuery, cf, collector);
> > >
> > >
> > > Thanks & Regards,
> > > Priya
> > > PT-7A-012
> > > Residency Road
> > > Bangalore
> > > India
> > > Mob: 99011 22033
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Felipe Lobo
> > www.jusbrasil.com.br
> >
>
--
Felipe Lobo
www.jusbrasil.com.br
Re: lucene 2.4.1 : document in index but not returned in search
Posted by Anshum <an...@gmail.com>.
Hi Priya,
You are using different analyzers for searching and indexing in case of
CH/KR locales. Could you give the snippet of the doc you are trying to index
and search for. Also, get a print of keyQuery to get an idea of what exactly
is getting searched post (query)analysis. That might give a better picture.
@filipe : As far as my knowledge, A hit should be a non zero score anyways.
Correct me if I'm wrong :)
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............
On Fri, Oct 2, 2009 at 11:05 PM, Felipe Lobo <fe...@goshme.com> wrote:
> I had similar problem, it was the score of the documents, they were 0, so
> they didn't came when i searched!
> Check your score docs!
>
> On Fri, Oct 2, 2009 at 2:12 PM, N Hira <nh...@cognocys.com> wrote:
>
> > Which analyzer do you use in luke?
> >
> > The general practice is to use the same analyzer for indexing and
> > searching.
> >
> > Good luck.
> >
> > -h
> >
> >
> >
> > ----- Original Message ----
> > From: Rathinapriya Nagalingam <rn...@in.ibm.com>
> > To: java-user@lucene.apache.org
> > Sent: Friday, October 2, 2009 10:51:42 AM
> > Subject: lucene 2.4.1 : document in index but not returned in search
> >
> > Hi,
> >
> > I created an index of around 45000 documents. I search using Title and
> > Abstract field. (Using lucene 2.4.1)
> > When I look in lukeall, some titles are available in index, but I dont
> get
> > them when I search using title as keyword.
> > I have copied code snapshots below.
> >
> > Recently we upgraded from lucene 2.0 to 2.4.1 and I am fairly new to
> > lucene. Please let me know what could be the possible issue.
> >
> > IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
> > pCreate,
> > IndexWriter.MaxFieldLength.UNLIMITED);
> >
> > ........................
> > SimpleAnalyzer cjkAnalyzer = new SimpleAnalyzer();
> > doc.add(new Field(LuceneDocument.TITLE_FIELD, pTitle, Field.Store.YES,
> > Field.Index.ANALYZED_NO_NORMS));
> >
> > doc.add(new Field(LuceneDocument.ABSTRACT_FIELD, pDescription,
> > Field.Store.YES, Field.Index.
> > ANALYZED_NO_NORMS));
> > ...............
> > if (localeStr.equals("zh_CN") || localeStr.equals("ko_KR")) {
> > writer.addDocument(doc, cjkAnalyzer);
> > } else {
> > writer.addDocument(doc);
> > }
> >
> >
> > While searching the index, I use the analyser as below.
> >
> > // construct the proper analyzer based on locale
> > if (pLocale.equals("zh_CN") || pLocale.equals(
> > "ko_KR")) { /* NOI18N */
> > analyzer = new SimpleAnalyzer();
> > } else {
> > analyzer = new StandardAnalyzer();
> > }
> > // analyze the keywords
> > ts = analyzer.tokenStream("abstract",
> > //$NON-NLS-1$
> > new StringReader(sb.toString()));
> > tokens = new ArrayList();
> > try {
> > while (true) {
> > Token token = ts.next();
> > if (token == null) {
> > break;
> > }
> > tokens.add(token.termText());
> > }
> > } catch (IOException ex) {
> > Logger.logException(Logger.TYPE_ERR, this,
> > "analyzeKeywords", ex);
> > //$NON-NLS-1$
> > }
> >
> > // Loop through the keywords
> > for (int i = 0; i < tokens.length; i++) {
> > // Each keyword must be queried
> > against the title and abstract.
> > BooleanQuery bQuery = new
> > BooleanQuery();
> > TermQuery titleTerm = new
> > TermQuery(
> > new Term("title",
> > tokens[i])); //$NON-NLS-1$
> > TermQuery abstractTerm = new
> > TermQuery(new Term(
> > "abstract",
> > tokens[i])); //$NON-NLS-1$
> > if (keywordStatusListSize > i
> > && ((Boolean)
> > keywordStatusList.get(i)).booleanValue()) {
> > bQuery.add(titleTerm,
> > BooleanClause.Occur.MUST);
> > bQuery.add(abstractTerm,
> > BooleanClause.Occur.MUST);
> > } else {
> > bQuery.add(titleTerm,
> > BooleanClause.Occur.SHOULD);
> > bQuery.add(abstractTerm,
> > BooleanClause.Occur.SHOULD);
> > }
> > if (flag || ((Boolean)
> > keywordStatusList.get(i)).booleanValue()) {
> > keyQuery.add(bQuery,
> > BooleanClause.Occur.MUST);
> > } else {
> > keyQuery.add(bQuery,
> > BooleanClause.Occur.SHOULD);
> > }
> > }
> >
> > ................................
> > CachingWrapperFilter cf;
> > Searcher searcher;
> > HitCollector collector;
> > ................. some assignments...............
> > searcher.search(keyQuery, cf, collector);
> >
> >
> > Thanks & Regards,
> > Priya
> > PT-7A-012
> > Residency Road
> > Bangalore
> > India
> > Mob: 99011 22033
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Felipe Lobo
> www.jusbrasil.com.br
>
Re: lucene 2.4.1 : document in index but not returned in search
Posted by Felipe Lobo <fe...@goshme.com>.
I had similar problem, it was the score of the documents, they were 0, so
they didn't came when i searched!
Check your score docs!
On Fri, Oct 2, 2009 at 2:12 PM, N Hira <nh...@cognocys.com> wrote:
> Which analyzer do you use in luke?
>
> The general practice is to use the same analyzer for indexing and
> searching.
>
> Good luck.
>
> -h
>
>
>
> ----- Original Message ----
> From: Rathinapriya Nagalingam <rn...@in.ibm.com>
> To: java-user@lucene.apache.org
> Sent: Friday, October 2, 2009 10:51:42 AM
> Subject: lucene 2.4.1 : document in index but not returned in search
>
> Hi,
>
> I created an index of around 45000 documents. I search using Title and
> Abstract field. (Using lucene 2.4.1)
> When I look in lukeall, some titles are available in index, but I dont get
> them when I search using title as keyword.
> I have copied code snapshots below.
>
> Recently we upgraded from lucene 2.0 to 2.4.1 and I am fairly new to
> lucene. Please let me know what could be the possible issue.
>
> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
> pCreate,
> IndexWriter.MaxFieldLength.UNLIMITED);
>
> ........................
> SimpleAnalyzer cjkAnalyzer = new SimpleAnalyzer();
> doc.add(new Field(LuceneDocument.TITLE_FIELD, pTitle, Field.Store.YES,
> Field.Index.ANALYZED_NO_NORMS));
>
> doc.add(new Field(LuceneDocument.ABSTRACT_FIELD, pDescription,
> Field.Store.YES, Field.Index.
> ANALYZED_NO_NORMS));
> ...............
> if (localeStr.equals("zh_CN") || localeStr.equals("ko_KR")) {
> writer.addDocument(doc, cjkAnalyzer);
> } else {
> writer.addDocument(doc);
> }
>
>
> While searching the index, I use the analyser as below.
>
> // construct the proper analyzer based on locale
> if (pLocale.equals("zh_CN") || pLocale.equals(
> "ko_KR")) { /* NOI18N */
> analyzer = new SimpleAnalyzer();
> } else {
> analyzer = new StandardAnalyzer();
> }
> // analyze the keywords
> ts = analyzer.tokenStream("abstract",
> //$NON-NLS-1$
> new StringReader(sb.toString()));
> tokens = new ArrayList();
> try {
> while (true) {
> Token token = ts.next();
> if (token == null) {
> break;
> }
> tokens.add(token.termText());
> }
> } catch (IOException ex) {
> Logger.logException(Logger.TYPE_ERR, this,
> "analyzeKeywords", ex);
> //$NON-NLS-1$
> }
>
> // Loop through the keywords
> for (int i = 0; i < tokens.length; i++) {
> // Each keyword must be queried
> against the title and abstract.
> BooleanQuery bQuery = new
> BooleanQuery();
> TermQuery titleTerm = new
> TermQuery(
> new Term("title",
> tokens[i])); //$NON-NLS-1$
> TermQuery abstractTerm = new
> TermQuery(new Term(
> "abstract",
> tokens[i])); //$NON-NLS-1$
> if (keywordStatusListSize > i
> && ((Boolean)
> keywordStatusList.get(i)).booleanValue()) {
> bQuery.add(titleTerm,
> BooleanClause.Occur.MUST);
> bQuery.add(abstractTerm,
> BooleanClause.Occur.MUST);
> } else {
> bQuery.add(titleTerm,
> BooleanClause.Occur.SHOULD);
> bQuery.add(abstractTerm,
> BooleanClause.Occur.SHOULD);
> }
> if (flag || ((Boolean)
> keywordStatusList.get(i)).booleanValue()) {
> keyQuery.add(bQuery,
> BooleanClause.Occur.MUST);
> } else {
> keyQuery.add(bQuery,
> BooleanClause.Occur.SHOULD);
> }
> }
>
> ................................
> CachingWrapperFilter cf;
> Searcher searcher;
> HitCollector collector;
> ................. some assignments...............
> searcher.search(keyQuery, cf, collector);
>
>
> Thanks & Regards,
> Priya
> PT-7A-012
> Residency Road
> Bangalore
> India
> Mob: 99011 22033
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
Felipe Lobo
www.jusbrasil.com.br
Re: lucene 2.4.1 : document in index but not returned in search
Posted by N Hira <nh...@cognocys.com>.
Which analyzer do you use in luke?
The general practice is to use the same analyzer for indexing and searching.
Good luck.
-h
----- Original Message ----
From: Rathinapriya Nagalingam <rn...@in.ibm.com>
To: java-user@lucene.apache.org
Sent: Friday, October 2, 2009 10:51:42 AM
Subject: lucene 2.4.1 : document in index but not returned in search
Hi,
I created an index of around 45000 documents. I search using Title and
Abstract field. (Using lucene 2.4.1)
When I look in lukeall, some titles are available in index, but I dont get
them when I search using title as keyword.
I have copied code snapshots below.
Recently we upgraded from lucene 2.0 to 2.4.1 and I am fairly new to
lucene. Please let me know what could be the possible issue.
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
pCreate,
IndexWriter.MaxFieldLength.UNLIMITED);
........................
SimpleAnalyzer cjkAnalyzer = new SimpleAnalyzer();
doc.add(new Field(LuceneDocument.TITLE_FIELD, pTitle, Field.Store.YES,
Field.Index.ANALYZED_NO_NORMS));
doc.add(new Field(LuceneDocument.ABSTRACT_FIELD, pDescription,
Field.Store.YES, Field.Index.
ANALYZED_NO_NORMS));
...............
if (localeStr.equals("zh_CN") || localeStr.equals("ko_KR")) {
writer.addDocument(doc, cjkAnalyzer);
} else {
writer.addDocument(doc);
}
While searching the index, I use the analyser as below.
// construct the proper analyzer based on locale
if (pLocale.equals("zh_CN") || pLocale.equals(
"ko_KR")) { /* NOI18N */
analyzer = new SimpleAnalyzer();
} else {
analyzer = new StandardAnalyzer();
}
// analyze the keywords
ts = analyzer.tokenStream("abstract",
//$NON-NLS-1$
new StringReader(sb.toString()));
tokens = new ArrayList();
try {
while (true) {
Token token = ts.next();
if (token == null) {
break;
}
tokens.add(token.termText());
}
} catch (IOException ex) {
Logger.logException(Logger.TYPE_ERR, this,
"analyzeKeywords", ex);
//$NON-NLS-1$
}
// Loop through the keywords
for (int i = 0; i < tokens.length; i++) {
// Each keyword must be queried
against the title and abstract.
BooleanQuery bQuery = new
BooleanQuery();
TermQuery titleTerm = new
TermQuery(
new Term("title",
tokens[i])); //$NON-NLS-1$
TermQuery abstractTerm = new
TermQuery(new Term(
"abstract",
tokens[i])); //$NON-NLS-1$
if (keywordStatusListSize > i
&& ((Boolean)
keywordStatusList.get(i)).booleanValue()) {
bQuery.add(titleTerm,
BooleanClause.Occur.MUST);
bQuery.add(abstractTerm,
BooleanClause.Occur.MUST);
} else {
bQuery.add(titleTerm,
BooleanClause.Occur.SHOULD);
bQuery.add(abstractTerm,
BooleanClause.Occur.SHOULD);
}
if (flag || ((Boolean)
keywordStatusList.get(i)).booleanValue()) {
keyQuery.add(bQuery,
BooleanClause.Occur.MUST);
} else {
keyQuery.add(bQuery,
BooleanClause.Occur.SHOULD);
}
}
................................
CachingWrapperFilter cf;
Searcher searcher;
HitCollector collector;
................. some assignments...............
searcher.search(keyQuery, cf, collector);
Thanks & Regards,
Priya
PT-7A-012
Residency Road
Bangalore
India
Mob: 99011 22033
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org