You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Oliver Castle <Ol...@TescoDiets.com> on 2007/10/22 11:18:24 UTC

Multi field search problems

Hello

Firstly great work on every one involved in the Lucene.NET its a great 
project. We are going to use it as the search for a new project that we 
are working on which is due to go live in the next few months. The only 
problem is that we are getting great results from the search if we are 
only searching one field but generally our users will be selecting from 
two or three fields which using the MultiFieldQueryParser does not seem 
to be producing great results so I was wondering if any one might be 
able to help us?

To create the index I am using the code below which is written to disk:-

            luceneDocument.Add(new 
Field(Constant.LuceneConstant_FoodItemId, 
foodItem.FoodItemId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_ParentFoodItemId, 
foodItem.ParentFoodItemId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_DataProviderItemId, 
foodItem.DataProviderItemId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_EANcode, 
(foodItem.EuropeanArticleNumberCode == null) ? string.Empty : 
foodItem.EuropeanArticleNumberCode.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_Summary, foodItem.Summary, 
Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_StorageTypeId, 
foodItem.StorageTypeId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_FoodItemSupplierId, 
foodItem.FoodItemSupplierId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_FoodItemBrandId, 
foodItem.FoodItemBrandId.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_MeasurementType, 
((int)foodItem.MeasurementType).ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));           
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_NoOfUnits, foodItem.NoOfUnits.ToString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_PackSize, foodItem.PackSize.ToString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_PortionSize, 
foodItem.PortionSize.ToString(), Field.Store.YES, 
Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_Approved, foodItem.Approved.ToString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_Created, foodItem.Created.ToString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.NO));
            luceneDocument.Add(new 
Field(Constant.LuceneConstant_Updated, foodItem.Updated.ToString(), 
Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.NO));

The index contains about 20,000 food products which creates an index of 
about 6mb. The code to search this index is then below

        public static SortableResultsCollection<FoodItem> 
FoodItemSearch(short departmentCode, short ailseCode, short shelfCode, 
short supermarketId, byte searchType, string searchQuery, int 
startValue, int amountOfResults)
        {
            List<string> queryFieldList = new List<string>();
            List<string> queryList = new List<string>();
            List<BooleanClause.Occur> queryClauseList = new 
List<BooleanClause.Occur>();
            //Process the department code
            if (departmentCode > 0)
            {
                queryList.Add(Constant.LuceneConstant_CategoryTopLevel);
                queryFieldList.Add(departmentCode.ToString());
                queryClauseList.Add(BooleanClause.Occur.MUST);
            }
            //Process the ailse code
            if (ailseCode > 0)
            {
                queryList.Add(Constant.LuceneConstant_CategorySecondLevel);
                queryFieldList.Add(ailseCode.ToString());
                queryClauseList.Add(BooleanClause.Occur.MUST);
            }
            //Process the shelf code
            if (shelfCode > 0)
            {
                queryList.Add(Constant.LuceneConstant_CategoryThirdLevel);
                queryFieldList.Add(shelfCode.ToString());
                queryClauseList.Add(BooleanClause.Occur.MUST);
            }
            //Process the supermarket
            if (supermarketId > 0)
            {
                queryList.Add(Constant.LuceneConstant_FoodItemSupplierId);
                queryFieldList.Add(supermarketId.ToString());
                queryClauseList.Add(BooleanClause.Occur.MUST);
            }
            //Process the search query
            if (searchQuery != string.Empty)
            {
                if (searchType == 1)
                {
                    queryList.Add(Constant.LuceneConstant_Summary);
                    queryFieldList.Add(searchQuery);
                    queryClauseList.Add(BooleanClause.Occur.MUST);
                }
                else
                {
                    queryList.Add(Constant.LuceneConstant_EANcode);
                    queryFieldList.Add(searchQuery);
                    queryClauseList.Add(BooleanClause.Occur.MUST);
                }
            }
            //Create the arrays to pass to the query
            string[] queryFieldArray = new string[queryFieldList.Count];
            string[] queryArray = new string[queryList.Count];
            BooleanClause.Occur[] occurArray = new 
BooleanClause.Occur[queryList.Count];
            //Assign the list data to the array
            int rowCount = queryList.Count - 1;
            for (int i = 0; i <= rowCount; i++)
            {
                queryFieldArray[i] = queryFieldList[i];
                queryArray[i] = queryList[i];
                occurArray[i] = queryClauseList[i];
            }
           
            Query query = MultiFieldQueryParser.Parse(queryFieldArray, 
queryArray, occurArray, new StandardAnalyzer());
            EGC.SortableResultsCollection<FoodItem> collection = new 
EGC.SortableResultsCollection<FoodItem>();
            string indexPath = 
ConfigurationManager.AppSettings[CONST_LUCENE_PATH];
            IndexSearcher indexSearcher = new IndexSearcher(indexPath);
            Hits searchHits = indexSearcher.Search(query);
            int totalHitsCount = searchHits.Length();
            collection.TotalResultCount = totalHitsCount - 1;
            collection.StartResult = startValue;
            int hitsCount = startValue + amountOfResults;
            if (hitsCount > totalHitsCount)
            {
                hitsCount = amountOfResults;
            }
            if (totalHitsCount > 0)
            {
                for (int hitNumer = startValue; hitNumer <= hitsCount; 
hitNumer++)
                {
                    
collection.Add(LuceneDocumentToFoodItem(searchHits.Doc(hitNumer)));
                }
            }
            //close the index sercher
            indexSearcher.Close();

I know the code at the moment is not amazing but we are trying to get 
the concept correct before we do any tuning. I have tried changing 
BooleanClause.Occur.MUST to BooleanClause.Occur.SHOULD but we have seen 
no improvement in the search. Is using the MultiFieldQueryParser the 
correct way to search for products in the index or do I need to do 
singular searches and then apply filters to narrow the results?

Any help is greatly received.

Thanks

Ollie Castle


***************************************************************************
E-mail Disclaimer
The information transmitted in this e-mail is intended only for the confidential use of the named recipient and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited.
Furthermore, no part of the information may be reproduced or transmitted in any form or by any means, electronic or mechanical, or by an information storage or retrieval system, without prior permission.

TescoDiets may monitor email traffic data and also the content of email for the purposes of security and staff training. 
If you received this in error, please contact the sender and/or system manager and delete the material from all relevant computers.
The employer assumes no responsibility for any use to which the information may be put, or for any errors.

Registration Number 19542
***************************************************************************