You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by emmettwalsh <em...@gmail.com> on 2007/07/03 23:03:37 UTC

Indexing and searching help

Hi,

lucene documentation seems to be very confusing... here is my predicament

I have an object like the following:

public class PropertyImpl implements Property {

	String id;

	List<String> names = new ArrayList<String>();
	
	String address = "";

	String city = "";

	String street = "";

	String state = "";

	String cachedMatchingString;

...

..
public String toString() {
		// save some time
		if (cachedMatchingString == null) {

			StringBuffer buf = new StringBuffer();

			for (String name : names) {
				//buf.append(name + ",");
				buf.append(name + " ");
			}

			if (address != null) {
				//buf.append(address + ",");
				buf.append(address + " ");
			}
			if (city != null) {
				//buf.append(city + ",");
				buf.append(city + " ");
			}
			if (street != null) {
				//buf.append(street + ",");
				buf.append(street + " ");
			}
			if (state != null) {
				//buf.append(state);
				buf.append(state);
			}

			// get rid of any spaces
			//cachedMatchingString = buf.toString().replace(" ", "");
			cachedMatchingString = buf.toString();		
		}
		return cachedMatchingString;
	}


A typical instance of this object would have the following data

id=123134
names= {'Emmett Walsh', 'John Walsh'}
address="Milltown."
city="Dublin"
street="Bankside Cottages"
state="Ireland"
cachedMatchingString="Emmett Walsh John Walsh Milltown Dublin Bankside
Cottages Ireland"



I have about 8000 of these objects which I store in a hashmap for direct
access based on there id. It is the id(s) that I need lucene to return to me
so that I can retireve the desired object(s) from the hashmap.

The cachedMatchingString string is what I use when indexing the information
like following:

e.g. so for each of the 8000 objects i do the following to build the index

idxWriter.addDocument(createDocument(prop));


private static Document createDocument(Property prop) {
		Document doc = new Document();

		Field field = new Field("id", prop.getId(), Store.YES, Index.NO);
		doc.add(field);

		Field field2 = new Field("content", prop.toString(), Store.NO,
				Index.TOKENIZED);  //will tokenise our long string
		doc.add(field2);

		return doc;
	}



Searching
I have a text field in my app that typically would take in a strings like

"Main s" , which
would end up in a query like "Main AND s*" like follows

"B", which would end up in a query like "b*"

                        queryString = queryString.trim();  //the string
taken in from textbox
			
			
			String[] strings = queryString.split(" ");
			int numStrings = strings.length;
			
			StringBuffer luceneQuery = new StringBuffer();
			
			for(int i=0;i<numStrings;i++){
				String aString = strings[i];
				if(aString.equals("")){
					continue;
				}
				luceneQuery.append(aString);
				if((i+1) != numStrings){
					luceneQuery.append(" AND ");
				}else{
					luceneQuery.append("*");
				}
			}
			
			//BooleanQuery.setMaxClauseCount(2*1024);
			
			
			QueryParser parser = new QueryParser("content",
					new StandardAnalyzer());
			
			
			org.apache.lucene.search.Query query =
parser.parse(luceneQuery.toString());

			// Search for the query
			Hits hits = searcher.search(query);


but i keep getting Toomanyclausesexceptions

Any suggestions on how to index or search better ?

-- 
View this message in context: http://www.nabble.com/Indexing-and-searching-help-tf4020937.html#a11420608
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Indexing and searching help

Posted by Chris Hostetter <ho...@fucit.org>.
Questions about *using* the lucene APIs should be sent to the *user* list
... the dev list is for discusion about the development of the internals.

Please ask your question on that list, but before doing so you may want to
check out the FAQ on TooManyClauses and search the archives for
"prefix TooManyClauses" ... there has been a lot of helpful discussion in
the (not to distant) past...

http://wiki.apache.org/lucene-java/LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831
http://www.nabble.com/forum/Search.jtp?forum=44&local=y&query=TooManyClauses+prefix


: Date: Tue, 3 Jul 2007 14:03:37 -0700 (PDT)
: From: emmettwalsh <em...@gmail.com>
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Indexing and searching help
:
:
: Hi,
:
: lucene documentation seems to be very confusing... here is my predicament
:
: I have an object like the following:
:
: public class PropertyImpl implements Property {
:
: 	String id;
:
: 	List<String> names = new ArrayList<String>();
:
: 	String address = "";
:
: 	String city = "";
:
: 	String street = "";
:
: 	String state = "";
:
: 	String cachedMatchingString;
:
: ...
:
: ..
: public String toString() {
: 		// save some time
: 		if (cachedMatchingString == null) {
:
: 			StringBuffer buf = new StringBuffer();
:
: 			for (String name : names) {
: 				//buf.append(name + ",");
: 				buf.append(name + " ");
: 			}
:
: 			if (address != null) {
: 				//buf.append(address + ",");
: 				buf.append(address + " ");
: 			}
: 			if (city != null) {
: 				//buf.append(city + ",");
: 				buf.append(city + " ");
: 			}
: 			if (street != null) {
: 				//buf.append(street + ",");
: 				buf.append(street + " ");
: 			}
: 			if (state != null) {
: 				//buf.append(state);
: 				buf.append(state);
: 			}
:
: 			// get rid of any spaces
: 			//cachedMatchingString = buf.toString().replace(" ", "");
: 			cachedMatchingString = buf.toString();
: 		}
: 		return cachedMatchingString;
: 	}
:
:
: A typical instance of this object would have the following data
:
: id=123134
: names= {'Emmett Walsh', 'John Walsh'}
: address="Milltown."
: city="Dublin"
: street="Bankside Cottages"
: state="Ireland"
: cachedMatchingString="Emmett Walsh John Walsh Milltown Dublin Bankside
: Cottages Ireland"
:
:
:
: I have about 8000 of these objects which I store in a hashmap for direct
: access based on there id. It is the id(s) that I need lucene to return to me
: so that I can retireve the desired object(s) from the hashmap.
:
: The cachedMatchingString string is what I use when indexing the information
: like following:
:
: e.g. so for each of the 8000 objects i do the following to build the index
:
: idxWriter.addDocument(createDocument(prop));
:
:
: private static Document createDocument(Property prop) {
: 		Document doc = new Document();
:
: 		Field field = new Field("id", prop.getId(), Store.YES, Index.NO);
: 		doc.add(field);
:
: 		Field field2 = new Field("content", prop.toString(), Store.NO,
: 				Index.TOKENIZED);  //will tokenise our long string
: 		doc.add(field2);
:
: 		return doc;
: 	}
:
:
:
: Searching
: I have a text field in my app that typically would take in a strings like
:
: "Main s" , which
: would end up in a query like "Main AND s*" like follows
:
: "B", which would end up in a query like "b*"
:
:                         queryString = queryString.trim();  //the string
: taken in from textbox
:
:
: 			String[] strings = queryString.split(" ");
: 			int numStrings = strings.length;
:
: 			StringBuffer luceneQuery = new StringBuffer();
:
: 			for(int i=0;i<numStrings;i++){
: 				String aString = strings[i];
: 				if(aString.equals("")){
: 					continue;
: 				}
: 				luceneQuery.append(aString);
: 				if((i+1) != numStrings){
: 					luceneQuery.append(" AND ");
: 				}else{
: 					luceneQuery.append("*");
: 				}
: 			}
:
: 			//BooleanQuery.setMaxClauseCount(2*1024);
:
:
: 			QueryParser parser = new QueryParser("content",
: 					new StandardAnalyzer());
:
:
: 			org.apache.lucene.search.Query query =
: parser.parse(luceneQuery.toString());
:
: 			// Search for the query
: 			Hits hits = searcher.search(query);
:
:
: but i keep getting Toomanyclausesexceptions
:
: Any suggestions on how to index or search better ?
:
: --
: View this message in context: http://www.nabble.com/Indexing-and-searching-help-tf4020937.html#a11420608
: Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-dev-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org