You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Carsten Schnober <sc...@ids-mannheim.de> on 2012/04/11 18:12:48 UTC

Indexing Pre-analyzed Field

Hi,
I've been wondering about best way to index a pre-analyzed field. With
pre-analyzed, I mean essentially one that I'd like to initialize with
the constructor Field(String name, TokenStream tokenStream).

There is a loop about a bunch of document, all with pre-defined
tokenizations that are stored in the variable tokenizations. One by one,
the Lucene documents are added to the writer. The writer is an
IndexWriter object that has been initialized and configured before.

I have implemented a custom TokenStream class for that purpose, so I've
approached the problem like the following:

CustomTokenStream ts = new CustomTokenStream();
for (tokenization : tokenizations) {
	idField = new Field("id", doc.getDocid(), Field.Store.YES,
Field.Index.NOT_ANALYZED);

	ts.setTokenization(tokenization);			
	textField = new Field("text", ts);

	luceneDocument.add(idField);
	luceneDocument.add(textField);
	try {
		writer.addDocument(luceneDocument);
	} catch (IOException e) {
		System.err.println("Error adding document:\n"+e.getLocalizedMessage());
	}
}

The problem is clearly that I cannot query the text field, can I?

I've tried other ways though like initializing the text field with

textField = new Field(String name, String value, Field.Store.YES,
Field.Index.ANALYZED)

and setting

textField.setTokenStream(ts);


However, this does not seem to make sense since I don't want to use a
Lucene built-in analyzer and I'm not quite clear about what I should use
for the value in the latter approach.

Any help is very welcome! Thank you very much!
Best regards,
Carsten

-- 
Carsten Schnober
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP -- Korpusanalyseplattform der nächsten Generation
http://korap.ids-mannheim.de/ | Tel.: +49-(0)621-1581-238

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org