You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/01/19 23:43:11 UTC

Re: languages & files

Hello,
For the first problem (indexing different types of documents), you can use the mini-framework for doing just that.  Just get the source code that comes with Lucene in Action, and play - http://www.lucenebook.com/

For the Analyzers, look what Snowball provides (do a search at lucenebook.com or google.com).  I imagine only Dutch is supported, and I imagine you may be able to find a Bulgarian Analyzer somewhere, but the other two languages may be harder ...

Otis

----- Original Message ----
From: arnaudbuffet@free.fr
To: java-user@lucene.apache.org
Sent: Thu 19 Jan 2006 11:01:07 PM EST
Subject: languages &  files 

Hi,

I begin working with lucene and need few explanations to do what i want,
thanks for your helpful answers.

I have to add lucene into a java application and I have two targets:

- To enable search throw different types of files, like MS Word, PDF or
Excel files. 
I read that each type of document must be indexed with the appropriate
indexer. So how can I do it easily? I found an API called Lius which
seem to index different types of documents directly, is anyone know this
product? Other ones?

- Secondly, the search system must work with different like Dutch,
Turkish, Bulgarian and tomorrow Thaï or Chinese. 
The lucene documentation talks about using different Analyzer for none
ISO languages. Lucene's sandbox is quite empty, and I do not understand
which kind of treatment much be done to read correctly data. I think I
still have problem searching into indexed documents with accents. So how
should I work on the languages particularity?

Thanks

A


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org