You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Farzad Mahdikhani <fa...@yahoo.com> on 2007/08/13 12:30:34 UTC

cross-lingual IR

 Dear All, 
 
 I would like to implement a cross-lingual IR system with support for Persian and English languages for an academic research task. How can I use Lucene for my task? How shall I proceed? what are the requirements? 
 
 Regards, 
 Farzad
       
---------------------------------
Pinpoint customers who are looking for what you sell. 

Re: cross-lingual IR

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Farzad,

Hmmm, where to begin...  This is a tough question and one that  
warrants a fair amount of research.  I would start by taking a look  
at the TREC cross-language tracks and the CLEF conference.

I have used Lucene to index/search both the English and Arabic/French/ 
Spanish/Dutch/etc. documents.  In general, you need some way of  
transforming a source language query into a target language query OR  
you need some way of automatically translating all your documents to  
the same language.  How you do this is really the matter of research,  
eh?  The most basic approach to the query transformation problem is  
to use a dictionary to look up the terms from the source and get the  
target language equivalents.

As for Lucene, you will need an Analyzer that handles Persian (try  
googling "Persian Lucene Analyzer")  you may very well have to write  
your own.   The actual indexing and search tasks are relatively  
straightforward as Lucene tasks and there a number of good tutorials  
and books on how to do that.

Good luck,
Grant

On Aug 13, 2007, at 6:30 AM, Farzad Mahdikhani wrote:

>  Dear All,
>
>  I would like to implement a cross-lingual IR system with support  
> for Persian and English languages for an academic research task.  
> How can I use Lucene for my task? How shall I proceed? what are the  
> requirements?
>
>  Regards,
>  Farzad
>
> ---------------------------------
> Pinpoint customers who are looking for what you sell.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ