You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Liaqat Ali <li...@gmail.com> on 2007/09/18 23:23:19 UTC

lucene for Arabic and Urdu

Hello All

I m new to the field of Information Retrieval and now working to 
develop  search engine for language like Arabic  and Urdu. Kindly guide 
me in this regard that how can Lucene be utilized for this purpose.
Can anybody tell me exactly what I should do to design a search engine 
from the scratch using Lucene.

Liaqat Ali


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene for Arabic and Urdu

Posted by Grant Ingersoll <gs...@apache.org>.
http://wiki.apache.org/lucene-java/IndexingOtherLanguages is slightly  
out of date, but still has some tips.  You may also want to consider  
starting with Solr, as it has many features a search engine needs.

In the past, I have written Arabic Analyzers for Lucene (sorry, can't  
share them) but I can tell you that you can also find some by  
searching for it on Google.

And, as Karl said, the "Lucene In Action" book is excellent, even if  
it is a bit out of date.  Most of the concepts still apply.

On Sep 18, 2007, at 5:23 PM, Liaqat Ali wrote:

> Hello All
>
> I m new to the field of Information Retrieval and now working to  
> develop  search engine for language like Arabic  and Urdu. Kindly  
> guide me in this regard that how can Lucene be utilized for this  
> purpose.
> Can anybody tell me exactly what I should do to design a search  
> engine from the scratch using Lucene.
>
> Liaqat Ali
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene for Arabic and Urdu

Posted by Karl Wettin <ka...@gmail.com>.
18 sep 2007 kl. 23.23 skrev Liaqat Ali:

> I m new to the field of Information Retrieval and now working to  
> develop  search engine for language like Arabic  and Urdu. Kindly  
> guide me in this regard that how can Lucene be utilized for this  
> purpose.

Lucene makes no distinction between languages. All data is discrete  
chunks of characters, also known as tokens. Tokens are repsresented  
in fields, and the combination of a token in a specific field is  
known as a term. What tokens your index end up containing depends on  
the analyzer strategy you will be using. An analyzer could be  
language sensitive, it could also be something completely different.

> Can anybody tell me exactly what I should do to design a search  
> engine from the scratch using Lucene.

You need to define what your search engine is supposed to do in order  
to get an answer that makes sense.


Lucene in action is a pretty good book, even though it covers 1.4 or  
so. The SVN contains a demo application. There is also the Wiki and  
this forum.

-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org