You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Liaqat Ali <li...@gmail.com> on 2007/09/18 23:23:19 UTC
lucene for Arabic and Urdu
Hello All
I m new to the field of Information Retrieval and now working to
develop search engine for language like Arabic and Urdu. Kindly guide
me in this regard that how can Lucene be utilized for this purpose.
Can anybody tell me exactly what I should do to design a search engine
from the scratch using Lucene.
Liaqat Ali
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene for Arabic and Urdu
Posted by Grant Ingersoll <gs...@apache.org>.
http://wiki.apache.org/lucene-java/IndexingOtherLanguages is slightly
out of date, but still has some tips. You may also want to consider
starting with Solr, as it has many features a search engine needs.
In the past, I have written Arabic Analyzers for Lucene (sorry, can't
share them) but I can tell you that you can also find some by
searching for it on Google.
And, as Karl said, the "Lucene In Action" book is excellent, even if
it is a bit out of date. Most of the concepts still apply.
On Sep 18, 2007, at 5:23 PM, Liaqat Ali wrote:
> Hello All
>
> I m new to the field of Information Retrieval and now working to
> develop search engine for language like Arabic and Urdu. Kindly
> guide me in this regard that how can Lucene be utilized for this
> purpose.
> Can anybody tell me exactly what I should do to design a search
> engine from the scratch using Lucene.
>
> Liaqat Ali
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene for Arabic and Urdu
Posted by Karl Wettin <ka...@gmail.com>.
18 sep 2007 kl. 23.23 skrev Liaqat Ali:
> I m new to the field of Information Retrieval and now working to
> develop search engine for language like Arabic and Urdu. Kindly
> guide me in this regard that how can Lucene be utilized for this
> purpose.
Lucene makes no distinction between languages. All data is discrete
chunks of characters, also known as tokens. Tokens are repsresented
in fields, and the combination of a token in a specific field is
known as a term. What tokens your index end up containing depends on
the analyzer strategy you will be using. An analyzer could be
language sensitive, it could also be something completely different.
> Can anybody tell me exactly what I should do to design a search
> engine from the scratch using Lucene.
You need to define what your search engine is supposed to do in order
to get an answer that makes sense.
Lucene in action is a pretty good book, even though it covers 1.4 or
so. The SVN contains a demo application. There is also the Wiki and
this forum.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org