You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sagar Chaturvedi <sa...@nectechnologies.in> on 2013/05/29 22:21:27 UTC

Support for Mongolian language

Hi All,

Does solr provide support for Mongolian language?

Also which filters and tokenizers must be used for Chinese, Japanese and Korean languages?

Regards,
Sagar Chaturvedi


DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
-----------------------------------------------------------------------------------------------------------------------

RE: Support for Mongolian language

Posted by Sagar Chaturvedi <sa...@nectechnologies.in>.
Thanks Alexandre for the link. It was really helpful.

The original text will be in UTF-8.

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Friday, May 31, 2013 8:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Well, you would need a tokenizer, probably a stemmer, a list of stop-words (to ignore). Is the original text in UTF8 or is it in some alternative encoding.

A quick search showed that there is an academic paper where they are trying to work with Mongolian to get it into Lucene. It seems quite relevant and would be a great point to start:
http://scholar.google.ca/scholar?cluster=15851397934729234574&hl=en&as_sdt=0,5

It also lists a lot of challenges that happened with other languages before UTF8 became the main standard (Russian and Ukranian come to mind).

Hope it helps,
    Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi <sa...@nectechnologies.in> wrote:
> What would be the steps if we want to use Mongolian or any other language that is not supported?
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Thursday, May 30, 2013 5:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Support for Mongolian language
>
> No, there is not.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Sagar Chaturvedi
> Sent: Thursday, May 30, 2013 3:03 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Support for Mongolian language
>
> I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that?
>
> -----Original Message-----
> From: bbarani [mailto:bbarani@gmail.com]
> Sent: Thursday, May 30, 2013 2:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Support for Mongolian language
>
> Check out..
>
> wiki.apache.org/solr/LanguageAnalysis‎
>
> For some reason the above site takes long time to open..
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp40
> 66871p4066874.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>
>
>
> DISCLAIMER:
> ----------------------------------------------------------------------
> -------------------------------------------------
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. .
> ----------------------------------------------------------------------
> -------------------------------------------------
>
>
>
>
> DISCLAIMER:
> ----------------------------------------------------------------------
> -------------------------------------------------
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or NEC or its 
> affiliates. Any views or opinions presented in this email are solely 
> those of the author and may not necessarily reflect the opinions of 
> NEC or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please 
> delete it and notify the sender immediately. .
> ----------------------------------------------------------------------
> -------------------------------------------------



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
-----------------------------------------------------------------------------------------------------------------------

Re: Support for Mongolian language

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Well, you would need a tokenizer, probably a stemmer, a list of
stop-words (to ignore). Is the original text in UTF8 or is it in some
alternative encoding.

A quick search showed that there is an academic paper where they are
trying to work with Mongolian to get it into Lucene. It seems quite
relevant and would be a great point to start:
http://scholar.google.ca/scholar?cluster=15851397934729234574&hl=en&as_sdt=0,5

It also lists a lot of challenges that happened with other languages
before UTF8 became the main standard (Russian and Ukranian come to
mind).

Hope it helps,
    Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 30, 2013 at 10:49 PM, Sagar Chaturvedi
<sa...@nectechnologies.in> wrote:
> What would be the steps if we want to use Mongolian or any other language that is not supported?
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Thursday, May 30, 2013 5:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Support for Mongolian language
>
> No, there is not.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Sagar Chaturvedi
> Sent: Thursday, May 30, 2013 3:03 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Support for Mongolian language
>
> I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that?
>
> -----Original Message-----
> From: bbarani [mailto:bbarani@gmail.com]
> Sent: Thursday, May 30, 2013 2:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Support for Mongolian language
>
> Check out..
>
> wiki.apache.org/solr/LanguageAnalysis‎
>
> For some reason the above site takes long time to open..
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. .
> -----------------------------------------------------------------------------------------------------------------------
>
>
>
>
> DISCLAIMER:
> -----------------------------------------------------------------------------------------------------------------------
> The contents of this e-mail and any attachment(s) are confidential and
> intended
> for the named recipient(s) only.
> It shall not attach any liability on the originator or NEC or its
> affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the
> opinions of NEC or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification,
> distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is
> strictly prohibited. If you have
> received this email in error please delete it and notify the sender
> immediately. .
> -----------------------------------------------------------------------------------------------------------------------

RE: Support for Mongolian language

Posted by bbarani <bb...@gmail.com>.
Please create a new topic for any new questions..



--
View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4067374.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Support for Mongolian language

Posted by Sagar Chaturvedi <sa...@nectechnologies.in>.
Hi,

On solr admin UI, in a query I am trying to highlight some fields. I have set hl = true, given name of comma separated fields in hl.fl but fields are not getting highlighted. Any insights?

Regards,
Sagar







DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
-----------------------------------------------------------------------------------------------------------------------

Re: Support for Mongolian language

Posted by Jack Krupansky <ja...@basetechnology.com>.
Try using the "text_general" field type and see how reasonable or 
unreasonable the standard tokenizer is at identifying reasonable word breaks 
for some sample Mongolian text.

Use the Solr Admin UI Analyzer page to see what the various term analysis 
filters output.

-- Jack Krupansky

-----Original Message----- 
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 10:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

What would be the steps if we want to use Mongolian or any other language 
that is not supported?

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Thursday, May 30, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

No, there is not.

-- Jack Krupansky

-----Original Message-----
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?

-----Original Message-----
From: bbarani [mailto:bbarani@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..






--
View this message in context:
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and 
intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its 
affiliates. Any views or opinions presented in this email are solely those 
of the author and may not necessarily reflect the opinions of NEC or its 
affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. .
-----------------------------------------------------------------------------------------------------------------------




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect 
the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail 
is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
----------------------------------------------------------------------------------------------------------------------- 


RE: Support for Mongolian language

Posted by Sagar Chaturvedi <sa...@nectechnologies.in>.
What would be the steps if we want to use Mongolian or any other language that is not supported?

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Thursday, May 30, 2013 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

No, there is not.

-- Jack Krupansky

-----Original Message-----
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that?

-----Original Message-----
From: bbarani [mailto:bbarani@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. .
----------------------------------------------------------------------------------------------------------------------- 




DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
-----------------------------------------------------------------------------------------------------------------------

Re: Support for Mongolian language

Posted by Jack Krupansky <ja...@basetechnology.com>.
No, there is not.

-- Jack Krupansky

-----Original Message----- 
From: Sagar Chaturvedi
Sent: Thursday, May 30, 2013 3:03 AM
To: solr-user@lucene.apache.org
Subject: RE: Support for Mongolian language

I have already checked this link. Could not find any hint about Mongolian 
language. Is there any plugin available for that?

-----Original Message-----
From: bbarani [mailto:bbarani@gmail.com]
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open..






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect 
the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail 
is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
----------------------------------------------------------------------------------------------------------------------- 


RE: Support for Mongolian language

Posted by Sagar Chaturvedi <sa...@nectechnologies.in>.
I have already checked this link. Could not find any hint about Mongolian language. Is there any plugin available for that?

-----Original Message-----
From: bbarani [mailto:bbarani@gmail.com] 
Sent: Thursday, May 30, 2013 2:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Support for Mongolian language

Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open.. 






--
View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.



DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
-----------------------------------------------------------------------------------------------------------------------

Re: Support for Mongolian language

Posted by Upayavira <uv...@odoko.co.uk>.

On Wed, May 29, 2013, at 09:34 PM, bbarani wrote:
> Check out..
> 
> wiki.apache.org/solr/LanguageAnalysis‎
> 
> For some reason the above site takes long time to open.. 

There's a known performance issue with the wiki. Admins are working on
it.

Upayavira

Re: Support for Mongolian language

Posted by bbarani <bb...@gmail.com>.
Check out..

wiki.apache.org/solr/LanguageAnalysis‎

For some reason the above site takes long time to open.. 






--
View this message in context: http://lucene.472066.n3.nabble.com/Support-for-Mongolian-language-tp4066871p4066874.html
Sent from the Solr - User mailing list archive at Nabble.com.