You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sundling, Paul" <pa...@sonyconnect.com> on 2007/07/27 04:26:34 UTC

Solr and Chines/Japenese

Are there any known Solr sites that are in Chinese or Japenese?
 
I need to include links to such sites for a comparison I'm doing on
enterprise search engines.
 
I realize that if I stay UTF-8 it should work and I can use the CJK
analyzer.
 
Paul Sundling

Re: Solr and Chines/Japenese

Posted by Alan Darnell <al...@utoronto.ca>.
What about scripts that are written right to left?  How does Solr  
handle these in terms of sorting and searching.  Can left-to-right  
and right-to-left scripts be handled in the same Solr document?

lan


On 27-Jul-07, at 8:29 AM, Erik Hatcher wrote:

>
> On Jul 27, 2007, at 6:17 AM, Erik Hatcher wrote:
>
>>
>> On Jul 26, 2007, at 10:26 PM, Sundling, Paul wrote:
>>> Are there any known Solr sites that are in Chinese or Japenese?
>>
>> This might be the first mention of this project in the Solr  
>> community, and I'm certainly not confident our server can handle  
>> the load but here goes anyway :)
>>
>> 	<http://blacklight.betech.virginia.edu/>
>>
>> The bulk of the content, 3.8M documents, is not Chinese, but there  
>> are 320 Tang dynasty poems indexed there with both English and  
>> Chinese content.  Click on the "Tang Dynasty Poems" on the top  
>> right facet.  You can search in Chinese, no problem too:
>
> I had trouble with the link I sent before, but maybe this one will  
> work more generally:
>
> 	<http://blacklight.betech.virginia.edu/search?q=%E7%81%AB+AND+%E6% 
> B0%B4>
>
>


Re: Solr and Chines/Japenese

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 27, 2007, at 6:17 AM, Erik Hatcher wrote:

>
> On Jul 26, 2007, at 10:26 PM, Sundling, Paul wrote:
>> Are there any known Solr sites that are in Chinese or Japenese?
>
> This might be the first mention of this project in the Solr  
> community, and I'm certainly not confident our server can handle  
> the load but here goes anyway :)
>
> 	<http://blacklight.betech.virginia.edu/>
>
> The bulk of the content, 3.8M documents, is not Chinese, but there  
> are 320 Tang dynasty poems indexed there with both English and  
> Chinese content.  Click on the "Tang Dynasty Poems" on the top  
> right facet.  You can search in Chinese, no problem too:

I had trouble with the link I sent before, but maybe this one will  
work more generally:

	<http://blacklight.betech.virginia.edu/search?q=%E7%81%AB+AND+%E6%B0% 
B4>



Re: Solr and Chines/Japenese

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 26, 2007, at 10:26 PM, Sundling, Paul wrote:
> Are there any known Solr sites that are in Chinese or Japenese?

This might be the first mention of this project in the Solr  
community, and I'm certainly not confident our server can handle the  
load but here goes anyway :)

	<http://blacklight.betech.virginia.edu/>

The bulk of the content, 3.8M documents, is not Chinese, but there  
are 320 Tang dynasty poems indexed there with both English and  
Chinese content.  Click on the "Tang Dynasty Poems" on the top right  
facet.  You can search in Chinese, no problem too:

	<http://blacklight.betech.virginia.edu/search?q=火+AND+水>   
(hopefully that link will pass through e-mail ok)

Blacklight is an unsupported demo of library data + Solr + Ruby on  
Rails.  The library data comes from 3 different sources:

	* MARC data from our integrated library system, converted to UTF8 -  
there are non-English words in some of this data (tinker with the  
language facet to stumble on Russian and other stuff)

	* TEI data sample from our "Digital Library"

	* HTML scrapped Tang dynasty poems

   Erik