You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rupert Fiasco <ru...@gmail.com> on 2009/08/11 01:03:17 UTC

Question mark glyphs in indexed content

Hello, I am using the latest Solr4j to index content. When I look at
that content in the Solr Admin web utility I see weird characters like
this:

http://brockwine.com/images/solrglyphs.png

When I look at the text in the MySQL DB those chars appear to just be
plain hyphens. The MySQL table character set is utf8 and the collation
is utf8.

Environment:
OS X 10.5.8
java version "1.5.0_19"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)

Solr Specification Version: 1.3.0
Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47
Lucene Specification Version: 2.4-dev
Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16

Jetty 6.1.3

Any thoughts?

Thanks
/Rupert

Re: Question mark glyphs in indexed content

Posted by Chris Hostetter <ho...@fucit.org>.
: Hello, I am using the latest Solr4j to index content. When I look at
: that content in the Solr Admin web utility I see weird characters like
: this:
: 
: http://brockwine.com/images/solrglyphs.png
: 
: When I look at the text in the MySQL DB those chars appear to just be
: plain hyphens. The MySQL table character set is utf8 and the collation
: is utf8.

What do you mean by "Solr4j" ?

more then likely, there is a character encoding problem somewhere between 
your database and Solr ... solr expects utf8 when you index content, but 
justbecause it's utf8 in your database doesn't mean the code reading from 
your database and sending it to Solr is using utf8 along the way ... 
knowing exactly waht that code looks like is neccessary to understand what 
might be happening here.


-Hoss