You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walid ABDELKABIR <ab...@gmail.com> on 2009/03/18 12:46:20 UTC
solrj : probleme with utf-8 content
when executing this code I got in my index the field "includes" with this
value : "????? ???? ????????????? ?????" :
---------------------------
String content ="eaiou with circumflexes: êâîôû";
SolrInputDocument doc = new SolrInputDocument();
doc.addField( "id", "123", 1.0f );
doc.addField( "includes", content, 1.0f );
server.add( doc );
---------------------------
but this code works fine :
-------------------------------
String addContent = "<add><doc boost="1.0">"
+"<field name="id">123</field><field
name="includes">eaiou with circumflexes:âîôû</field>"
+"</doc></add>";
DirectXmlRequest up = new DirectXmlRequest( "/update", addContent );
server.request( up );
-------------------------------
thanks for help
Re: solrj : probleme with utf-8 content
Posted by Pascal Dimassimo <th...@hotmail.com>.
Hi,
I have that problem to. But I notice that it only happens if I send my data
via solrj. If I send it via the solr-ruby gem, everything is fine
(http://wiki.apache.org/solr/solr-ruby).
Here is my jruby script:
-------------------------------
require 'rubygems'
require 'solr'
require 'rexml/document'
include Java
def send_via_solrj(text, url)
doc = org.apache.solr.common.SolrInputDocument.new
doc.addField('id', '1')
doc.addField('text', text)
server = org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.new(url)
server.add(doc);
server.commit();
end
def send_via_gem(text, url)
solr_doc = Solr::Document.new
solr_doc['id'] = '2'
solr_doc['text'] = text
options = {
:autocommit => :on
}
conn = Solr::Connection.new(url, options)
conn.add(solr_doc)
end
host = 'localhost'
port = '8888'
path = '/solr/core0'
url = "http://#{host}:#{port}#{path}"
text = "eaiou with circumflexes: êâîôû"
send_via_solrj(text, url)
send_via_gem(text, url)
puts "done!"
-------------------------------
If I watch the http messages with tcpmon, I see that the data sent via solrj
is encoded in cp1252 while the data sent via the gem is utf-8.
Anyone has an idea of how we can configure sorlj to send in utf-8?
Thanks in advance.
Walid ABDELKABIR wrote:
>
> when executing this code I got in my index the field "includes" with this
> value : "????? ???? ????????????? ?????" :
> ---------------------------
> String content ="eaiou with circumflexes: êâîôû";
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField( "id", "123", 1.0f );
> doc.addField( "includes", content, 1.0f );
> server.add( doc );
> ---------------------------
>
> but this code works fine :
>
> -------------------------------
> String addContent = "<add><doc boost="1.0">"
> +"<field name="id">123</field><field
> name="includes">eaiou with circumflexes:âîôû</field>"
> +"</doc></add>";
> DirectXmlRequest up = new DirectXmlRequest( "/update", addContent );
> server.request( up );
> -------------------------------
>
> thanks for help
>
>
--
View this message in context: http://www.nabble.com/solrj-%3A-probleme-with-utf-8-content-tp22577377p22620317.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrj : probleme with utf-8 content
Posted by Pascal Dimassimo <th...@hotmail.com>.
yes, now it works fine with the trunk sources
thanks!
Noble Paul നോബിള് नोब्ळ् wrote:
>
> SOLR-973 seems to have caused the problem
>
> On Fri, Mar 20, 2009 at 11:01 PM, Ryan McKinley <ry...@gmail.com> wrote:
>> do you know if your java file is encoded with utf-8?
>>
>> sometimes it will be encoded as something different and that can cause
>> funny
>> problems..
>>
>>
>> On Mar 18, 2009, at 7:46 AM, Walid ABDELKABIR wrote:
>>
>>> when executing this code I got in my index the field "includes" with
>>> this
>>> value : "????? ???? ????????????? ?????" :
>>> ---------------------------
>>> String content ="eaiou with circumflexes: êâîôû";
>>> SolrInputDocument doc = new SolrInputDocument();
>>> doc.addField( "id", "123", 1.0f );
>>> doc.addField( "includes", content, 1.0f );
>>> server.add( doc );
>>> ---------------------------
>>>
>>> but this code works fine :
>>>
>>> -------------------------------
>>> String addContent = "<add><doc boost="1.0">"
>>> +"<field name="id">123</field><field
>>> name="includes">eaiou with circumflexes:âîôû</field>"
>>> +"</doc></add>";
>>> DirectXmlRequest up = new DirectXmlRequest( "/update", addContent );
>>> server.request( up );
>>> -------------------------------
>>>
>>> thanks for help
>>
>>
>
>
>
> --
> --Noble Paul
>
>
--
View this message in context: http://www.nabble.com/solrj-%3A-probleme-with-utf-8-content-tp22577377p22627715.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrj : probleme with utf-8 content
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
SOLR-973 seems to have caused the problem
On Fri, Mar 20, 2009 at 11:01 PM, Ryan McKinley <ry...@gmail.com> wrote:
> do you know if your java file is encoded with utf-8?
>
> sometimes it will be encoded as something different and that can cause funny
> problems..
>
>
> On Mar 18, 2009, at 7:46 AM, Walid ABDELKABIR wrote:
>
>> when executing this code I got in my index the field "includes" with this
>> value : "????? ???? ????????????? ?????" :
>> ---------------------------
>> String content ="eaiou with circumflexes: êâîôû";
>> SolrInputDocument doc = new SolrInputDocument();
>> doc.addField( "id", "123", 1.0f );
>> doc.addField( "includes", content, 1.0f );
>> server.add( doc );
>> ---------------------------
>>
>> but this code works fine :
>>
>> -------------------------------
>> String addContent = "<add><doc boost="1.0">"
>> +"<field name="id">123</field><field
>> name="includes">eaiou with circumflexes:âîôû</field>"
>> +"</doc></add>";
>> DirectXmlRequest up = new DirectXmlRequest( "/update", addContent );
>> server.request( up );
>> -------------------------------
>>
>> thanks for help
>
>
--
--Noble Paul
Re: solrj : probleme with utf-8 content
Posted by Ryan McKinley <ry...@gmail.com>.
do you know if your java file is encoded with utf-8?
sometimes it will be encoded as something different and that can cause
funny problems..
On Mar 18, 2009, at 7:46 AM, Walid ABDELKABIR wrote:
> when executing this code I got in my index the field "includes" with
> this
> value : "????? ???? ????????????? ?????" :
> ---------------------------
> String content ="eaiou with circumflexes: êâîôû";
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField( "id", "123", 1.0f );
> doc.addField( "includes", content, 1.0f );
> server.add( doc );
> ---------------------------
>
> but this code works fine :
>
> -------------------------------
> String addContent = "<add><doc boost="1.0">"
> +"<field name="id">123</field><field
> name="includes">eaiou with circumflexes:âîôû</field>"
> +"</doc></add>";
> DirectXmlRequest up = new DirectXmlRequest( "/update", addContent );
> server.request( up );
> -------------------------------
>
> thanks for help