You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thierry Collogne <th...@gmail.com> on 2007/03/21 10:05:46 UTC

Problems with special characters

Hello,

I am using the post.jar file to update the search indexes. Problem is that
foreign characters like é, à, ... don't work correctly.

Even when I use the example xml files (like utf8-example.xml), the
characters don't work. Could this be a problem with the post.jar?

When I open the files in BabelPad, the characters are shown ok and the
editor tells me that that the file is in UTF-8

Where can I get the latest version of this file, or is there another way of
updating the index on windows?

Re: Problems with special characters

Posted by Ma...@ibsbe.be.
No, i didn't try to use it (on account of the fact that we dont use Solr 
to display the results)
the only thing our Solr server returns are ID's ... so there is nothing to 
put highlights on

but the code doesnt look half bad :)
lets hope the Client Developers pick up on it :)




"Thierry Collogne" <th...@gmail.com> 
22/03/2007 11:27
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thanks. Did you also try using it?

On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> 
wrote:
>
> nice one !
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 22/03/2007 09:00
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thanks. I made some modifications to SolrQuery.java to allow 
highlighting.
> I
> will post the code on
>
> http://issues.apache.org/jira/browse/SOLR-20
>
>
>
> On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > we didnt use it, but i took a quick look :
> >
> > you need to implement the "hl=on" attribute in the getquerystring()
> method
> > of the solrqueryImpl
> >
> > the resultdocs allready contain highlighting, that's why you found
> > processHighlighting in the Resultparser
> >
> > good luck !
> > m
> >
> >
> >
> >
> > "Thierry Collogne" <th...@gmail.com>
> > 21/03/2007 17:04
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
>
> >
> >
> >
> >
> >
> > Thank you. When I add the code you described, the Solr Java Client
> works.
> > One more question about the Solr Java Client.
> >
> > Does it allow the use of highlighting? I void a processHighlighting
> method
> > in ResultsParser.java, but I can't find a way of enabling it.
> >
> > Did you use highlighting?
> >
> > On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> > wrote:
> > >
> > > hey,
> > >
> > > we had the same problem with the Solr Java Client ...
> > >
> > > they forgot to put UTF-8 encoding on the stream ...
> > >
> > > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > > it's this post :
> > > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > > Frederic Hennequin [07/Mar/07 08:27 AM]
> > >
> > > grts,m
> > >
> > >
> > >
> > >
> > >
> > > "Bertrand Delacretaz" <bd...@apache.org>
> > > Sent by: bdelacretaz@gmail.com
> > > 21/03/2007 11:19
> > > Please respond to
> > > solr-user@lucene.apache.org
> > >
> > >
> > > To
> > > solr-user@lucene.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Problems with special characters
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> > >
> > > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
> jar
> > > call
> > > > and the problem isn't there anymore...
> > >
> > > ok, thanks for the feedback!
> > >
> > > -Bertrand
> > >
> > >
> >
> >
>
>


Re: Problems with special characters

Posted by Thierry Collogne <th...@gmail.com>.
Thanks. Did you also try using it?

On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> nice one !
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 22/03/2007 09:00
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thanks. I made some modifications to SolrQuery.java to allow highlighting.
> I
> will post the code on
>
> http://issues.apache.org/jira/browse/SOLR-20
>
>
>
> On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > we didnt use it, but i took a quick look :
> >
> > you need to implement the "hl=on" attribute in the getquerystring()
> method
> > of the solrqueryImpl
> >
> > the resultdocs allready contain highlighting, that's why you found
> > processHighlighting in the Resultparser
> >
> > good luck !
> > m
> >
> >
> >
> >
> > "Thierry Collogne" <th...@gmail.com>
> > 21/03/2007 17:04
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
>
> >
> >
> >
> >
> >
> > Thank you. When I add the code you described, the Solr Java Client
> works.
> > One more question about the Solr Java Client.
> >
> > Does it allow the use of highlighting? I void a processHighlighting
> method
> > in ResultsParser.java, but I can't find a way of enabling it.
> >
> > Did you use highlighting?
> >
> > On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> > wrote:
> > >
> > > hey,
> > >
> > > we had the same problem with the Solr Java Client ...
> > >
> > > they forgot to put UTF-8 encoding on the stream ...
> > >
> > > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > > it's this post :
> > > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > > Frederic Hennequin [07/Mar/07 08:27 AM]
> > >
> > > grts,m
> > >
> > >
> > >
> > >
> > >
> > > "Bertrand Delacretaz" <bd...@apache.org>
> > > Sent by: bdelacretaz@gmail.com
> > > 21/03/2007 11:19
> > > Please respond to
> > > solr-user@lucene.apache.org
> > >
> > >
> > > To
> > > solr-user@lucene.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Problems with special characters
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> > >
> > > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
> jar
> > > call
> > > > and the problem isn't there anymore...
> > >
> > > ok, thanks for the feedback!
> > >
> > > -Bertrand
> > >
> > >
> >
> >
>
>

Re: Problems with special characters

Posted by Ma...@ibsbe.be.
nice one !




"Thierry Collogne" <th...@gmail.com> 
22/03/2007 09:00
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thanks. I made some modifications to SolrQuery.java to allow highlighting. 
I
will post the code on

http://issues.apache.org/jira/browse/SOLR-20



On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> 
wrote:
>
> we didnt use it, but i took a quick look :
>
> you need to implement the "hl=on" attribute in the getquerystring() 
method
> of the solrqueryImpl
>
> the resultdocs allready contain highlighting, that's why you found
> processHighlighting in the Resultparser
>
> good luck !
> m
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 21/03/2007 17:04
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>

>
>
>
>
>
> Thank you. When I add the code you described, the Solr Java Client 
works.
> One more question about the Solr Java Client.
>
> Does it allow the use of highlighting? I void a processHighlighting 
method
> in ResultsParser.java, but I can't find a way of enabling it.
>
> Did you use highlighting?
>
> On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > hey,
> >
> > we had the same problem with the Solr Java Client ...
> >
> > they forgot to put UTF-8 encoding on the stream ...
> >
> > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > it's this post :
> > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > Frederic Hennequin [07/Mar/07 08:27 AM]
> >
> > grts,m
> >
> >
> >
> >
> >
> > "Bertrand Delacretaz" <bd...@apache.org>
> > Sent by: bdelacretaz@gmail.com
> > 21/03/2007 11:19
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
> >
> >
> >
> >
> >
> > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> >
> > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my 
jar
> > call
> > > and the problem isn't there anymore...
> >
> > ok, thanks for the feedback!
> >
> > -Bertrand
> >
> >
>
>


Re: Problems with special characters

Posted by Thierry Collogne <th...@gmail.com>.
Thanks. I made some modifications to SolrQuery.java to allow highlighting. I
will post the code on

http://issues.apache.org/jira/browse/SOLR-20



On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> we didnt use it, but i took a quick look :
>
> you need to implement the "hl=on" attribute in the getquerystring() method
> of the solrqueryImpl
>
> the resultdocs allready contain highlighting, that's why you found
> processHighlighting in the Resultparser
>
> good luck !
> m
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 21/03/2007 17:04
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thank you. When I add the code you described, the Solr Java Client works.
> One more question about the Solr Java Client.
>
> Does it allow the use of highlighting? I void a processHighlighting method
> in ResultsParser.java, but I can't find a way of enabling it.
>
> Did you use highlighting?
>
> On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > hey,
> >
> > we had the same problem with the Solr Java Client ...
> >
> > they forgot to put UTF-8 encoding on the stream ...
> >
> > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > it's this post :
> > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > Frederic Hennequin [07/Mar/07 08:27 AM]
> >
> > grts,m
> >
> >
> >
> >
> >
> > "Bertrand Delacretaz" <bd...@apache.org>
> > Sent by: bdelacretaz@gmail.com
> > 21/03/2007 11:19
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
> >
> >
> >
> >
> >
> > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> >
> > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> > call
> > > and the problem isn't there anymore...
> >
> > ok, thanks for the feedback!
> >
> > -Bertrand
> >
> >
>
>

Re: Problems with special characters

Posted by Ma...@ibsbe.be.
we didnt use it, but i took a quick look :

you need to implement the "hl=on" attribute in the getquerystring() method 
of the solrqueryImpl

the resultdocs allready contain highlighting, that's why you found 
processHighlighting in the Resultparser

good luck !
m




"Thierry Collogne" <th...@gmail.com> 
21/03/2007 17:04
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thank you. When I add the code you described, the Solr Java Client works.
One more question about the Solr Java Client.

Does it allow the use of highlighting? I void a processHighlighting method
in ResultsParser.java, but I can't find a way of enabling it.

Did you use highlighting?

On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> 
wrote:
>
> hey,
>
> we had the same problem with the Solr Java Client ...
>
> they forgot to put UTF-8 encoding on the stream ...
>
> i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> it's this post :
> http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> Frederic Hennequin [07/Mar/07 08:27 AM]
>
> grts,m
>
>
>
>
>
> "Bertrand Delacretaz" <bd...@apache.org>
> Sent by: bdelacretaz@gmail.com
> 21/03/2007 11:19
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> call
> > and the problem isn't there anymore...
>
> ok, thanks for the feedback!
>
> -Bertrand
>
>


Re: Problems with special characters

Posted by Thierry Collogne <th...@gmail.com>.
Thank you. When I add the code you described, the Solr Java Client works.
One more question about the Solr Java Client.

Does it allow the use of highlighting? I void a processHighlighting method
in ResultsParser.java, but I can't find a way of enabling it.

Did you use highlighting?

On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> hey,
>
> we had the same problem with the Solr Java Client ...
>
> they forgot to put UTF-8 encoding on the stream ...
>
> i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> it's this post :
> http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> Frederic Hennequin [07/Mar/07 08:27 AM]
>
> grts,m
>
>
>
>
>
> "Bertrand Delacretaz" <bd...@apache.org>
> Sent by: bdelacretaz@gmail.com
> 21/03/2007 11:19
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> call
> > and the problem isn't there anymore...
>
> ok, thanks for the feedback!
>
> -Bertrand
>
>

Re: Problems with special characters

Posted by Ma...@ibsbe.be.
hey,

we had the same problem with the Solr Java Client ...

they forgot to put UTF-8 encoding on the stream ...

i posted our fix on http://issues.apache.org/jira/browse/SOLR-20 
it's this post : 
http://issues.apache.org/jira/browse/SOLR-20#action_12478810
Frederic Hennequin [07/Mar/07 08:27 AM] 

grts,m 





"Bertrand Delacretaz" <bd...@apache.org> 
Sent by: bdelacretaz@gmail.com
21/03/2007 11:19
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:

> I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar 
call
> and the problem isn't there anymore...

ok, thanks for the feedback!

-Bertrand


Re: Problems with special characters

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:

> I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call
> and the problem isn't there anymore...

ok, thanks for the feedback!

-Bertrand

Re: Problems with special characters

Posted by Thierry Collogne <th...@gmail.com>.
I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call
and the problem isn't there anymore.

Thanks a lot for the help.

On 21/03/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > ...What would be the best way of building the SimplePostTool.java
>
> You can use "ant example" in the top-level directory of the Solr source
> code.
>
> I have attached the current post.jar to SOLR-194 for convenience.
>
> -Bertrand
>

Re: Problems with special characters

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:

> ...What would be the best way of building the SimplePostTool.java

You can use "ant example" in the top-level directory of the Solr source code.

I have attached the current post.jar to SOLR-194 for convenience.

-Bertrand

Re: Problems with special characters

Posted by Thierry Collogne <th...@gmail.com>.
I have used you first workaround and this works for me. What would be the
best way of building the
SimplePostTool.java<https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java>
?

On 21/03/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> On 3/21/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> > ...For now, using this as a workaround should help:
> >
> > java -Dfile.encoding=UTF-8 -jar post.jar
> > http://localhost:8983/solr/update utf8-example.xml..
>
> Should be fixed now, if you can grab the latest SimplePostToolCode [1]
> it should work irrelevant of the default JVM encoding. Please confirm
> if you test it.
>
> It's a kind of brute force fix, I have hardcoded the encoding as
> UTF-8, I'm keeping SOLR-194 open so that we don't forget to fix this
> (but considering SOLR-190 it's not urgent to fix).
>
> -Bertrand
>
> [1]
> https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java
>

Re: Problems with special characters

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Bertrand Delacretaz <bd...@apache.org> wrote:

> ...For now, using this as a workaround should help:
>
> java -Dfile.encoding=UTF-8 -jar post.jar
> http://localhost:8983/solr/update utf8-example.xml..

Should be fixed now, if you can grab the latest SimplePostToolCode [1]
it should work irrelevant of the default JVM encoding. Please confirm
if you test it.

It's a kind of brute force fix, I have hardcoded the encoding as
UTF-8, I'm keeping SOLR-194 open so that we don't forget to fix this
(but considering SOLR-190 it's not urgent to fix).

-Bertrand

[1] https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java

Re: Problems with special characters

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:

> ...I am using the post.jar file to update the search indexes. Problem is that
> foreign characters like é, à, ... don't work correctly...

You're right, I have entered the issue in
https://issues.apache.org/jira/browse/SOLR-194

For now, using this as a workaround should help:

java -Dfile.encoding=UTF-8 -jar post.jar
http://localhost:8983/solr/update utf8-example.xml

-Bertrand