You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thierry Collogne <th...@gmail.com> on 2007/03/21 10:05:46 UTC
Problems with special characters
Hello,
I am using the post.jar file to update the search indexes. Problem is that
foreign characters like é, à, ... don't work correctly.
Even when I use the example xml files (like utf8-example.xml), the
characters don't work. Could this be a problem with the post.jar?
When I open the files in BabelPad, the characters are shown ok and the
editor tells me that that the file is in UTF-8
Where can I get the latest version of this file, or is there another way of
updating the index on windows?
Re: Problems with special characters
Posted by Ma...@ibsbe.be.
No, i didn't try to use it (on account of the fact that we dont use Solr
to display the results)
the only thing our Solr server returns are ID's ... so there is nothing to
put highlights on
but the code doesnt look half bad :)
lets hope the Client Developers pick up on it :)
"Thierry Collogne" <th...@gmail.com>
22/03/2007 11:27
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Re: Problems with special characters
Thanks. Did you also try using it?
On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
wrote:
>
> nice one !
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 22/03/2007 09:00
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thanks. I made some modifications to SolrQuery.java to allow
highlighting.
> I
> will post the code on
>
> http://issues.apache.org/jira/browse/SOLR-20
>
>
>
> On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > we didnt use it, but i took a quick look :
> >
> > you need to implement the "hl=on" attribute in the getquerystring()
> method
> > of the solrqueryImpl
> >
> > the resultdocs allready contain highlighting, that's why you found
> > processHighlighting in the Resultparser
> >
> > good luck !
> > m
> >
> >
> >
> >
> > "Thierry Collogne" <th...@gmail.com>
> > 21/03/2007 17:04
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
>
> >
> >
> >
> >
> >
> > Thank you. When I add the code you described, the Solr Java Client
> works.
> > One more question about the Solr Java Client.
> >
> > Does it allow the use of highlighting? I void a processHighlighting
> method
> > in ResultsParser.java, but I can't find a way of enabling it.
> >
> > Did you use highlighting?
> >
> > On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> > wrote:
> > >
> > > hey,
> > >
> > > we had the same problem with the Solr Java Client ...
> > >
> > > they forgot to put UTF-8 encoding on the stream ...
> > >
> > > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > > it's this post :
> > > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > > Frederic Hennequin [07/Mar/07 08:27 AM]
> > >
> > > grts,m
> > >
> > >
> > >
> > >
> > >
> > > "Bertrand Delacretaz" <bd...@apache.org>
> > > Sent by: bdelacretaz@gmail.com
> > > 21/03/2007 11:19
> > > Please respond to
> > > solr-user@lucene.apache.org
> > >
> > >
> > > To
> > > solr-user@lucene.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Problems with special characters
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> > >
> > > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
> jar
> > > call
> > > > and the problem isn't there anymore...
> > >
> > > ok, thanks for the feedback!
> > >
> > > -Bertrand
> > >
> > >
> >
> >
>
>
Re: Problems with special characters
Posted by Thierry Collogne <th...@gmail.com>.
Thanks. Did you also try using it?
On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> nice one !
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 22/03/2007 09:00
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thanks. I made some modifications to SolrQuery.java to allow highlighting.
> I
> will post the code on
>
> http://issues.apache.org/jira/browse/SOLR-20
>
>
>
> On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > we didnt use it, but i took a quick look :
> >
> > you need to implement the "hl=on" attribute in the getquerystring()
> method
> > of the solrqueryImpl
> >
> > the resultdocs allready contain highlighting, that's why you found
> > processHighlighting in the Resultparser
> >
> > good luck !
> > m
> >
> >
> >
> >
> > "Thierry Collogne" <th...@gmail.com>
> > 21/03/2007 17:04
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
>
> >
> >
> >
> >
> >
> > Thank you. When I add the code you described, the Solr Java Client
> works.
> > One more question about the Solr Java Client.
> >
> > Does it allow the use of highlighting? I void a processHighlighting
> method
> > in ResultsParser.java, but I can't find a way of enabling it.
> >
> > Did you use highlighting?
> >
> > On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> > wrote:
> > >
> > > hey,
> > >
> > > we had the same problem with the Solr Java Client ...
> > >
> > > they forgot to put UTF-8 encoding on the stream ...
> > >
> > > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > > it's this post :
> > > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > > Frederic Hennequin [07/Mar/07 08:27 AM]
> > >
> > > grts,m
> > >
> > >
> > >
> > >
> > >
> > > "Bertrand Delacretaz" <bd...@apache.org>
> > > Sent by: bdelacretaz@gmail.com
> > > 21/03/2007 11:19
> > > Please respond to
> > > solr-user@lucene.apache.org
> > >
> > >
> > > To
> > > solr-user@lucene.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Problems with special characters
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> > >
> > > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
> jar
> > > call
> > > > and the problem isn't there anymore...
> > >
> > > ok, thanks for the feedback!
> > >
> > > -Bertrand
> > >
> > >
> >
> >
>
>
Re: Problems with special characters
Posted by Ma...@ibsbe.be.
nice one !
"Thierry Collogne" <th...@gmail.com>
22/03/2007 09:00
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Re: Problems with special characters
Thanks. I made some modifications to SolrQuery.java to allow highlighting.
I
will post the code on
http://issues.apache.org/jira/browse/SOLR-20
On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
wrote:
>
> we didnt use it, but i took a quick look :
>
> you need to implement the "hl=on" attribute in the getquerystring()
method
> of the solrqueryImpl
>
> the resultdocs allready contain highlighting, that's why you found
> processHighlighting in the Resultparser
>
> good luck !
> m
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 21/03/2007 17:04
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thank you. When I add the code you described, the Solr Java Client
works.
> One more question about the Solr Java Client.
>
> Does it allow the use of highlighting? I void a processHighlighting
method
> in ResultsParser.java, but I can't find a way of enabling it.
>
> Did you use highlighting?
>
> On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > hey,
> >
> > we had the same problem with the Solr Java Client ...
> >
> > they forgot to put UTF-8 encoding on the stream ...
> >
> > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > it's this post :
> > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > Frederic Hennequin [07/Mar/07 08:27 AM]
> >
> > grts,m
> >
> >
> >
> >
> >
> > "Bertrand Delacretaz" <bd...@apache.org>
> > Sent by: bdelacretaz@gmail.com
> > 21/03/2007 11:19
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
> >
> >
> >
> >
> >
> > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> >
> > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
jar
> > call
> > > and the problem isn't there anymore...
> >
> > ok, thanks for the feedback!
> >
> > -Bertrand
> >
> >
>
>
Re: Problems with special characters
Posted by Thierry Collogne <th...@gmail.com>.
Thanks. I made some modifications to SolrQuery.java to allow highlighting. I
will post the code on
http://issues.apache.org/jira/browse/SOLR-20
On 22/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> we didnt use it, but i took a quick look :
>
> you need to implement the "hl=on" attribute in the getquerystring() method
> of the solrqueryImpl
>
> the resultdocs allready contain highlighting, that's why you found
> processHighlighting in the Resultparser
>
> good luck !
> m
>
>
>
>
> "Thierry Collogne" <th...@gmail.com>
> 21/03/2007 17:04
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thank you. When I add the code you described, the Solr Java Client works.
> One more question about the Solr Java Client.
>
> Does it allow the use of highlighting? I void a processHighlighting method
> in ResultsParser.java, but I can't find a way of enabling it.
>
> Did you use highlighting?
>
> On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
> wrote:
> >
> > hey,
> >
> > we had the same problem with the Solr Java Client ...
> >
> > they forgot to put UTF-8 encoding on the stream ...
> >
> > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > it's this post :
> > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > Frederic Hennequin [07/Mar/07 08:27 AM]
> >
> > grts,m
> >
> >
> >
> >
> >
> > "Bertrand Delacretaz" <bd...@apache.org>
> > Sent by: bdelacretaz@gmail.com
> > 21/03/2007 11:19
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
> >
> >
> >
> >
> >
> > On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> >
> > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> > call
> > > and the problem isn't there anymore...
> >
> > ok, thanks for the feedback!
> >
> > -Bertrand
> >
> >
>
>
Re: Problems with special characters
Posted by Ma...@ibsbe.be.
we didnt use it, but i took a quick look :
you need to implement the "hl=on" attribute in the getquerystring() method
of the solrqueryImpl
the resultdocs allready contain highlighting, that's why you found
processHighlighting in the Resultparser
good luck !
m
"Thierry Collogne" <th...@gmail.com>
21/03/2007 17:04
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Re: Problems with special characters
Thank you. When I add the code you described, the Solr Java Client works.
One more question about the Solr Java Client.
Does it allow the use of highlighting? I void a processHighlighting method
in ResultsParser.java, but I can't find a way of enabling it.
Did you use highlighting?
On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be>
wrote:
>
> hey,
>
> we had the same problem with the Solr Java Client ...
>
> they forgot to put UTF-8 encoding on the stream ...
>
> i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> it's this post :
> http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> Frederic Hennequin [07/Mar/07 08:27 AM]
>
> grts,m
>
>
>
>
>
> "Bertrand Delacretaz" <bd...@apache.org>
> Sent by: bdelacretaz@gmail.com
> 21/03/2007 11:19
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> call
> > and the problem isn't there anymore...
>
> ok, thanks for the feedback!
>
> -Bertrand
>
>
Re: Problems with special characters
Posted by Thierry Collogne <th...@gmail.com>.
Thank you. When I add the code you described, the Solr Java Client works.
One more question about the Solr Java Client.
Does it allow the use of highlighting? I void a processHighlighting method
in ResultsParser.java, but I can't find a way of enabling it.
Did you use highlighting?
On 21/03/07, Maarten.De.Vilder@ibsbe.be <Ma...@ibsbe.be> wrote:
>
> hey,
>
> we had the same problem with the Solr Java Client ...
>
> they forgot to put UTF-8 encoding on the stream ...
>
> i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> it's this post :
> http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> Frederic Hennequin [07/Mar/07 08:27 AM]
>
> grts,m
>
>
>
>
>
> "Bertrand Delacretaz" <bd...@apache.org>
> Sent by: bdelacretaz@gmail.com
> 21/03/2007 11:19
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> call
> > and the problem isn't there anymore...
>
> ok, thanks for the feedback!
>
> -Bertrand
>
>
Re: Problems with special characters
Posted by Ma...@ibsbe.be.
hey,
we had the same problem with the Solr Java Client ...
they forgot to put UTF-8 encoding on the stream ...
i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
it's this post :
http://issues.apache.org/jira/browse/SOLR-20#action_12478810
Frederic Hennequin [07/Mar/07 08:27 AM]
grts,m
"Bertrand Delacretaz" <bd...@apache.org>
Sent by: bdelacretaz@gmail.com
21/03/2007 11:19
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Re: Problems with special characters
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
call
> and the problem isn't there anymore...
ok, thanks for the feedback!
-Bertrand
Re: Problems with special characters
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call
> and the problem isn't there anymore...
ok, thanks for the feedback!
-Bertrand
Re: Problems with special characters
Posted by Thierry Collogne <th...@gmail.com>.
I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call
and the problem isn't there anymore.
Thanks a lot for the help.
On 21/03/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
>
> > ...What would be the best way of building the SimplePostTool.java
>
> You can use "ant example" in the top-level directory of the Solr source
> code.
>
> I have attached the current post.jar to SOLR-194 for convenience.
>
> -Bertrand
>
Re: Problems with special characters
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> ...What would be the best way of building the SimplePostTool.java
You can use "ant example" in the top-level directory of the Solr source code.
I have attached the current post.jar to SOLR-194 for convenience.
-Bertrand
Re: Problems with special characters
Posted by Thierry Collogne <th...@gmail.com>.
I have used you first workaround and this works for me. What would be the
best way of building the
SimplePostTool.java<https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java>
?
On 21/03/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> On 3/21/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> > ...For now, using this as a workaround should help:
> >
> > java -Dfile.encoding=UTF-8 -jar post.jar
> > http://localhost:8983/solr/update utf8-example.xml..
>
> Should be fixed now, if you can grab the latest SimplePostToolCode [1]
> it should work irrelevant of the default JVM encoding. Please confirm
> if you test it.
>
> It's a kind of brute force fix, I have hardcoded the encoding as
> UTF-8, I'm keeping SOLR-194 open so that we don't forget to fix this
> (but considering SOLR-190 it's not urgent to fix).
>
> -Bertrand
>
> [1]
> https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java
>
Re: Problems with special characters
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Bertrand Delacretaz <bd...@apache.org> wrote:
> ...For now, using this as a workaround should help:
>
> java -Dfile.encoding=UTF-8 -jar post.jar
> http://localhost:8983/solr/update utf8-example.xml..
Should be fixed now, if you can grab the latest SimplePostToolCode [1]
it should work irrelevant of the default JVM encoding. Please confirm
if you test it.
It's a kind of brute force fix, I have hardcoded the encoding as
UTF-8, I'm keeping SOLR-194 open so that we don't forget to fix this
(but considering SOLR-190 it's not urgent to fix).
-Bertrand
[1] https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java
Re: Problems with special characters
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/21/07, Thierry Collogne <th...@gmail.com> wrote:
> ...I am using the post.jar file to update the search indexes. Problem is that
> foreign characters like é, à, ... don't work correctly...
You're right, I have entered the issue in
https://issues.apache.org/jira/browse/SOLR-194
For now, using this as a workaround should help:
java -Dfile.encoding=UTF-8 -jar post.jar
http://localhost:8983/solr/update utf8-example.xml
-Bertrand