You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by robert mena <ro...@gmail.com> on 2010/07/18 18:16:09 UTC
help finding illegal chars in XML doc
Hi,
I am doing some tests with solr 1.4.1.
I've created a XML file with the documents I'd like to index. With a few
items (1000) everything went fine.
When I went to a more representative import (around 60000) I got error
java -jar example/exampledocs/post.jar doc.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file add.xml
SimplePostTool: FATAL: Solr returned an error:
Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
I've tried to track where this problem is located without luck.
Any ideas?
Re: help finding illegal chars in XML doc
Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks for your reply. I could not find in the log files any mention to
: that. By the way I only have YYYY_MM_DD.request.log files in my directory.
:
: Do I have to enable any specific log or level to catch those errors?
if you are using that "java -jar start.jar" command for the example jetty
nstance then the log messages i'm refering to are written directly to your
console. if you are using running solr in some other servlet container,
then it all depneds on the servlet container...
http://wiki.apache.org/solr/SolrLogging
http://wiki.apache.org/solr/LoggingInDefaultJettySetup
-Hoss
Re: help finding illegal chars in XML doc
Posted by robert mena <ro...@gmail.com>.
Hi Chris,
Thanks for your reply. I could not find in the log files any mention to
that. By the way I only have YYYY_MM_DD.request.log files in my directory.
Do I have to enable any specific log or level to catch those errors?
On Sun, Jul 18, 2010 at 3:45 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : SimplePostTool: FATAL: Solr returned an error:
> : Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
> :
> : I've tried to track where this problem is located without luck.
>
> check your solr logs, it will contain the "unmunged" version of the error
> message (the the version of jetty used in the 1.4.1 example setup seems to
> think all punctuation should be removed from error messages) complete with
> the row/column of your XML message that had the problem (it's either
> 3,7022847; or 370,22847; or 3702,2847; etc...
>
>
>
> -Hoss
>
>
Re: help finding illegal chars in XML doc
Posted by Chris Hostetter <ho...@fucit.org>.
: SimplePostTool: FATAL: Solr returned an error:
: Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
:
: I've tried to track where this problem is located without luck.
check your solr logs, it will contain the "unmunged" version of the error
message (the the version of jetty used in the 1.4.1 example setup seems to
think all punctuation should be removed from error messages) complete with
the row/column of your XML message that had the problem (it's either
3,7022847; or 370,22847; or 3702,2847; etc...
-Hoss
Re: help finding illegal chars in XML doc
Posted by didier deshommes <df...@gmail.com>.
For xml 1.1 documents, you can view if any of your documents have
these restricted characters defined here:
http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-RestrictedChar
If they are, you'll have to remove them.
didier
On Sun, Jul 18, 2010 at 11:16 AM, robert mena <ro...@gmail.com> wrote:
> Hi,
>
> I am doing some tests with solr 1.4.1.
>
> I've created a XML file with the documents I'd like to index. With a few
> items (1000) everything went fine.
>
> When I went to a more representative import (around 60000) I got error
>
> java -jar example/exampledocs/post.jar doc.xml
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
> other encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file add.xml
> SimplePostTool: FATAL: Solr returned an error:
> Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
>
> I've tried to track where this problem is located without luck.
>
> Any ideas?
>