You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by robert mena <ro...@gmail.com> on 2010/07/18 18:16:09 UTC

help finding illegal chars in XML doc

Hi,

I am doing some tests with solr 1.4.1.

I've created a XML file with the documents I'd like to index.   With a few
items (1000) everything went fine.

When I went to a more representative import (around 60000) I got error

java -jar example/exampledocs/post.jar doc.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file add.xml
SimplePostTool: FATAL: Solr returned an error:
Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847

I've tried to track where this problem is located without luck.

Any ideas?

Re: help finding illegal chars in XML doc

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks for your reply. I could not find in the log files any mention to
: that.  By the way I only have YYYY_MM_DD.request.log files in my directory.
: 
: Do I have to enable any specific log or level to catch those errors?

if you are using that "java -jar start.jar" command for the example jetty 
nstance then the log messages i'm refering to are written directly to your 
console.  if you are using running solr in some other servlet container, 
then it all depneds on the servlet container...

http://wiki.apache.org/solr/SolrLogging
http://wiki.apache.org/solr/LoggingInDefaultJettySetup



-Hoss


Re: help finding illegal chars in XML doc

Posted by robert mena <ro...@gmail.com>.
Hi Chris,

Thanks for your reply. I could not find in the log files any mention to
that.  By the way I only have YYYY_MM_DD.request.log files in my directory.

Do I have to enable any specific log or level to catch those errors?

On Sun, Jul 18, 2010 at 3:45 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : SimplePostTool: FATAL: Solr returned an error:
> : Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
> :
> : I've tried to track where this problem is located without luck.
>
> check your solr logs, it will contain the "unmunged" version of the error
> message (the the version of jetty used in the 1.4.1 example setup seems to
> think all punctuation should be removed from error messages) complete with
> the row/column of your XML message that had the problem (it's either
> 3,7022847; or 370,22847; or 3702,2847; etc...
>
>
>
> -Hoss
>
>

Re: help finding illegal chars in XML doc

Posted by Chris Hostetter <ho...@fucit.org>.
: SimplePostTool: FATAL: Solr returned an error:
: Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
: 
: I've tried to track where this problem is located without luck.

check your solr logs, it will contain the "unmunged" version of the error 
message (the the version of jetty used in the 1.4.1 example setup seems to 
think all punctuation should be removed from error messages) complete with 
the row/column of your XML message that had the problem (it's either 
3,7022847; or 370,22847; or 3702,2847; etc...



-Hoss


Re: help finding illegal chars in XML doc

Posted by didier deshommes <df...@gmail.com>.
For xml 1.1 documents, you can view if any of your documents have
these restricted characters defined here:
http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-RestrictedChar

If they are, you'll have to remove them.

didier

On Sun, Jul 18, 2010 at 11:16 AM, robert mena <ro...@gmail.com> wrote:
> Hi,
>
> I am doing some tests with solr 1.4.1.
>
> I've created a XML file with the documents I'd like to index.   With a few
> items (1000) everything went fine.
>
> When I went to a more representative import (around 60000) I got error
>
> java -jar example/exampledocs/post.jar doc.xml
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
> other encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file add.xml
> SimplePostTool: FATAL: Solr returned an error:
> Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
>
> I've tried to track where this problem is located without luck.
>
> Any ideas?
>