You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jan Høydahl <ja...@cominvent.com> on 2011/04/01 15:22:37 UTC

Unsupported encoding GB18030

Hi,

Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23

When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
% java -jar post.jar gb18030-example.xml
jar gb18030-example.xml
SimplePostTool: version 1.3
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file gb18030-example.xml
SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap

From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030

The same works on my MacBook with Java1.6.0_24

Clues?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> > Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
> >
> > When trying to post example\exampledocs\gb18030-example.xml using
> post.jar I get this error:
> > % java -jar post.jar gb18030-example.xml
> > jar gb18030-example.xml
> > SimplePostTool: version 1.3
> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> > SimplePostTool: POSTing file gb18030-example.xml
> > SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding:
> GB18030lap
> >
> > From the stack it is caused by com.ctc.wstx.exc.WstxIOException:
> Unsupported encoding: GB18030
> >
> > The same works on my MacBook with Java1.6.0_24
> 
> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
> don't have XP around any longer to see if that's the factor somehow.

It seems that this JVM used on Windows does not support the particular
encoding. This is not Solr's fault, maybe it's some stripped down foreign
JDK like IBM's or whatever. But even Sun only guarantees some encodings to
be present in any JVM, but GB18030 is for sure very optional. As far as I
remember in early JDK days, there were extra eastern JDKs around, that had
extra charsets, maybe that’s still the case for Win XP?

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Apr 4, 2011 at 7:51 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi Jan,
>
>> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
>> which was auto-updated from 1.5 to 1.6.
>> After completely uninstalling Java and re-installing jdk-6u24-windows-
>> i586.exe the GB18030 encoding is supported.
>
> Just out of curiosity: A JRE-only configuration should not work / is not
> supported for Solr at all, as JSP pages don't work. So the admin interface
> of Solr would produce a HTTP error 500, right?

This used to be true a long time ago, but then servlet containers like
jetty started working with JSPs when using a JRE since they dropped
the "javac" dependency.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Jan,

> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
> which was auto-updated from 1.5 to 1.6.
> After completely uninstalling Java and re-installing jdk-6u24-windows-
> i586.exe the GB18030 encoding is supported.

Just out of curiosity: A JRE-only configuration should not work / is not
supported for Solr at all, as JSP pages don't work. So the admin interface
of Solr would produce a HTTP error 500, right?

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
>From my tests, this only affects Windows XP and previous.

*Nix and OSX use always full charset.jar. Windows Vista and Windows 7 by
default "support" all languages and report this back through
http://msdn.microsoft.com/en-us/library/dd317827(v=vs.85).aspx , so the
"testing code" in the installer gets back true for all language groups and
is forced to install full charsets.jar. This is described in the Sun issue
and I verified that at least on Vista and 7 Ultimate - it seems to install
full language support even on German Windows - in contrast to XP which
installs no charsets.jar (jre/lib folder).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
> Sent: Monday, April 04, 2011 6:25 PM
> To: dev@lucene.apache.org
> Subject: Re: Unsupported encoding GB18030
> 
> Makes sense.
> 
> Question is, do we want to require full JDK to index exampledocs? Most
> developers will have a JDK, but the occasional semi-tech manager just
> wanting to test out Solr may get burnt and think "Open Source sucks, just
as I
> thought" :)
> 
> I added a note to http://wiki.apache.org/solr/SolrInstall about the need
for
> JDK for international charsets..
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 4. apr. 2011, at 17.06, Uwe Schindler wrote:
> 
> > To come back to the original issue:
> > If you are using a pure JRE installed in your operating system using
> > the standard mechanism "browser automatically installs Java Plugin
> > methods" or similar, the following applies:
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6329080
> >
> > To reduce size of downloads, the JRE-only installation does not
> > contain the full charsets.jar, so the problem is expected. In fact,
> > those JRE's only contain the basic charsets as Robert told and the
> > ones needed for your area (it analyzes your environment in the
> > installer and chooses between western, eastern and possibly others to
> > download only the corresponding charsets.jar).
> >
> > We should maybe add a note to Solr, that you should in all cases use a
> > full locale JRE installation or better a JDK, else the full
> > international functionality of Solr cannot be used.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
> >> Sent: Monday, April 04, 2011 1:37 PM
> >> To: dev@lucene.apache.org
> >> Subject: Re: Unsupported encoding GB18030
> >>
> >>>>> : I don't see the reason why "exampledocs" should contain docs
> >>>>> with narrow charsets not guaranteed to be supported.
> >>>> personally i would like to see us add a lot more exampledocs in a
> >>>> lot more esoteric encodings, precisely to help end users sanity
> >>>> test this sort of we frequetnly get questions form people about
> >>>> character encoding wonkiness, and things like test_utf8.sh,
> >>>> utf8-example.xml, and now gb18030-example.xml can help us narrow
> down the problem:
> >>>> their client code, their servlet container, or solr?
> >>>
> >>> Same here. In my opinion, an example set of files should also
> >>> contain "more complicated" ones to show what Solr can do. If some of
> >>> them don't work, it's not really a problem. Maybe we should simply
> >>> add a "tag" to the filename to mark them as not working in every
> > configuration.
> >>
> >> Positive to more example docs!
> >>
> >> My concern was that since indexing exampledocs/*.xml is perhaps THE
> >> most common action any new Solr user will do, it should just work,
> >> and it's a benefit if the results revolve around the same theme, a
> >> set of products
> > with
> >> category and prices. We definitely want to show off more advanced
> >> features, and we should add more example documents for that. Plain
> >> test docs could be placed in a a subfolder "exampledocs/extras" or
> something.
> >>
> >> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not
> >> JDK) which was auto-updated from 1.5 to 1.6.
> >> After completely uninstalling Java and re-installing
> >> jdk-6u24-windows- i586.exe the GB18030 encoding is supported.
> >>
> >> --
> >> Jan Høydahl, search solution architect Cominvent AS -
> >> www.cominvent.com
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >> additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Jan Høydahl <ja...@cominvent.com>.
Makes sense.

Question is, do we want to require full JDK to index exampledocs? Most developers will have a JDK, but the occasional semi-tech manager just wanting to test out Solr may get burnt and think "Open Source sucks, just as I thought" :)

I added a note to http://wiki.apache.org/solr/SolrInstall about the need for JDK for international charsets..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 4. apr. 2011, at 17.06, Uwe Schindler wrote:

> To come back to the original issue:
> If you are using a pure JRE installed in your operating system using the
> standard mechanism "browser automatically installs Java Plugin methods" or
> similar, the following applies:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6329080
> 
> To reduce size of downloads, the JRE-only installation does not contain the
> full charsets.jar, so the problem is expected. In fact, those JRE's only
> contain the basic charsets as Robert told and the ones needed for your area
> (it analyzes your environment in the installer and chooses between western,
> eastern and possibly others to download only the corresponding
> charsets.jar).
> 
> We should maybe add a note to Solr, that you should in all cases use a full
> locale JRE installation or better a JDK, else the full international
> functionality of Solr cannot be used.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
>> Sent: Monday, April 04, 2011 1:37 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Unsupported encoding GB18030
>> 
>>>>> : I don't see the reason why "exampledocs" should contain docs with
>>>>> narrow charsets not guaranteed to be supported.
>>>> personally i would like to see us add a lot more exampledocs in a lot
>>>> more esoteric encodings, precisely to help end users sanity test this
>>>> sort of we frequetnly get questions form people about character
>>>> encoding wonkiness, and things like test_utf8.sh, utf8-example.xml,
>>>> and now gb18030-example.xml can help us narrow down the problem:
>>>> their client code, their servlet container, or solr?
>>> 
>>> Same here. In my opinion, an example set of files should also contain
>>> "more complicated" ones to show what Solr can do. If some of them
>>> don't work, it's not really a problem. Maybe we should simply add a
>>> "tag" to the filename to mark them as not working in every
> configuration.
>> 
>> Positive to more example docs!
>> 
>> My concern was that since indexing exampledocs/*.xml is perhaps THE most
>> common action any new Solr user will do, it should just work, and it's a
>> benefit if the results revolve around the same theme, a set of products
> with
>> category and prices. We definitely want to show off more advanced
>> features, and we should add more example documents for that. Plain test
>> docs could be placed in a a subfolder "exampledocs/extras" or something.
>> 
>> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
>> which was auto-updated from 1.5 to 1.6.
>> After completely uninstalling Java and re-installing jdk-6u24-windows-
>> i586.exe the GB18030 encoding is supported.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
To come back to the original issue:
If you are using a pure JRE installed in your operating system using the
standard mechanism "browser automatically installs Java Plugin methods" or
similar, the following applies:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6329080

To reduce size of downloads, the JRE-only installation does not contain the
full charsets.jar, so the problem is expected. In fact, those JRE's only
contain the basic charsets as Robert told and the ones needed for your area
(it analyzes your environment in the installer and chooses between western,
eastern and possibly others to download only the corresponding
charsets.jar).

We should maybe add a note to Solr, that you should in all cases use a full
locale JRE installation or better a JDK, else the full international
functionality of Solr cannot be used.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Jan Høydahl [mailto:jan.asf@cominvent.com]
> Sent: Monday, April 04, 2011 1:37 PM
> To: dev@lucene.apache.org
> Subject: Re: Unsupported encoding GB18030
> 
> >>> : I don't see the reason why "exampledocs" should contain docs with
> >>> narrow charsets not guaranteed to be supported.
> >> personally i would like to see us add a lot more exampledocs in a lot
> >> more esoteric encodings, precisely to help end users sanity test this
> >> sort of we frequetnly get questions form people about character
> >> encoding wonkiness, and things like test_utf8.sh, utf8-example.xml,
> >> and now gb18030-example.xml can help us narrow down the problem:
> >> their client code, their servlet container, or solr?
> >
> > Same here. In my opinion, an example set of files should also contain
> > "more complicated" ones to show what Solr can do. If some of them
> > don't work, it's not really a problem. Maybe we should simply add a
> > "tag" to the filename to mark them as not working in every
configuration.
> 
> Positive to more example docs!
> 
> My concern was that since indexing exampledocs/*.xml is perhaps THE most
> common action any new Solr user will do, it should just work, and it's a
> benefit if the results revolve around the same theme, a set of products
with
> category and prices. We definitely want to show off more advanced
> features, and we should add more example documents for that. Plain test
> docs could be placed in a a subfolder "exampledocs/extras" or something.
> 
> Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK)
> which was auto-updated from 1.5 to 1.6.
> After completely uninstalling Java and re-installing jdk-6u24-windows-
> i586.exe the GB18030 encoding is supported.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Chris Hostetter <ho...@fucit.org>.
: with category and prices. We definitely want to show off more advanced 
: features, and we should add more example documents for that. Plain test 
: docs could be placed in a a subfolder "exampledocs/extras" or something.

+1

or "exampledocs/exotic" or "exampledocs/complex"


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Jan Høydahl <ja...@cominvent.com>.
>>> : I don't see the reason why "exampledocs" should contain docs with narrow
>>> charsets not guaranteed to be supported.
>> personally i would like to see us add a lot more exampledocs in a lot more
>> esoteric encodings, precisely to help end users sanity test this sort of
>> we frequetnly get questions form people about character encoding
>> wonkiness, and things like test_utf8.sh, utf8-example.xml, and now
>> gb18030-example.xml can help us narrow down the problem: their client
>> code, their servlet container, or solr?
> 
> Same here. In my opinion, an example set of files should also contain "more
> complicated" ones to show what Solr can do. If some of them don't work, it's
> not really a problem. Maybe we should simply add a "tag" to the filename to
> mark them as not working in every configuration.

Positive to more example docs!

My concern was that since indexing exampledocs/*.xml is perhaps THE most common action any new Solr user will do, it should just work, and it's a benefit if the results revolve around the same theme, a set of products with category and prices. We definitely want to show off more advanced features, and we should add more example documents for that. Plain test docs could be placed in a a subfolder "exampledocs/extras" or something.

Regarding the WindowsXP VMmware I was using, it had a Sun JRE (not JDK) which was auto-updated from 1.5 to 1.6.
After completely uninstalling Java and re-installing jdk-6u24-windows-i586.exe the GB18030 encoding is supported.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> : I don't see the reason why "exampledocs" should contain docs with narrow
> charsets not guaranteed to be supported.
> : In my opinion this file belongs in the test suite, also since it only
contains
> "test" content, unsuitable for demoing.
> 
> it's purpose for being there is to let *users* "test" if their servlet
container +
> solr conbination is working with alternate encodings -- much the same
reason
> utf8-example.xml and test_utf8.sh are included in exampledocs.
> 
> It's a perfectly valid exampledoc for Solr.  it may not work on all
platforms,
> but the *.sh files aren't garunteed to work on all platforms either.  if
we
> moved it to hte test directory, end users fetching binary releases
wouldn't
> get it. and may not be aware that their servlet container isn't supporting
thta
> charset.
> 
> personally i would like to see us add a lot more exampledocs in a lot more
> esoteric encodings, precisely to help end users sanity test this sort of
thing.
> we frequetnly get questions form people about character encoding
> wonkiness, and things like test_utf8.sh, utf8-example.xml, and now
> gb18030-example.xml can help us narrow down the problem: their client
> code, their servlet container, or solr?

Same here. In my opinion, an example set of files should also contain "more
complicated" ones to show what Solr can do. If some of them don't work, it's
not really a problem. Maybe we should simply add a "tag" to the filename to
mark them as not working in every configuration.

The servlet container can no longer break those files! Solr now *only*
communicates with the servlet container using Input/OutputStreams. All
charsets are handled by the XML parser or Readers/Writers created by Solr's
code (this was one improvement which even did a serious speed improvement,
because Jetty's servlet Writers are very ineffective...).

To come back to the original issue: I did extensive testing with different
*Sun/Oracle* JDKs and operating systems in VirtualBOX, none of them failed!
To get behind the issue, Jan should tell us hin complete configuration:
- Was the JDK freshly installed (not upgraded or whatever)
- Was it a clean binary Solr distribution (I tested only those). If it was a
SVN checkout, maybe the SVN client broke the file itself (strange was the
error message in the exception, it contained some trash behind the encoding
name, maybe the file itself was corrupted - maybe Jan did open and save it
with an incompatible text editor that cannot handle this extension!). We
should know what Jan changed. Maybe he used the already modified solr
installation of his project.
- Maybe the classpath on Jan's Solr installation contains some "older" XML
parser libs that cannot handle this GB encoding. From the exception, we
cannot see if the STAX parser that produced these exceptions is really the
one from Solr itself. Maybe there is some other Wstx in his classpath. A
good test would be to (as he uses JDK 1.6) to remove wstx from Solr's lib
folder. If the exception then still contains the same Exception (and not a
different one), there is another Wstx somewhere. JDK6 has an internal (but
slower) Stax parser, so removing the file is a good test case.
- Jan should test a short java program, to test if e.g. new String(new
byte[0], "GB18030") works for him!

Finally, JDK's charset support is in charsets.jar in the JDK lib folder. It
has nothing to do with any operating system support for charsets, so it also
works on any windows version that is e.g. US English only. The charset
support for windows xp is only to support displaying those characters on the
video card (contains only fonts and basic OS support). As Solr has no GUI
and the charset conversions are handled by charsets.jar internally,
installing any operating system patches has no effect.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Chris Hostetter <ho...@fucit.org>.
: I don't see the reason why "exampledocs" should contain docs with narrow charsets not guaranteed to be supported.
: In my opinion this file belongs in the test suite, also since it only contains "test" content, unsuitable for demoing.

it's purpose for being there is to let *users* "test" if their servlet 
container + solr conbination is working with alternate encodings -- much 
the same reason utf8-example.xml and test_utf8.sh are included in 
exampledocs.

It's a perfectly valid exampledoc for Solr.  it may not work on all 
platforms, but the *.sh files aren't garunteed to work on all platforms 
either.  if we moved it to hte test directory, end users fetching binary 
releases wouldn't get it. and may not be aware that their servlet 
container isn't supporting thta charset.

personally i would like to see us add a lot more exampledocs in a lot more 
esoteric encodings, precisely to help end users sanity test this sort of 
thing.  we frequetnly get questions form people about character encoding 
wonkiness, and things like test_utf8.sh, utf8-example.xml, and now 
gb18030-example.xml can help us narrow down the problem: their 
client code, their servlet container, or solr?


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Jan Høydahl <ja...@cominvent.com>.
My XP is a VMWare instance. SP3 with Oracle's standard Java. I upgraded Java to Java 1.6.0_24 but that did not fix it.
Then I installed support for "East Asian languages" and "right to left" in Control Panel, rebooted and tried again. No luck.
Then I installed GB18030 Support Package from http://go.microsoft.com/fwlink/?LinkID=26235. No luck.

I don't personally have this issue since I don't run Windows, it was a test I did to validate that things work under Windows.

I don't see the reason why "exampledocs" should contain docs with narrow charsets not guaranteed to be supported.
In my opinion this file belongs in the test suite, also since it only contains "test" content, unsuitable for demoing.

+1 to remove gb18030-example.xml from exampledocs. Not sure if it should be moved to a unit test.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 1. apr. 2011, at 17.16, Yonik Seeley wrote:

> Being practical, it's all about "If this is likely to fail for enough
> users", as I said in my previous post.
> I don't really know the answer to that at this point.
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
> 
> 
> On Fri, Apr 1, 2011 at 11:12 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> Hi Yonik,
>> 
>> I started my virtual box with fresh windows xp snapshot. Downloaded JDK
>> 1.6.0_24 and Solr 3.1.0. Started solr and then "java -jar post.jar *.xml" ->
>> success.
>> 
>> You should before we start to "fix" something that's not an issue ask this
>> person which JDK exactly he uses and where he downloaded it. Is it maybe not
>> an Oracle one? (this GB encoding is very common - if a JVM does not support
>> it (it must not) it can only be some western-european one like I mentioned
>> in my mail).
>> 
>> Uwe
>> 
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>> 
>> 
>>> -----Original Message-----
>>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
>>> Seeley
>>> Sent: Friday, April 01, 2011 4:21 PM
>>> To: dev@lucene.apache.org
>>> Cc: Robert Muir
>>> Subject: Re: Unsupported encoding GB18030
>>> 
>>> On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir <rc...@gmail.com> wrote:
>>>> On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley
>>>> <yo...@lucidimagination.com> wrote:
>>>>> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com>
>>> wrote:
>>>>>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>>>>>> 
>>>>>> When trying to post example\exampledocs\gb18030-example.xml using
>>> post.jar I get this error:
>>>>>> % java -jar post.jar gb18030-example.xml jar gb18030-example.xml
>>>>>> SimplePostTool: version 1.3
>>>>>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>>>>>> SimplePostTool: POSTing file gb18030-example.xml
>>>>>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported
>>>>>> encoding: GB18030lap
>>>>>> 
>>>>>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException:
>>>>>> Unsupported encoding: GB18030
>>>>>> 
>>>>>> The same works on my MacBook with Java1.6.0_24
>>>>> 
>>>>> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
>>>>> don't have XP around any longer to see if that's the factor somehow.
>>>>> 
>>>> 
>>>> Its worth mentioning, there is no guarantee the JRE will support
>>>> GB18030 encoding.
>>>> 
>>>> There are only 6 charsets guaranteed to exist:
>>>> http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.
>>>> html
>>> 
>>> Indexing *.xml is a very common thing for new users to do.
>>> If this is likely to fail for enough users, we should move, remove, or at
>> least
>>> change the filename to something like gb18030-example.xml.gb18030 so it
>>> won't get picked up by accident.
>>> 
>>> -Yonik
>>> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-
>>> 26, San Francisco
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>>> commands, e-mail: dev-help@lucene.apache.org
>> 
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Being practical, it's all about "If this is likely to fail for enough
users", as I said in my previous post.
I don't really know the answer to that at this point.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


On Fri, Apr 1, 2011 at 11:12 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi Yonik,
>
> I started my virtual box with fresh windows xp snapshot. Downloaded JDK
> 1.6.0_24 and Solr 3.1.0. Started solr and then "java -jar post.jar *.xml" ->
> success.
>
> You should before we start to "fix" something that's not an issue ask this
> person which JDK exactly he uses and where he downloaded it. Is it maybe not
> an Oracle one? (this GB encoding is very common - if a JVM does not support
> it (it must not) it can only be some western-european one like I mentioned
> in my mail).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
>> Seeley
>> Sent: Friday, April 01, 2011 4:21 PM
>> To: dev@lucene.apache.org
>> Cc: Robert Muir
>> Subject: Re: Unsupported encoding GB18030
>>
>> On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir <rc...@gmail.com> wrote:
>> > On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley
>> > <yo...@lucidimagination.com> wrote:
>> >> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com>
>> wrote:
>> >>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>> >>>
>> >>> When trying to post example\exampledocs\gb18030-example.xml using
>> post.jar I get this error:
>> >>> % java -jar post.jar gb18030-example.xml jar gb18030-example.xml
>> >>> SimplePostTool: version 1.3
>> >>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>> >>> SimplePostTool: POSTing file gb18030-example.xml
>> >>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported
>> >>> encoding: GB18030lap
>> >>>
>> >>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException:
>> >>> Unsupported encoding: GB18030
>> >>>
>> >>> The same works on my MacBook with Java1.6.0_24
>> >>
>> >> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
>> >> don't have XP around any longer to see if that's the factor somehow.
>> >>
>> >
>> > Its worth mentioning, there is no guarantee the JRE will support
>> > GB18030 encoding.
>> >
>> > There are only 6 charsets guaranteed to exist:
>> > http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.
>> > html
>>
>> Indexing *.xml is a very common thing for new users to do.
>> If this is likely to fail for enough users, we should move, remove, or at
> least
>> change the filename to something like gb18030-example.xml.gb18030 so it
>> won't get picked up by accident.
>>
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-
>> 26, San Francisco
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Unsupported encoding GB18030

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Yonik,

I started my virtual box with fresh windows xp snapshot. Downloaded JDK
1.6.0_24 and Solr 3.1.0. Started solr and then "java -jar post.jar *.xml" ->
success.

You should before we start to "fix" something that's not an issue ask this
person which JDK exactly he uses and where he downloaded it. Is it maybe not
an Oracle one? (this GB encoding is very common - if a JVM does not support
it (it must not) it can only be some western-european one like I mentioned
in my mail).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Friday, April 01, 2011 4:21 PM
> To: dev@lucene.apache.org
> Cc: Robert Muir
> Subject: Re: Unsupported encoding GB18030
> 
> On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir <rc...@gmail.com> wrote:
> > On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley
> > <yo...@lucidimagination.com> wrote:
> >> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com>
> wrote:
> >>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
> >>>
> >>> When trying to post example\exampledocs\gb18030-example.xml using
> post.jar I get this error:
> >>> % java -jar post.jar gb18030-example.xml jar gb18030-example.xml
> >>> SimplePostTool: version 1.3
> >>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> >>> SimplePostTool: POSTing file gb18030-example.xml
> >>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported
> >>> encoding: GB18030lap
> >>>
> >>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException:
> >>> Unsupported encoding: GB18030
> >>>
> >>> The same works on my MacBook with Java1.6.0_24
> >>
> >> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
> >> don't have XP around any longer to see if that's the factor somehow.
> >>
> >
> > Its worth mentioning, there is no guarantee the JRE will support
> > GB18030 encoding.
> >
> > There are only 6 charsets guaranteed to exist:
> > http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.
> > html
> 
> Indexing *.xml is a very common thing for new users to do.
> If this is likely to fail for enough users, we should move, remove, or at
least
> change the filename to something like gb18030-example.xml.gb18030 so it
> won't get picked up by accident.
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-
> 26, San Francisco
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Apr 1, 2011 at 10:07 AM, Robert Muir <rc...@gmail.com> wrote:
> On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>>>
>>> When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
>>> % java -jar post.jar gb18030-example.xml
>>> jar gb18030-example.xml
>>> SimplePostTool: version 1.3
>>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>>> SimplePostTool: POSTing file gb18030-example.xml
>>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap
>>>
>>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030
>>>
>>> The same works on my MacBook with Java1.6.0_24
>>
>> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
>> don't have XP around any longer to see if that's the factor somehow.
>>
>
> Its worth mentioning, there is no guarantee the JRE will support
> GB18030 encoding.
>
> There are only 6 charsets guaranteed to exist:
> http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

Indexing *.xml is a very common thing for new users to do.
If this is likely to fail for enough users, we should move, remove, or
at least change the filename to
something like gb18030-example.xml.gb18030 so it won't get picked up
by accident.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Apr 1, 2011 at 10:00 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>>
>> When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
>> % java -jar post.jar gb18030-example.xml
>> jar gb18030-example.xml
>> SimplePostTool: version 1.3
>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>> SimplePostTool: POSTing file gb18030-example.xml
>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap
>>
>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030
>>
>> The same works on my MacBook with Java1.6.0_24
>
> Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
> don't have XP around any longer to see if that's the factor somehow.
>

Its worth mentioning, there is no guarantee the JRE will support
GB18030 encoding.

There are only 6 charsets guaranteed to exist:
http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Apr 1, 2011 at 9:22 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>
> When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
> % java -jar post.jar gb18030-example.xml
> jar gb18030-example.xml
> SimplePostTool: version 1.3
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file gb18030-example.xml
> SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap
>
> From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030
>
> The same works on my MacBook with Java1.6.0_24

Interesting - things seem fine for me on Win7 Java1.6.0_24, but I
don't have XP around any longer to see if that's the factor somehow.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Unsupported encoding GB18030

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Then why did it work fine with post.jar on my Mac? The chinese letters show just fine. It also works using
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary @gb18030-example.xml

So I imagine there must be something with Windows XP or the JVM on my Windows box??

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 1. apr. 2011, at 15.34, Li Li wrote:

> post.jar only support utf8. you must do the transformation.
> 
> 2011/4/1 Jan Høydahl <ja...@cominvent.com>:
>> Hi,
>> 
>> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>> 
>> When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
>> % java -jar post.jar gb18030-example.xml
>> jar gb18030-example.xml
>> SimplePostTool: version 1.3
>> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>> SimplePostTool: POSTing file gb18030-example.xml
>> SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap
>> 
>> From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030
>> 
>> The same works on my MacBook with Java1.6.0_24
>> 
>> Clues?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 


Re: Unsupported encoding GB18030

Posted by Li Li <fa...@gmail.com>.
post.jar only support utf8. you must do the transformation.

2011/4/1 Jan Høydahl <ja...@cominvent.com>:
> Hi,
>
> Testing the new Solr 3.1 release under Windows XP and Java 1.6.0_23
>
> When trying to post example\exampledocs\gb18030-example.xml using post.jar I get this error:
> % java -jar post.jar gb18030-example.xml
> jar gb18030-example.xml
> SimplePostTool: version 1.3
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file gb18030-example.xml
> SimplePostTool: FATAL: Solr returned an error #400 Unsupported encoding: GB18030lap
>
> From the stack it is caused by com.ctc.wstx.exc.WstxIOException: Unsupported encoding: GB18030
>
> The same works on my MacBook with Java1.6.0_24
>
> Clues?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>