You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@abdera.apache.org by "herbert welker (JIRA)" <ji...@apache.org> on 2007/09/05 11:52:33 UTC

[jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Invalid byte 2 of 3-byte UTF-8 sequence
---------------------------------------

                 Key: ABDERA-61
                 URL: https://issues.apache.org/jira/browse/ABDERA-61
             Project: Abdera
          Issue Type: Bug
    Affects Versions: 0.2.2, 0.3.0
         Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
The JDK is:
C:\Programme\IBM\SDP70\jdk\bin>java -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
0323 (JIT enabled)
J9VM - 20070322_12058_lHdSMR
JIT  - 20070109_1805ifx3_r8
GC   - WASIFIX_2007)
JCL  - 20070131

But the program runs under JDK-Compliance-Level-1.4

The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
            Reporter: herbert welker


When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:

org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.

Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."

More description and the java-code of my client is given in the attached files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by James M Snell <ja...@gmail.com>.
Ok, I think I've spotted the problem.  While Chris' issue was occurring
in the response, Herbert's appears to be on the request... that is, the
entry being sent to the server does not appear to be serialized
properly.  Looking at the serialization code, I was creating an
OutputStreamWriter without specifying the charset.  I'll post a patch
shortly that should address the issue.

- James

herbert wrote:
> 
> 
> herbert wrote:
>>
>> I will try it again.
>> I will capture the post again, this time with a 0.30-client and also in
>> hex-dump notation 
>> and post it to the JIRA.
>>
>>
> 
> I've just added the attachment "UTF-8-problem_2.zip" to the ABDERA-61-issue,
> containing the new ethereal-TCP-stream-shots.
> 
> Herbert

Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by herbert <io...@gmx.de>.
Hi James!

Please ignore the email, I've just send. I wrote it before
I read, that you've posted a second patch.


James M Snell wrote:
> 
> Herbert, please apply this patch and see if this resolves your issue.
> 
> https://issues.apache.org/jira/secure/attachment/12365279/patch2.diff
> 
> 

I applied the patch and it works like a charm!
Thank you very much!

Regards,

Herbert

P.S.:
One thing coming to my mind, reading the ethereal-streams.
Apparently the abdera-client first tries to post the entry one time without
authentification
and if that fails, it retries the request authorized. Is this the
default-behaviour? Is it possible
to switch this off? 

-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28ABDERA-61%29-Invalid-byte-2-of-3-byte-UTF-8-sequence-tf4383132.html#a12526188
Sent from the abdera-dev mailing list archive at Nabble.com.


Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by James M Snell <ja...@gmail.com>.
Herbert, please apply this patch and see if this resolves your issue.

https://issues.apache.org/jira/secure/attachment/12365279/patch2.diff

herbert wrote:
> 
> 
> herbert wrote:
>>
>> I will try it again.
>> I will capture the post again, this time with a 0.30-client and also in
>> hex-dump notation 
>> and post it to the JIRA.
>>
>>
> 
> I've just added the attachment "UTF-8-problem_2.zip" to the ABDERA-61-issue,
> containing the new ethereal-TCP-stream-shots.
> 
> Herbert

Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by herbert <io...@gmx.de>.


herbert wrote:
> 
> 
> I will try it again.
> I will capture the post again, this time with a 0.30-client and also in
> hex-dump notation 
> and post it to the JIRA.
> 
> 

I've just added the attachment "UTF-8-problem_2.zip" to the ABDERA-61-issue,
containing the new ethereal-TCP-stream-shots.

Herbert
-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28ABDERA-61%29-Invalid-byte-2-of-3-byte-UTF-8-sequence-tf4383132.html#a12521370
Sent from the abdera-dev mailing list archive at Nabble.com.


Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by James M Snell <ja...@gmail.com>.

herbert wrote:
> [snip]
> I'm very sure, that I've applied the patch right, but it was the first time,
> I've done this,
> so there is the eventuality, that I made something wrong.
> A way to re-check my patching would be, if you provide me with the already
> patched jars, though
> - as just said - I'm quite sure, that I made it the correct way.
> 

I can send you the jars if you'd like.

> Does anyone has the facility to reproduce my issue against a
> Lotus-Connections-1.0.1-Server?
> 

I should be able to hit the internal IBM activities server later today.

> [snip]
> I do not use chunked requests explicitley. I assume, that's something 
> abdera-client does for me, through the way I've set up my code.
> 

This was the default setting for 0.2.2.  It has been disabled by default
in 0.3.0

> Yes the "ethereal stream"'s were shot with release 0.2.2 but the
> same error occurs with release 0.3.0.
> 

Hmm.. ok.

> 
> James M Snell wrote:
>>
>> [...]
>> .. and you did not capture the actual bits of the
>> xml in the post.  
>> [...]
>>
>>
> 
> I'm not sure, what you mean with "the actual bits of the
> xml in the post.". I've captured the whole TCP-Stream, how ethereal
> gave it to me.  IMO the xml of the post is included also. hmmm.
> 

I did not see any xml for the second stream.

> [snip]
> I will try it again.
> I will capture the post again, this time with a 0.30-client and also in
> hex-dump notation 
> and post it to the JIRA.
> 
> Thanks anyway for all hints and help-attempts!
> 

:-)

- James

Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by herbert <io...@gmx.de>.
Hi there!

1)

Chris Berry wrote:
> 
> 
>>[...] 
>>So what is happening is that the Reader (created by StAXUtils and
subsequently Woodstox)
>>uses the default encoding (MacRoman in my case)
>>Which is the reason why it works in Linux -- the default encoding is
UTF-8.
>>
>>I don't know what Herbert's default encoding is....
> 
> 

I'm using Windows XP Professional, so I assume, my default encoding is
Cp1252.
How can I find out exactly, which default encoding I'm using?

Is it the file-encoding I get, when printing out this line:
System.out.println("My encoding: "+System.getProperty("file.encoding")); ?
This gives me a Cp1252.

2)

James M Snell wrote:
> 
> 
>> [...]
>> Please give this patch a try.  This is a refactored version of what
>> Chris provided earlier.
>> [...]
> 
> 

I've checked out the branch 0.3.0-incubating 573049, applied your patch to
it
and made an ant-build-dist.

I've tried the freshly-built jars in my test-project and it also does *not*
work. Same thingi.

I'm very sure, that I've applied the patch right, but it was the first time,
I've done this,
so there is the eventuality, that I made something wrong.
A way to re-check my patching would be, if you provide me with the already
patched jars, though
- as just said - I'm quite sure, that I made it the correct way.

Does anyone has the facility to reproduce my issue against a
Lotus-Connections-1.0.1-Server?

3)

James M Snell wrote:
> 
> 
> [...]
> Hmmm...  notice that in your "ethereal stream 2", you're using chunked
> requests, release 0.2.2... 
> [...]
> 
> 

I do not use chunked requests explicitley. I assume, that's something 
abdera-client does for me, through the way I've set up my code.

Yes the "ethereal stream"'s were shot with release 0.2.2 but the
same error occurs with release 0.3.0.


James M Snell wrote:
> 
> 
> [...]
> .. and you did not capture the actual bits of the
> xml in the post.  
> [...]
> 
> 

I'm not sure, what you mean with "the actual bits of the
xml in the post.". I've captured the whole TCP-Stream, how ethereal
gave it to me.  IMO the xml of the post is included also. hmmm.



James M Snell wrote:
> 
> 
> [...]
> Would it be possible for you to capture another trace
> that includes the xml of the post?  Also, would it be possible for you
> to capture the hex output.  I want to see what bytes are actually being
> written to the wire.
> [...]
> 
> 

I will try it again.
I will capture the post again, this time with a 0.30-client and also in
hex-dump notation 
and post it to the JIRA.

Thanks anyway for all hints and help-attempts!

Herbert
-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28ABDERA-61%29-Invalid-byte-2-of-3-byte-UTF-8-sequence-tf4383132.html#a12520551
Sent from the abdera-dev mailing list archive at Nabble.com.


Re: [jira] Created: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by James M Snell <ja...@gmail.com>.
Hmmm...  notice that in your "ethereal stream 2", you're using chunked
requests, release 0.2.2 and you did not capture the actual bits of the
xml in the post.  Would it be possible for you to capture another trace
that includes the xml of the post?  Also, would it be possible for you
to capture the hex output.  I want to see what bytes are actually being
written to the wire.

- James

herbert welker (JIRA) wrote:
> Invalid byte 2 of 3-byte UTF-8 sequence
> ---------------------------------------
> 
>                  Key: ABDERA-61
>                  URL: https://issues.apache.org/jira/browse/ABDERA-61
>              Project: Abdera
>           Issue Type: Bug
>     Affects Versions: 0.2.2, 0.3.0
>          Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
> The JDK is:
> C:\Programme\IBM\SDP70\jdk\bin>java -version
> java version "1.5.0"
> Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
> ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
> 0323 (JIT enabled)
> J9VM - 20070322_12058_lHdSMR
> JIT  - 20070109_1805ifx3_r8
> GC   - WASIFIX_2007)
> JCL  - 20070131
> 
> But the program runs under JDK-Compliance-Level-1.4
> 
> The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
>             Reporter: herbert welker
> 
> 
> When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:
> 
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
> 
> Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."
> 
> More description and the java-code of my client is given in the attached files.
> 

[jira] Updated: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by "James M Snell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ABDERA-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James M Snell updated ABDERA-61:
--------------------------------

    Attachment: patch2.diff

This patch tells the serializer to use the appropriate charset

> Invalid byte 2 of 3-byte UTF-8 sequence
> ---------------------------------------
>
>                 Key: ABDERA-61
>                 URL: https://issues.apache.org/jira/browse/ABDERA-61
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.2.2, 0.3.0
>         Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
> The JDK is:
> C:\Programme\IBM\SDP70\jdk\bin>java -version
> java version "1.5.0"
> Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
> ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
> 0323 (JIT enabled)
> J9VM - 20070322_12058_lHdSMR
> JIT  - 20070109_1805ifx3_r8
> GC   - WASIFIX_2007)
> JCL  - 20070131
> But the program runs under JDK-Compliance-Level-1.4
> The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
>            Reporter: herbert welker
>         Attachments: patch.diff, patch2.diff, UTF-8-problem.zip, UTF-8-problem_2.zip
>
>
> When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
> Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."
> More description and the java-code of my client is given in the attached files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by "James M Snell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ABDERA-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James M Snell updated ABDERA-61:
--------------------------------

    Attachment: patch.diff

Here's a refactored version of the change Chris posted to the list.  This should be a bit more robust and will hopefully address the problem.  Please give this a test in your local environments and if it works, I'll commit the change to the trunk and the 0.3.0 branch.

> Invalid byte 2 of 3-byte UTF-8 sequence
> ---------------------------------------
>
>                 Key: ABDERA-61
>                 URL: https://issues.apache.org/jira/browse/ABDERA-61
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.2.2, 0.3.0
>         Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
> The JDK is:
> C:\Programme\IBM\SDP70\jdk\bin>java -version
> java version "1.5.0"
> Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
> ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
> 0323 (JIT enabled)
> J9VM - 20070322_12058_lHdSMR
> JIT  - 20070109_1805ifx3_r8
> GC   - WASIFIX_2007)
> JCL  - 20070131
> But the program runs under JDK-Compliance-Level-1.4
> The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
>            Reporter: herbert welker
>         Attachments: patch.diff, UTF-8-problem.zip
>
>
> When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
> Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."
> More description and the java-code of my client is given in the attached files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by "herbert welker (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ABDERA-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

herbert welker updated ABDERA-61:
---------------------------------

    Attachment: UTF-8-problem_2.zip

> Invalid byte 2 of 3-byte UTF-8 sequence
> ---------------------------------------
>
>                 Key: ABDERA-61
>                 URL: https://issues.apache.org/jira/browse/ABDERA-61
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.2.2, 0.3.0
>         Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
> The JDK is:
> C:\Programme\IBM\SDP70\jdk\bin>java -version
> java version "1.5.0"
> Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
> ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
> 0323 (JIT enabled)
> J9VM - 20070322_12058_lHdSMR
> JIT  - 20070109_1805ifx3_r8
> GC   - WASIFIX_2007)
> JCL  - 20070131
> But the program runs under JDK-Compliance-Level-1.4
> The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
>            Reporter: herbert welker
>         Attachments: patch.diff, UTF-8-problem.zip, UTF-8-problem_2.zip
>
>
> When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
> Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."
> More description and the java-code of my client is given in the attached files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (ABDERA-61) Invalid byte 2 of 3-byte UTF-8 sequence

Posted by "James M Snell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ABDERA-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James M Snell resolved ABDERA-61.
---------------------------------

    Resolution: Fixed

Force the use of UTF-8 when serializing an entry when the charset is not otherwise specified

> Invalid byte 2 of 3-byte UTF-8 sequence
> ---------------------------------------
>
>                 Key: ABDERA-61
>                 URL: https://issues.apache.org/jira/browse/ABDERA-61
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.2.2, 0.3.0
>         Environment: The System runs a RAD 7.0.0.3 on Windows XP Professional
> The JDK is:
> C:\Programme\IBM\SDP70\jdk\bin>java -version
> java version "1.5.0"
> Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (if
> ix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-2007
> 0323 (JIT enabled)
> J9VM - 20070322_12058_lHdSMR
> JIT  - 20070109_1805ifx3_r8
> GC   - WASIFIX_2007)
> JCL  - 20070131
> But the program runs under JDK-Compliance-Level-1.4
> The Lotus-Connections-Server runs on a WebSphere 6.1.0.9 on a Windows XP Professional System.
>            Reporter: herbert welker
>         Attachments: patch.diff, patch2.diff, UTF-8-problem.zip, UTF-8-problem_2.zip
>
>
> When trying to create an Atom-Entry with the abdera-0.22-client on a Lotus-Connections-Server, the server (Lotus-Connections-1.0.1) responds with a HTTP-400-Error-message:
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
> Some googleing gives "...The most likely cause is that the document you are uploading specifies that it is in UTF-8 encoding, but that it contains non-UTF-8 characters. As UTF-8 is the default character set for XML, it might also be the case that the document does not specify a character set at all."
> More description and the java-code of my client is given in the attached files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.