You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by jayachandra <ja...@gmail.com> on 2005/04/21 14:58:28 UTC

[Axis2] Notes on underlying MXparser of OM

Hi all!
Continuing my work on XMLConformance. I've naively implemented
OMComment, OMPI and OMDTD (makeshift vanilla implementation without
any sort of validation). And did the XMLConformance testing. In the
test suite provided by W3C there are a whole lot invalid and
ill-formed XMLs along with valid ones. In this phase I wanted to
concentrate how well we can deal with the valid ones, letting aside
rejecting invalid and ill-formed ones. So pruned for valid XMLs and
used them to test the OM comformance against them.
With the makeshift implementations for DTD, PI and OM though I
expected a 100% success in parsing the XML files, it didn't happen
quite so. Only 761 got parsed out of 960 input XMLs. In this
connection, I've observed a few limitations of the stAXparser we are
using that are worth mentioning and require serious attention

1. The MXParser doesn't seem to support the UTF-8 character set fully.
Japanese XML files weren't parsed properly. In future, this could
throw a serious problem. This can have it's effect on the SOAP message
processing of foreign web services.

2. The DTDParser inside MXParser was failing to understand the DTD
declaration line(s) of several (complex?) DTDs. Though this might not
seem as a problem if you look at the SOAP message processing part of
it, but certainly with such a behaviour complete XML infoset support
can not be given to our OM.

Thanks
Jaya
-- 
-- Jaya

Re: [Axis2] Notes on underlying MXparser of OM

Posted by Venkat Reddy <vr...@gmail.com>.
On 4/24/05, Aleksander Slominski <as...@cs.indiana.edu> wrote:
> Venkat Reddy wrote:
> 
> >>>I've some problem using yahoogroups
> >>>
> >>what problems do you have?
> >>
> >
> >The URL is blocked by firewall restrictions.  :-(
> >
> so you are not even able to subscribe to mailing list?

Yes.

> 
> does the same happens if you use https://?

Yes.

> 
> alek
> 
> >- venkat
> >
> >
> >
> 
> --
> The best way to predict the future is to invent it - Alan Kay
> 
>

Re: [Axis2] Notes on underlying MXparser of OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Venkat Reddy wrote:

>>>I've some problem using yahoogroups
>>>
>>>
>>>      
>>>
>>what problems do you have?
>>
>>    
>>
>
>The URL is blocked by firewall restrictions.  :-(
>  
>
so you are not even able to subscribe to mailing list?

does the same happens if you use https://?

alek

>- venkat
>
>  
>


-- 
The best way to predict the future is to invent it - Alan Kay


Re: [Axis2] Notes on underlying MXparser of OM

Posted by Venkat Reddy <vr...@gmail.com>.
> >I've some problem using yahoogroups
> >
> >
> what problems do you have?
> 

The URL is blocked by firewall restrictions.  :-(

- venkat

Re: [Axis2] Notes on underlying MXparser of OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
jayachandra wrote:

>Alek!
>The problem is that of localization. Businesses which publish local
>language wsdl's can not be understood by Axis2.
>The following (attached) XML file written in chinese/japanese couldn't
>be parsed by the MXParser. Parsing threw a XMLStreamException saying
>"start tag unexpected character \ub9" the moment it encountered a
>foreign character.
>  
>
how did you do parsing? did you makes sure to pass Reader that is 
understanding UTF8 byte stream?

>Other than yahoo group is their some kind of mailing list for stAX
>community? 
>
yahoo group is the right place for this kind of questions.

>I've some problem using yahoogroups
>  
>
what problems do you have?

thanks,

alek

>On 4/21/05, Aleksander Slominski <as...@cs.indiana.edu> wrote:
>  
>
>>jayachandra wrote:
>>
>>    
>>
>>>1. The MXParser doesn't seem to support the UTF-8 character set fully.
>>>Japanese XML files weren't parsed properly. In future, this could
>>>throw a serious problem. This can have it's effect on the SOAP message
>>>processing of foreign web services.
>>>
>>>
>>>      
>>>
>>what is the problem? makes ure to send email about problems to
>>http://groups.yahoo.com/group/stax_builders/
>>and file bug report(s) in bugzilla for StAX RI
>>(http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX)
>>
>>    
>>
>>>2. The DTDParser inside MXParser was failing to understand the DTD
>>>declaration line(s) of several (complex?) DTDs. Though this might not
>>>seem as a problem if you look at the SOAP message processing part of
>>>it, but certainly with such a behaviour complete XML infoset support
>>>can not be given to our OM.
>>>
>>>
>>>      
>>>
>>are you actually testing validating or non-validating parser?
>>requirement for non-validating parser are different in regard to DTDs.
>>
>>moreover DTDs are to be rejected by SOAP processor anyway ...
>>
>>thanks,
>>
>>alek
>>
>>--
>>The best way to predict the future is to invent it - Alan Kay
>>
>>
>>    
>>
>
>
>  
>
>------------------------------------------------------------------------
>
><?xml version="1.0"?>
><!DOCTYPE 週� � SYSTEM "weekly-utf-8.dtd">
><!-- 週� �サンプル -->
><週� �>
>  <年月週>
>    <年度>1997</年度>
>    <月度>1</月度>
>    <週>1</週>
>  </年月週>
>
>  <氏名>
>    <氏>山田</氏>
>    <名>太郎</名>
>  </氏名>
>
>  <業務� �告リスト>
>    <業務� �告>
>      <業務名>XMLエディターの作成</業務名>
>      <業務コード>X3355-23</業務コード>
>      <工数管理>
>        <見積もり工数>1600</見積もり工数>
>        <実績工数>320</実績工数>
>        <当月見積もり工数>160</当月見積もり工数>
>        <当月実績工数>24</当月実績工数>
>      </工数管理>
>      <予定� �目リスト>
>        <予定� �目>
>          <P>XMLエディターの基本仕様の作成</P>
>        </予定� �目>
>      </予定� �目リスト>
>      <実施事� �リスト>
>        <実施事� �>
>          <P>XMLエディターの基本仕様の作成</P>
>        </実施事� �>
>        <実施事� �>
>          <P>競合他社製品の機能調査</P>
>        </実施事� �>
>      </実施事� �リスト>
>      <上長への要請事� �リスト>
>        <上長への要請事� �>
>          <P>特になし</P>
>        </上長への要請事� �>
>      </上長への要請事� �リスト>
>      <問題点対策>
>        <P>XMLとは何かわからない。</P>
>      </問題点対策>
>    </業務� �告>
>
>    <業務� �告>
>      <業務名>検索エンジンの開発</業務名>
>      <業務コード>S8821-76</業務コード>
>      <工数管理>
>        <見積もり工数>120</見積もり工数>
>        <実績工数>6</実績工数>
>        <当月見積もり工数>32</当月見積もり工数>
>        <当月実績工数>2</当月実績工数>
>      </工数管理>
>      <予定� �目リスト>
>        <予定� �目>
>          <P><A href="http://www.goo.ne.jp">goo</A>の機能を調べてみる</P>
>        </予定� �目>
>      </予定� �目リスト>
>      <実施事� �リスト>
>        <実施事� �>
>          <P>更に、どういう検索エンジンがあるか調査する</P>
>        </実施事� �>
>      </実施事� �リスト>
>      <上長への要請事� �リスト>
>        <上長への要請事� �>
>          <P>開発をするのはめんどうなので、Yahoo!を買収して下さい。</P>
>        </上長への要請事� �>
>      </上長への要請事� �リスト>
>      <問題点対策>
>        <P>検索エンジンで車を走らせることができない。(要調査)</P>
>      </問題点対策>
>    </業務� �告>
>  </業務� �告リスト>
></週� �>
>  
>
>------------------------------------------------------------------------
>
><!--
>*************************************************************************
>*                                                                       *
>*	DPSD PDG週� �用DTD	  weekly.dtd                            *
>*                                                                       *
>*   Copyright 1997 Fuji Xerox Information Systems Co.,Ltd.              *
>*                                                                       *
>*************************************************************************
>-->
>
>
><!-- 変更履歴
>    Version 1.0 1997/10/29 作成   村田真
>-->
>
><!ELEMENT 週� �                    (年月週, 氏名, 業務� �告リスト)>
>
><!ELEMENT 年月週                  (年度, 月度, 週)>
><!ELEMENT 年度                    (#PCDATA)> <!-- 年度を表す数字 -->
><!ELEMENT 月度                    (#PCDATA)> <!-- 月度を表す数字 -->
><!ELEMENT 週                      (#PCDATA)> <!-- 何週目かを表す数字 -->
>
><!ELEMENT 氏名                    (氏, 名)>
><!ELEMENT 氏                      (#PCDATA)>
><!ELEMENT 名                      (#PCDATA)>
>
><!ELEMENT 業務� �告リスト          (業務� �告+)>
><!ELEMENT 業務� �告                (業務名, 業務コード, 工数管理, 
>                                   予定� �目リスト, 
>                                   実施事� �リスト, 
>                                   上長への要請事� �リスト,
>                                   問題点対策?)>
><!ELEMENT 業務名                  (#PCDATA)>  <!-- 業務コード一覧を参照 -->
><!ELEMENT 業務コード              (#PCDATA)>  <!-- 業務コード一覧を参照 -->
>
><!ELEMENT 工数管理                (見積もり工数, 実績工数, 
>                                   当月見積もり工数, 当月実績工数)>
><!ELEMENT 見積もり工数            (#PCDATA)>  <!-- 単位は時間 -->
><!ELEMENT 実績工数                (#PCDATA)>  <!-- 単位は時間 -->
><!ELEMENT 当月見積もり工数        (#PCDATA)>  <!-- 単位は時間 -->
><!ELEMENT 当月実績工数            (#PCDATA)>  <!-- 単位は時間 -->
>
><!ELEMENT 予定� �目リスト          (予定� �目*)>
><!ELEMENT 予定� �目                ((P | OL | UL)+)>
><!ELEMENT 実施事� �リスト          (実施事� �*)>
><!ELEMENT 実施事� �                ((P | OL | UL)+)>
><!ELEMENT 問題点対策              ((P | OL | UL)+)>
>
><!ELEMENT 上長への要請事� �リスト  (上長への要請事� �*)>
><!ELEMENT 上長への要請事� �        ((P | OL | UL)+)>
>
>
><!-- XMLであらかじめ定義された実体 -->
><!ENTITY lt     "&#38;#60;"> 
><!ENTITY gt     "&#62;"> 
><!ENTITY amp    "&#38;#38;"> 
><!ENTITY apos   "&#39;"> 
><!ENTITY quot   "&#34;">
>
><!-- HTMLの汎用的なタグ -->
><!ELEMENT P      (#PCDATA | EM | STRONG | A)*>
><!ELEMENT OL     (LI)*>
><!ELEMENT UL     (LI)*>
><!ELEMENT LI     (#PCDATA | EM | STRONG | A)*>
><!ELEMENT EM     (#PCDATA | EM | STRONG | A)*>
><!ELEMENT STRONG (#PCDATA | EM | STRONG | A)*>
><!ELEMENT A      (#PCDATA | EM | STRONG)*>
><!ATTLIST A
>        name    CDATA   #IMPLIED
>        href    CDATA   #IMPLIED
>        >
>  
>


-- 
The best way to predict the future is to invent it - Alan Kay


Re: [Axis2] Notes on underlying MXparser of OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
jayachandra wrote:

>Sorry!
>Just now found a solution on the stAX community how to make stAX
>parser understand the UTF encoding
>  
>
great that this problem is solved!

alek

>On 4/22/05, jayachandra <ja...@gmail.com> wrote:
>  
>
>>Alek!
>>The problem is that of localization. Businesses which publish local
>>language wsdl's can not be understood by Axis2.
>>The following (attached) XML file written in chinese/japanese couldn't
>>be parsed by the MXParser. Parsing threw a XMLStreamException saying
>>"start tag unexpected character \ub9" the moment it encountered a
>>foreign character.
>>Other than yahoo group is their some kind of mailing list for stAX
>>community? I've some problem using yahoogroups
>>
>>Thank you
>>Jayachandra
>>
>>On 4/21/05, Aleksander Slominski <as...@cs.indiana.edu> wrote:
>>    
>>
>>>jayachandra wrote:
>>>
>>>      
>>>
>>>>1. The MXParser doesn't seem to support the UTF-8 character set fully.
>>>>Japanese XML files weren't parsed properly. In future, this could
>>>>throw a serious problem. This can have it's effect on the SOAP message
>>>>processing of foreign web services.
>>>>
>>>>
>>>>        
>>>>
>>>what is the problem? makes ure to send email about problems to
>>>http://groups.yahoo.com/group/stax_builders/
>>>and file bug report(s) in bugzilla for StAX RI
>>>(http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX)
>>>
>>>      
>>>
>>>>2. The DTDParser inside MXParser was failing to understand the DTD
>>>>declaration line(s) of several (complex?) DTDs. Though this might not
>>>>seem as a problem if you look at the SOAP message processing part of
>>>>it, but certainly with such a behaviour complete XML infoset support
>>>>can not be given to our OM.
>>>>
>>>>
>>>>        
>>>>
>>>are you actually testing validating or non-validating parser?
>>>requirement for non-validating parser are different in regard to DTDs.
>>>
>>>moreover DTDs are to be rejected by SOAP processor anyway ...
>>>
>>>thanks,
>>>
>>>alek
>>>
>>>--
>>>The best way to predict the future is to invent it - Alan Kay
>>>
>>>
>>>      
>>>
>>--
>>-- Jaya
>>
>>
>>
>>    
>>
>
>
>  
>


-- 
The best way to predict the future is to invent it - Alan Kay


Re: [Axis2] Notes on underlying MXparser of OM

Posted by jayachandra <ja...@gmail.com>.
Sorry!
Just now found a solution on the stAX community how to make stAX
parser understand the UTF encoding

Jaya

On 4/22/05, jayachandra <ja...@gmail.com> wrote:
> Alek!
> The problem is that of localization. Businesses which publish local
> language wsdl's can not be understood by Axis2.
> The following (attached) XML file written in chinese/japanese couldn't
> be parsed by the MXParser. Parsing threw a XMLStreamException saying
> "start tag unexpected character \ub9" the moment it encountered a
> foreign character.
> Other than yahoo group is their some kind of mailing list for stAX
> community? I've some problem using yahoogroups
> 
> Thank you
> Jayachandra
> 
> On 4/21/05, Aleksander Slominski <as...@cs.indiana.edu> wrote:
> > jayachandra wrote:
> >
> > >1. The MXParser doesn't seem to support the UTF-8 character set fully.
> > >Japanese XML files weren't parsed properly. In future, this could
> > >throw a serious problem. This can have it's effect on the SOAP message
> > >processing of foreign web services.
> > >
> > >
> > what is the problem? makes ure to send email about problems to
> > http://groups.yahoo.com/group/stax_builders/
> > and file bug report(s) in bugzilla for StAX RI
> > (http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX)
> >
> > >2. The DTDParser inside MXParser was failing to understand the DTD
> > >declaration line(s) of several (complex?) DTDs. Though this might not
> > >seem as a problem if you look at the SOAP message processing part of
> > >it, but certainly with such a behaviour complete XML infoset support
> > >can not be given to our OM.
> > >
> > >
> > are you actually testing validating or non-validating parser?
> > requirement for non-validating parser are different in regard to DTDs.
> >
> > moreover DTDs are to be rejected by SOAP processor anyway ...
> >
> > thanks,
> >
> > alek
> >
> > --
> > The best way to predict the future is to invent it - Alan Kay
> >
> >
> 
> --
> -- Jaya
> 
> 
> 


-- 
-- Jaya

Re: [Axis2] Notes on underlying MXparser of OM

Posted by jayachandra <ja...@gmail.com>.
Alek!
The problem is that of localization. Businesses which publish local
language wsdl's can not be understood by Axis2.
The following (attached) XML file written in chinese/japanese couldn't
be parsed by the MXParser. Parsing threw a XMLStreamException saying
"start tag unexpected character \ub9" the moment it encountered a
foreign character.
Other than yahoo group is their some kind of mailing list for stAX
community? I've some problem using yahoogroups

Thank you
Jayachandra

On 4/21/05, Aleksander Slominski <as...@cs.indiana.edu> wrote:
> jayachandra wrote:
> 
> >1. The MXParser doesn't seem to support the UTF-8 character set fully.
> >Japanese XML files weren't parsed properly. In future, this could
> >throw a serious problem. This can have it's effect on the SOAP message
> >processing of foreign web services.
> >
> >
> what is the problem? makes ure to send email about problems to
> http://groups.yahoo.com/group/stax_builders/
> and file bug report(s) in bugzilla for StAX RI
> (http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX)
> 
> >2. The DTDParser inside MXParser was failing to understand the DTD
> >declaration line(s) of several (complex?) DTDs. Though this might not
> >seem as a problem if you look at the SOAP message processing part of
> >it, but certainly with such a behaviour complete XML infoset support
> >can not be given to our OM.
> >
> >
> are you actually testing validating or non-validating parser?
> requirement for non-validating parser are different in regard to DTDs.
> 
> moreover DTDs are to be rejected by SOAP processor anyway ...
> 
> thanks,
> 
> alek
> 
> --
> The best way to predict the future is to invent it - Alan Kay
> 
> 


-- 
-- Jaya

Re: [Axis2] Notes on underlying MXparser of OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
jayachandra wrote:

>1. The MXParser doesn't seem to support the UTF-8 character set fully.
>Japanese XML files weren't parsed properly. In future, this could
>throw a serious problem. This can have it's effect on the SOAP message
>processing of foreign web services.
>  
>
what is the problem? makes ure to send email about problems to 
http://groups.yahoo.com/group/stax_builders/
and file bug report(s) in bugzilla for StAX RI 
(http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX)

>2. The DTDParser inside MXParser was failing to understand the DTD
>declaration line(s) of several (complex?) DTDs. Though this might not
>seem as a problem if you look at the SOAP message processing part of
>it, but certainly with such a behaviour complete XML infoset support
>can not be given to our OM.
>  
>
are you actually testing validating or non-validating parser? 
requirement for non-validating parser are different in regard to DTDs.

moreover DTDs are to be rejected by SOAP processor anyway ...

thanks,

alek

-- 
The best way to predict the future is to invent it - Alan Kay