You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mina.apache.org by Fernando Padilla <fe...@alum.mit.edu> on 2009/10/14 16:25:57 UTC

[mina] xml codec IoFilter?

Hi.  I would love to have an IoFilter that would take in a stream and be 
able to spit out xml? (kind of like sax?)

Does anyone have any leads for pre-built ones?  Or should I dream about 
making one? :)


Re: [mina] xml codec IoFilter?

Posted by Julien Vermillard <jv...@archean.fr>.
Le Wed, 14 Oct 2009 17:58:30 +0200,
Bernd Fondermann <be...@googlemail.com> a écrit :

> On Wed, Oct 14, 2009 at 16:25, Fernando Padilla <fe...@alum.mit.edu>
> wrote:
> > Hi.  I would love to have an IoFilter that would take in a stream
> > and be able to spit out xml? (kind of like sax?)
> >
> > Does anyone have any leads for pre-built ones?  Or should I dream
> > about making one? :)
> 
> That would be a nightmare then ;-)
> Seriously, XML is very complex. Namespaces, entities, tree-like
> structures.
> 
> For Vysper, I tried to hook MINA into a streaming XML parser ("StAX").
> But the parsers I looked at were unable to re-parse partial HTML and
> had issues as their buffers ran dry (stopped working).
> Maybe I just didn't get it right and you will.
> 
> In the end, for Vysper I wrote such a parser, but it's only parsing
> the XMPP XML subset, not proper XML.
> 
>   Bernd
Yep, I think something like protocol buffers or ASN.1/BER is much more
efficient than XML.

Julien

Re: [mina] xml codec IoFilter?

Posted by Emmanuel Lecharny <el...@apache.org>.
Sounds like a no go...

On Thu, Oct 15, 2009 at 10:12 AM, Julien Vermillard
<jv...@archean.fr> wrote:
> Le Thu, 15 Oct 2009 08:03:02 +0000 (UTC),
> Fredrik Jonson <fr...@myrealbox.com> a écrit :
>
>> Emmanuel Lecharny wrote:
>>
>> >  Actually, this very problem was the run I discussed with Bernd
>> > last spring during ApacheCon, as I was looking for a XML parsing
>> > supporting stops in the middle of a XML tag. We need some XML
>> > parser that support this kind of partial data, and can recover from
>> > it. Not simple ...
>>
>> Haven't used it, but from your description it sounds like the alto xml
>> processor might be up for the job. It's developed by one of the guys
>> behind woodstox.
>>
>> http://wiki.fasterxml.com/AaltoHome
>> http://www.cowtowncoder.com/blog/blog.html
>>
>> Seems the license hasn't been decided yet, a bit of a bum.
>>
>
> Look like it's GPLed :
>
>    * Basic GPL ("totally Free") for Free Software (read GNU Manifesto
> for more), such as research projects and free/open software developers
>
>    * Negotiatable commercial license by FasterXML, LLC for commercial
>      entities that prefer full control over their usage of Aalto,
>      including distribution without requirement for opening their
>      source code. (Send email to info@fasterxml.com for more
>      information.)
>
>
>



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [mina] xml codec IoFilter?

Posted by Julien Vermillard <jv...@archean.fr>.
Le Thu, 15 Oct 2009 08:03:02 +0000 (UTC),
Fredrik Jonson <fr...@myrealbox.com> a écrit :

> Emmanuel Lecharny wrote:
> 
> >  Actually, this very problem was the run I discussed with Bernd
> > last spring during ApacheCon, as I was looking for a XML parsing
> > supporting stops in the middle of a XML tag. We need some XML
> > parser that support this kind of partial data, and can recover from
> > it. Not simple ...
> 
> Haven't used it, but from your description it sounds like the alto xml
> processor might be up for the job. It's developed by one of the guys
> behind woodstox. 
> 
> http://wiki.fasterxml.com/AaltoHome
> http://www.cowtowncoder.com/blog/blog.html
> 
> Seems the license hasn't been decided yet, a bit of a bum.
> 

Look like it's GPLed : 

    * Basic GPL ("totally Free") for Free Software (read GNU Manifesto
for more), such as research projects and free/open software developers 

    * Negotiatable commercial license by FasterXML, LLC for commercial
      entities that prefer full control over their usage of Aalto,
      including distribution without requirement for opening their
      source code. (Send email to info@fasterxml.com for more
      information.) 



Re: [mina] xml codec IoFilter?

Posted by Fredrik Jonson <fr...@myrealbox.com>.
Emmanuel Lecharny wrote:

>  Actually, this very problem was the run I discussed with Bernd last 
>  spring during ApacheCon, as I was looking for a XML parsing supporting 
>  stops in the middle of a XML tag. We need some XML parser that support 
>  this kind of partial data, and can recover from it. Not simple ...

Haven't used it, but from your description it sounds like the alto xml
processor might be up for the job. It's developed by one of the guys
behind woodstox. 

http://wiki.fasterxml.com/AaltoHome
http://www.cowtowncoder.com/blog/blog.html

Seems the license hasn't been decided yet, a bit of a bum.

-- 
Fredrik Jonson


Re: [mina] xml codec IoFilter?

Posted by Fernando Padilla <fe...@alum.mit.edu>.
Well, I gave it a try, and reviewing what Vysper actually did makes it 
seem a lot more manageable.  There really are a handful of cases, and 
most of them are plainly ignored (comment,pi,doctype), most are just 
text handling (cdata,text).  The more complicated one is element-tag, 
which has several sub-states (elementname, attributename, 
attributevalue).  But vysper ignores the element-tag sub-states, and 
simply waits until element-tag is all there before parsing name/attrs 
(<el attr="attr"> or <el attr="attr"/>) (which is a good first 
implementation of this, but can be easily enhanced too).

I wrote up a draft version of a SAX parser for Mina last week, which I 
think is not a bad representation of what I'm thinking. Since a SAX 
parser is free to call back to a listener as it sees fit.  Then I was 
thinking we could create another codec/processor that would have various 
options on how to convert the sax event stream into a DOM event stream.  
Since some applications want a full document (only using it for NIO 
parsing), while other applications want a unbounded stream of dom 
elements ( like vysper/xmpp ).

Not sure where to put up the code to get comments.. maybe I should learn 
github. :)



On 10/19/09 10:39 PM, Ashish wrote:
>> Actually, this very problem was the run I discussed with Bernd last spring
>> during ApacheCon, as I was looking for a XML parsing supporting stops in the
>> middle of a XML tag. We need some XML parser that support this kind of
>> partial data, and can recover from it. Not simple ...
>>
>>      
>
> Mine was working fine partially, though I didn't tested it for all the
> use cases.
> Had tried both the approaches, first was to extend an external parser
> to support this. It worked for simple cases.
> The second was a bit dumb solution, but worked fine. Manually just
> look for start and end (root elements) of XML.
> Once complete xml is received, slice the buffer and pass it to a full
> blown parser to do actual XML parsing. It kept life real simple.
> However, the problem was less than solved, as I was unable to handle
> misbehaving clients, like never sending end element, and starting a
> new XML. Though rare but implementation has to be robust enough to
> deal with them.
>
> I will see if I still have the code :-(
>
> A straight out of box solution won't work, as a TCP packet can have
> end of one xml and start of next one :-)
> This was the reason why I opted for dumb approach. Else we make our
> parser to slice the complete xml and leave the unfinished data in
> buffer. This is where the real challenge lies.
>
> What I was thinking was to reduce two passes. Modify XML parser to
> work on packets or on pure stream. Packets approach would be more
> challenging. Parse the packet, keep the XML tree, as and when the tree
> is complete, return the XML tree. Or pass on packets to parser and let
> it parse. Catch uncomplete xml/data exception and store the data in
> memory or file system. Once it completes the xml, get the xml, slice
> the stream.
>
> Have to stop here else it shall become an essay :-)
>
> Good Luck
>
>
>    

Re: [mina] xml codec IoFilter?

Posted by Ashish <pa...@gmail.com>.
> Actually, this very problem was the run I discussed with Bernd last spring
> during ApacheCon, as I was looking for a XML parsing supporting stops in the
> middle of a XML tag. We need some XML parser that support this kind of
> partial data, and can recover from it. Not simple ...
>


Mine was working fine partially, though I didn't tested it for all the
use cases.
Had tried both the approaches, first was to extend an external parser
to support this. It worked for simple cases.
The second was a bit dumb solution, but worked fine. Manually just
look for start and end (root elements) of XML.
Once complete xml is received, slice the buffer and pass it to a full
blown parser to do actual XML parsing. It kept life real simple.
However, the problem was less than solved, as I was unable to handle
misbehaving clients, like never sending end element, and starting a
new XML. Though rare but implementation has to be robust enough to
deal with them.

I will see if I still have the code :-(

A straight out of box solution won't work, as a TCP packet can have
end of one xml and start of next one :-)
This was the reason why I opted for dumb approach. Else we make our
parser to slice the complete xml and leave the unfinished data in
buffer. This is where the real challenge lies.

What I was thinking was to reduce two passes. Modify XML parser to
work on packets or on pure stream. Packets approach would be more
challenging. Parse the packet, keep the XML tree, as and when the tree
is complete, return the XML tree. Or pass on packets to parser and let
it parse. Catch uncomplete xml/data exception and store the data in
memory or file system. Once it completes the xml, get the xml, slice
the stream.

Have to stop here else it shall become an essay :-)

Good Luck


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: [mina] xml codec IoFilter?

Posted by Emmanuel Lecharny <el...@apache.org>.
Bernd Fondermann wrote:
> On Wed, Oct 14, 2009 at 16:25, Fernando Padilla <fe...@alum.mit.edu> wrote:
>   
>> Hi.  I would love to have an IoFilter that would take in a stream and be
>> able to spit out xml? (kind of like sax?)
>>
>> Does anyone have any leads for pre-built ones?  Or should I dream about
>> making one? :)
>>     
>
> That would be a nightmare then ;-)
> Seriously, XML is very complex. Namespaces, entities, tree-like structures.
>
> For Vysper, I tried to hook MINA into a streaming XML parser ("StAX").
> But the parsers I looked at were unable to re-parse partial HTML and
> had issues as their buffers ran dry (stopped working).
> Maybe I just didn't get it right and you will.
>
> In the end, for Vysper I wrote such a parser, but it's only parsing
> the XMPP XML subset, not proper XML.
>   
Actually, this very problem was the run I discussed with Bernd last 
spring during ApacheCon, as I was looking for a XML parsing supporting 
stops in the middle of a XML tag. We need some XML parser that support 
this kind of partial data, and can recover from it. Not simple ...

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: [mina] xml codec IoFilter?

Posted by Fernando Padilla <fe...@alum.mit.edu>.
ok cool. I'll try my hand at it :)


On 10/14/09 11:34 AM, Niklas Gustavsson wrote:
> On Wed, Oct 14, 2009 at 5:58 PM, Bernd Fondermann
> <be...@googlemail.com>  wrote:
>    
>> In the end, for Vysper I wrote such a parser, but it's only parsing
>> the XMPP XML subset, not proper XML.
>>      
> I'm hoping we will be able to turn the parser in Vysper into one that
> support full XML + XML namespaces one day. But, as Bernd points out,
> we're not there yet.
>
> /niklas
>    

Re: [mina] xml codec IoFilter?

Posted by Niklas Gustavsson <ni...@protocol7.com>.
On Wed, Oct 14, 2009 at 5:58 PM, Bernd Fondermann
<be...@googlemail.com> wrote:
> In the end, for Vysper I wrote such a parser, but it's only parsing
> the XMPP XML subset, not proper XML.

I'm hoping we will be able to turn the parser in Vysper into one that
support full XML + XML namespaces one day. But, as Bernd points out,
we're not there yet.

/niklas

Re: [mina] xml codec IoFilter?

Posted by Bernd Fondermann <be...@googlemail.com>.
On Wed, Oct 14, 2009 at 16:25, Fernando Padilla <fe...@alum.mit.edu> wrote:
> Hi.  I would love to have an IoFilter that would take in a stream and be
> able to spit out xml? (kind of like sax?)
>
> Does anyone have any leads for pre-built ones?  Or should I dream about
> making one? :)

That would be a nightmare then ;-)
Seriously, XML is very complex. Namespaces, entities, tree-like structures.

For Vysper, I tried to hook MINA into a streaming XML parser ("StAX").
But the parsers I looked at were unable to re-parse partial HTML and
had issues as their buffers ran dry (stopped working).
Maybe I just didn't get it right and you will.

In the end, for Vysper I wrote such a parser, but it's only parsing
the XMPP XML subset, not proper XML.

  Bernd