You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@shindig.apache.org by Paul Lindner <pl...@linkedin.com> on 2009/07/08 12:46:39 UTC

dependency cleanup for java

I filed https://issues.apache.org/jira/browse/SHINDIG-1107
Does anyone have any opinion about cleaning up those dependencies?  We were
pulling in json-lib which seems unnecessary since we have a native json
serializer in place now.

Another simplification is deprecating nekohtml for htmlparser, which is used
by caja.  I asked the caja folks about using neko and this was their
response:

htmlparser was recommended by Ian Hickson, author of large chunks of
the HTML5 spec
as conforming closely to the spec.  Nekohtml is indeed quite fast but
htmlparser does
a better job of more accurately producing the kind of DOM that you
would get in an
actual browser (which is what we're trying to codify) when parsing tag soup.

Mike Samuel looked at nekohtml more recently (primarily to see if we
could benefit
from faster parsing by neko) and improved our own parsing speed to a
point where it
is comparable to neko.  I am not sure I fully follow the benefit of removing
dependency on icu4j.

Re: dependency cleanup for java

Posted by Vincent Siveton <vs...@apache.org>.

Hi Paul,

+1

Probably for some dependencies we will need to add:
<optional/>
or
<exclusions/>

Cheers,

Vincent

2009/7/8, Paul Lindner <pl...@linkedin.com>:
> I filed https://issues.apache.org/jira/browse/SHINDIG-1107
>  Does anyone have any opinion about cleaning up those dependencies?  We were
>  pulling in json-lib which seems unnecessary since we have a native json
>  serializer in place now.
>
>  Another simplification is deprecating nekohtml for htmlparser, which is used
>  by caja.  I asked the caja folks about using neko and this was their
>  response:
>
>  htmlparser was recommended by Ian Hickson, author of large chunks of
>  the HTML5 spec
>  as conforming closely to the spec.  Nekohtml is indeed quite fast but
>  htmlparser does
>  a better job of more accurately producing the kind of DOM that you
>  would get in an
>  actual browser (which is what we're trying to codify) when parsing tag soup.
>
>  Mike Samuel looked at nekohtml more recently (primarily to see if we
>  could benefit
>  from faster parsing by neko) and improved our own parsing speed to a
>  point where it
>  is comparable to neko.  I am not sure I fully follow the benefit of removing
>  dependency on icu4j.
>

Re: dependency cleanup for java

Posted by Adam Winer <aw...@gmail.com>.

On Wed, Jul 8, 2009 at 3:46 AM, Paul Lindner <pl...@linkedin.com> wrote:

> I filed https://issues.apache.org/jira/browse/SHINDIG-1107
> Does anyone have any opinion about cleaning up those dependencies?  We were
> pulling in json-lib which seems unnecessary since we have a native json
> serializer in place now.
>
> Another simplification is deprecating nekohtml for htmlparser, which is
> used
> by caja.  I asked the caja folks about using neko and this was their
> response:
>
> htmlparser was recommended by Ian Hickson, author of large chunks of
> the HTML5 spec
> as conforming closely to the spec.  Nekohtml is indeed quite fast but
> htmlparser does
> a better job of more accurately producing the kind of DOM that you
> would get in an
> actual browser (which is what we're trying to codify) when parsing tag
> soup.


The only non-obvious
feature requirement I know is supporting DOM parsing of <script> elements for
@type = text/os-data and
text/os-templates, including support for namespaces.  If HTMLParser
can do that, I'm all for switching to it.
>
>
>
> Mike Samuel looked at nekohtml more recently (primarily to see if we
> could benefit
> from faster parsing by neko) and improved our own parsing speed to a
> point where it
> is comparable to neko.  I am not sure I fully follow the benefit of
> removing
> dependency on icu4j.
>

Re: dependency cleanup for java

Posted by Paul Lindner <li...@inuus.com>.

okay, sounds good.
I'll commit the other cleanups then.

On Wed, Jul 8, 2009 at 12:28 PM, Louis Ryan <lr...@google.com> wrote:

> The switch to use htmlparser is something I've been planning to do for
> quite
> a while. We're currently waiting for Mike et al to fix some issues in their
> CSS DOM before I go ahead and make the switch, which has significant
> benefits for our sanitization and cajoling pipelines. I believe there is a
> CL out for review to fix this on Caja.
>
> On Wed, Jul 8, 2009 at 3:46 AM, Paul Lindner <pl...@linkedin.com>
> wrote:
>
> > I filed https://issues.apache.org/jira/browse/SHINDIG-1107
> > Does anyone have any opinion about cleaning up those dependencies?  We
> were
> > pulling in json-lib which seems unnecessary since we have a native json
> > serializer in place now.
> >
> > Another simplification is deprecating nekohtml for htmlparser, which is
> > used
> > by caja.  I asked the caja folks about using neko and this was their
> > response:
> >
> > htmlparser was recommended by Ian Hickson, author of large chunks of
> > the HTML5 spec
> > as conforming closely to the spec.  Nekohtml is indeed quite fast but
> > htmlparser does
> > a better job of more accurately producing the kind of DOM that you
> > would get in an
> > actual browser (which is what we're trying to codify) when parsing tag
> > soup.
> >
> > Mike Samuel looked at nekohtml more recently (primarily to see if we
> > could benefit
> > from faster parsing by neko) and improved our own parsing speed to a
> > point where it
> > is comparable to neko.  I am not sure I fully follow the benefit of
> > removing
> > dependency on icu4j.
> >
>

Re: dependency cleanup for java

Posted by Louis Ryan <lr...@google.com>.

The switch to use htmlparser is something I've been planning to do for quite
a while. We're currently waiting for Mike et al to fix some issues in their
CSS DOM before I go ahead and make the switch, which has significant
benefits for our sanitization and cajoling pipelines. I believe there is a
CL out for review to fix this on Caja.

On Wed, Jul 8, 2009 at 3:46 AM, Paul Lindner <pl...@linkedin.com> wrote:

> I filed https://issues.apache.org/jira/browse/SHINDIG-1107
> Does anyone have any opinion about cleaning up those dependencies?  We were
> pulling in json-lib which seems unnecessary since we have a native json
> serializer in place now.
>
> Another simplification is deprecating nekohtml for htmlparser, which is
> used
> by caja.  I asked the caja folks about using neko and this was their
> response:
>
> htmlparser was recommended by Ian Hickson, author of large chunks of
> the HTML5 spec
> as conforming closely to the spec.  Nekohtml is indeed quite fast but
> htmlparser does
> a better job of more accurately producing the kind of DOM that you
> would get in an
> actual browser (which is what we're trying to codify) when parsing tag
> soup.
>
> Mike Samuel looked at nekohtml more recently (primarily to see if we
> could benefit
> from faster parsing by neko) and improved our own parsing speed to a
> point where it
> is comparable to neko.  I am not sure I fully follow the benefit of
> removing
> dependency on icu4j.
>