You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jon Travis <jt...@covalent.net> on 2002/09/03 20:35:10 UTC

Re: El-Kabong -- HTML Parser

Any word on this?  (take 2)

-- Jon


On Mon, Aug 26, 2002 at 08:32:16PM -0700, Jon Travis wrote:
> Hi all...
> Jon Travis here...
> 
> Covalent has written a pretty keen HTML parser (called el-kabong) 
> which we'd like to offer to the ASF for inclusion in APR-util (or
> whichever other umbrella it fits under.)  It's faster than 
> anything I can find, provides a SAX stylee interface, uses
> APR for most of its operations (hash tables, etc.), and has a
> pretty nice testsuite.  We use it in our code to re-write HTML on 
> the fly.  I would be the initial maintainer of the code.
> 
> Please voice any interest, thanks.
> 
> -- Jon
> 

Re: El-Kabong -- HTML Parser

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 02:00 PM 9/3/2002, Jon Travis wrote:
>Either one is fine to me.  Integrating the code into apr-util is probably
>an easier setup, but will require more work to adapt to the build system
>and change the symbols (and of course I'm quite liking the name
>'el-kabong' ;-)).

That's sort of the concensus.  It seems the majority is pro-apr-util but
we will let Greg announce that.  He already has the sources, so it shouldn't
be any trouble for him to import them once we have the signed contributors
agreement on file.

Others may still be interested in reviewing the code.  To those folks,
don't hesitate to email John requesting the evaluation sources.  But those
who've reviewed the code is reasonably happy with it.  Once the code
is part of the ASF process, the usual banter can erupt over how best to
improve the code.  I for one, am much happier with the current streamed
implementation than a read-in-the-whole-danged-file into a tree approach :-)

AFA el-kabong, we should definitely keep that colorful history note in
the sources :-)  But I have a major issue with oddball names when 'html'
or 'html-parser' is more than sufficient [inside of a bigger library.]  Things
need to be obvious to new adopters.  This is my pet peeve with the Jakarta
project.  [Hundreds of names, who can keep track of what they all do :-?]

>I'm not in a rush, I just like to know where things stand.  Since this
>discussion is seemingly happening off-list, I can't differentiate between
>no discussion or a heated one.  I'd prefer this to be on-list, as I
>think it does affect the users of APR, and it would allow me to monitor
>the progress here.

Sorry.  No cloke-n-dagger star chamber here.  We've been polling all the
members of the ASF about where are we headed.  Does the ASF become
yet another Sourceforge?  If not (and the near-unanimous consensus is no,
we don't want to be a Sourceforge, they do that just fine themselves), then
how do we qualify and distinguish our projects?  How do we keep active
communities?  How do we group the newer client-side elements, when the
httpd project is focused on the server-side?

All of which makes for entertaining discussion, but some of this debate
is probably just off-topic for developers lists.  The httpd and apr lists are
busy enough without all of the "Why el-kabong belongs here" rants on
every side of the coin.  Rest assured, that's the debate, not whether or
not we accept the code.  Most everyone seems pleased that including
this code [somewhere] would be a "Good Thing"(R)(sm) :-)

Bill



Re: El-Kabong -- HTML Parser

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 02:00 PM 9/3/2002, Jon Travis wrote:
>Either one is fine to me.  Integrating the code into apr-util is probably
>an easier setup, but will require more work to adapt to the build system
>and change the symbols (and of course I'm quite liking the name
>'el-kabong' ;-)).

That's sort of the concensus.  It seems the majority is pro-apr-util but
we will let Greg announce that.  He already has the sources, so it shouldn't
be any trouble for him to import them once we have the signed contributors
agreement on file.

Others may still be interested in reviewing the code.  To those folks,
don't hesitate to email John requesting the evaluation sources.  But those
who've reviewed the code is reasonably happy with it.  Once the code
is part of the ASF process, the usual banter can erupt over how best to
improve the code.  I for one, am much happier with the current streamed
implementation than a read-in-the-whole-danged-file into a tree approach :-)

AFA el-kabong, we should definitely keep that colorful history note in
the sources :-)  But I have a major issue with oddball names when 'html'
or 'html-parser' is more than sufficient [inside of a bigger library.]  Things
need to be obvious to new adopters.  This is my pet peeve with the Jakarta
project.  [Hundreds of names, who can keep track of what they all do :-?]

>I'm not in a rush, I just like to know where things stand.  Since this
>discussion is seemingly happening off-list, I can't differentiate between
>no discussion or a heated one.  I'd prefer this to be on-list, as I
>think it does affect the users of APR, and it would allow me to monitor
>the progress here.

Sorry.  No cloke-n-dagger star chamber here.  We've been polling all the
members of the ASF about where are we headed.  Does the ASF become
yet another Sourceforge?  If not (and the near-unanimous consensus is no,
we don't want to be a Sourceforge, they do that just fine themselves), then
how do we qualify and distinguish our projects?  How do we keep active
communities?  How do we group the newer client-side elements, when the
httpd project is focused on the server-side?

All of which makes for entertaining discussion, but some of this debate
is probably just off-topic for developers lists.  The httpd and apr lists are
busy enough without all of the "Why el-kabong belongs here" rants on
every side of the coin.  Rest assured, that's the debate, not whether or
not we accept the code.  Most everyone seems pleased that including
this code [somewhere] would be a "Good Thing"(R)(sm) :-)

Bill



Re: El-Kabong -- HTML Parser

Posted by Jon Travis <jt...@covalent.net>.
My comments inline:


On Tue, Sep 03, 2002 at 02:53:03PM -0400, rbb@apache.org wrote:
> 
> There are currently two possible avenues.
> 
> 1)  The code goes into apr-util.
> 2)  The code goes into a sandbox project.
> 
> The APR option is faster, but there is some misgivings about whether it
> belongs in apr-util.  The vote was done, and it seems to be accepted, but
> Greg was keeping tally, so I don't have the exact numbers about where it
> would go.  I _think_, and I could be wrong, that it would be put in
> apr-util/html as a separate piece of apr-util.
> 
> The second option will take a bit longer, because the sandbox project will
> need to be created first.
> 
> I have tried to answer without letting any of my personal opinions show in
> the message, because that has caused some problems before.  The real
> question now, is given those two options, which would you prefer.  Not
> saying that your preference is the only factor i the decision, but it
> should be taken into account.

Either one is fine to me.  Integrating the code into apr-util is probably
an easier setup, but will require more work to adapt to the build system
and change the symbols (and of course I'm quite liking the name 
'el-kabong' ;-)).

> There are also some people questioning why we are moving so quickly on
> this.  The general feeling is that we should find the best fit before
> taking the code.  If you are in a rush, then that would change things, but
> the understanding was just that you wanted to be kept in the loop about
> what is happening.
> 
> Keep pinging, but the conversation is on-going, and very active, so there
> is little chance that it won't happen.  It is really just a matter of time
> now.
> 
> Ryan

I'm not in a rush, I just like to know where things stand.  Since this
discussion is seemingly happening off-list, I can't differentiate between
no discussion or a heated one.  I'd prefer this to be on-list, as I 
think it does affect the users of APR, and it would allow me to monitor
the progress here.

-- Jon


Re: El-Kabong -- HTML Parser

Posted by Jon Travis <jt...@covalent.net>.
My comments inline:


On Tue, Sep 03, 2002 at 02:53:03PM -0400, rbb@apache.org wrote:
> 
> There are currently two possible avenues.
> 
> 1)  The code goes into apr-util.
> 2)  The code goes into a sandbox project.
> 
> The APR option is faster, but there is some misgivings about whether it
> belongs in apr-util.  The vote was done, and it seems to be accepted, but
> Greg was keeping tally, so I don't have the exact numbers about where it
> would go.  I _think_, and I could be wrong, that it would be put in
> apr-util/html as a separate piece of apr-util.
> 
> The second option will take a bit longer, because the sandbox project will
> need to be created first.
> 
> I have tried to answer without letting any of my personal opinions show in
> the message, because that has caused some problems before.  The real
> question now, is given those two options, which would you prefer.  Not
> saying that your preference is the only factor i the decision, but it
> should be taken into account.

Either one is fine to me.  Integrating the code into apr-util is probably
an easier setup, but will require more work to adapt to the build system
and change the symbols (and of course I'm quite liking the name 
'el-kabong' ;-)).

> There are also some people questioning why we are moving so quickly on
> this.  The general feeling is that we should find the best fit before
> taking the code.  If you are in a rush, then that would change things, but
> the understanding was just that you wanted to be kept in the loop about
> what is happening.
> 
> Keep pinging, but the conversation is on-going, and very active, so there
> is little chance that it won't happen.  It is really just a matter of time
> now.
> 
> Ryan

I'm not in a rush, I just like to know where things stand.  Since this
discussion is seemingly happening off-list, I can't differentiate between
no discussion or a heated one.  I'd prefer this to be on-list, as I 
think it does affect the users of APR, and it would allow me to monitor
the progress here.

-- Jon


Re: El-Kabong -- HTML Parser

Posted by Pier Fumagalli <pi...@betaversion.org>.
"rbb@apache.org" <rb...@apache.org> wrote:

> 
> There are currently two possible avenues.
> 
> 1)  The code goes into apr-util.
> 2)  The code goes into a sandbox project.

It makes a lot of sense to have it also in XML as well, together with
XERCES-C...

    Pier


Re: El-Kabong -- HTML Parser

Posted by Pier Fumagalli <pi...@betaversion.org>.
"rbb@apache.org" <rb...@apache.org> wrote:

> 
> There are currently two possible avenues.
> 
> 1)  The code goes into apr-util.
> 2)  The code goes into a sandbox project.

It makes a lot of sense to have it also in XML as well, together with
XERCES-C...

    Pier


Re: El-Kabong -- HTML Parser

Posted by rb...@apache.org.
There are currently two possible avenues.

1)  The code goes into apr-util.
2)  The code goes into a sandbox project.

The APR option is faster, but there is some misgivings about whether it
belongs in apr-util.  The vote was done, and it seems to be accepted, but
Greg was keeping tally, so I don't have the exact numbers about where it
would go.  I _think_, and I could be wrong, that it would be put in
apr-util/html as a separate piece of apr-util.

The second option will take a bit longer, because the sandbox project will
need to be created first.

I have tried to answer without letting any of my personal opinions show in
the message, because that has caused some problems before.  The real
question now, is given those two options, which would you prefer.  Not
saying that your preference is the only factor i the decision, but it
should be taken into account.

There are also some people questioning why we are moving so quickly on
this.  The general feeling is that we should find the best fit before
taking the code.  If you are in a rush, then that would change things, but
the understanding was just that you wanted to be kept in the loop about
what is happening.

Keep pinging, but the conversation is on-going, and very active, so there
is little chance that it won't happen.  It is really just a matter of time
now.

Ryan

On Tue, 3 Sep 2002, Jon Travis wrote:

> Any word on this?  (take 2)
> 
> -- Jon
> 
> 
> On Mon, Aug 26, 2002 at 08:32:16PM -0700, Jon Travis wrote:
> > Hi all...
> > Jon Travis here...
> > 
> > Covalent has written a pretty keen HTML parser (called el-kabong) 
> > which we'd like to offer to the ASF for inclusion in APR-util (or
> > whichever other umbrella it fits under.)  It's faster than 
> > anything I can find, provides a SAX stylee interface, uses
> > APR for most of its operations (hash tables, etc.), and has a
> > pretty nice testsuite.  We use it in our code to re-write HTML on 
> > the fly.  I would be the initial maintainer of the code.
> > 
> > Please voice any interest, thanks.
> > 
> > -- Jon
> > 
> 

-- 

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
550 Jean St
Oakland CA 94610
-------------------------------------------------------------------------------


Re: El-Kabong -- HTML Parser

Posted by rb...@apache.org.
There are currently two possible avenues.

1)  The code goes into apr-util.
2)  The code goes into a sandbox project.

The APR option is faster, but there is some misgivings about whether it
belongs in apr-util.  The vote was done, and it seems to be accepted, but
Greg was keeping tally, so I don't have the exact numbers about where it
would go.  I _think_, and I could be wrong, that it would be put in
apr-util/html as a separate piece of apr-util.

The second option will take a bit longer, because the sandbox project will
need to be created first.

I have tried to answer without letting any of my personal opinions show in
the message, because that has caused some problems before.  The real
question now, is given those two options, which would you prefer.  Not
saying that your preference is the only factor i the decision, but it
should be taken into account.

There are also some people questioning why we are moving so quickly on
this.  The general feeling is that we should find the best fit before
taking the code.  If you are in a rush, then that would change things, but
the understanding was just that you wanted to be kept in the loop about
what is happening.

Keep pinging, but the conversation is on-going, and very active, so there
is little chance that it won't happen.  It is really just a matter of time
now.

Ryan

On Tue, 3 Sep 2002, Jon Travis wrote:

> Any word on this?  (take 2)
> 
> -- Jon
> 
> 
> On Mon, Aug 26, 2002 at 08:32:16PM -0700, Jon Travis wrote:
> > Hi all...
> > Jon Travis here...
> > 
> > Covalent has written a pretty keen HTML parser (called el-kabong) 
> > which we'd like to offer to the ASF for inclusion in APR-util (or
> > whichever other umbrella it fits under.)  It's faster than 
> > anything I can find, provides a SAX stylee interface, uses
> > APR for most of its operations (hash tables, etc.), and has a
> > pretty nice testsuite.  We use it in our code to re-write HTML on 
> > the fly.  I would be the initial maintainer of the code.
> > 
> > Please voice any interest, thanks.
> > 
> > -- Jon
> > 
> 

-- 

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
550 Jean St
Oakland CA 94610
-------------------------------------------------------------------------------