You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Takumi Fujiwara <tr...@yahoo.com> on 2004/03/08 18:40:45 UTC

Form Correction in NekoHTML parser

When NeknoHTML parser corrects ill formated <form>,
does it builds some data structure to record which
form elements are part of the form?

Considering the following example:

In IE and Mozilla, the <form> tag will end after the
<table>, however, the forms[].elements.length only has
**2** elements instead of 3. 

Could someone please tell me how does NekoHTML parser
handles situation like this? i.e. if I parse this html
using NekoHTML parser, how can I know i only need to
submit the first 2 form elements value during form
submittion?


<html>
<body>
<form name="form1" action="http://www.google.com">
<table>
<tr><td>1 <input type=submit name="a"></td>
</tr>
<tr><td>2 <input type="text" name="b"></td>
</tr>
</form>
<tr><td>3 <input type="text" name="c"></td>
</tr>
</table>

<script>
alert("document.forms[0].name: " +
document.forms[0].name );
alert("document.forms[0].elements.length: " +
document.forms[0].elements.length );


</script>

</body>
</html> 
 

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Form Correction in NekoHTML parser

Posted by Andy Clark <an...@apache.org>.
Takumi Fujiwara wrote:
> When NeknoHTML parser corrects ill formated <form>,
> does it builds some data structure to record which
> form elements are part of the form?

NekoHTML operates in a streaming manner so it has no
memory of what happened previously in the document.
Therefore, once it has parsed something and sent that
information through the pipeline, it's too late to fix
up that element or content later when it has seen more
of the document.

> Could someone please tell me how does NekoHTML parser
> handles situation like this? i.e. if I parse this html
> using NekoHTML parser, how can I know i only need to
> submit the first 2 form elements value during form
> submittion?

However, there are always things that you can do. For
example, in this particular case, you could write a
filter that ignores form element children (e.g. <input>)
if they appear outside of the </form> tag. Then you can
just insert that filter before the tag-balancer in the
parsing pipeline.

For example: (This code will NOT compile -- you have to
finish the code first... Plus, I am writing this from
memory so there may be errors.)

public class IgnoreFormChildren
   extends DefaultFilter {

   boolean inForm;

   // NOTE: It's safest to override *both* startDocument
   //       methods in order to work with *all* versions
   //       of Xerces2.
   public void startDocument(...) throws XNIException {
     inForm = false;
   }

   public void startElement(...) throws XNIException {
     HTMLElement.Element elem = HTMLElements.getElement(qname);
     if (elem.code == HTMLElements.FORM) {
       inForm = true;
     }

     boolean ignore = false;
     if (!inForm) {
       if (elem.parents[0].code == HTMLElements.FORM) {
         ignore = true;
       }
     }

     if (!ignore) {
       super.startElement(...);
     }
   }

   public void endElement(...) throws XNIException {
     HTMLElement.Element elem = HTMLElements.getElement(qname);
     if (elem.code == HTMLElements.FORM) {
       inForm = false;
     }
     super.endElement(...);
   }
}

Then...

DOMParser parser = new DOMParser();

XMLDocumentFilter[] filters = {
   new IgnoreFormChildren(),
   new HTMLTagBalancer(),
};

parser.setFeature("http://cyberneko.org/html/features/balance-tags", false);
parser.setProperty("http://cyberneko.org/html/properties/filters", filters);

parser.parse("index.html");

Hope this helps...

By the way, NekoHTML is not an Apache project so questions
regarding NekoHTML should not be posted here. You can send
your questions and comments directly to me.

yoroshiku...

-- 
Andy Clark * andyc@apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org