You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Tim Reilly <ti...@consultant.com> on 2004/08/15 23:34:01 UTC

[validator] UrlValidator

I'd posted this to commons-user back in July, never got an answer and went
on with my reqs using a regex instead.
I saw Martin's post today on a 1.1.3 release and figured I should I forward
this to see if it was a bug or only that fully 'resolved' urls are valid or
something and I was using incorrectly?

> -----Original Message-----
> From: Tim Reilly
> Sent: Wednesday, July 28, 2004 11:53 AM
> To: commons-user@jakarta.apache.org
> Subject: [validator] UrlValidator
>
>
> First, I'm a newbie with validator, so maybe I'm just not using
> it correctly.
> I was hoping to use validator library to validate url strings
> from a form. I'm finding the validation too strict for my use
> case. I have a form and want to validate the the user entered a
> valid url in 'simple user' terms of valid.
>
> But...I added this test case/method to
> org.apache.commons.validator.UrlTest; the url is valid from my POV.
>
>    public void testSanity() {
>        UrlValidator v = new UrlValidator();
>        assertTrue(v.isValid("http://www.google.com"));
>    }
>
> But it fails.
>
> Also testing http://www.google.com with:
> isValidScheme : false
> isValidAuthority : false
> isValidPath : false
> isValidQuery : true
> isValidFragment: true
>
> Can anyone explain the UrlValidator.isValidXXXXX ?
> Why would query and fragment be valid (I guess because they
> aren't specified. I'm good with that.)
> But can't port and path be optional not specified as well?
> Also why is the scheme (http://) not valid?
>
> Oddly, the url http://www.google.com:80/test passes isValid(),
> but fails isValidScheme(), etc..
>
> Thanks,
> -TR
>
> (btw: Testing against HEAD / 1.1.3)



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [validator] UrlValidator

Posted by Martin Cooper <mf...@gmail.com>.
On Sun, 15 Aug 2004 23:23:07 -0400, Tim Reilly
<ti...@consultant.com> wrote:
> Thanks Martin!
> Then it appears I was using the is validScheme etc incorrectly and as well a
> bug. I've open a bugzilla issue for the optional port and path issues.
> 
> I have started a patch for this.
> 
> Starting with the unit test:
> The patch seems easy enough. In the arrays of test parts I'm adding empty
> strings such as
> 
> -   TestPair[] testPath = {new TestPair("/test1", true),
> +   TestPair[] testPath = {new TestPair("", true),
> +                          new TestPair("/test1", true),

Actually, if you look a half dozen lines further down, you'll see that
the empty string is already there, but with a 'false' validity. So
someone seemed to think that an empty path should not be valid. The
RFC says it should be, though.

> Question 1)
> Which should be valid according to the rfc:
> http://www.google.com
> or
> http://wwww.google.com/
> or both perhaps?
> I'll adjust the test data accordingly.

Both. The RFC specifically allows for an empty path (the former) and a
path of "/" is a perfectly valid path (the latter).

> Question 2)
> On line 206 of UrlTest I'm confused by
> new TestPair("", true)};
> seems to be constructing a case where scheme may be EMPTY_STRING as valid.
> I'd assumed scheme is always required of a url, but admit I've not read the
> rfc you mentioned.

The RFC does in fact state that the scheme is *not* optional. I'm
guessing that this test might be trying to test relative URLs - but
that is just a guess, since I haven't been involved with this class up
to now.

> Question 3)
> Changing the UrlValidator should be a matter of getting the regular
> expression correct I believe(?) for the empty path parts.

I'm not sure if it's that easy or not. The convoluted (IMHO) nature of
the tests makes it a bit difficult to figure out just what is being
tested and how. Specifically, it's hard to see what URLs are being
tested, and so hard to know whether or not the tests are correct.

> One thought I had on port validation was to attempt to construct a
> java.net.URL and do something like
> 
> if (u.getPort() < 0 && u.getDefaultPort() < 0) {
>    return false;
> }
> which I think would mean the protocol handler for the scheme has a known
> default port. Does this sound like a good approach? or is there a case where
> someone has a protocol handler with default port that java.net.URL wouldn't
> know about? Or a case where java.net.URL is going to throw
> MalFormedUrlException where UrlValidator.isValid would otherwise return
> true?

I'm going to leave this to someone who knows more about this code than I do...

> NP on the 1.1.4 status for me. If getting patches in can get it to 1.1.3
> that'd be good too, either way.

We've already voted 1.1.3 to GA, so I'm reluctant to open it up again
for more changes. Theoretically, a 1.1.4 can happen before too long in
any case.

--
Martin Cooper


> Thanks,
> -TR
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [validator] UrlValidator

Posted by Tim Reilly <ti...@consultant.com>.
Thanks Martin!
Then it appears I was using the is validScheme etc incorrectly and as well a
bug. I've open a bugzilla issue for the optional port and path issues.

I have started a patch for this.

Starting with the unit test:
The patch seems easy enough. In the arrays of test parts I'm adding empty
strings such as

-   TestPair[] testPath = {new TestPair("/test1", true),
+   TestPair[] testPath = {new TestPair("", true),
+                          new TestPair("/test1", true),

Question 1)
Which should be valid according to the rfc:
http://www.google.com
or
http://wwww.google.com/
or both perhaps?
I'll adjust the test data accordingly.

Question 2)
On line 206 of UrlTest I'm confused by
new TestPair("", true)};
seems to be constructing a case where scheme may be EMPTY_STRING as valid.
I'd assumed scheme is always required of a url, but admit I've not read the
rfc you mentioned.

Question 3)
Changing the UrlValidator should be a matter of getting the regular
expression correct I believe(?) for the empty path parts.
One thought I had on port validation was to attempt to construct a
java.net.URL and do something like

if (u.getPort() < 0 && u.getDefaultPort() < 0) {
    return false;
}
which I think would mean the protocol handler for the scheme has a known
default port. Does this sound like a good approach? or is there a case where
someone has a protocol handler with default port that java.net.URL wouldn't
know about? Or a case where java.net.URL is going to throw
MalFormedUrlException where UrlValidator.isValid would otherwise return
true?

NP on the 1.1.4 status for me. If getting patches in can get it to 1.1.3
that'd be good too, either way.
Thanks,
-TR



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [validator] UrlValidator

Posted by Martin Cooper <mf...@gmail.com>.
On Sun, 15 Aug 2004 17:34:01 -0400, Tim Reilly
<ti...@consultant.com> wrote:
> I'd posted this to commons-user back in July, never got an answer and went
> on with my reqs using a regex instead.
> I saw Martin's post today on a 1.1.3 release and figured I should I forward
> this to see if it was a bug or only that fully 'resolved' urls are valid or
> something and I was using incorrectly?

This looks like a bug to me. Interestingly, the test code treats an
empty path in a URL as invalid, whereas RFC 2396 allows it. This is
obviously part of the problem.

If you could file a bug for this in Bugzilla, that will make sure it
doesn't get lost. Most likely, a fix will show up in 1.1.4.

By the way, your comments below suggest that you're passing the entire
URL to isValidScheme() et al. You should note that these methods take
only the part of the URL corresponding to their name, not the entire
URL. In other words, calling isValidScheme("http://www.google.com")
*should* fail, whereas isValidScheme("http") will succeed.

--
Martin Cooper


> 
> > -----Original Message-----
> > From: Tim Reilly
> > Sent: Wednesday, July 28, 2004 11:53 AM
> > To: commons-user@jakarta.apache.org
> > Subject: [validator] UrlValidator
> >
> >
> > First, I'm a newbie with validator, so maybe I'm just not using
> > it correctly.
> > I was hoping to use validator library to validate url strings
> > from a form. I'm finding the validation too strict for my use
> > case. I have a form and want to validate the the user entered a
> > valid url in 'simple user' terms of valid.
> >
> > But...I added this test case/method to
> > org.apache.commons.validator.UrlTest; the url is valid from my POV.
> >
> >    public void testSanity() {
> >        UrlValidator v = new UrlValidator();
> >        assertTrue(v.isValid("http://www.google.com"));
> >    }
> >
> > But it fails.
> >
> > Also testing http://www.google.com with:
> > isValidScheme : false
> > isValidAuthority : false
> > isValidPath : false
> > isValidQuery : true
> > isValidFragment: true
> >
> > Can anyone explain the UrlValidator.isValidXXXXX ?
> > Why would query and fragment be valid (I guess because they
> > aren't specified. I'm good with that.)
> > But can't port and path be optional not specified as well?
> > Also why is the scheme (http://) not valid?
> >
> > Oddly, the url http://www.google.com:80/test passes isValid(),
> > but fails isValidScheme(), etc..
> >
> > Thanks,
> > -TR
> >
> > (btw: Testing against HEAD / 1.1.3)
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org