You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Rick Bragg <li...@gmnet.net> on 2008/11/14 20:16:52 UTC

[users@httpd] encodings getting decoded...

Hi,

I have a problem with file name encodings in URL's, and I don't know if
it is a browser or a server issue

I have a file like this:

http://test.com/Testfile_%22Test%27s%22.html

but when I try to browse to it, it gets re-written to the following and
causes a 404.

http://test.com/Testfile_"Test's".html

Is this something that apache is doing? If so, what should I do to make
sure that it does not get decoded like this?

Thanks!
Rick



-- 
This message has been scanned for viruses and
dangerous content by Green Mountain Network, and is
believed to be clean.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] encodings getting decoded...

Posted by André Warnier <aw...@ice-sa.com>.
Hi.

In my opinion, it is a *very* bad idea to use the user's input to create 
a filename on your system, even if you filter out some characters.  You 
will always forget something, and some malignant user will always find a 
way to make your application write to "../../../../etc/passwd" or " > 
/etc/passwd" or something like that. Do you get the idea ?
There are a lot of bad people out there.

Even if all your users are only nice people, you will still end up with 
filenames that will give you headaches when it comes to copying them, 
backing them up, or even opening them.  Filenames with spaces in them 
are evil, because you will always somewhere run out of quotes. Filenames 
with non-US-ASCII characters in them can cause problems when you move 
them from one system to another. Filenames with symbols like  < > | & = 
: [ ] { } ; etc. can be legal on one system and not on the other, and 
can cause all kinds of strange things to happen.

It would be much safer to generate a filename yourself, and then somehow 
"associate" that file with the Title that your users input.
For example, generate a file name like "yyyymmdd-hhmmss-nnnnnn.ext"
(where yyyymmdd is the date, hhmmss the time, and nnnnnn some random 
number 000000-999999) or some other scheme unlikely to generate 2 times 
the same name, but without giving the user control over it.
Do yourself a favor. Even if it makes your application a bit more 
complicated at first, in the end you will be glad you did it.



Rick Bragg wrote:
> I'm using the latest FireFox on Linux.  I am building a web application
> where these URL's are created dynamically based on a users input in a
> "title" field.  I am filtering their input and encoding it, I guess it
> would be best to just delete them with the filter.  Is there a list of
> all these characters somewhere that I can look at do build my filter?
> 
> Thanks!
> Rick
> 
> 
> On Fri, 2008-11-14 at 11:37 -0800, Dan Udey wrote:
>> Sounds like a browser issue. My browser (Safari) doesn't do that. Are  
>> you using Internet Explorer? Try another browser.
>>
>> Also, it's bad practice to have non-URL-safe characters in the URL,  
>> encoded or not.
>>
>> On 14-Nov-08, at 11:16 AM, Rick Bragg wrote:
>>
>>> Hi,
>>>
>>> I have a problem with file name encodings in URL's, and I don't know  
>>> if
>>> it is a browser or a server issue
>>>
>>> I have a file like this:
>>>
>>> http://test.com/Testfile_%22Test%27s%22.html
>>>
>>> but when I try to browse to it, it gets re-written to the following  
>>> and
>>> causes a 404.
>>>
>>> http://test.com/Testfile_"Test's".html
>>>
>>> Is this something that apache is doing? If so, what should I do to  
>>> make
>>> sure that it does not get decoded like this?
>>>
>>> Thanks!
>>> Rick
>> ---------------------------------------------------------------------
>> The official User-To-User support forum of the Apache HTTP Server Project.
>> See <URL:http://httpd.apache.org/userslist.html> for more info.
>> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
>> For additional commands, e-mail: users-help@httpd.apache.org
>>
>>
> 
> 


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] encodings getting decoded...

Posted by Dan Udey <da...@communicate.com>.
For safety's sake, I generally restrict URLs to numbers, letters,  
underscores, and dashes. Remove apostrophes and quotes, replace any  
other characters with underscores, and compress consecutive  
underscores. Lowercasing everything is not a bad idea either.

So http://test.com/Testfile_"Test's".htm would become http://test.com/testfile_tests.htm

Your implementation may vary, but explicitly allowing characters  
generally works better than explicitly disallowing.

On 14-Nov-08, at 12:45 PM, Rick Bragg wrote:

> I'm using the latest FireFox on Linux.  I am building a web  
> application
> where these URL's are created dynamically based on a users input in a
> "title" field.  I am filtering their input and encoding it, I guess it
> would be best to just delete them with the filter.  Is there a list of
> all these characters somewhere that I can look at do build my filter?
>
> Thanks!
> Rick
>
>
> On Fri, 2008-11-14 at 11:37 -0800, Dan Udey wrote:
>> Sounds like a browser issue. My browser (Safari) doesn't do that. Are
>> you using Internet Explorer? Try another browser.
>>
>> Also, it's bad practice to have non-URL-safe characters in the URL,
>> encoded or not.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] encodings getting decoded...

Posted by Rick Bragg <li...@gmnet.net>.
I'm using the latest FireFox on Linux.  I am building a web application
where these URL's are created dynamically based on a users input in a
"title" field.  I am filtering their input and encoding it, I guess it
would be best to just delete them with the filter.  Is there a list of
all these characters somewhere that I can look at do build my filter?

Thanks!
Rick


On Fri, 2008-11-14 at 11:37 -0800, Dan Udey wrote:
> Sounds like a browser issue. My browser (Safari) doesn't do that. Are  
> you using Internet Explorer? Try another browser.
> 
> Also, it's bad practice to have non-URL-safe characters in the URL,  
> encoded or not.
> 
> On 14-Nov-08, at 11:16 AM, Rick Bragg wrote:
> 
> > Hi,
> >
> > I have a problem with file name encodings in URL's, and I don't know  
> > if
> > it is a browser or a server issue
> >
> > I have a file like this:
> >
> > http://test.com/Testfile_%22Test%27s%22.html
> >
> > but when I try to browse to it, it gets re-written to the following  
> > and
> > causes a 404.
> >
> > http://test.com/Testfile_"Test's".html
> >
> > Is this something that apache is doing? If so, what should I do to  
> > make
> > sure that it does not get decoded like this?
> >
> > Thanks!
> > Rick
> 
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
> 
> 


-- 
This message has been scanned for viruses and
dangerous content by Green Mountain Network, and is
believed to be clean.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] encodings getting decoded...

Posted by Dan Udey <da...@communicate.com>.
Sounds like a browser issue. My browser (Safari) doesn't do that. Are  
you using Internet Explorer? Try another browser.

Also, it's bad practice to have non-URL-safe characters in the URL,  
encoded or not.

On 14-Nov-08, at 11:16 AM, Rick Bragg wrote:

> Hi,
>
> I have a problem with file name encodings in URL's, and I don't know  
> if
> it is a browser or a server issue
>
> I have a file like this:
>
> http://test.com/Testfile_%22Test%27s%22.html
>
> but when I try to browse to it, it gets re-written to the following  
> and
> causes a 404.
>
> http://test.com/Testfile_"Test's".html
>
> Is this something that apache is doing? If so, what should I do to  
> make
> sure that it does not get decoded like this?
>
> Thanks!
> Rick

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org