You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Mark Stosberg <ma...@summersault.com> on 2008/04/15 21:19:09 UTC

Best practices for returning 404/file-not-found pages inside and outside of mod_perl

It seems that using CGI, it is too late return a true 404 once the 
script is processing the request. It's possible to still send output 
that returns "page not found" text, but the HTTP status code will be 200.

More recently, I learned that with mod_perl, I learned that I can get
the system to return a true 404, so I updated my CGI::Application logic 
to do that when possible:

  if (exists $ENV{MOD_PERL}) {
    $self->header_add( -status => 404 );
    return '';
  }
  else {
     return $self->error(title => 'Page not found')
  }

However, I don't think I'm doing the ideal think in mod_perl, because it
behaves strangely in some cases. Two specific cases:

If I use GET on the command line, instead of 404, I'll get back this:
"500 EOF when chunk header expected"

Unless I fallback to HTTP 1.0:

PERL_LWP_USE_HTTP_10=1 GET ...

But for some reason, setting this environment variable was not working
for with Test::WWW::Mechanize.

More troubling is the behavior I see in the browser: The first time I
access the script that would through this 404 in mod_perl, it works.
Then for attempts 2 through 6 return internal server errors complaining
about "can't locate modules". Starting on load 7, the pages are returned
reliably with the 404 error. WTF?

( This is with Apache 1.3x and mod_perl 1.x )

The approach of CGI::Application::Dispatch is to also return the "404"
code, but also returns the body content along with it. In my case, I'm
hoping to trigger the internal ErrorDocument 404 page instead of
re-inventingt that wheel.

What am I missing?

Thanks!

    Mark


Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Graham TerMarsch <mo...@howlingfrog.com>.
On Thursday 01 May 2008 7:06 am, André Warnier wrote:
> Mark Stosberg wrote:
> >> +    my $dir = dirname(__FILE__);
> >> +    use lib $dir.'/../config';
> >> +    use lib $dir.'/../perllib';
> >
> > Actually, for some reason that syntax didn't work either, but this did
> > work on my modperl-startup.pl:
> >
> >  use lib dirname(__FILE__).'/../config';
> >  use lib dirname(__FILE__).'/../perllib';
> >
> >     Mark
>
> this is a question to the perl gurus here :
>
> In the first part above (what does not work), is it not because the "use
> lib" instructions are actually "executed" at the perl *compile* time, at
> which time the $dir variable does not have any value yet ?

Yes.

Crazier yet, if you do it as:

    my $dir;
    BEGIN { $dir=dirname(__FILE__) };
    use lib $dir.'/../config';
    use lib $dir.'/../perllib';

and put the assignment in a BEGIN block, it still doesn't work (at least, I 
haven't been able to get it to work).

I cheated, and when I needed this idiom I created a module that exported the 
value, and which assigned it in its "import()" method.  Gave me something 
like:

  use FindMe qw($ME);
  use lib $ME.'/../config';
  use lib $ME.'/../perllib';

and that worked fine.  The "import()" routine gets called and sets the value 
before its used in the following lines.

Now... why the BEGIN block doesn't do it, I've no idea and didn't poke too 
much farther into it to figure out why.

-- 
Graham TerMarsch
Howling Frog Internet Development, Inc.

Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by André Warnier <aw...@ice-sa.com>.

Mark Stosberg wrote:
>> +    my $dir = dirname(__FILE__);
>> +    use lib $dir.'/../config';
>> +    use lib $dir.'/../perllib';
> 
> Actually, for some reason that syntax didn't work either, but this did work on my modperl-startup.pl:
> 
>  use lib dirname(__FILE__).'/../config';
>  use lib dirname(__FILE__).'/../perllib';
> 
>     Mark
> 
> 

this is a question to the perl gurus here :

In the first part above (what does not work), is it not because the "use 
lib" instructions are actually "executed" at the perl *compile* time, at 
which time the $dir variable does not have any value yet ?


Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Mark Stosberg <ma...@summersault.com>.
> +    my $dir = dirname(__FILE__);
> +    use lib $dir.'/../config';
> +    use lib $dir.'/../perllib';

Actually, for some reason that syntax didn't work either, but this did work on my modperl-startup.pl:

 use lib dirname(__FILE__).'/../config';
 use lib dirname(__FILE__).'/../perllib';

    Mark



Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Mark Stosberg <ma...@summersault.com>.
On Fri, 25 Apr 2008 16:38:36 +0200
André Warnier <aw...@ice-sa.com> wrote:

> Mark Stosberg wrote:
> >> Sorry Mark, I misread your mail.  I thought you were using
> >> PerlSetVar. What are you doing exactly?  PerlSetEnv PERL5LIB?
> > 
> > Exactly.
> > 
> Hi guys,
> sorry to butt in, particularly since much higher-grade specialists
> have been in this thread before, but ..
> I seem to recall an earlier thread talking about the same kind of
> thing. Isn't it too late, once the Apache server has started and
> initialised Perl interpreters and so on, to set the environment var.
> PERL5LIB via PerlSetEnv ?
> Would that not explain the curious behaviour seen ?

Possibly. As I  reviewed other environments that are working for us,
I see that we use "SetEnv" there. I just introduced "PerlSetEnv" recently,
as part trying to debug mod_perl problems. 

I did find and change this section in my startup script which seems like 
it could be related:

-    chdir dirname(__FILE__);
-    use lib '../config', '../perllib';

+    my $dir = dirname(__FILE__);
+    use lib $dir.'/../config'; $
+    use lib $dir.'/../perllib';

####

So, I was adding relative paths to @INC before, and now I'm adding absolute ones. 

   Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by André Warnier <aw...@ice-sa.com>.
Mark Stosberg wrote:
>> Sorry Mark, I misread your mail.  I thought you were using PerlSetVar.
>>  What are you doing exactly?  PerlSetEnv PERL5LIB?
> 
> Exactly.
> 
Hi guys,
sorry to butt in, particularly since much higher-grade specialists have 
been in this thread before, but ..
I seem to recall an earlier thread talking about the same kind of thing.
Isn't it too late, once the Apache server has started and initialised 
Perl interpreters and so on, to set the environment var. PERL5LIB via 
PerlSetEnv ?
Would that not explain the curious behaviour seen ?

André




Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Mark Stosberg <ma...@summersault.com>.
> Sorry Mark, I misread your mail.  I thought you were using PerlSetVar.
>  What are you doing exactly?  PerlSetEnv PERL5LIB?

Exactly.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .

Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Perrin Harkins <pe...@elem.com>.
On Thu, Apr 24, 2008 at 4:16 PM, Mark Stosberg <ma...@summersault.com> wrote:
>  No, I'm not. Are there debugging techniques that help me confirm
>  this?

Sorry Mark, I misread your mail.  I thought you were using PerlSetVar.
 What are you doing exactly?  PerlSetEnv PERL5LIB?

- Perrin

Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Mark Stosberg <ma...@summersault.com>.
> On Tue, Apr 22, 2008 at 2:27 PM, Mark Stosberg <ma...@summersault.com>
> wrote:
> >   A. If I just set "status => 404" with CGI.pm / Apache::Registry
> > and return nothing, it works the first time, and then after that I
> >    get a lot of these errors:
> >
> >    "[Tue Apr 22 13:47:07 2008] [error] Can't locate
> > SAP/QuickSearch.pm in @INC" And indeed, the path that should be set
> > via PerlSetEnv is missing.
> 
> Hmm, PerlSetEnv depends on being inside a specific
> Location/File/Directory block.  Are you sure that apache has resolved
> to the block you think it's in when you have this problem?

No, I'm not. Are there debugging techniques that help me confirm 
this?

Thanks again for your help, Perrin.

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Perrin Harkins <pe...@elem.com>.
On Tue, Apr 22, 2008 at 2:27 PM, Mark Stosberg <ma...@summersault.com> wrote:
>   A. If I just set "status => 404" with CGI.pm / Apache::Registry and
>    return nothing, it works the first time, and then after that I
>    get a lot of these errors:
>
>    "[Tue Apr 22 13:47:07 2008] [error] Can't locate SAP/QuickSearch.pm
>  in @INC" And indeed, the path that should be set via PerlSetEnv is
>  missing.

Hmm, PerlSetEnv depends on being inside a specific
Location/File/Directory block.  Are you sure that apache has resolved
to the block you think it's in when you have this problem?

>   B. If I sent "status => 404" *and* send content, the result is that
>  two pages are displayed: One is the ErrorDocument for 404, and the
>  other is the content I sent.

Yeah, it can't recall content you've already sent.  I think mod_cgi
avoids this issue by buffering everything.

- Perrin

Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Mark Stosberg <ma...@summersault.com>.
I'm come to understand my 404 handling case better. Here's what I know:

 A. If I just set "status => 404" with CGI.pm / Apache::Registry and
   return nothing, it works the first time, and then after that I 
   get a lot of these errors:

   "[Tue Apr 22 13:47:07 2008] [error] Can't locate SAP/QuickSearch.pm
in @INC" And indeed, the path that should be set via PerlSetEnv is
missing.

 B. If I sent "status => 404" *and* send content, the result is that
two pages are displayed: One is the ErrorDocument for 404, and the
other is the content I sent.

 C. If I don't set the status code but just send "file not found"
content, that looks right to users, but the "200" code is returned,
which is inaccurate for any automated tools using the site. 

###

At this point, I'm ready go with "C" as being "good enough", however I'm
interested to know why the environment variable would be missing when 
the 404 page is called, especially when it works on the first load.

Thanks!

    Mark

-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


-- 
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark Stosberg            Principal Developer  
   mark@summersault.com     Summersault, LLC     
   765-939-9301 ext 202     database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .



Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by Perrin Harkins <pe...@elem.com>.
On Tue, Apr 15, 2008 at 3:19 PM, Mark Stosberg <ma...@summersault.com> wrote:
>  It seems that using CGI, it is too late return a true 404 once the script
> is processing the request.

I thought mod_cgi would handle this, actually.  It parses your header
output.  Apache::Registry has trouble emulating that, as discussed on
this list in the past.

>  However, I don't think I'm doing the ideal think in mod_perl, because it
>  behaves strangely in some cases. Two specific cases:
>
>  If I use GET on the command line, instead of 404, I'll get back this:
>  "500 EOF when chunk header expected"

You're not using Registry here, right?  Is it possible that something
is using your status header as a return code from a mod_perl handler?
Those don't always match.

The best source for examples of how to do this correctly is probably
the mod_perl Developer's Cookbook.  I don't have mine handy, but
that's where I'd like first if you have it.

>  More troubling is the behavior I see in the browser: The first time I
>  access the script that would through this 404 in mod_perl, it works.
>  Then for attempts 2 through 6 return internal server errors complaining
>  about "can't locate modules". Starting on load 7, the pages are returned
>  reliably with the 404 error. WTF?

I'm not familiar with that one.  What's the full text of the error message?

>  In my case, I'm
>  hoping to trigger the internal ErrorDocument 404 page instead of
>  re-inventingt that wheel.

I'm not sure you can do that.  I know you can set the ErrorDocument
for a specific block ($r->custom_response), but I don't think you can
just hand off to ErrorDocument because it's tied into the default
handler.  I don't remember this well, so checking one of the books or
the list archive is your best bet.

- Perrin

Re: Best practices for returning 404/file-not-found pages inside and outside of mod_perl

Posted by David Nicol <da...@gmail.com>.
On Tue, Apr 15, 2008 at 2:19 PM, Mark Stosberg <ma...@summersault.com> wrote:
>
>   return a true 404

Since MP already replaces the C<exit> function, it shouldn't be too tricky to
abstract 404 and other error codes with by letting exit take arguments -- then
you could do what you want with C<<exit(STATUS => 404)>> for instance.

I don't know how hard that feature would be to add; adding it to mod-perl might
drive similar features added to other CGI systems, for instance the apache
project could add a mapping of specific non-zero exit codes from CGI programs
to things other than "500 internal server error."

Of course, there's always redirecting to a location that really doesn't exist,
but that isn't "true."

Dave the idea guy