You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by Scott Purcell <pu...@charter.net> on 2006/02/11 02:40:01 UTC

Access log to see where robots go.

I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com

Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily.

$ cat  localhost_access_log.2006-02-07.txt | less
67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404 985
67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404 985
62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/includes/siteWide.css HTTP/1.1" 200 15402
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/images/header_pear.jpg HTTP/1.1" 200 11227


I see the entry for robots.txt, but I have no idea where they are going, or what they are doing.

I turned on access log like this in the server.xml like so:
        <Valve className="org.apache.catalina.valves.AccessLogValve"
                 directory="logs"  prefix="localhost_access_log." suffix=".txt"
                 pattern="common" resolveHosts="false"/>

And that is a snippet of the log from above.

Does anyone know how to get more involved text, or can anyone tell me what the robots.txt above is doing?


Thanks,
Scott

Re: Access log to see where robots go.

Posted by Leon Rosenberg <ro...@googlemail.com>.

On 2/11/06, Tim Funk <fu...@joedog.org> wrote:
> The problem is your home page, not robots.txt. When / is requested - the
> following is served back, notice the javascript redirect: (the full file is
> below)
>
> ----
>    function invokeWebApp() {
>      top.location.href = "http://www.theuniquepear.com/unique/index.jsp";
>    }
> ----
> Search engines do not execute javascript are there are no links on the page
> so search engines have no where to go. (Except someone else's site).

That's not quite true, in fact crawlers like google bot are executing
javascript. Their javascript engine seem to be very powerful thou. But
they do it mostly to find cloaker and not for indexing.
And back to the original posters problem: don't even try to deliver
something different based on the user agent. This will be your bye-bye
from the  index.
Still, for best crawlability, add a link to your real start page to
the body (where your hello is):
<a href="http://www.theuniquepear.com/unique/index.jsp">Follow me, robot</a> :-)

regards
Leon

>
> As much as I detest SEO companies, you might find it helpful to search for
> one for some assistance.
>
> <html>
> <head>
>    <head>
>      <title>The Unique Pear | Unique Home Decor & Accessories</title>
>                  <meta name="description" content="The Unique Pear is an
> online b                     outique specializing in home decor &
> accessories. Products include clocks, candl                     es, wall
> decor, garden, lighting, bath and more.">
>      <meta name="keywords" content="The Unique Pear Timework clocks, lamps,
> lamp                      shades, candles, aroma, aroma difuser, wall decor,
> wall scounces, wrought iron,                      pitchers, bookstands, jaqua
> bath products, candleholders">
>                  <meta name="description" content="">
> <meta name="keywords" content="">
>   </head>
> <body bgcolor="#FFFFFF">
>
> <script language = "javascript">
>    //<!--
>    function invokeWebApp() {
>      top.location.href = "http://www.theuniquepear.com/unique/index.jsp";
>    }
>    invokeWebApp();
>    // -->
> </script>
>
> hello
> </body>
> </html>
>
> -Tim
>
> Scott Purcell wrote:
> > I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com
> >
> > Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily.
> >
> > $ cat  localhost_access_log.2006-02-07.txt | less
> > 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404 985
> > 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
> > 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404 985
> > 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
> > 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/includes/siteWide.css HTTP/1.1" 200 15402
> > 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/images/header_pear.jpg HTTP/1.1" 200 11227
> >
> >
> > I see the entry for robots.txt, but I have no idea where they are going, or what they are doing.
> >
> > I turned on access log like this in the server.xml like so:
> >         <Valve className="org.apache.catalina.valves.AccessLogValve"
> >                  directory="logs"  prefix="localhost_access_log." suffix=".txt"
> >                  pattern="common" resolveHosts="false"/>
> >
> > And that is a snippet of the log from above.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: Access log to see where robots go.

Posted by Tim Funk <fu...@joedog.org>.

The problem is your home page, not robots.txt. When / is requested - the 
following is served back, notice the javascript redirect: (the full file is 
below)

----
   function invokeWebApp() {
     top.location.href = "http://www.theuniquepear.com/unique/index.jsp";
   }
----
Search engines do not execute javascript are there are no links on the page 
so search engines have no where to go. (Except someone else's site).

As much as I detest SEO companies, you might find it helpful to search for 
one for some assistance.

<html>
<head>
   <head>
     <title>The Unique Pear | Unique Home Decor & Accessories</title>
                 <meta name="description" content="The Unique Pear is an 
online b                     outique specializing in home decor & 
accessories. Products include clocks, candl                     es, wall 
decor, garden, lighting, bath and more.">
     <meta name="keywords" content="The Unique Pear Timework clocks, lamps, 
lamp                      shades, candles, aroma, aroma difuser, wall decor, 
wall scounces, wrought iron,                      pitchers, bookstands, jaqua 
bath products, candleholders">
                 <meta name="description" content="">
<meta name="keywords" content="">
  </head>
<body bgcolor="#FFFFFF">

<script language = "javascript">
   //<!--
   function invokeWebApp() {
     top.location.href = "http://www.theuniquepear.com/unique/index.jsp";
   }
   invokeWebApp();
   // -->
</script>

hello
</body>
</html>

-Tim

Scott Purcell wrote:
> I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com
> 
> Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily.
> 
> $ cat  localhost_access_log.2006-02-07.txt | less
> 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404 985
> 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
> 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404 985
> 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/includes/siteWide.css HTTP/1.1" 200 15402
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET /unique/images/header_pear.jpg HTTP/1.1" 200 11227
> 
> 
> I see the entry for robots.txt, but I have no idea where they are going, or what they are doing.
> 
> I turned on access log like this in the server.xml like so:
>         <Valve className="org.apache.catalina.valves.AccessLogValve"
>                  directory="logs"  prefix="localhost_access_log." suffix=".txt"
>                  pattern="common" resolveHosts="false"/>
> 
> And that is a snippet of the log from above.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: Access log to see where robots go.

Posted by Mark Hagger <ma...@m-spatial.com>.

robots.txt is a standard file that search engines should request before trying 
to index your site.  Its allows you to block the indexer completely, or 
partially from your site.  Try a google search for "robots.txt" for more 
details.

Not having one is the same as saying "feel free to index my entire site", so 
in your case thats not causing any problems.

Mark


On Saturday 11 February 2006 16:57, Ed Bicker wrote:
> Hello Scott,
> I have had similar problem. Can you let me know if this is resolved on your
> end. Sometimes the email response coming back to me gets buried in another
> folder and I never get to see the resolutions.
> I can't seem to get search engines to see my site, as well. I do not know
> how to resolve this....
>
> Thanks
> Ed
> guru@travelin.com
>
>
> -----Original Message-----
> From: Scott Purcell [mailto:purcell5@charter.net]
> Sent: Friday, February 10, 2006 8:40 PM
> To: Tomcat Users List
> Subject: Access log to see where robots go.
>
>
> I have had trouble getting search engines to see my site. I built it with
> struts, and use some tags from the index.html page to get business logic,
> to finally get to my page. The url is http://www.theuniquepear.com
>
> Anyway, upon talking to some co-workers, they suggested I watch my access
> log, so I can see what files they are indexing. I thought I had the access
> log turned on for the site, and see when someone hits my web site, but as
> far as the searchbots go, I only see this in my logs daily.
>
> $ cat  localhost_access_log.2006-02-07.txt | less
> 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404
> 985
> 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
> 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404
> 985
> 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
> /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> /unique/includes/siteWide.css HTTP/1.1" 200 15402
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> /unique/images/header_pear.jpg HTTP/1.1" 200 11227
>
>
> I see the entry for robots.txt, but I have no idea where they are going, or
> what they are doing.
>
> I turned on access log like this in the server.xml like so:
>         <Valve className="org.apache.catalina.valves.AccessLogValve"
>                  directory="logs"  prefix="localhost_access_log."
> suffix=".txt"
>                  pattern="common" resolveHosts="false"/>
>
> And that is a snippet of the log from above.
>
> Does anyone know how to get more involved text, or can anyone tell me what
> the robots.txt above is doing?
>
>
> Thanks,
> Scott
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>
> ________________________________________________________________________
> This email has been scanned for all known viruses by the MessageLabs
> SkyScan service.

________________________________________________________________________
This email has been scanned for all known viruses by the MessageLabs SkyScan service.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: Access log to see where robots go.

Posted by Ed Bicker <gu...@travelin.com>.

Hello Scott,
I have had similar problem. Can you let me know if this is resolved on your
end. Sometimes the email response coming back to me gets buried in another
folder and I never get to see the resolutions.
I can't seem to get search engines to see my site, as well. I do not know
how to resolve this....

Thanks
Ed
guru@travelin.com


-----Original Message-----
From: Scott Purcell [mailto:purcell5@charter.net]
Sent: Friday, February 10, 2006 8:40 PM
To: Tomcat Users List
Subject: Access log to see where robots go.


I have had trouble getting search engines to see my site. I built it with
struts, and use some tags from the index.html page to get business logic, to
finally get to my page. The url is http://www.theuniquepear.com

Anyway, upon talking to some co-workers, they suggested I watch my access
log, so I can see what files they are indexing. I thought I had the access
log turned on for the site, and see when someone hits my web site, but as
far as the searchbots go, I only see this in my logs daily.

$ cat  localhost_access_log.2006-02-07.txt | less
67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404
985
67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404
985
62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
/unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
/unique/includes/siteWide.css HTTP/1.1" 200 15402
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
/unique/images/header_pear.jpg HTTP/1.1" 200 11227


I see the entry for robots.txt, but I have no idea where they are going, or
what they are doing.

I turned on access log like this in the server.xml like so:
        <Valve className="org.apache.catalina.valves.AccessLogValve"
                 directory="logs"  prefix="localhost_access_log."
suffix=".txt"
                 pattern="common" resolveHosts="false"/>

And that is a snippet of the log from above.

Does anyone know how to get more involved text, or can anyone tell me what
the robots.txt above is doing?


Thanks,
Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org