You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Abu Hurayrah <ab...@almaghrib.org> on 2005/05/03 01:27:30 UTC

[users@httpd] MSN-Bot doesn't finish a download?

Greets to all!

I've just noticed that when the MSNbot crawls my website and hits some 
of my downloads, it doesn't download the whole file.

The request is HTTP 1.0, rather than the usual 1.1, and the amount 
downloaded changes depending on a certain factor.

I manage downloads via my own PHP script, which reads in a chunk of a 
file, sends that chunk to the browser, and then continues in a loop 
until the entire file has been sent.

MSN seems to only catch ONE chunk, no matter what size I make it, which 
I find very strange, because I cannot think of why my implementation 
would matter to MSN or not.

Previous to this, MSN used to be one of the chief eaters of my 
bandwidth, ironically, when I used another technique for delivering files.

Can anyone explain this odd behavior, and/or why the HTTP 1.0 GET 
protocal doesn't finish the download on this particular implementation?

Thanks in advance!

Re: [users@httpd] MSN-Bot doesn't finish a download?

Posted by Craig Dunigan <cd...@doit.wisc.edu>.
On Tue, 3 May 2005, Joshua Slive wrote:

> On 5/3/05, Abu Hurayrah <ab...@almaghrib.org> wrote:
> > I've tried sizes ranging from 50,000 bytes to 500,000 bytes, and always,
> > MSN gets only that much.
> 
> It's very possible that the bot is interested in less than 50,000 bytes.
> 
> > 
> > Previously, MSN would download the ENTIRE files, when I was sending
> > these files all at once.  I cannot understand the mechanism that
> > prevents it from continuing downloading the entire file, despite the
> > fact that I partition the download into these discrete chunks.  I am not
> > mangling the data in any way I know, I am simply sending it down in
> > chunks to reduce the memory footprint of each of my download script's
> > instances.
> 
> It is possible that the bot was always dropping the connection in the
> same place, but you didn't know about it before because the rest of
> the content would just get blindly sent.
> 
> Joshua.
> 

Joshua's explanations seem the most likely, but I thought I'd toss out
that, in my experience, MS products are often intolerant of latency.  It
comes from a corporate culture that began when everything ran on a LAN.  
Perhaps the MSN bot just can't deal with the split-second pause between
chunks, assumes the connection went bad, and drops it.  Perhaps it's not
even a bug, and they deliberately built it that way as a safeguard against
remote site slowdowns.  At any rate, it is most certainly a client issue,
not an Apache issue.  If the dropped connections cause you trouble, you'll
probably have to practice some defensive programming and handle that agent
ID differently in your download script.




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] MSN-Bot doesn't finish a download?

Posted by Joshua Slive <js...@gmail.com>.
On 5/3/05, Abu Hurayrah <ab...@almaghrib.org> wrote:
> I've tried sizes ranging from 50,000 bytes to 500,000 bytes, and always,
> MSN gets only that much.

It's very possible that the bot is interested in less than 50,000 bytes.

> 
> Previously, MSN would download the ENTIRE files, when I was sending
> these files all at once.  I cannot understand the mechanism that
> prevents it from continuing downloading the entire file, despite the
> fact that I partition the download into these discrete chunks.  I am not
> mangling the data in any way I know, I am simply sending it down in
> chunks to reduce the memory footprint of each of my download script's
> instances.

It is possible that the bot was always dropping the connection in the
same place, but you didn't know about it before because the rest of
the content would just get blindly sent.

Joshua.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] MSN-Bot doesn't finish a download?

Posted by Abu Hurayrah <ab...@almaghrib.org>.
Thank you for your reply!

Joshua Slive wrote:

>On 5/2/05, Abu Hurayrah <ab...@almaghrib.org> wrote:
>  
>
>>Greets to all!
>>
>>I've just noticed that when the MSNbot crawls my website and hits some of my
>>downloads, it doesn't download the whole file.
>>    
>>
>
>Most search engines are only interested in the first x bytes of the
>file, so the bot may simply be dropping the connection after it gets
>what it want.s
>  
>
That's a fair enough assumption, however, the size of the chunk that is 
downloaded is ALWAYS the same size as my $chunk_size value in my 
download script.

>  
>
>>MSN seems to only catch ONE chunk, no matter what size I make it, which I
>>find very strange, because I cannot think of why my implementation would
>>matter to MSN or not.
>>    
>>
>
>What is the smallest "chunk" size you have tried?  You are probably
>just not detecting the dropped connection until you have sent a chunk,
>so you don't really knwo what the bot is accepting.
>
>Joshua.
>  
>
I've tried sizes ranging from 50,000 bytes to 500,000 bytes, and always, 
MSN gets only that much.

Previously, MSN would download the ENTIRE files, when I was sending 
these files all at once.  I cannot understand the mechanism that 
prevents it from continuing downloading the entire file, despite the 
fact that I partition the download into these discrete chunks.  I am not 
mangling the data in any way I know, I am simply sending it down in 
chunks to reduce the memory footprint of each of my download script's 
instances.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] MSN-Bot doesn't finish a download?

Posted by Joshua Slive <js...@gmail.com>.
On 5/2/05, Abu Hurayrah <ab...@almaghrib.org> wrote:
> Greets to all!
> 
> I've just noticed that when the MSNbot crawls my website and hits some of my
> downloads, it doesn't download the whole file.

Most search engines are only interested in the first x bytes of the
file, so the bot may simply be dropping the connection after it gets
what it want.s

> MSN seems to only catch ONE chunk, no matter what size I make it, which I
> find very strange, because I cannot think of why my implementation would
> matter to MSN or not.

What is the smallest "chunk" size you have tried?  You are probably
just not detecting the dropped connection until you have sent a chunk,
so you don't really knwo what the bot is accepting.

Joshua.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org