You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Shinichiro Abe <sh...@gmail.com> on 2013/01/09 08:10:06 UTC

Http status code 302

Hi,

I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.

When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.

Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?

Thanks in advance,
Shinichiro Abe

Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
I created CONNECTORS-604 to track this problem.

Karl

On Wed, Jan 9, 2013 at 10:02 AM, Karl Wright <da...@gmail.com> wrote:
> There seems to be only two differences.  The Host header value is
> different, and there is an Accept header in the one that works.
> (Accept: */*)
>
> I will experiment with curl this evening to see which of these is
> causing the problem.  Or, if you don't want to wait, you can use curl
> and explicitly set these headers to see which one causes it to fail.
>
> Thanks,
> Karl
>
>
> On Wed, Jan 9, 2013 at 9:56 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> Thank you for your navigation.
>> I got a log from MCF 1.0.1.
>>
>> A) a log from curl
>>
>> curl -vvv "http://lucene.jugem.jp/?eid=39"
>> * About to connect() to lucene.jugem.jp port 80 (#0)
>> *   Trying 210.172.160.170... connected
>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>> GET /?eid=39 HTTP/1.1
>>> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
>>> Host: lucene.jugem.jp
>>> Accept: */*
>>>
>> < HTTP/1.1 200 OK
>> < Date: Wed, 09 Jan 2013 13:23:15 GMT
>> < Server: Apache/2.0.59 (Unix)
>> < Vary: User-Agent,Host,Accept-Encoding
>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>> < Accept-Ranges: bytes
>> < Content-Length: 22594
>> < Cache-Control: private
>> < Pragma: no-cache
>> < Connection: close
>> < Content-Type: text/html
>>
>>
>> B) a log from MCF 1.0.1
>>
>> DEBUG 2013-01-09 23:40:11,313 (Thread-472) - Open connection to 210.172.160.170:80
>> DEBUG 2013-01-09 23:40:11,436 (Thread-472) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Using virtual host name: lucene.jugem.jp
>> DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Adding Host request header
>> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "Host: lucene.jugem.jp[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,629 (Thread-472) - << "HTTP/1.1 200 OK[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Date: Wed, 09 Jan 2013 14:39:24 GMT[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Vary: User-Agent,Host,Accept-Encoding[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Accept-Ranges: bytes[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Length: 22594[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Cache-Control: private[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Pragma: no-cache[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Connection: close[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Type: text/html[\r][\n]"
>> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "[\r][\n]"
>> DEBUG 2013-01-09 23:40:12,054 (Worker thread '0') - Should close connection in response to directive: close
>>
>> Is it enough to diagnose?
>>
>> Thank you very much,
>> Shinichiro
>>
>>
>>
>>
>> On 2013/01/09, at 23:12, Karl Wright wrote:
>>
>>> Wire debugging with MCF 1.0.1 requires different logging.ini
>>> parameters, because it uses commons-httpclient instead.  That's
>>> described here:
>>>
>>> http://hc.apache.org/httpclient-3.x/logging.html
>>>
>>> I will need a working comparison to diagnose what is happening, so
>>> please either get a log from curl, or better yet from MCF 1.0.1.
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>> On Wed, Jan 9, 2013 at 9:04 AM, Shinichiro Abe
>>> <sh...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I did wire debugging:
>>>> curl yielded a 200 while ManifoldCF trunk got a 302, ManifoldCF 1.0.1 got a 200.
>>>>
>>>> The manifoldcf.log of trunk showed logs[1] but one of 1.0.1 showed no logs.
>>>>
>>>> [1]
>>>> DEBUG 2013-01-09 22:07:26,494 (Thread-474) - Sending request: GET /?eid=39 HTTP/1.1
>>>> DEBUG 2013-01-09 22:07:26,495 (Thread-474) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,496 (Thread-474) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Host: lucene.jugem.jp:80[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Connection: Keep-Alive[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> GET /?eid=39 HTTP/1.1
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> From: shinichiro.abe.1@gmail.com
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Host: lucene.jugem.jp:80
>>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Connection: Keep-Alive
>>>> DEBUG 2013-01-09 22:07:26,556 (Thread-474) - << "HTTP/1.1 302 Found[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Date: Wed, 09 Jan 2013 13:06:39 GMT[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Location: http://error.jugem.jp/[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Length: 285[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Connection: close[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "[\r][\n]"
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - Receiving response: HTTP/1.1 302 Found
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << HTTP/1.1 302 Found
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Date: Wed, 09 Jan 2013 13:06:39 GMT
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Server: Apache/2.0.59 (Unix)
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Location: http://error.jugem.jp/
>>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Content-Length: 285
>>>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Connection: close
>>>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Content-Type: text/html; charset=iso-8859-1
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<html><head>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<title>302 Found</title>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "</head><body>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<h1>Found</h1>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<p>The document has moved <a href="http://error.jugem.jp/">here</a>.</p>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<hr>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "<address>Apache/2.0.59 (Unix) Server at lucene.jugem.jp Port 80</address>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "</body></html>[\n]"
>>>> DEBUG 2013-01-09 22:07:26,618 (Thread-474) - Connection 0.0.0.0:56784<->210.172.160.170:80 closed
>>>>
>>>>
>>>>
>>>> Hmm.. It looks like moving to the error location anyway.
>>>>
>>>> Thanks,
>>>> Shinichiro Abe
>>>>
>>>>
>>>> On 2013/01/09, at 21:08, Karl Wright wrote:
>>>>
>>>>> Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
>>>>> Koji's blog site does not like one of the headers, crawler-agent
>>>>> perhaps?
>>>>>
>>>>> I am behind a firewall now but I will explore this later today.  In
>>>>> the meantime, if you want to research the problem, could you turn on
>>>>> wire debugging?  You do this in the logging.ini file following these
>>>>> instructions:
>>>>>
>>>>> http://hc.apache.org/httpcomponents-client-ga/logging.html
>>>>>
>>>>> You should see everything happening in the log then, and you can then
>>>>> compare against curl using -vvv.  Please let me know what you find.
>>>>>
>>>>> Thanks!
>>>>> Karl
>>>>>
>>>>> On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
>>>>> <sh...@gmail.com> wrote:
>>>>>> I'm using web connector.
>>>>>>
>>>>>>> Are you trying to crawl through a proxy?
>>>>>> No. I just set seeds that url without a proxy.
>>>>>> (Also I didn't obey robots.txt)
>>>>>>
>>>>>> Using curl, it is the same as your result.
>>>>>>
>>>>>> Could you reproduce that?
>>>>>>
>>>>>> Shinichiro
>>>>>>
>>>>>> On 2013/01/09, at 17:49, Karl Wright wrote:
>>>>>>
>>>>>>> When I try the URL you gave using curl and no special arguments, I get this:
>>>>>>>
>>>>>>>
>>>>>>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>>>>>>> * About to connect() to lucene.jugem.jp port 80 (#0)
>>>>>>> *   Trying 210.172.160.170... connected
>>>>>>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>>>>>>> GET /?eid=39 HTTP/1.1
>>>>>>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>>>>>>> .5 librtmp/2.3
>>>>>>>> Host: lucene.jugem.jp
>>>>>>>> Accept: */*
>>>>>>>>
>>>>>>> < HTTP/1.1 200 OK
>>>>>>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>>>>>>> < Server: Apache/2.0.59 (Unix)
>>>>>>> < Vary: User-Agent,Host,Accept-Encoding
>>>>>>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>>>>>>> < Accept-Ranges: bytes
>>>>>>> < Content-Length: 22594
>>>>>>> < Cache-Control: private
>>>>>>> < Pragma: no-cache
>>>>>>> < Connection: close
>>>>>>> < Content-Type: text/html
>>>>>>>
>>>>>>> There's no 302 from here.
>>>>>>>
>>>>>>> Are you trying to crawl through a proxy?  If so, that might be where
>>>>>>> the problem lies.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>>>>>>> It sounds like the httpclient upgrade definitely broke something.  We
>>>>>>>> should open a ticket.
>>>>>>>>
>>>>>>>> But first, can you confirm what connector this is?  Is it the web
>>>>>>>> connector?  If so, I am puzzled because the web connector has always
>>>>>>>> logged any 302 return, but then queued a second document which it
>>>>>>>> subsequently fetches.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>>>>>>> <sh...@gmail.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>>>>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>>>>>>
>>>>>>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>>>>>>
>>>>>>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Shinichiro Abe
>>>>>>
>>>>
>>

Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
There seems to be only two differences.  The Host header value is
different, and there is an Accept header in the one that works.
(Accept: */*)

I will experiment with curl this evening to see which of these is
causing the problem.  Or, if you don't want to wait, you can use curl
and explicitly set these headers to see which one causes it to fail.

Thanks,
Karl


On Wed, Jan 9, 2013 at 9:56 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Thank you for your navigation.
> I got a log from MCF 1.0.1.
>
> A) a log from curl
>
> curl -vvv "http://lucene.jugem.jp/?eid=39"
> * About to connect() to lucene.jugem.jp port 80 (#0)
> *   Trying 210.172.160.170... connected
> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>> GET /?eid=39 HTTP/1.1
>> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
>> Host: lucene.jugem.jp
>> Accept: */*
>>
> < HTTP/1.1 200 OK
> < Date: Wed, 09 Jan 2013 13:23:15 GMT
> < Server: Apache/2.0.59 (Unix)
> < Vary: User-Agent,Host,Accept-Encoding
> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
> < Accept-Ranges: bytes
> < Content-Length: 22594
> < Cache-Control: private
> < Pragma: no-cache
> < Connection: close
> < Content-Type: text/html
>
>
> B) a log from MCF 1.0.1
>
> DEBUG 2013-01-09 23:40:11,313 (Thread-472) - Open connection to 210.172.160.170:80
> DEBUG 2013-01-09 23:40:11,436 (Thread-472) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
> DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Using virtual host name: lucene.jugem.jp
> DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Adding Host request header
> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "Host: lucene.jugem.jp[\r][\n]"
> DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "[\r][\n]"
> DEBUG 2013-01-09 23:40:11,629 (Thread-472) - << "HTTP/1.1 200 OK[\r][\n]"
> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Date: Wed, 09 Jan 2013 14:39:24 GMT[\r][\n]"
> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Vary: User-Agent,Host,Accept-Encoding[\r][\n]"
> DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Accept-Ranges: bytes[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Length: 22594[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Cache-Control: private[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Pragma: no-cache[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Connection: close[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Type: text/html[\r][\n]"
> DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "[\r][\n]"
> DEBUG 2013-01-09 23:40:12,054 (Worker thread '0') - Should close connection in response to directive: close
>
> Is it enough to diagnose?
>
> Thank you very much,
> Shinichiro
>
>
>
>
> On 2013/01/09, at 23:12, Karl Wright wrote:
>
>> Wire debugging with MCF 1.0.1 requires different logging.ini
>> parameters, because it uses commons-httpclient instead.  That's
>> described here:
>>
>> http://hc.apache.org/httpclient-3.x/logging.html
>>
>> I will need a working comparison to diagnose what is happening, so
>> please either get a log from curl, or better yet from MCF 1.0.1.
>>
>> Thanks!
>> Karl
>>
>>
>> On Wed, Jan 9, 2013 at 9:04 AM, Shinichiro Abe
>> <sh...@gmail.com> wrote:
>>> Hi,
>>>
>>> I did wire debugging:
>>> curl yielded a 200 while ManifoldCF trunk got a 302, ManifoldCF 1.0.1 got a 200.
>>>
>>> The manifoldcf.log of trunk showed logs[1] but one of 1.0.1 showed no logs.
>>>
>>> [1]
>>> DEBUG 2013-01-09 22:07:26,494 (Thread-474) - Sending request: GET /?eid=39 HTTP/1.1
>>> DEBUG 2013-01-09 22:07:26,495 (Thread-474) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,496 (Thread-474) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Host: lucene.jugem.jp:80[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Connection: Keep-Alive[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> GET /?eid=39 HTTP/1.1
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> From: shinichiro.abe.1@gmail.com
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Host: lucene.jugem.jp:80
>>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Connection: Keep-Alive
>>> DEBUG 2013-01-09 22:07:26,556 (Thread-474) - << "HTTP/1.1 302 Found[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Date: Wed, 09 Jan 2013 13:06:39 GMT[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Location: http://error.jugem.jp/[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Length: 285[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Connection: close[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "[\r][\n]"
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - Receiving response: HTTP/1.1 302 Found
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << HTTP/1.1 302 Found
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Date: Wed, 09 Jan 2013 13:06:39 GMT
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Server: Apache/2.0.59 (Unix)
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Location: http://error.jugem.jp/
>>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Content-Length: 285
>>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Connection: close
>>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Content-Type: text/html; charset=iso-8859-1
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<html><head>[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<title>302 Found</title>[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "</head><body>[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<h1>Found</h1>[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<p>The document has moved <a href="http://error.jugem.jp/">here</a>.</p>[\n]"
>>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<hr>[\n]"
>>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "<address>Apache/2.0.59 (Unix) Server at lucene.jugem.jp Port 80</address>[\n]"
>>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "</body></html>[\n]"
>>> DEBUG 2013-01-09 22:07:26,618 (Thread-474) - Connection 0.0.0.0:56784<->210.172.160.170:80 closed
>>>
>>>
>>>
>>> Hmm.. It looks like moving to the error location anyway.
>>>
>>> Thanks,
>>> Shinichiro Abe
>>>
>>>
>>> On 2013/01/09, at 21:08, Karl Wright wrote:
>>>
>>>> Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
>>>> Koji's blog site does not like one of the headers, crawler-agent
>>>> perhaps?
>>>>
>>>> I am behind a firewall now but I will explore this later today.  In
>>>> the meantime, if you want to research the problem, could you turn on
>>>> wire debugging?  You do this in the logging.ini file following these
>>>> instructions:
>>>>
>>>> http://hc.apache.org/httpcomponents-client-ga/logging.html
>>>>
>>>> You should see everything happening in the log then, and you can then
>>>> compare against curl using -vvv.  Please let me know what you find.
>>>>
>>>> Thanks!
>>>> Karl
>>>>
>>>> On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
>>>> <sh...@gmail.com> wrote:
>>>>> I'm using web connector.
>>>>>
>>>>>> Are you trying to crawl through a proxy?
>>>>> No. I just set seeds that url without a proxy.
>>>>> (Also I didn't obey robots.txt)
>>>>>
>>>>> Using curl, it is the same as your result.
>>>>>
>>>>> Could you reproduce that?
>>>>>
>>>>> Shinichiro
>>>>>
>>>>> On 2013/01/09, at 17:49, Karl Wright wrote:
>>>>>
>>>>>> When I try the URL you gave using curl and no special arguments, I get this:
>>>>>>
>>>>>>
>>>>>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>>>>>> * About to connect() to lucene.jugem.jp port 80 (#0)
>>>>>> *   Trying 210.172.160.170... connected
>>>>>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>>>>>> GET /?eid=39 HTTP/1.1
>>>>>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>>>>>> .5 librtmp/2.3
>>>>>>> Host: lucene.jugem.jp
>>>>>>> Accept: */*
>>>>>>>
>>>>>> < HTTP/1.1 200 OK
>>>>>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>>>>>> < Server: Apache/2.0.59 (Unix)
>>>>>> < Vary: User-Agent,Host,Accept-Encoding
>>>>>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>>>>>> < Accept-Ranges: bytes
>>>>>> < Content-Length: 22594
>>>>>> < Cache-Control: private
>>>>>> < Pragma: no-cache
>>>>>> < Connection: close
>>>>>> < Content-Type: text/html
>>>>>>
>>>>>> There's no 302 from here.
>>>>>>
>>>>>> Are you trying to crawl through a proxy?  If so, that might be where
>>>>>> the problem lies.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>>>>>> It sounds like the httpclient upgrade definitely broke something.  We
>>>>>>> should open a ticket.
>>>>>>>
>>>>>>> But first, can you confirm what connector this is?  Is it the web
>>>>>>> connector?  If so, I am puzzled because the web connector has always
>>>>>>> logged any 302 return, but then queued a second document which it
>>>>>>> subsequently fetches.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>>>>>> <sh...@gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>>>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>>>>>
>>>>>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>>>>>
>>>>>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> Shinichiro Abe
>>>>>
>>>
>

Re: Http status code 302

Posted by Shinichiro Abe <sh...@gmail.com>.
Thank you for your navigation. 
I got a log from MCF 1.0.1.

A) a log from curl

curl -vvv "http://lucene.jugem.jp/?eid=39"
* About to connect() to lucene.jugem.jp port 80 (#0)
*   Trying 210.172.160.170... connected
* Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
> GET /?eid=39 HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
> Host: lucene.jugem.jp
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Wed, 09 Jan 2013 13:23:15 GMT
< Server: Apache/2.0.59 (Unix)
< Vary: User-Agent,Host,Accept-Encoding
< Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
< Accept-Ranges: bytes
< Content-Length: 22594
< Cache-Control: private
< Pragma: no-cache
< Connection: close
< Content-Type: text/html


B) a log from MCF 1.0.1

DEBUG 2013-01-09 23:40:11,313 (Thread-472) - Open connection to 210.172.160.170:80
DEBUG 2013-01-09 23:40:11,436 (Thread-472) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Using virtual host name: lucene.jugem.jp
DEBUG 2013-01-09 23:40:11,437 (Thread-472) - Adding Host request header
DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "Host: lucene.jugem.jp[\r][\n]"
DEBUG 2013-01-09 23:40:11,447 (Thread-472) - >> "[\r][\n]"
DEBUG 2013-01-09 23:40:11,629 (Thread-472) - << "HTTP/1.1 200 OK[\r][\n]"
DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Date: Wed, 09 Jan 2013 14:39:24 GMT[\r][\n]"
DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Vary: User-Agent,Host,Accept-Encoding[\r][\n]"
DEBUG 2013-01-09 23:40:11,632 (Thread-472) - << "Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Accept-Ranges: bytes[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Length: 22594[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Cache-Control: private[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Pragma: no-cache[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Connection: close[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "Content-Type: text/html[\r][\n]"
DEBUG 2013-01-09 23:40:11,633 (Thread-472) - << "[\r][\n]"
DEBUG 2013-01-09 23:40:12,054 (Worker thread '0') - Should close connection in response to directive: close

Is it enough to diagnose?

Thank you very much,
Shinichiro




On 2013/01/09, at 23:12, Karl Wright wrote:

> Wire debugging with MCF 1.0.1 requires different logging.ini
> parameters, because it uses commons-httpclient instead.  That's
> described here:
> 
> http://hc.apache.org/httpclient-3.x/logging.html
> 
> I will need a working comparison to diagnose what is happening, so
> please either get a log from curl, or better yet from MCF 1.0.1.
> 
> Thanks!
> Karl
> 
> 
> On Wed, Jan 9, 2013 at 9:04 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> Hi,
>> 
>> I did wire debugging:
>> curl yielded a 200 while ManifoldCF trunk got a 302, ManifoldCF 1.0.1 got a 200.
>> 
>> The manifoldcf.log of trunk showed logs[1] but one of 1.0.1 showed no logs.
>> 
>> [1]
>> DEBUG 2013-01-09 22:07:26,494 (Thread-474) - Sending request: GET /?eid=39 HTTP/1.1
>> DEBUG 2013-01-09 22:07:26,495 (Thread-474) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,496 (Thread-474) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Host: lucene.jugem.jp:80[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Connection: Keep-Alive[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> GET /?eid=39 HTTP/1.1
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> From: shinichiro.abe.1@gmail.com
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Host: lucene.jugem.jp:80
>> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Connection: Keep-Alive
>> DEBUG 2013-01-09 22:07:26,556 (Thread-474) - << "HTTP/1.1 302 Found[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Date: Wed, 09 Jan 2013 13:06:39 GMT[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Location: http://error.jugem.jp/[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Length: 285[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Connection: close[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "[\r][\n]"
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - Receiving response: HTTP/1.1 302 Found
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << HTTP/1.1 302 Found
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Date: Wed, 09 Jan 2013 13:06:39 GMT
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Server: Apache/2.0.59 (Unix)
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Location: http://error.jugem.jp/
>> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Content-Length: 285
>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Connection: close
>> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Content-Type: text/html; charset=iso-8859-1
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<html><head>[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<title>302 Found</title>[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "</head><body>[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<h1>Found</h1>[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<p>The document has moved <a href="http://error.jugem.jp/">here</a>.</p>[\n]"
>> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<hr>[\n]"
>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "<address>Apache/2.0.59 (Unix) Server at lucene.jugem.jp Port 80</address>[\n]"
>> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "</body></html>[\n]"
>> DEBUG 2013-01-09 22:07:26,618 (Thread-474) - Connection 0.0.0.0:56784<->210.172.160.170:80 closed
>> 
>> 
>> 
>> Hmm.. It looks like moving to the error location anyway.
>> 
>> Thanks,
>> Shinichiro Abe
>> 
>> 
>> On 2013/01/09, at 21:08, Karl Wright wrote:
>> 
>>> Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
>>> Koji's blog site does not like one of the headers, crawler-agent
>>> perhaps?
>>> 
>>> I am behind a firewall now but I will explore this later today.  In
>>> the meantime, if you want to research the problem, could you turn on
>>> wire debugging?  You do this in the logging.ini file following these
>>> instructions:
>>> 
>>> http://hc.apache.org/httpcomponents-client-ga/logging.html
>>> 
>>> You should see everything happening in the log then, and you can then
>>> compare against curl using -vvv.  Please let me know what you find.
>>> 
>>> Thanks!
>>> Karl
>>> 
>>> On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
>>> <sh...@gmail.com> wrote:
>>>> I'm using web connector.
>>>> 
>>>>> Are you trying to crawl through a proxy?
>>>> No. I just set seeds that url without a proxy.
>>>> (Also I didn't obey robots.txt)
>>>> 
>>>> Using curl, it is the same as your result.
>>>> 
>>>> Could you reproduce that?
>>>> 
>>>> Shinichiro
>>>> 
>>>> On 2013/01/09, at 17:49, Karl Wright wrote:
>>>> 
>>>>> When I try the URL you gave using curl and no special arguments, I get this:
>>>>> 
>>>>> 
>>>>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>>>>> * About to connect() to lucene.jugem.jp port 80 (#0)
>>>>> *   Trying 210.172.160.170... connected
>>>>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>>>>> GET /?eid=39 HTTP/1.1
>>>>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>>>>> .5 librtmp/2.3
>>>>>> Host: lucene.jugem.jp
>>>>>> Accept: */*
>>>>>> 
>>>>> < HTTP/1.1 200 OK
>>>>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>>>>> < Server: Apache/2.0.59 (Unix)
>>>>> < Vary: User-Agent,Host,Accept-Encoding
>>>>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>>>>> < Accept-Ranges: bytes
>>>>> < Content-Length: 22594
>>>>> < Cache-Control: private
>>>>> < Pragma: no-cache
>>>>> < Connection: close
>>>>> < Content-Type: text/html
>>>>> 
>>>>> There's no 302 from here.
>>>>> 
>>>>> Are you trying to crawl through a proxy?  If so, that might be where
>>>>> the problem lies.
>>>>> 
>>>>> Karl
>>>>> 
>>>>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>>>>> It sounds like the httpclient upgrade definitely broke something.  We
>>>>>> should open a ticket.
>>>>>> 
>>>>>> But first, can you confirm what connector this is?  Is it the web
>>>>>> connector?  If so, I am puzzled because the web connector has always
>>>>>> logged any 302 return, but then queued a second document which it
>>>>>> subsequently fetches.
>>>>>> 
>>>>>> Karl
>>>>>> 
>>>>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>>>>> <sh...@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>>>> 
>>>>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>>>> 
>>>>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> Shinichiro Abe
>>>> 
>> 


Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
Wire debugging with MCF 1.0.1 requires different logging.ini
parameters, because it uses commons-httpclient instead.  That's
described here:

http://hc.apache.org/httpclient-3.x/logging.html

I will need a working comparison to diagnose what is happening, so
please either get a log from curl, or better yet from MCF 1.0.1.

Thanks!
Karl


On Wed, Jan 9, 2013 at 9:04 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Hi,
>
> I did wire debugging:
> curl yielded a 200 while ManifoldCF trunk got a 302, ManifoldCF 1.0.1 got a 200.
>
> The manifoldcf.log of trunk showed logs[1] but one of 1.0.1 showed no logs.
>
> [1]
> DEBUG 2013-01-09 22:07:26,494 (Thread-474) - Sending request: GET /?eid=39 HTTP/1.1
> DEBUG 2013-01-09 22:07:26,495 (Thread-474) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
> DEBUG 2013-01-09 22:07:26,496 (Thread-474) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Host: lucene.jugem.jp:80[\r][\n]"
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Connection: Keep-Alive[\r][\n]"
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "[\r][\n]"
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> GET /?eid=39 HTTP/1.1
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> From: shinichiro.abe.1@gmail.com
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Host: lucene.jugem.jp:80
> DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Connection: Keep-Alive
> DEBUG 2013-01-09 22:07:26,556 (Thread-474) - << "HTTP/1.1 302 Found[\r][\n]"
> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Date: Wed, 09 Jan 2013 13:06:39 GMT[\r][\n]"
> DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Location: http://error.jugem.jp/[\r][\n]"
> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Length: 285[\r][\n]"
> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Connection: close[\r][\n]"
> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
> DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "[\r][\n]"
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - Receiving response: HTTP/1.1 302 Found
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << HTTP/1.1 302 Found
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Date: Wed, 09 Jan 2013 13:06:39 GMT
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Server: Apache/2.0.59 (Unix)
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Location: http://error.jugem.jp/
> DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Content-Length: 285
> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Connection: close
> DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Content-Type: text/html; charset=iso-8859-1
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<html><head>[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<title>302 Found</title>[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "</head><body>[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<h1>Found</h1>[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<p>The document has moved <a href="http://error.jugem.jp/">here</a>.</p>[\n]"
> DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<hr>[\n]"
> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "<address>Apache/2.0.59 (Unix) Server at lucene.jugem.jp Port 80</address>[\n]"
> DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "</body></html>[\n]"
> DEBUG 2013-01-09 22:07:26,618 (Thread-474) - Connection 0.0.0.0:56784<->210.172.160.170:80 closed
>
>
>
> Hmm.. It looks like moving to the error location anyway.
>
> Thanks,
> Shinichiro Abe
>
>
> On 2013/01/09, at 21:08, Karl Wright wrote:
>
>> Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
>> Koji's blog site does not like one of the headers, crawler-agent
>> perhaps?
>>
>> I am behind a firewall now but I will explore this later today.  In
>> the meantime, if you want to research the problem, could you turn on
>> wire debugging?  You do this in the logging.ini file following these
>> instructions:
>>
>> http://hc.apache.org/httpcomponents-client-ga/logging.html
>>
>> You should see everything happening in the log then, and you can then
>> compare against curl using -vvv.  Please let me know what you find.
>>
>> Thanks!
>> Karl
>>
>> On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
>> <sh...@gmail.com> wrote:
>>> I'm using web connector.
>>>
>>>> Are you trying to crawl through a proxy?
>>> No. I just set seeds that url without a proxy.
>>> (Also I didn't obey robots.txt)
>>>
>>> Using curl, it is the same as your result.
>>>
>>> Could you reproduce that?
>>>
>>> Shinichiro
>>>
>>> On 2013/01/09, at 17:49, Karl Wright wrote:
>>>
>>>> When I try the URL you gave using curl and no special arguments, I get this:
>>>>
>>>>
>>>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>>>> * About to connect() to lucene.jugem.jp port 80 (#0)
>>>> *   Trying 210.172.160.170... connected
>>>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>>>> GET /?eid=39 HTTP/1.1
>>>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>>>> .5 librtmp/2.3
>>>>> Host: lucene.jugem.jp
>>>>> Accept: */*
>>>>>
>>>> < HTTP/1.1 200 OK
>>>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>>>> < Server: Apache/2.0.59 (Unix)
>>>> < Vary: User-Agent,Host,Accept-Encoding
>>>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>>>> < Accept-Ranges: bytes
>>>> < Content-Length: 22594
>>>> < Cache-Control: private
>>>> < Pragma: no-cache
>>>> < Connection: close
>>>> < Content-Type: text/html
>>>>
>>>> There's no 302 from here.
>>>>
>>>> Are you trying to crawl through a proxy?  If so, that might be where
>>>> the problem lies.
>>>>
>>>> Karl
>>>>
>>>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>>>> It sounds like the httpclient upgrade definitely broke something.  We
>>>>> should open a ticket.
>>>>>
>>>>> But first, can you confirm what connector this is?  Is it the web
>>>>> connector?  If so, I am puzzled because the web connector has always
>>>>> logged any 302 return, but then queued a second document which it
>>>>> subsequently fetches.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>>>> <sh...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>>>
>>>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>>>
>>>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Shinichiro Abe
>>>
>

Re: Http status code 302

Posted by Shinichiro Abe <sh...@gmail.com>.
Hi,

I did wire debugging:
curl yielded a 200 while ManifoldCF trunk got a 302, ManifoldCF 1.0.1 got a 200.

The manifoldcf.log of trunk showed logs[1] but one of 1.0.1 showed no logs.

[1]
DEBUG 2013-01-09 22:07:26,494 (Thread-474) - Sending request: GET /?eid=39 HTTP/1.1
DEBUG 2013-01-09 22:07:26,495 (Thread-474) - >> "GET /?eid=39 HTTP/1.1[\r][\n]"
DEBUG 2013-01-09 22:07:26,496 (Thread-474) - >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)[\r][\n]"
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "From: shinichiro.abe.1@gmail.com[\r][\n]"
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Host: lucene.jugem.jp:80[\r][\n]"
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "Connection: Keep-Alive[\r][\n]"
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> "[\r][\n]"
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> GET /?eid=39 HTTP/1.1
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; shinichiro.abe.1@gmail.com)
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> From: shinichiro.abe.1@gmail.com
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Host: lucene.jugem.jp:80
DEBUG 2013-01-09 22:07:26,497 (Thread-474) - >> Connection: Keep-Alive
DEBUG 2013-01-09 22:07:26,556 (Thread-474) - << "HTTP/1.1 302 Found[\r][\n]"
DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Date: Wed, 09 Jan 2013 13:06:39 GMT[\r][\n]"
DEBUG 2013-01-09 22:07:26,561 (Thread-474) - << "Server: Apache/2.0.59 (Unix)[\r][\n]"
DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Location: http://error.jugem.jp/[\r][\n]"
DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Length: 285[\r][\n]"
DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Connection: close[\r][\n]"
DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
DEBUG 2013-01-09 22:07:26,562 (Thread-474) - << "[\r][\n]"
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - Receiving response: HTTP/1.1 302 Found
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << HTTP/1.1 302 Found
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Date: Wed, 09 Jan 2013 13:06:39 GMT
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Server: Apache/2.0.59 (Unix)
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Location: http://error.jugem.jp/
DEBUG 2013-01-09 22:07:26,563 (Thread-474) - << Content-Length: 285
DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Connection: close
DEBUG 2013-01-09 22:07:26,564 (Thread-474) - << Content-Type: text/html; charset=iso-8859-1
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<html><head>[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<title>302 Found</title>[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "</head><body>[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<h1>Found</h1>[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<p>The document has moved <a href="http://error.jugem.jp/">here</a>.</p>[\n]"
DEBUG 2013-01-09 22:07:26,575 (Thread-474) - << "<hr>[\n]"
DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "<address>Apache/2.0.59 (Unix) Server at lucene.jugem.jp Port 80</address>[\n]"
DEBUG 2013-01-09 22:07:26,576 (Thread-474) - << "</body></html>[\n]"
DEBUG 2013-01-09 22:07:26,618 (Thread-474) - Connection 0.0.0.0:56784<->210.172.160.170:80 closed



Hmm.. It looks like moving to the error location anyway.

Thanks,
Shinichiro Abe


On 2013/01/09, at 21:08, Karl Wright wrote:

> Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
> Koji's blog site does not like one of the headers, crawler-agent
> perhaps?
> 
> I am behind a firewall now but I will explore this later today.  In
> the meantime, if you want to research the problem, could you turn on
> wire debugging?  You do this in the logging.ini file following these
> instructions:
> 
> http://hc.apache.org/httpcomponents-client-ga/logging.html
> 
> You should see everything happening in the log then, and you can then
> compare against curl using -vvv.  Please let me know what you find.
> 
> Thanks!
> Karl
> 
> On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> I'm using web connector.
>> 
>>> Are you trying to crawl through a proxy?
>> No. I just set seeds that url without a proxy.
>> (Also I didn't obey robots.txt)
>> 
>> Using curl, it is the same as your result.
>> 
>> Could you reproduce that?
>> 
>> Shinichiro
>> 
>> On 2013/01/09, at 17:49, Karl Wright wrote:
>> 
>>> When I try the URL you gave using curl and no special arguments, I get this:
>>> 
>>> 
>>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>>> * About to connect() to lucene.jugem.jp port 80 (#0)
>>> *   Trying 210.172.160.170... connected
>>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>>> GET /?eid=39 HTTP/1.1
>>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>>> .5 librtmp/2.3
>>>> Host: lucene.jugem.jp
>>>> Accept: */*
>>>> 
>>> < HTTP/1.1 200 OK
>>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>>> < Server: Apache/2.0.59 (Unix)
>>> < Vary: User-Agent,Host,Accept-Encoding
>>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>>> < Accept-Ranges: bytes
>>> < Content-Length: 22594
>>> < Cache-Control: private
>>> < Pragma: no-cache
>>> < Connection: close
>>> < Content-Type: text/html
>>> 
>>> There's no 302 from here.
>>> 
>>> Are you trying to crawl through a proxy?  If so, that might be where
>>> the problem lies.
>>> 
>>> Karl
>>> 
>>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>>> It sounds like the httpclient upgrade definitely broke something.  We
>>>> should open a ticket.
>>>> 
>>>> But first, can you confirm what connector this is?  Is it the web
>>>> connector?  If so, I am puzzled because the web connector has always
>>>> logged any 302 return, but then queued a second document which it
>>>> subsequently fetches.
>>>> 
>>>> Karl
>>>> 
>>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>>> <sh...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>> 
>>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>> 
>>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>> 
>>>>> Thanks in advance,
>>>>> Shinichiro Abe
>> 


Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
Odd that curl would yield a 200 while ManifoldCF gets a 302.  Maybe
Koji's blog site does not like one of the headers, crawler-agent
perhaps?

I am behind a firewall now but I will explore this later today.  In
the meantime, if you want to research the problem, could you turn on
wire debugging?  You do this in the logging.ini file following these
instructions:

http://hc.apache.org/httpcomponents-client-ga/logging.html

You should see everything happening in the log then, and you can then
compare against curl using -vvv.  Please let me know what you find.

Thanks!
Karl

On Wed, Jan 9, 2013 at 4:29 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> I'm using web connector.
>
>> Are you trying to crawl through a proxy?
> No. I just set seeds that url without a proxy.
> (Also I didn't obey robots.txt)
>
> Using curl, it is the same as your result.
>
> Could you reproduce that?
>
> Shinichiro
>
> On 2013/01/09, at 17:49, Karl Wright wrote:
>
>> When I try the URL you gave using curl and no special arguments, I get this:
>>
>>
>> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
>> * About to connect() to lucene.jugem.jp port 80 (#0)
>> *   Trying 210.172.160.170... connected
>> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>>> GET /?eid=39 HTTP/1.1
>>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
>> .5 librtmp/2.3
>>> Host: lucene.jugem.jp
>>> Accept: */*
>>>
>> < HTTP/1.1 200 OK
>> < Date: Wed, 09 Jan 2013 08:47:52 GMT
>> < Server: Apache/2.0.59 (Unix)
>> < Vary: User-Agent,Host,Accept-Encoding
>> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
>> < Accept-Ranges: bytes
>> < Content-Length: 22594
>> < Cache-Control: private
>> < Pragma: no-cache
>> < Connection: close
>> < Content-Type: text/html
>>
>> There's no 302 from here.
>>
>> Are you trying to crawl through a proxy?  If so, that might be where
>> the problem lies.
>>
>> Karl
>>
>> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>>> It sounds like the httpclient upgrade definitely broke something.  We
>>> should open a ticket.
>>>
>>> But first, can you confirm what connector this is?  Is it the web
>>> connector?  If so, I am puzzled because the web connector has always
>>> logged any 302 return, but then queued a second document which it
>>> subsequently fetches.
>>>
>>> Karl
>>>
>>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>>> <sh...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>>>
>>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>>>
>>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>>>
>>>> Thanks in advance,
>>>> Shinichiro Abe
>

Re: Http status code 302

Posted by Shinichiro Abe <sh...@gmail.com>.
I'm using web connector.

> Are you trying to crawl through a proxy?
No. I just set seeds that url without a proxy.
(Also I didn't obey robots.txt)

Using curl, it is the same as your result. 

Could you reproduce that?

Shinichiro

On 2013/01/09, at 17:49, Karl Wright wrote:

> When I try the URL you gave using curl and no special arguments, I get this:
> 
> 
> C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
> * About to connect() to lucene.jugem.jp port 80 (#0)
> *   Trying 210.172.160.170... connected
> * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
>> GET /?eid=39 HTTP/1.1
>> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
> .5 librtmp/2.3
>> Host: lucene.jugem.jp
>> Accept: */*
>> 
> < HTTP/1.1 200 OK
> < Date: Wed, 09 Jan 2013 08:47:52 GMT
> < Server: Apache/2.0.59 (Unix)
> < Vary: User-Agent,Host,Accept-Encoding
> < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
> < Accept-Ranges: bytes
> < Content-Length: 22594
> < Cache-Control: private
> < Pragma: no-cache
> < Connection: close
> < Content-Type: text/html
> 
> There's no 302 from here.
> 
> Are you trying to crawl through a proxy?  If so, that might be where
> the problem lies.
> 
> Karl
> 
> On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
>> It sounds like the httpclient upgrade definitely broke something.  We
>> should open a ticket.
>> 
>> But first, can you confirm what connector this is?  Is it the web
>> connector?  If so, I am puzzled because the web connector has always
>> logged any 302 return, but then queued a second document which it
>> subsequently fetches.
>> 
>> Karl
>> 
>> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
>> <sh...@gmail.com> wrote:
>>> Hi,
>>> 
>>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>> 
>>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>> 
>>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>> 
>>> Thanks in advance,
>>> Shinichiro Abe


Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
When I try the URL you gave using curl and no special arguments, I get this:


C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39"
* About to connect() to lucene.jugem.jp port 80 (#0)
*   Trying 210.172.160.170... connected
* Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0)
> GET /?eid=39 HTTP/1.1
> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2
.5 librtmp/2.3
> Host: lucene.jugem.jp
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 09 Jan 2013 08:47:52 GMT
< Server: Apache/2.0.59 (Unix)
< Vary: User-Agent,Host,Accept-Encoding
< Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT
< Accept-Ranges: bytes
< Content-Length: 22594
< Cache-Control: private
< Pragma: no-cache
< Connection: close
< Content-Type: text/html

There's no 302 from here.

Are you trying to crawl through a proxy?  If so, that might be where
the problem lies.

Karl

On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <da...@gmail.com> wrote:
> It sounds like the httpclient upgrade definitely broke something.  We
> should open a ticket.
>
> But first, can you confirm what connector this is?  Is it the web
> connector?  If so, I am puzzled because the web connector has always
> logged any 302 return, but then queued a second document which it
> subsequently fetches.
>
> Karl
>
> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> Hi,
>>
>> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
>> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>>
>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>>
>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>>
>> Thanks in advance,
>> Shinichiro Abe

Re: Http status code 302

Posted by Karl Wright <da...@gmail.com>.
It sounds like the httpclient upgrade definitely broke something.  We
should open a ticket.

But first, can you confirm what connector this is?  Is it the web
connector?  If so, I am puzzled because the web connector has always
logged any 302 return, but then queued a second document which it
subsequently fetches.

Karl

On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Hi,
>
> I'm using trunk code and crawling web site with seeds which have http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt).
> As I'm look at Simple History, it shows 302 result code at fetch activity and doesn't ingest document.
>
> When I used MCF 1.0.1 in the same situation, Simple History showed 200 result code and MCF could ingest documents.
>
> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient?
>
> Thanks in advance,
> Shinichiro Abe