You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Erlend Garåsen <e....@usit.uio.no> on 2018/12/12 16:12:20 UTC

Unexpected job status encountered

Hello list,

I have upgraded from MCF 1.7.1 to the latest 2.11 release. Importing the
old configuration data to my fresh new version did not succeed
completely, but the Solr and the web connectors were installed. So I had
to recreate the jobs manually.

When I start up the jobs, all of them quits with the following error:
"Unexpected job status encountered".

This is what I can see in my log after a job restart:
cat /var/log/mcf/manifoldcf.log | grep 1544625651939

--8<--
DEBUG 2018-12-12T16:50:04,586 (Startup thread) - Done adding initial
seed documents for job 1544625651939.
 INFO 2018-12-12T16:50:04,589 (Startup thread) - Aborting job
1544625651939 due to error 'Unexpected job status encountered: 23'
 INFO 2018-12-12T16:50:04,599 (Startup thread) - Job 1544625651939 abort
signal successfully sent
 INFO 2018-12-12T16:50:13,410 (Job reset thread) - Stopped job 1544625651939
 INFO 2018-12-12T16:50:22,604 (Job notification thread) - Found job
1544625651939 in need of notification
 INFO 2018-12-12T16:50:24,995 (Job notification thread) - Found job
1544625651939 in need of notification
DEBUG 2018-12-12T16:50:34,003 (Job start thread) - Checking if job
1544625651939 needs to be started; it was lahe new versiost checked at
1544629813391, and now it is 1544629833992
--8<--

Any suggestions how I can debug this further?
BTW, I deleted the old MCF tables before I reinstalled the version 2.11
and executed the following in order to recreate the tables:

org.apache.manifoldcf.agents.Install
org.apache.manifoldcf.agents.Register
org.apache.manifoldcf.crawler.system.CrawlerAgent
org.apache.manifoldcf.crawler.Register
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector
WebCrawler
org.apache.manifoldcf.agents.RegisterOutput
org.apache.manifoldcf.agents.output.solr.SolrConnector SolrConnector

Environment
uname -a:
3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64
x86_64 GNU/Linux
Tomcat: 7.0.76
PostgreSQL: 11.1

Erlend

Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
Please do create a ticket with a patch.  I'm extremely curious.

Depending on what you're proposing, I think a valid approach might need to
be to propose appropriate changes to the HttpComponents/HttpClient library.

Karl


On Thu, Jan 3, 2019 at 7:52 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

>
> It works now because I have implemented preemptive authentication. I'll
> create a ticket, because this is something I think we should support.
>
> I have analyzed the logs once again. MCF never tries to authenticate.
> Well, it tries, but it cannot repeat the request entity. That's why I
> mentioned that preemptive authentication could be a solution. Then we
> only need to post to Solr once, not doing the unnecessary two-step
> authentication process by:
> 1. Try to post
> 2. Solr server sends a 401 response
> 3. Try to post once again using the header: "Authorization: Basic ******"
>
> It's not very effective if you have to post, say, 100,000 documents.
>
> This is actually what happens:
> 1. http-outgoing-200 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
> 2. http-outgoing-200 << "HTTP/1.1 401 Unauthorized[\r][\n]"
> 3. IO exception during indexing
> https://www.journals.uio.no/index.php/bioimpedance/article/view/3350: null
> org.apache.http.client.ClientProtocolException
> (Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.)
>
> By using preemptive authentication, the following is now being sent to
> Solr in the first request:
> http-outgoing-30 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
> http-outgoing-30 >> "Authorization: Basic **************[\r][\n]"
>
> Preemptive authentication is also suggested as a solution to other
> developers facing the same problem:
>
> https://developer.ibm.com/answers/questions/266117/im-getting-this-exception-trying-to-add-doc-to-wat/
>
> I can create a patch or PR. It's very easy to implement, and we have
> done it for all the other Solr connectors we have developed.
>
> Erlend
>

Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
It works now because I have implemented preemptive authentication. I'll
create a ticket, because this is something I think we should support.

I have analyzed the logs once again. MCF never tries to authenticate.
Well, it tries, but it cannot repeat the request entity. That's why I
mentioned that preemptive authentication could be a solution. Then we
only need to post to Solr once, not doing the unnecessary two-step
authentication process by:
1. Try to post
2. Solr server sends a 401 response
3. Try to post once again using the header: "Authorization: Basic ******"

It's not very effective if you have to post, say, 100,000 documents.

This is actually what happens:
1. http-outgoing-200 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
2. http-outgoing-200 << "HTTP/1.1 401 Unauthorized[\r][\n]"
3. IO exception during indexing
https://www.journals.uio.no/index.php/bioimpedance/article/view/3350: null
org.apache.http.client.ClientProtocolException
(Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.)

By using preemptive authentication, the following is now being sent to
Solr in the first request:
http-outgoing-30 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
http-outgoing-30 >> "Authorization: Basic **************[\r][\n]"

Preemptive authentication is also suggested as a solution to other
developers facing the same problem:
https://developer.ibm.com/answers/questions/266117/im-getting-this-exception-trying-to-add-doc-to-wat/

I can create a patch or PR. It's very easy to implement, and we have
done it for all the other Solr connectors we have developed.

Erlend

Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
Thanks for looking harder into this!

The credential encoding in httpcomponents/httpclient has been problem free
as far as I have seen, so if you determine that that's the issue I am sure
it will be news to a lot of people.  But by using the wire logging you
should be able to see the headers, including the encoded credentials, and
compare/contrast what's working and what's not pretty easily.

Karl


On Thu, Dec 27, 2018 at 5:42 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

>
> It wasn't necessary to deal with tools like tcpdump etc. Adding the
> following to the logging.xml did the trick:
> <Logger name="org.apache.http.wire" level="debug" additivity="false">
>   <AppenderRef ref="MyFile" />
> </Logger>
>
> So now I know what's going on. Bad credentials:
>
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "HTTP/1.1 401 Unauthorized[\r][\n]"
>
> Strange, because connection is working according to the Solr Output
> Connector. I'll double-check whether the Solr server has another
> password for index writing (path "/solr/uio/update/extract"). Or maybe
> we have an encoding issue with the password since it's long and contains
> special characters.
>
> --8<--
>
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> "</div><!-- container --> [\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> "</body> [\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> "</html>[\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "2f[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "******************[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "0[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "HTTP/1.1 401 Unauthorized[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "Date:
> Thu, 27 Dec 2018 10:18:41 GMT[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Server: Apache/2.4.6 (Red Hat Enterprise Linux)
> OpenSSL/1.0.2k-fips[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "WWW-Authenticate: Basic realm="Solr"[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Content-Length: 381[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Keep-Alive: timeout=10, max=100[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Connection: Keep-Alive[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "<html><head>[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "<title>401 Unauthorized</title>[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "</head><body>[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "<h1>Unauthorized</h1>[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "<p>This server could not verify that you[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "are
> authorized to access the document[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "requested.  Either you supplied the wrong[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "credentials (e.g., bad password), or your[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "browser doesn't understand how to supply[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "the
> credentials required.</p>[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "</body></html>[\n]"
>  WARN 2018-12-27T11:18:41,599 (Worker thread '48') - IO exception during
> indexing https://www.journals.uio.no/index.php/Dialogia: null
> org.apache.http.client.ClientProtocolException
>
> Erlend
>
> On 21/12/2018 16:39, Karl Wright wrote:
> > I'll have a look as time permits, but it won't be for a couple of weeks.
> > Usually it's best to debug with http wire debugging, not packet
> > captures.  I'm not an HTTP expert so it may be better to bring this up
> > to the HttpComponents/HttpClient list, not ManifoldCF.
> >
> > Karl
> >
> >
> > On Fri, Dec 21, 2018 at 8:54 AM Erlend Garåsen <e.f.garasen@usit.uio.no
> > <ma...@usit.uio.no>> wrote:
> >
> >
> >     I tried to configure preemptive authentication by modifying the
> >     HttpPoster class, but I still get the same errors. Then I ran the
> >     following command in order to analyze the traffic:
> >     tcpdump -w /tmp/chatter.dmp -s 0 -i ens192 -X host
> >     solr-test01.uio.no <http://solr-test01.uio.no>
> >
> >     I'm not an expert in reading such outputs, but I can send you the
> dump
> >     file, Karl. I don't want to let the file be available for others,
> even
> >     though it's from our test environment.
> >
> >     Erlend
> >
> >     On 13/12/2018 15:09, Karl Wright wrote:
> >     > Ok, thanks, I misunderstood where the SSL error was coming from.
> The
> >     > Solr connection is what is complaining.  Do you see "Connection
> >     working"
> >     > for your output connection?  Please forgive me if you already
> answered
> >     > this; I didn't note it in your response.  If you see that, then the
> >     > connector was able to talk to your Solr "ping" handler, which
> >     would mean
> >     > that the SSL configuration is right but there's something else
> >     about the
> >     > connection that we would have to figure out and deal with.
> >     >
> >     > Karl
> >     >
> >     >
> >     > On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen
> >     <e.f.garasen@usit.uio.no <ma...@usit.uio.no>
> >     > <mailto:e.f.garasen@usit.uio.no <ma...@usit.uio.no>>>
> >     wrote:
> >     >
> >     >     On 13/12/2018 14:26, Karl Wright wrote:
> >     >     > This is SSL.  Did you add the server's cert to the web
> >     connector's
> >     >     > keystore?  Or, if not, add a "trust all" rule?
> >     >
> >     >     Thanks for the reply, Karl.
> >     >
> >     >     Yes, the root certificate was added both in the repository
> >     connection
> >     >     and for the Solr connector.
> >     >
> >     >     I checked the "trust all" option and restarted the job. Same
> >     problem.
> >     >
> >     >     BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
> >     >
> >     >     I have attached a screenshot of what I can see in the simple
> >     history
> >     >     report.
> >     >
> >     >     Erlend
> >     >
> >
>
>

Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
It wasn't necessary to deal with tools like tcpdump etc. Adding the
following to the logging.xml did the trick:
<Logger name="org.apache.http.wire" level="debug" additivity="false">
  <AppenderRef ref="MyFile" />
</Logger>

So now I know what's going on. Bad credentials:

DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"HTTP/1.1 401 Unauthorized[\r][\n]"

Strange, because connection is working according to the Solr Output
Connector. I'll double-check whether the Solr server has another
password for index writing (path "/solr/uio/update/extract"). Or maybe
we have an encoding issue with the password since it's long and contains
special characters.

--8<--

DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
"</div><!-- container --> [\n]"
DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
"</body> [\n]"
DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
"</html>[\n]"
DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\n]"
DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
"2f[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
"******************[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "0[\r][\n]"
DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"HTTP/1.1 401 Unauthorized[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "Date:
Thu, 27 Dec 2018 10:18:41 GMT[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"Server: Apache/2.4.6 (Red Hat Enterprise Linux)
OpenSSL/1.0.2k-fips[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"WWW-Authenticate: Basic realm="Solr"[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"Content-Length: 381[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"Keep-Alive: timeout=10, max=100[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"Connection: Keep-Alive[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
"Content-Type: text/html; charset=iso-8859-1[\r][\n]"
DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "[\r][\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"<html><head>[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"<title>401 Unauthorized</title>[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"</head><body>[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"<h1>Unauthorized</h1>[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"<p>This server could not verify that you[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "are
authorized to access the document[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"requested.  Either you supplied the wrong[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"credentials (e.g., bad password), or your[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"browser doesn't understand how to supply[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "the
credentials required.</p>[\n]"
DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
"</body></html>[\n]"
 WARN 2018-12-27T11:18:41,599 (Worker thread '48') - IO exception during
indexing https://www.journals.uio.no/index.php/Dialogia: null
org.apache.http.client.ClientProtocolException

Erlend

On 21/12/2018 16:39, Karl Wright wrote:
> I'll have a look as time permits, but it won't be for a couple of weeks.
> Usually it's best to debug with http wire debugging, not packet
> captures.  I'm not an HTTP expert so it may be better to bring this up
> to the HttpComponents/HttpClient list, not ManifoldCF.
> 
> Karl
> 
> 
> On Fri, Dec 21, 2018 at 8:54 AM Erlend Garåsen <e.f.garasen@usit.uio.no
> <ma...@usit.uio.no>> wrote:
> 
> 
>     I tried to configure preemptive authentication by modifying the
>     HttpPoster class, but I still get the same errors. Then I ran the
>     following command in order to analyze the traffic:
>     tcpdump -w /tmp/chatter.dmp -s 0 -i ens192 -X host
>     solr-test01.uio.no <http://solr-test01.uio.no>
> 
>     I'm not an expert in reading such outputs, but I can send you the dump
>     file, Karl. I don't want to let the file be available for others, even
>     though it's from our test environment.
> 
>     Erlend
> 
>     On 13/12/2018 15:09, Karl Wright wrote:
>     > Ok, thanks, I misunderstood where the SSL error was coming from.  The
>     > Solr connection is what is complaining.  Do you see "Connection
>     working"
>     > for your output connection?  Please forgive me if you already answered
>     > this; I didn't note it in your response.  If you see that, then the
>     > connector was able to talk to your Solr "ping" handler, which
>     would mean
>     > that the SSL configuration is right but there's something else
>     about the
>     > connection that we would have to figure out and deal with.
>     >
>     > Karl
>     >
>     >
>     > On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen
>     <e.f.garasen@usit.uio.no <ma...@usit.uio.no>
>     > <mailto:e.f.garasen@usit.uio.no <ma...@usit.uio.no>>>
>     wrote:
>     >
>     >     On 13/12/2018 14:26, Karl Wright wrote:
>     >     > This is SSL.  Did you add the server's cert to the web
>     connector's
>     >     > keystore?  Or, if not, add a "trust all" rule?
>     >
>     >     Thanks for the reply, Karl.
>     >
>     >     Yes, the root certificate was added both in the repository
>     connection
>     >     and for the Solr connector.
>     >
>     >     I checked the "trust all" option and restarted the job. Same
>     problem.
>     >
>     >     BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
>     >
>     >     I have attached a screenshot of what I can see in the simple
>     history
>     >     report.
>     >
>     >     Erlend
>     >
> 


Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
I'll have a look as time permits, but it won't be for a couple of weeks.
Usually it's best to debug with http wire debugging, not packet captures.
I'm not an HTTP expert so it may be better to bring this up to the
HttpComponents/HttpClient list, not ManifoldCF.

Karl


On Fri, Dec 21, 2018 at 8:54 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

>
> I tried to configure preemptive authentication by modifying the
> HttpPoster class, but I still get the same errors. Then I ran the
> following command in order to analyze the traffic:
> tcpdump -w /tmp/chatter.dmp -s 0 -i ens192 -X host solr-test01.uio.no
>
> I'm not an expert in reading such outputs, but I can send you the dump
> file, Karl. I don't want to let the file be available for others, even
> though it's from our test environment.
>
> Erlend
>
> On 13/12/2018 15:09, Karl Wright wrote:
> > Ok, thanks, I misunderstood where the SSL error was coming from.  The
> > Solr connection is what is complaining.  Do you see "Connection working"
> > for your output connection?  Please forgive me if you already answered
> > this; I didn't note it in your response.  If you see that, then the
> > connector was able to talk to your Solr "ping" handler, which would mean
> > that the SSL configuration is right but there's something else about the
> > connection that we would have to figure out and deal with.
> >
> > Karl
> >
> >
> > On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen <e.f.garasen@usit.uio.no
> > <ma...@usit.uio.no>> wrote:
> >
> >     On 13/12/2018 14:26, Karl Wright wrote:
> >     > This is SSL.  Did you add the server's cert to the web connector's
> >     > keystore?  Or, if not, add a "trust all" rule?
> >
> >     Thanks for the reply, Karl.
> >
> >     Yes, the root certificate was added both in the repository connection
> >     and for the Solr connector.
> >
> >     I checked the "trust all" option and restarted the job. Same problem.
> >
> >     BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
> >
> >     I have attached a screenshot of what I can see in the simple history
> >     report.
> >
> >     Erlend
> >
>
>

Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
I tried to configure preemptive authentication by modifying the
HttpPoster class, but I still get the same errors. Then I ran the
following command in order to analyze the traffic:
tcpdump -w /tmp/chatter.dmp -s 0 -i ens192 -X host solr-test01.uio.no

I'm not an expert in reading such outputs, but I can send you the dump
file, Karl. I don't want to let the file be available for others, even
though it's from our test environment.

Erlend

On 13/12/2018 15:09, Karl Wright wrote:
> Ok, thanks, I misunderstood where the SSL error was coming from.  The
> Solr connection is what is complaining.  Do you see "Connection working"
> for your output connection?  Please forgive me if you already answered
> this; I didn't note it in your response.  If you see that, then the
> connector was able to talk to your Solr "ping" handler, which would mean
> that the SSL configuration is right but there's something else about the
> connection that we would have to figure out and deal with.
> 
> Karl
> 
> 
> On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen <e.f.garasen@usit.uio.no
> <ma...@usit.uio.no>> wrote:
> 
>     On 13/12/2018 14:26, Karl Wright wrote:
>     > This is SSL.  Did you add the server's cert to the web connector's
>     > keystore?  Or, if not, add a "trust all" rule?
> 
>     Thanks for the reply, Karl.
> 
>     Yes, the root certificate was added both in the repository connection
>     and for the Solr connector.
> 
>     I checked the "trust all" option and restarted the job. Same problem.
> 
>     BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
> 
>     I have attached a screenshot of what I can see in the simple history
>     report.
> 
>     Erlend
> 


Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
Thanks,

Yes, the connection is working — according to the Solr Output Connection
Status.

Maybe this is related to HttpClient, InputStream and non-repeatable
request entities. I had similar problems in 2013:
https://issues.apache.org/jira/browse/CONNECTORS-661

But I'm not sure since earlier versions of MCF works.

Erlend

On 13/12/2018 15:09, Karl Wright wrote:
> Ok, thanks, I misunderstood where the SSL error was coming from.  The
> Solr connection is what is complaining.  Do you see "Connection working"
> for your output connection?  Please forgive me if you already answered
> this; I didn't note it in your response.  If you see that, then the
> connector was able to talk to your Solr "ping" handler, which would mean
> that the SSL configuration is right but there's something else about the
> connection that we would have to figure out and deal with.
> 
> Karl
> 
> 
> On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen <e.f.garasen@usit.uio.no
> <ma...@usit.uio.no>> wrote:
> 
>     On 13/12/2018 14:26, Karl Wright wrote:
>     > This is SSL.  Did you add the server's cert to the web connector's
>     > keystore?  Or, if not, add a "trust all" rule?
> 
>     Thanks for the reply, Karl.
> 
>     Yes, the root certificate was added both in the repository connection
>     and for the Solr connector.
> 
>     I checked the "trust all" option and restarted the job. Same problem.
> 
>     BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
> 
>     I have attached a screenshot of what I can see in the simple history
>     report.
> 
>     Erlend
> 


Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
Ok, thanks, I misunderstood where the SSL error was coming from.  The Solr
connection is what is complaining.  Do you see "Connection working" for
your output connection?  Please forgive me if you already answered this; I
didn't note it in your response.  If you see that, then the connector was
able to talk to your Solr "ping" handler, which would mean that the SSL
configuration is right but there's something else about the connection that
we would have to figure out and deal with.

Karl


On Thu, Dec 13, 2018 at 8:58 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

> On 13/12/2018 14:26, Karl Wright wrote:
> > This is SSL.  Did you add the server's cert to the web connector's
> > keystore?  Or, if not, add a "trust all" rule?
>
> Thanks for the reply, Karl.
>
> Yes, the root certificate was added both in the repository connection
> and for the Solr connector.
>
> I checked the "trust all" option and restarted the job. Same problem.
>
> BTW, I get a lot of 200s, so MCF is able to fetch the URLs.
>
> I have attached a screenshot of what I can see in the simple history
> report.
>
> Erlend
>

Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
On 13/12/2018 14:26, Karl Wright wrote:
> This is SSL.  Did you add the server's cert to the web connector's
> keystore?  Or, if not, add a "trust all" rule?

Thanks for the reply, Karl.

Yes, the root certificate was added both in the repository connection
and for the Solr connector.

I checked the "trust all" option and restarted the job. Same problem.

BTW, I get a lot of 200s, so MCF is able to fetch the URLs.

I have attached a screenshot of what I can see in the simple history report.

Erlend

Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
This is SSL.  Did you add the server's cert to the web connector's
keystore?  Or, if not, add a "trust all" rule?

Karl


On Thu, Dec 13, 2018 at 8:08 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

>
> I'm able to run the jobs now, but I think the problem was related to
> Tomcat. The Solr and Web connectors were still available/showing up in
> the web interface (MCF) *EVEN THOUGH* I deleted all tables in pgAdmin.
> They disappeared after I restarted Tomcat.
>
> Now I'm getting an error I reported back in 2013:
> IOException occured when talking to server at:
> https://solr-test03.uio.no:443/solr/uio: null
>
> The Solr Output Connector tells me that the connection is working. I
> have double-checked that the realm, username and password are correct.
> And I'm able to connect to the Solr server by using Curl. Notice the
> "null" at the end of the url/error. It seems to be related to HttpClient
> (NonRepeatableRequestException).
>
> I'm afraid that this is related to something similar described in the
> following issue:
> https://issues.apache.org/jira/browse/CONNECTORS-661
>
> But this time I get only one HTTP response:
> # curl -X POST -H "Content-Type: text/xml" -H "Expect: 100-continue"
> --data-binary "<xml/>" -k -i
> https://solr-test03.uio.no:443/solr/uio/update
> HTTP/1.1 <https://solr-test03.uio.no:443/solr/uio/updateHTTP/1.1> 401
> Unauthorized
>
> This is what I can see in the logs:
> WARN 2018-12-13T13:20:40,977 (Worker thread '21') - IO exception during
> indexing
> https://www.journals.uio.no/index.php/bioimpedance/article/view/4443: null
> org.apache.http.client.ClientProtocolException
>         at
>
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
>
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
> ~[?:?]
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> ~[?:?]
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> ~[?:?]
>         at
>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:968)
> ~[?:?]
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.
>         at
>
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:226)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> ~[httpclient-4.5.6.jar:4.5.6]
>         at
>
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
>         ... 8 more
>
> Erlend
>
> On 12/12/2018 20:07, Erlend Garåsen wrote:
> > On 12/12/2018 18:12, Karl Wright wrote:
> >> Did you import any data directly into new tables?
> >
> > I used the ImportConfiguration command class and noticed some errors,
> > but it seems that the Solr and Web connectors have been successfully
> > imported.
> >
> >> The schema has changed significantly from 1.7 until now.  I doubt very
> >> much you could get away with an import of the old table data, and that
> >> could well cause the effect you're seeing.
> >
> > Then I will delete all the tables once again, reimport the tables and
> > add all the data manually. I can try to do that tomorrow and try again.
> :)
> >
> > Erlend
> >
>
>

Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
I'm able to run the jobs now, but I think the problem was related to
Tomcat. The Solr and Web connectors were still available/showing up in
the web interface (MCF) *EVEN THOUGH* I deleted all tables in pgAdmin.
They disappeared after I restarted Tomcat.

Now I'm getting an error I reported back in 2013:
IOException occured when talking to server at:
https://solr-test03.uio.no:443/solr/uio: null

The Solr Output Connector tells me that the connection is working. I
have double-checked that the realm, username and password are correct.
And I'm able to connect to the Solr server by using Curl. Notice the
"null" at the end of the url/error. It seems to be related to HttpClient
(NonRepeatableRequestException).

I'm afraid that this is related to something similar described in the
following issue:
https://issues.apache.org/jira/browse/CONNECTORS-661

But this time I get only one HTTP response:
# curl -X POST -H "Content-Type: text/xml" -H "Expect: 100-continue"
--data-binary "<xml/>" -k -i https://solr-test03.uio.no:443/solr/uio/update
HTTP/1.1 401 Unauthorized

This is what I can see in the logs:
WARN 2018-12-13T13:20:40,977 (Worker thread '21') - IO exception during
indexing
https://www.journals.uio.no/index.php/bioimpedance/article/view/4443: null
org.apache.http.client.ClientProtocolException
	at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
~[?:?]
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[?:?]
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[?:?]
	at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
~[?:?]
	at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
~[?:?]
	at
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:968)
~[?:?]
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.
	at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:226)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
~[httpclient-4.5.6.jar:4.5.6]
	at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
~[httpclient-4.5.6.jar:4.5.6]
	... 8 more

Erlend

On 12/12/2018 20:07, Erlend Garåsen wrote:
> On 12/12/2018 18:12, Karl Wright wrote:
>> Did you import any data directly into new tables?
> 
> I used the ImportConfiguration command class and noticed some errors,
> but it seems that the Solr and Web connectors have been successfully
> imported.
> 
>> The schema has changed significantly from 1.7 until now.  I doubt very
>> much you could get away with an import of the old table data, and that
>> could well cause the effect you're seeing.
> 
> Then I will delete all the tables once again, reimport the tables and
> add all the data manually. I can try to do that tomorrow and try again. :)
> 
> Erlend
> 


Re: Unexpected job status encountered

Posted by Erlend Garåsen <e....@usit.uio.no>.
On 12/12/2018 18:12, Karl Wright wrote:
> Did you import any data directly into new tables?

I used the ImportConfiguration command class and noticed some errors,
but it seems that the Solr and Web connectors have been successfully
imported.

> The schema has changed significantly from 1.7 until now.  I doubt very
> much you could get away with an import of the old table data, and that
> could well cause the effect you're seeing.

Then I will delete all the tables once again, reimport the tables and
add all the data manually. I can try to do that tomorrow and try again. :)

Erlend

Re: Unexpected job status encountered

Posted by Karl Wright <da...@gmail.com>.
Did you import any data directly into new tables?
The schema has changed significantly from 1.7 until now.  I doubt very much
you could get away with an import of the old table data, and that could
well cause the effect you're seeing.

Karl


On Wed, Dec 12, 2018 at 11:12 AM Erlend Garåsen <e....@usit.uio.no>
wrote:

> Hello list,
>
> I have upgraded from MCF 1.7.1 to the latest 2.11 release. Importing the
> old configuration data to my fresh new version did not succeed
> completely, but the Solr and the web connectors were installed. So I had
> to recreate the jobs manually.
>
> When I start up the jobs, all of them quits with the following error:
> "Unexpected job status encountered".
>
> This is what I can see in my log after a job restart:
> cat /var/log/mcf/manifoldcf.log | grep 1544625651939
>
> --8<--
> DEBUG 2018-12-12T16:50:04,586 (Startup thread) - Done adding initial
> seed documents for job 1544625651939.
>  INFO 2018-12-12T16:50:04,589 (Startup thread) - Aborting job
> 1544625651939 due to error 'Unexpected job status encountered: 23'
>  INFO 2018-12-12T16:50:04,599 (Startup thread) - Job 1544625651939 abort
> signal successfully sent
>  INFO 2018-12-12T16:50:13,410 (Job reset thread) - Stopped job
> 1544625651939
>  INFO 2018-12-12T16:50:22,604 (Job notification thread) - Found job
> 1544625651939 in need of notification
>  INFO 2018-12-12T16:50:24,995 (Job notification thread) - Found job
> 1544625651939 in need of notification
> DEBUG 2018-12-12T16:50:34,003 (Job start thread) - Checking if job
> 1544625651939 needs to be started; it was lahe new versiost checked at
> 1544629813391, and now it is 1544629833992
> --8<--
>
> Any suggestions how I can debug this further?
> BTW, I deleted the old MCF tables before I reinstalled the version 2.11
> and executed the following in order to recreate the tables:
>
> org.apache.manifoldcf.agents.Install
> org.apache.manifoldcf.agents.Register
> org.apache.manifoldcf.crawler.system.CrawlerAgent
> org.apache.manifoldcf.crawler.Register
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector
> WebCrawler
> org.apache.manifoldcf.agents.RegisterOutput
> org.apache.manifoldcf.agents.output.solr.SolrConnector SolrConnector
>
> Environment
> uname -a:
> 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64
> x86_64 GNU/Linux
> Tomcat: 7.0.76
> PostgreSQL: 11.1
>
> Erlend
>