You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruce Campbell <br...@peopletec.com> on 2020/11/05 15:27:30 UTC

Trouble with post.jar

I'm just getting my feet wet with Rolr. I am having trouble with posting a web crawl. I get the following:

C:\Users\bruce.campbell\Downloads\solr-8.6.3\solr-8.6.3\example\exampledocs>  java -Ddata=web -Dc=solr -jar post.jar http://www.lucene.apache.org/
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/solr/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: IOException when reading page from url http://www.lucene.apache.org: www.lucene.apache.org
SimplePostTool: WARNING: The URL http://www.lucene.apache.org returned a HTTP result status of 404
0 web pages indexed.
COMMITting Solr index changes to http://localhost:8983/solr/solr/update/extract...
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solr/update/extract?commit=true
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/solr/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>


Thanks you in advance.

Re: Trouble with post.jar

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Are you sure you have the request handler for /update/extract defined
in your solrconfig.xml?
Not all the update request handlers are defined explicitly (you can
check with Config API - /solr/hadoopDocs/config/requestHandler), but I
am 99% sure that the /update/extract would be explicit because it
needs Tika, which means a library statement to load the jar as well.

The latest Solr does not have this handler in the default
configuration, so if you bootstrapped from that, this is the most
likely cause. But the non-default techproducts one does. So, you could
copy the lib directive (contrib/extraction) and the request handler
(/update/extract) to your config's solrconfig.xml and - after
restarting the core - it may work.

Regards,
   Alex.
P.s. The relevant solrconfig.xml is in
solr-8.6.1/server/solr/configsets/sample_techproducts_configs/conf ,
but make sure to not modify things anywhere in that path, just copy
from it.

On Thu, 5 Nov 2020 at 11:40, Bruce Campbell
<br...@peopletec.com> wrote:
>
> Thanks for your reply. I am using the Solr (or lucene) web site as a test site so my collection name is "solr". I think the first solr is part of the part of the url that the solr application uses while the second one is the name of the collection. Here is the same message when I tried to use a collection called hadoopDocs:
>
> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/hadoopDocs/update/extract?commit=true
>
> If I am wrong, please correct me.
>
> Thanks again for your reply,
> Bruce
> -----Original Message-----
> From: Vincenzo D'Amore <v....@gmail.com>
> Sent: Thursday, November 5, 2020 9:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Trouble with post.jar
>
> I see there are two solr in the url path, very likely you typed the wrong Solr host parameter
>
> http://localhost:8983/solr/solr/update/extract?commit=true
>
> Ciao,
> Vincenzo
>
> --
> mobile: 3498513251
> skype: free.dev
>
> > On 5 Nov 2020, at 16:27, Bruce Campbell <br...@peopletec.com> wrote:
> >
> > http://localhost:8983/solr/solr/update/extract?commit=true

RE: Trouble with post.jar

Posted by Bruce Campbell <br...@peopletec.com>.
Thanks for your reply. I am using the Solr (or lucene) web site as a test site so my collection name is "solr". I think the first solr is part of the part of the url that the solr application uses while the second one is the name of the collection. Here is the same message when I tried to use a collection called hadoopDocs:

SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/hadoopDocs/update/extract?commit=true

If I am wrong, please correct me.  

Thanks again for your reply, 
Bruce
-----Original Message-----
From: Vincenzo D'Amore <v....@gmail.com> 
Sent: Thursday, November 5, 2020 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Trouble with post.jar

I see there are two solr in the url path, very likely you typed the wrong Solr host parameter

http://localhost:8983/solr/solr/update/extract?commit=true

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 5 Nov 2020, at 16:27, Bruce Campbell <br...@peopletec.com> wrote:
> 
> http://localhost:8983/solr/solr/update/extract?commit=true

Re: Trouble with post.jar

Posted by Vincenzo D'Amore <v....@gmail.com>.
I see there are two solr in the url path, very likely you typed the wrong Solr host parameter

http://localhost:8983/solr/solr/update/extract?commit=true

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 5 Nov 2020, at 16:27, Bruce Campbell <br...@peopletec.com> wrote:
> 
> http://localhost:8983/solr/solr/update/extract?commit=true