You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by jorge hernandez <jo...@carousel.nyc> on 2022/07/21 19:12:38 UTC

RE: Help with new install

Hello,

I just downloaded solr, copied the config sets in _default to my new core,
copied post.jar to where I have the files I want to index, created the new
core using the web GUI, everything seems right, but when I ran:

Java -Dauto -Dc=mynewcore -jar post.jar *.html

It keeps saying:

SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cpath_of_the_files>

I’m new at using solr, so I’m pretty sure I missed something, can anybody
tell me what I missed?

Thanks.

Re: Help with new install

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/21/22 13:12, jorge hernandez wrote:
> SimplePostTool: WARNING: IOException while reading response:
> java.io.FileNotFoundException:
> http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cpath_of_the_files>

The problem here is that the _default configset does NOT create the 
/update/extract handler, which you need to extract data from document 
types like html, word, PDF, etc.

This feature requires loading additional jars, because the feature (also 
called SolrCell) is not included in the webapp.  It is in the download 
as a module.

Note that the following document is for Solr 9.0 ... earlier versions 
will be slightly different.

https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html

One final note ... we STRONGLY recommend not using SolrCell in 
production.  Tika can be unstable -- some documents can cause it to 
consume huge amounts of memory, and even crash.  If Tika is running 
inside Solr when that happens, then Solr itself will suffer the 
effects.  Instead, you should run Tika in a separate process with crash 
handling, so that Solr remains operational if there is a problem with 
extraction.

Thanks,
Shawn


RE: Help with new install

Posted by jorge hernandez <jo...@carousel.nyc>.
I triple checked it and the name of the core is correct, what I don't
understand is why is it looking for the files in the core's folder when the
files are somewhere else? The post.jar command was run from the folder with
the files.


-----Original Message-----
From: Eric Pugh <ep...@opensourceconnections.com>
Sent: Thursday, July 21, 2022 3:43 PM
To: users@solr.apache.org
Subject: Re: Help with new install

Looks like your core name is wrong in your command, at least, what is coming
back in the message...

> On Jul 21, 2022, at 3:12 PM, jorge hernandez <jo...@carousel.nyc> wrote:
>
> Hello,
>
> I just downloaded solr, copied the config sets in _default to my new
> core, copied post.jar to where I have the files I want to index,
> created the new core using the web GUI, everything seems right, but when I
> ran:
>
> Java -Dauto -Dc=mynewcore -jar post.jar *.html
>
> It keeps saying:
>
> SimplePostTool: WARNING: IOException while reading response:
> java.io.FileNotFoundException:
> http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cp
> ath_of_the_files>
>
> I’m new at using solr, so I’m pretty sure I missed something, can
> anybody tell me what I missed?
>
> Thanks.

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com <http://www.opensourceconnections.com/>
| My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>This
e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless of
whether attachments are marked as such.

Re: Help with new install

Posted by Eric Pugh <ep...@opensourceconnections.com>.
Looks like your core name is wrong in your command, at least, what is coming back in the message...

> On Jul 21, 2022, at 3:12 PM, jorge hernandez <jo...@carousel.nyc> wrote:
> 
> Hello,
> 
> I just downloaded solr, copied the config sets in _default to my new core,
> copied post.jar to where I have the files I want to index, created the new
> core using the web GUI, everything seems right, but when I ran:
> 
> Java -Dauto -Dc=mynewcore -jar post.jar *.html
> 
> It keeps saying:
> 
> SimplePostTool: WARNING: IOException while reading response:
> java.io.FileNotFoundException:
> http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cpath_of_the_files>
> 
> I’m new at using solr, so I’m pretty sure I missed something, can anybody
> tell me what I missed?
> 
> Thanks.

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.