You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 周建二 <zh...@ict.ac.cn> on 2015/12/19 02:16:16 UTC

Some problems when upload data to index in cloud environment

Hello everyone:


I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.


However, when I try to add data to index using this command:
bin/post -c cloud-test example/exampledocs/sample.html  -p 8987 
Some error happen:

java -classpath /usr/local/solr-5.3.1/dist/solr-core-5.3.1.jar -Dauto=yes -Dport=8987 -Dc=cloud-test -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/sample.html

SimplePostTool version 5.0.0

Posting files to [base] url http://localhost:8987/solr/cloud-test/update...

Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

POSTing file sample.html (text/html) to [base]/extract

SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8987/solr/cloud-test/update/extract?resource.name=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html

SimplePostTool: WARNING: Response: <html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

<title>Error 404 Not Found</title>

</head>

<body><h2>HTTP ERROR 404</h2>

<p>Problem accessing /solr/cloud-test/update/extract. Reason:

<pre>    Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>




</body>

</html>

SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8987/solr/cloud-test/update/extract?resource.name=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html

1 files indexed.

COMMITting Solr index changes to http://localhost:8987/solr/cloud-test/update...

Time spent: 0:00:00.050




Would anyone help me to solve this problem? Thanks.





















Re: Re: Re: Some problems when upload data to index in cloud environment

Posted by 周建二 <zh...@ict.ac.cn>.
Erick:


Thank your so much for your advise. Now we do not index a large number of files, but in future we may. I will pay more attention to ExtractingRequestHandler. Thanks again.


Best regard,
Jianer


> -----原始邮件-----
> 发件人: "Erick Erickson" <er...@gmail.com>
> 发送时间: 2015年12月22日 星期二
> 收件人: solr-user <so...@lucene.apache.org>
> 抄送: 
> 主题: Re: Re: Some problems when upload data to index in cloud environment
> 
> Jianer:
> 
> Getting your head around the configs is, indeed, "exciting" at times.
> 
> I just wanted to caution you that using ExtractingRequestHandler
> puts the Tika parsing load on the Solr server, which doesn't
> scale as the same machine that's serving queries and indexing
> is _also_ parsing potentially very large files. It may not matter
> if you don't do it often, but if you're going to index a large number
> of files and/or you're going to do this continuously, you probably
> want to move the parsing off Solr. Here's an example with DB
> as well, but the DB bits can be removed easily.
> 
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
> 
> Best,
> Erick
> 
> On Sun, Dec 20, 2015 at 9:29 PM, 周建二 <zh...@ict.ac.cn> wrote:
> > Hi Shawn, thanks for your reply. :)
> >
> >
> > It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.
> >
> >
> > I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml,  and try to make my own config for my dataset based on the basic_configs.
> >
> >
> > Thanks again.
> > Jianer
> >
> >
> >> -----原始邮件-----
> >> 发件人: "Shawn Heisey" <ap...@elyograg.org>
> >> 发送时间: 2015年12月20日 星期日
> >> 收件人: solr-user@lucene.apache.org
> >> 抄送:
> >> 主题: Re: Some problems when upload data to index in cloud environment
> >>
> >> On 12/18/2015 6:16 PM, 周建二 wrote:
> >> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
> >>
> >> <snip>
> >>
> >> > <body><h2>HTTP ERROR 404</h2>
> >> >
> >> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
> >>
> >> One of two problems is likely:  Either there is no collection named
> >> "cloud-test" on your cloud, or the /update/extract handler is not
> >> defined in that collection's solrconfig.xml file.  The active version of
> >> this file lives in zookeeper when you're running SolrCloud.
> >>
> >> If you're sure a collection with this name exists, how exactly did you
> >> create it?  Was it built with one of the sample configs or with a config
> >> that you built yourself?
> >>
> >> Of the three configsets included with the Solr dowbload,
> >> data_driven_schema_configs and sample_techproducts_configs contain the
> >> /update/extract handler.  The configset named basic_configs does NOT
> >> contain the handler.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
> >
> >




Re: Re: Some problems when upload data to index in cloud environment

Posted by Erick Erickson <er...@gmail.com>.
Jianer:

Getting your head around the configs is, indeed, "exciting" at times.

I just wanted to caution you that using ExtractingRequestHandler
puts the Tika parsing load on the Solr server, which doesn't
scale as the same machine that's serving queries and indexing
is _also_ parsing potentially very large files. It may not matter
if you don't do it often, but if you're going to index a large number
of files and/or you're going to do this continuously, you probably
want to move the parsing off Solr. Here's an example with DB
as well, but the DB bits can be removed easily.

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Sun, Dec 20, 2015 at 9:29 PM, 周建二 <zh...@ict.ac.cn> wrote:
> Hi Shawn, thanks for your reply. :)
>
>
> It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.
>
>
> I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml,  and try to make my own config for my dataset based on the basic_configs.
>
>
> Thanks again.
> Jianer
>
>
>> -----原始邮件-----
>> 发件人: "Shawn Heisey" <ap...@elyograg.org>
>> 发送时间: 2015年12月20日 星期日
>> 收件人: solr-user@lucene.apache.org
>> 抄送:
>> 主题: Re: Some problems when upload data to index in cloud environment
>>
>> On 12/18/2015 6:16 PM, 周建二 wrote:
>> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
>>
>> <snip>
>>
>> > <body><h2>HTTP ERROR 404</h2>
>> >
>> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
>>
>> One of two problems is likely:  Either there is no collection named
>> "cloud-test" on your cloud, or the /update/extract handler is not
>> defined in that collection's solrconfig.xml file.  The active version of
>> this file lives in zookeeper when you're running SolrCloud.
>>
>> If you're sure a collection with this name exists, how exactly did you
>> create it?  Was it built with one of the sample configs or with a config
>> that you built yourself?
>>
>> Of the three configsets included with the Solr dowbload,
>> data_driven_schema_configs and sample_techproducts_configs contain the
>> /update/extract handler.  The configset named basic_configs does NOT
>> contain the handler.
>>
>> Thanks,
>> Shawn
>>
>
>
>

Re: Re: Some problems when upload data to index in cloud environment

Posted by 周建二 <zh...@ict.ac.cn>.
Hi Shawn, thanks for your reply. :)


It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.


I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml,  and try to make my own config for my dataset based on the basic_configs.


Thanks again.
Jianer


> -----原始邮件-----
> 发件人: "Shawn Heisey" <ap...@elyograg.org>
> 发送时间: 2015年12月20日 星期日
> 收件人: solr-user@lucene.apache.org
> 抄送: 
> 主题: Re: Some problems when upload data to index in cloud environment
> 
> On 12/18/2015 6:16 PM, 周建二 wrote:
> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
> 
> <snip>
> 
> > <body><h2>HTTP ERROR 404</h2>
> > 
> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
> 
> One of two problems is likely:  Either there is no collection named
> "cloud-test" on your cloud, or the /update/extract handler is not
> defined in that collection's solrconfig.xml file.  The active version of
> this file lives in zookeeper when you're running SolrCloud.
> 
> If you're sure a collection with this name exists, how exactly did you
> create it?  Was it built with one of the sample configs or with a config
> that you built yourself?
> 
> Of the three configsets included with the Solr dowbload,
> data_driven_schema_configs and sample_techproducts_configs contain the
> /update/extract handler.  The configset named basic_configs does NOT
> contain the handler.
> 
> Thanks,
> Shawn
> 




Re: Some problems when upload data to index in cloud environment

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/18/2015 6:16 PM, 周建二 wrote:
> I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.

<snip>

> <body><h2>HTTP ERROR 404</h2>
> 
> <p>Problem accessing /solr/cloud-test/update/extract. Reason:

One of two problems is likely:  Either there is no collection named
"cloud-test" on your cloud, or the /update/extract handler is not
defined in that collection's solrconfig.xml file.  The active version of
this file lives in zookeeper when you're running SolrCloud.

If you're sure a collection with this name exists, how exactly did you
create it?  Was it built with one of the sample configs or with a config
that you built yourself?

Of the three configsets included with the Solr dowbload,
data_driven_schema_configs and sample_techproducts_configs contain the
/update/extract handler.  The configset named basic_configs does NOT
contain the handler.

Thanks,
Shawn