You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 周建二 <zh...@ict.ac.cn> on 2015/12/19 02:16:16 UTC
Some problems when upload data to index in cloud environment
Hello everyone:
I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
However, when I try to add data to index using this command:
bin/post -c cloud-test example/exampledocs/sample.html -p 8987
Some error happen:
java -classpath /usr/local/solr-5.3.1/dist/solr-core-5.3.1.jar -Dauto=yes -Dport=8987 -Dc=cloud-test -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/sample.html
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8987/solr/cloud-test/update...
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file sample.html (text/html) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8987/solr/cloud-test/update/extract?resource.name=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/cloud-test/update/extract. Reason:
<pre> Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8987/solr/cloud-test/update/extract?resource.name=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2Fusr%2Flocal%2Fsolr-5.3.1%2Fexample%2Fexampledocs%2Fsample.html
1 files indexed.
COMMITting Solr index changes to http://localhost:8987/solr/cloud-test/update...
Time spent: 0:00:00.050
Would anyone help me to solve this problem? Thanks.
Re: Re: Re: Some problems when upload data to index in cloud
environment
Posted by 周建二 <zh...@ict.ac.cn>.
Erick:
Thank your so much for your advise. Now we do not index a large number of files, but in future we may. I will pay more attention to ExtractingRequestHandler. Thanks again.
Best regard,
Jianer
> -----原始邮件-----
> 发件人: "Erick Erickson" <er...@gmail.com>
> 发送时间: 2015年12月22日 星期二
> 收件人: solr-user <so...@lucene.apache.org>
> 抄送:
> 主题: Re: Re: Some problems when upload data to index in cloud environment
>
> Jianer:
>
> Getting your head around the configs is, indeed, "exciting" at times.
>
> I just wanted to caution you that using ExtractingRequestHandler
> puts the Tika parsing load on the Solr server, which doesn't
> scale as the same machine that's serving queries and indexing
> is _also_ parsing potentially very large files. It may not matter
> if you don't do it often, but if you're going to index a large number
> of files and/or you're going to do this continuously, you probably
> want to move the parsing off Solr. Here's an example with DB
> as well, but the DB bits can be removed easily.
>
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 9:29 PM, 周建二 <zh...@ict.ac.cn> wrote:
> > Hi Shawn, thanks for your reply. :)
> >
> >
> > It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.
> >
> >
> > I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml, and try to make my own config for my dataset based on the basic_configs.
> >
> >
> > Thanks again.
> > Jianer
> >
> >
> >> -----原始邮件-----
> >> 发件人: "Shawn Heisey" <ap...@elyograg.org>
> >> 发送时间: 2015年12月20日 星期日
> >> 收件人: solr-user@lucene.apache.org
> >> 抄送:
> >> 主题: Re: Some problems when upload data to index in cloud environment
> >>
> >> On 12/18/2015 6:16 PM, 周建二 wrote:
> >> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
> >>
> >> <snip>
> >>
> >> > <body><h2>HTTP ERROR 404</h2>
> >> >
> >> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
> >>
> >> One of two problems is likely: Either there is no collection named
> >> "cloud-test" on your cloud, or the /update/extract handler is not
> >> defined in that collection's solrconfig.xml file. The active version of
> >> this file lives in zookeeper when you're running SolrCloud.
> >>
> >> If you're sure a collection with this name exists, how exactly did you
> >> create it? Was it built with one of the sample configs or with a config
> >> that you built yourself?
> >>
> >> Of the three configsets included with the Solr dowbload,
> >> data_driven_schema_configs and sample_techproducts_configs contain the
> >> /update/extract handler. The configset named basic_configs does NOT
> >> contain the handler.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
> >
> >
Re: Re: Some problems when upload data to index in cloud environment
Posted by Erick Erickson <er...@gmail.com>.
Jianer:
Getting your head around the configs is, indeed, "exciting" at times.
I just wanted to caution you that using ExtractingRequestHandler
puts the Tika parsing load on the Solr server, which doesn't
scale as the same machine that's serving queries and indexing
is _also_ parsing potentially very large files. It may not matter
if you don't do it often, but if you're going to index a large number
of files and/or you're going to do this continuously, you probably
want to move the parsing off Solr. Here's an example with DB
as well, but the DB bits can be removed easily.
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
Best,
Erick
On Sun, Dec 20, 2015 at 9:29 PM, 周建二 <zh...@ict.ac.cn> wrote:
> Hi Shawn, thanks for your reply. :)
>
>
> It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.
>
>
> I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml, and try to make my own config for my dataset based on the basic_configs.
>
>
> Thanks again.
> Jianer
>
>
>> -----原始邮件-----
>> 发件人: "Shawn Heisey" <ap...@elyograg.org>
>> 发送时间: 2015年12月20日 星期日
>> 收件人: solr-user@lucene.apache.org
>> 抄送:
>> 主题: Re: Some problems when upload data to index in cloud environment
>>
>> On 12/18/2015 6:16 PM, 周建二 wrote:
>> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
>>
>> <snip>
>>
>> > <body><h2>HTTP ERROR 404</h2>
>> >
>> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
>>
>> One of two problems is likely: Either there is no collection named
>> "cloud-test" on your cloud, or the /update/extract handler is not
>> defined in that collection's solrconfig.xml file. The active version of
>> this file lives in zookeeper when you're running SolrCloud.
>>
>> If you're sure a collection with this name exists, how exactly did you
>> create it? Was it built with one of the sample configs or with a config
>> that you built yourself?
>>
>> Of the three configsets included with the Solr dowbload,
>> data_driven_schema_configs and sample_techproducts_configs contain the
>> /update/extract handler. The configset named basic_configs does NOT
>> contain the handler.
>>
>> Thanks,
>> Shawn
>>
>
>
>
Re: Re: Some problems when upload data to index in cloud
environment
Posted by 周建二 <zh...@ict.ac.cn>.
Hi Shawn, thanks for your reply. :)
It is because the /update/extract handler is not defined in my collection's solrconfig.xml file as I upload the basic_configs/conf to ZooKeeper. When I upload sample_techproducts_configs to ZooKeeper, everything goes well.
I am a freshman for Solr. Now I am going to learn the schema.xml solrconfig.xml, and try to make my own config for my dataset based on the basic_configs.
Thanks again.
Jianer
> -----原始邮件-----
> 发件人: "Shawn Heisey" <ap...@elyograg.org>
> 发送时间: 2015年12月20日 星期日
> 收件人: solr-user@lucene.apache.org
> 抄送:
> 主题: Re: Some problems when upload data to index in cloud environment
>
> On 12/18/2015 6:16 PM, 周建二 wrote:
> > I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
>
> <snip>
>
> > <body><h2>HTTP ERROR 404</h2>
> >
> > <p>Problem accessing /solr/cloud-test/update/extract. Reason:
>
> One of two problems is likely: Either there is no collection named
> "cloud-test" on your cloud, or the /update/extract handler is not
> defined in that collection's solrconfig.xml file. The active version of
> this file lives in zookeeper when you're running SolrCloud.
>
> If you're sure a collection with this name exists, how exactly did you
> create it? Was it built with one of the sample configs or with a config
> that you built yourself?
>
> Of the three configsets included with the Solr dowbload,
> data_driven_schema_configs and sample_techproducts_configs contain the
> /update/extract handler. The configset named basic_configs does NOT
> contain the handler.
>
> Thanks,
> Shawn
>
Re: Some problems when upload data to index in cloud environment
Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/18/2015 6:16 PM, 周建二 wrote:
> I am building a solr cloud production environment. My solr version is 5.3.1. The environment consists three nodes running CentOS 6.5. First I build the zookeeper environment by the three nodes, and then run solr on the three nodes, and at last build a collection consists of three shards and each shard has two replicas. After that we can see that cloud structure on the Solr Admin page.
<snip>
> <body><h2>HTTP ERROR 404</h2>
>
> <p>Problem accessing /solr/cloud-test/update/extract. Reason:
One of two problems is likely: Either there is no collection named
"cloud-test" on your cloud, or the /update/extract handler is not
defined in that collection's solrconfig.xml file. The active version of
this file lives in zookeeper when you're running SolrCloud.
If you're sure a collection with this name exists, how exactly did you
create it? Was it built with one of the sample configs or with a config
that you built yourself?
Of the three configsets included with the Solr dowbload,
data_driven_schema_configs and sample_techproducts_configs contain the
/update/extract handler. The configset named basic_configs does NOT
contain the handler.
Thanks,
Shawn