You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nirmal <ni...@yahoo.com> on 2014/09/30 00:53:08 UTC

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Tom Chen <tomchen1000 <at> gmail.com> writes:

> 
> Hi,
> 
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
> 
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?
action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-
00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
> 
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
> 
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local 
file
> system and then merge.
> 
> I tried to start Solr instance with these properties to allow solr 
instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
> 
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
> 
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
> 
> With that, the  go-live works fine.
> 
> Any comment on this approach?
> 
> Tom
> 
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerickson <at> 
gmail.com>
> wrote:
> 
> > How would the MapReduceIndexerTool (MRIT for short)
> > find the local disk to write from HDFS to for each shard?
> > All it has is the information in the Solr configs, which are
> > usually relative paths on the local Solr machines, relative
> > to SOLR_HOME. Which could be different on each node
> > (that would be screwy, but possible).
> >
> > Permissions would also be a royal pain to get right....
> >
> > You _can_ forego the --go-live option and copy from
> > the HDFS nodes to your local drive and then execute
> > the "mergeIndexes" command, see:
> > https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> > Note that there is the MergeIndexTool, but there's also
> > the Core Admin command.
> >
> > The sub-indexes are in a partition in HDFS and numbered
> > sequentially.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1000 <at> gmail.com> 
wrote:
> > > Hi,
> > >
> > >
> > > When we run Solr Map Reduce Indexer Tool (
> > > https://github.com/markrmiller/solr-map-reduce-example), it generates
> > > indexes on HDFS
> > >
> > > The last stage is Go Live to merge the generated index to live 
SolrCloud
> > > index.
> > >
> > > If the live SolrCloud write index to local file system (rather than
> > HDFS),
> > > the Go Live gives such error like this:
> > >
> > > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
> > > hdfs://
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00000
> > > into http://bdvs087.test.com:8983/solr
> > > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
> > sending
> > > live merge command
> > > java.util.concurrent.ExecutionException:
> > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > > directory '/opt/testdir/solr/node/hdfs:/
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00001/data/index
> > '
> > > does not exist
> > > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
> > > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:86
7)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:60
9)
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:5
96)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60
)
> > > at
> > >
> > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:37)
> > > at java.lang.reflect.Method.invoke(Method.java:611)
> > > at
> > >
> > 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
> > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > at 
java.security.AccessController.doPrivileged(AccessController.java:310)
> > > at javax.security.auth.Subject.doAs(Subject.java:573)
> > > at
> > >
> > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1502)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > Caused by:
> > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > > directory '/opt/testdir/solr/node/hdfs:/
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00001/data/index
> > '
> > > does not exist
> > > at
> > >
> > 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:495)
> > > at
> > >
> > 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:199)
> > > at
> > >
> > 
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminReque
st.java:493)
> > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
> > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > > at
> > >
> > 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:897)
> > > at
> > >
> > 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
19)
> > > at java.lang.Thread.run(Thread.java:738)
> > >
> > > Any way to setup SolrCloud to write index to local file system, while
> > > allowing the Solr MapReduceIndexerTool's GoLive to merge index 
generated
> > on
> > > HDFS to the SolrCloud?
> > >
> > > Thanks,
> > > Tom
> >
> 


Tom, 

Is you solrcloud instance running on remote host ? Live merge with local 
solrcloud instance works fine with default arguments. It doesn't work for 
remote host because remote solr cannot access the index in hdfs (remote to 
solrcoud) while merge using GOLIve option. 

This is the exception I get

java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
directory '/hduser/east-02/solr-
4.8.1/node1/hdfs:/127.0.0.1:8020/outdir/results/part-00000/data/index' does 
not exist

Has anyone tried live merge or GoLive option on a remote solrcloud 
instance.Your response is highly appreciated.

Thanks,
Nirmal