You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Chen <to...@gmail.com> on 2014/07/03 00:23:29 UTC

Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Hi,


When we run Solr Map Reduce Indexer Tool (
https://github.com/markrmiller/solr-map-reduce-example), it generates
indexes on HDFS

The last stage is Go Live to merge the generated index to live SolrCloud
index.

If the live SolrCloud write index to local file system (rather than HDFS),
the Go Live gives such error like this:

2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
hdfs://
bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
into http://bdvs087.test.com:8983/solr
2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending
live merge command
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/opt/testdir/solr/node/hdfs:/
bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index'
does not exist
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
at java.util.concurrent.FutureTask.get(FutureTask.java:94)
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/opt/testdir/solr/node/hdfs:/
bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index'
does not exist
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)

Any way to setup SolrCloud to write index to local file system, while
allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on
HDFS to the SolrCloud?

Thanks,
Tom

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, interesting, I actually hadn't thought of doing it
that way. I don't know the internals well enough to comment on it
but I do know someone who does. I'll check with them....

Erick

On Thu, Jul 3, 2014 at 9:18 AM, Tom Chen <to...@gmail.com> wrote:
> Hi,
>
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
>
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
>
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
>
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local file
> system and then merge.
>
> I tried to start Solr instance with these properties to allow solr instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
>
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
>
>
> With that, the  go-live works fine.
>
> Any comment on this approach?
>
>
>
> Tom
>
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> How would the MapReduceIndexerTool (MRIT for short)
>> find the local disk to write from HDFS to for each shard?
>> All it has is the information in the Solr configs, which are
>> usually relative paths on the local Solr machines, relative
>> to SOLR_HOME. Which could be different on each node
>> (that would be screwy, but possible).
>>
>> Permissions would also be a royal pain to get right....
>>
>> You _can_ forego the --go-live option and copy from
>> the HDFS nodes to your local drive and then execute
>> the "mergeIndexes" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
>> Note that there is the MergeIndexTool, but there's also
>> the Core Admin command.
>>
>> The sub-indexes are in a partition in HDFS and numbered
>> sequentially.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <to...@gmail.com> wrote:
>> > Hi,
>> >
>> >
>> > When we run Solr Map Reduce Indexer Tool (
>> > https://github.com/markrmiller/solr-map-reduce-example), it generates
>> > indexes on HDFS
>> >
>> > The last stage is Go Live to merge the generated index to live SolrCloud
>> > index.
>> >
>> > If the live SolrCloud write index to local file system (rather than
>> HDFS),
>> > the Go Live gives such error like this:
>> >
>> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
>> > hdfs://
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
>> > into http://bdvs087.test.com:8983/solr
>> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
>> sending
>> > live merge command
>> > java.util.concurrent.ExecutionException:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
>> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
>> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>> > at java.lang.reflect.Method.invoke(Method.java:611)
>> > at
>> >
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(AccessController.java:310)
>> > at javax.security.auth.Subject.doAs(Subject.java:573)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > Caused by:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>> > at
>> >
>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>> > at java.lang.Thread.run(Thread.java:738)
>> >
>> > Any way to setup SolrCloud to write index to local file system, while
>> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
>> on
>> > HDFS to the SolrCloud?
>> >
>> > Thanks,
>> > Tom
>>

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Posted by Erick Erickson <er...@gmail.com>.
Ok, I asked some folks who know and the response is that "that should
work, but it's not supported/tested". IOW, you're into somewhat
uncharted territory. The people who wrote the code don't have this
use-case in their priority list and probably won't be expending energy
in this direction any time soon.

So feel free! It'd be great if you reported/supplied patches for any
problems you run across, this has been a recurring theme with
HdfsDirectoryFactory and Solr replicas: "Why should three replicas
have 9 copies of the index laying around?"

Do note that disk space is cheap, however and there is considerable
work done to minimize any performance issues with HDFS.

Best,
Erick

On Thu, Jul 3, 2014 at 9:18 AM, Tom Chen <to...@gmail.com> wrote:
> Hi,
>
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
>
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
>
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
>
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local file
> system and then merge.
>
> I tried to start Solr instance with these properties to allow solr instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
>
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
>
>
> With that, the  go-live works fine.
>
> Any comment on this approach?
>
>
>
> Tom
>
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> How would the MapReduceIndexerTool (MRIT for short)
>> find the local disk to write from HDFS to for each shard?
>> All it has is the information in the Solr configs, which are
>> usually relative paths on the local Solr machines, relative
>> to SOLR_HOME. Which could be different on each node
>> (that would be screwy, but possible).
>>
>> Permissions would also be a royal pain to get right....
>>
>> You _can_ forego the --go-live option and copy from
>> the HDFS nodes to your local drive and then execute
>> the "mergeIndexes" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
>> Note that there is the MergeIndexTool, but there's also
>> the Core Admin command.
>>
>> The sub-indexes are in a partition in HDFS and numbered
>> sequentially.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <to...@gmail.com> wrote:
>> > Hi,
>> >
>> >
>> > When we run Solr Map Reduce Indexer Tool (
>> > https://github.com/markrmiller/solr-map-reduce-example), it generates
>> > indexes on HDFS
>> >
>> > The last stage is Go Live to merge the generated index to live SolrCloud
>> > index.
>> >
>> > If the live SolrCloud write index to local file system (rather than
>> HDFS),
>> > the Go Live gives such error like this:
>> >
>> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
>> > hdfs://
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
>> > into http://bdvs087.test.com:8983/solr
>> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
>> sending
>> > live merge command
>> > java.util.concurrent.ExecutionException:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
>> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
>> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>> > at java.lang.reflect.Method.invoke(Method.java:611)
>> > at
>> >
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(AccessController.java:310)
>> > at javax.security.auth.Subject.doAs(Subject.java:573)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > Caused by:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>> > at
>> >
>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>> > at java.lang.Thread.run(Thread.java:738)
>> >
>> > Any way to setup SolrCloud to write index to local file system, while
>> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
>> on
>> > HDFS to the SolrCloud?
>> >
>> > Thanks,
>> > Tom
>>

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Posted by Nirmal <ni...@yahoo.com>.
Tom Chen <tomchen1000 <at> gmail.com> writes:

> 
> Hi,
> 
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
> 
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?
action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-
00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
> 
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
> 
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local 
file
> system and then merge.
> 
> I tried to start Solr instance with these properties to allow solr 
instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
> 
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
> 
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
> 
> With that, the  go-live works fine.
> 
> Any comment on this approach?
> 
> Tom
> 
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerickson <at> 
gmail.com>
> wrote:
> 
> > How would the MapReduceIndexerTool (MRIT for short)
> > find the local disk to write from HDFS to for each shard?
> > All it has is the information in the Solr configs, which are
> > usually relative paths on the local Solr machines, relative
> > to SOLR_HOME. Which could be different on each node
> > (that would be screwy, but possible).
> >
> > Permissions would also be a royal pain to get right....
> >
> > You _can_ forego the --go-live option and copy from
> > the HDFS nodes to your local drive and then execute
> > the "mergeIndexes" command, see:
> > https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> > Note that there is the MergeIndexTool, but there's also
> > the Core Admin command.
> >
> > The sub-indexes are in a partition in HDFS and numbered
> > sequentially.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1000 <at> gmail.com> 
wrote:
> > > Hi,
> > >
> > >
> > > When we run Solr Map Reduce Indexer Tool (
> > > https://github.com/markrmiller/solr-map-reduce-example), it generates
> > > indexes on HDFS
> > >
> > > The last stage is Go Live to merge the generated index to live 
SolrCloud
> > > index.
> > >
> > > If the live SolrCloud write index to local file system (rather than
> > HDFS),
> > > the Go Live gives such error like this:
> > >
> > > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
> > > hdfs://
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00000
> > > into http://bdvs087.test.com:8983/solr
> > > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
> > sending
> > > live merge command
> > > java.util.concurrent.ExecutionException:
> > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > > directory '/opt/testdir/solr/node/hdfs:/
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00001/data/index
> > '
> > > does not exist
> > > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
> > > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:86
7)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:60
9)
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > at
> > >
> > 
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:5
96)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60
)
> > > at
> > >
> > 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:37)
> > > at java.lang.reflect.Method.invoke(Method.java:611)
> > > at
> > >
> > 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
> > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > at 
java.security.AccessController.doPrivileged(AccessController.java:310)
> > > at javax.security.auth.Subject.doAs(Subject.java:573)
> > > at
> > >
> > 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1502)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > Caused by:
> > > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > > directory '/opt/testdir/solr/node/hdfs:/
> > >
> > bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-
W/results/part-00001/data/index
> > '
> > > does not exist
> > > at
> > >
> > 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:495)
> > > at
> > >
> > 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:199)
> > > at
> > >
> > 
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminReque
st.java:493)
> > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
> > > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > > at
> > >
> > 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:897)
> > > at
> > >
> > 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
19)
> > > at java.lang.Thread.run(Thread.java:738)
> > >
> > > Any way to setup SolrCloud to write index to local file system, while
> > > allowing the Solr MapReduceIndexerTool's GoLive to merge index 
generated
> > on
> > > HDFS to the SolrCloud?
> > >
> > > Thanks,
> > > Tom
> >
> 


Tom, 

Is you solrcloud instance running on remote host ? Live merge with local 
solrcloud instance works fine with default arguments. It doesn't work for 
remote host because remote solr cannot access the index in hdfs (remote to 
solrcoud) while merge using GOLIve option. 

This is the exception I get

java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
directory '/hduser/east-02/solr-
4.8.1/node1/hdfs:/127.0.0.1:8020/outdir/results/part-00000/data/index' does 
not exist

Has anyone tried live merge or GoLive option on a remote solrcloud 
instance.Your response is highly appreciated.

Thanks,
Nirmal




Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Posted by Tom Chen <to...@gmail.com>.
Hi,

In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
instances. The request has a indexDir parameter with a hdfs path to the
index generated on HDFS, as shown in the MRIT log:

2014-07-02 15:03:55,123 DEBUG
org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
/solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
HTTP/1.1

So it's up to the Solr instance to understand reading index from HDFS
(rather than for the MRIT to find the local disk to write from HDFS).

The go-live option is very convenient to merge generated index to live
index. It's desirable to use go-live than copy around indexes to local file
system and then merge.

I tried to start Solr instance with these properties to allow solr instance
to write to local file system while being able to read index on HDFS when
doing MERGEINDEXES:

  -Dsolr.directoryFactory=HdfsDirectoryFactory \
  -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
  -Dsolr.lock.type=hdfs \
  -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \

i.e. the full command:
java -DnumShards=2 \
  -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf \
  -DzkHost=<zookeeper>:2181 \
  -Dhost=<node1> \
  -DSTOP.PORT=7983 -DSTOP.KEY=key \
  -Dsolr.directoryFactory=HdfsDirectoryFactory \
  -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
  -Dsolr.lock.type=hdfs \
  -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
  -jar start.jar


With that, the  go-live works fine.

Any comment on this approach?



Tom

On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <er...@gmail.com>
wrote:

> How would the MapReduceIndexerTool (MRIT for short)
> find the local disk to write from HDFS to for each shard?
> All it has is the information in the Solr configs, which are
> usually relative paths on the local Solr machines, relative
> to SOLR_HOME. Which could be different on each node
> (that would be screwy, but possible).
>
> Permissions would also be a royal pain to get right....
>
> You _can_ forego the --go-live option and copy from
> the HDFS nodes to your local drive and then execute
> the "mergeIndexes" command, see:
> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> Note that there is the MergeIndexTool, but there's also
> the Core Admin command.
>
> The sub-indexes are in a partition in HDFS and numbered
> sequentially.
>
> Best,
> Erick
>
> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <to...@gmail.com> wrote:
> > Hi,
> >
> >
> > When we run Solr Map Reduce Indexer Tool (
> > https://github.com/markrmiller/solr-map-reduce-example), it generates
> > indexes on HDFS
> >
> > The last stage is Go Live to merge the generated index to live SolrCloud
> > index.
> >
> > If the live SolrCloud write index to local file system (rather than
> HDFS),
> > the Go Live gives such error like this:
> >
> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
> > hdfs://
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
> > into http://bdvs087.test.com:8983/solr
> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
> sending
> > live merge command
> > java.util.concurrent.ExecutionException:
> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > directory '/opt/testdir/solr/node/hdfs:/
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
> '
> > does not exist
> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at
> >
> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> > at java.lang.reflect.Method.invoke(Method.java:611)
> > at
> >
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(AccessController.java:310)
> > at javax.security.auth.Subject.doAs(Subject.java:573)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by:
> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > directory '/opt/testdir/solr/node/hdfs:/
> >
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
> '
> > does not exist
> > at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
> > at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> > at
> >
> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> > at java.lang.Thread.run(Thread.java:738)
> >
> > Any way to setup SolrCloud to write index to local file system, while
> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
> on
> > HDFS to the SolrCloud?
> >
> > Thanks,
> > Tom
>

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

Posted by Erick Erickson <er...@gmail.com>.
How would the MapReduceIndexerTool (MRIT for short)
find the local disk to write from HDFS to for each shard?
All it has is the information in the Solr configs, which are
usually relative paths on the local Solr machines, relative
to SOLR_HOME. Which could be different on each node
(that would be screwy, but possible).

Permissions would also be a royal pain to get right....

You _can_ forego the --go-live option and copy from
the HDFS nodes to your local drive and then execute
the "mergeIndexes" command, see:
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
Note that there is the MergeIndexTool, but there's also
the Core Admin command.

The sub-indexes are in a partition in HDFS and numbered
sequentially.

Best,
Erick

On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <to...@gmail.com> wrote:
> Hi,
>
>
> When we run Solr Map Reduce Indexer Tool (
> https://github.com/markrmiller/solr-map-reduce-example), it generates
> indexes on HDFS
>
> The last stage is Go Live to merge the generated index to live SolrCloud
> index.
>
> If the live SolrCloud write index to local file system (rather than HDFS),
> the Go Live gives such error like this:
>
> 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
> hdfs://
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
> into http://bdvs087.test.com:8983/solr
> 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending
> live merge command
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> directory '/opt/testdir/solr/node/hdfs:/
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index'
> does not exist
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
> at java.util.concurrent.FutureTask.get(FutureTask.java:94)
> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> at
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
> at
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> at java.lang.reflect.Method.invoke(Method.java:611)
> at
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> directory '/opt/testdir/solr/node/hdfs:/
> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index'
> does not exist
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
> at
> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
> at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
> at java.util.concurrent.FutureTask.run(FutureTask.java:149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> at java.lang.Thread.run(Thread.java:738)
>
> Any way to setup SolrCloud to write index to local file system, while
> allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on
> HDFS to the SolrCloud?
>
> Thanks,
> Tom