You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitriy Shvadskiy <ds...@gmail.com> on 2013/08/09 21:50:29 UTC

Problem running Solr indexing in Amazon EMR

Hello,
We are trying to utilize Amazon Elastic Map Reduce to build Solr indexes. We
are using embedded Solr in the Reduce phase to create the actual index.
However we run into a following error and not sure what is causing it. Solr
version is 4.4. The job runs fine locally in Cloudera CDH 4.3 VM

Thanks,
Dmitriy


2013-08-09 14:52:02,602 FATAL org.apache.hadoop.mapred.Child (main): Error
running child : java.lang.VerifyError: (class:
org/apache/lucene/codecs/lucene40/Lucene40FieldInfosRead
er, method: read signature:
(Lorg/apache/lucene/store/Directory;Ljava/lang/String;Lorg/apache/lucene/store/IOContext;)Lorg/apache/lucene/index/FieldInfos;)
Incompatible argument
to function
        at
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.<init>(Lucene40FieldInfosFormat.java:99)
        at
org.apache.lucene.codecs.lucene40.Lucene40Codec.<init>(Lucene40Codec.java:49)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at
org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
        at
org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:47)
        at
org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
        at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:41)
        at
org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:185)
        at
org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:121)
        at
org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:235)
        at org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:149)
        at
org.finra.ss.solr.SolrIndexingReducer.getEmbeddedSolrServer(SolrIndexingReducer.java:195)
        at
org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:94)
        at
org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:33)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)




--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem running Solr indexing in Amazon EMR

Posted by Michael Della Bitta <mi...@appinions.com>.
If you do end up figuring it out, would you mind letting me know? Right
now, our solution is to use an older version of SolrJ, but that means we
miss out on some of the improvements/bugfixes around aliases.

Thanks,

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Mon, Aug 12, 2013 at 7:21 PM, Dmitriy Shvadskiy <ds...@gmail.com>wrote:

> Michael,
> We replaced Lucene jars but run into a problem with incompatible version of
> Apache HttpComponents. Still figuring it out.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Problem running Solr indexing in Amazon EMR

Posted by Dmitriy Shvadskiy <ds...@gmail.com>.
Michael,
We replaced Lucene jars but run into a problem with incompatible version of
Apache HttpComponents. Still figuring it out.

Dmitriy 



--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem running Solr indexing in Amazon EMR

Posted by Michael Della Bitta <mi...@appinions.com>.
hi Dmitriy,

Just out of curiosity, have you tried replacing the Lucene jars with a
bootstrap action?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Mon, Aug 12, 2013 at 5:32 PM, Dmitriy Shvadskiy <ds...@gmail.com>wrote:

> Michael,
>
> Amazon Hadoop distribution has Lucene 2.9.4 jars in /lib directory and they
> conflict with Solr 4.4 we are using. Once we pass that problem we run into
> conflict with Apache HttpComponents you describe. I think the best bet
> would
> be for us to build our own AMI to avoid these dependencies.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084103.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Problem running Solr indexing in Amazon EMR

Posted by Dmitriy Shvadskiy <ds...@gmail.com>.
Michael,

Amazon Hadoop distribution has Lucene 2.9.4 jars in /lib directory and they
conflict with Solr 4.4 we are using. Once we pass that problem we run into
conflict with Apache HttpComponents you describe. I think the best bet would
be for us to build our own AMI to avoid these dependencies.

Dmitriy



--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem running Solr indexing in Amazon EMR

Posted by Michael Della Bitta <mi...@appinions.com>.
Dmitriy,

I don't believe that EMR does include Solr or Lucene in their EMR AMIs. But
there was a recent AMI update that ruined some things for us. Have you
tried using an older AMI?

One headache for us has been that the EMR AMI uses an older version of
Apache HttpComponents than that of Solr 4.3, and they're not compatible. So
we're having to use an older SolrJ to compensate.

I haven't had problems with conflicting Lucene deps in EMR, however.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Sun, Aug 11, 2013 at 2:05 PM, Dmitriy Shvadskiy <ds...@gmail.com>wrote:

> Erick,
>
> Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure
> what
> version Amazon EMR includes. We are not directly referencing or using their
> version of Solr but instead build our jar against Solr 4.4 and include all
> dependencies in our jar file.  Also error occurs not while reading existing
> index but simply creating an instance of EmbeddedSolrServer. I think there
> is a conflict between jars that EMR process loads and that our map/reduce
> job requires but I can't figure out what it is.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Problem running Solr indexing in Amazon EMR

Posted by Dmitriy Shvadskiy <ds...@gmail.com>.
Erick,

It actually suppose to be just one version of Solr that is bundled with our
map/reduce jar. To be clear: Map/Reduce job is generating a new index, not
reading an existing one. But it fails even before  as an instance of
EmbeddedSolrServer is created at the first line of the following code.

CoreContainer coreContainer = new CoreContainer(solrhomedir);
coreContainer.load();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
"collection1");

Dmitriy



--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083884.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem running Solr indexing in Amazon EMR

Posted by Erick Erickson <er...@gmail.com>.
Have you checked the luceneMatchVersion in all your solrconfig.xml
files? I'm guessing it't set to 40 somewhere in the process as
evidenced by the line:
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.<init>(
Lucene40FieldInfosFormat.java:99)
so it looks like somehow a Lucene 4.0 codec is being used to try to read
a more recent format.

You have three different Solr's you're trying to mix-and-match, so
getting them all coordinated is "interesting". I'm guessing that
when you instantiate the embedded Solr, you're pointing it at a
pre-existing index, but that's only guessing..

Best
Erick


On Sun, Aug 11, 2013 at 2:05 PM, Dmitriy Shvadskiy <ds...@gmail.com>wrote:

> Erick,
>
> Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure
> what
> version Amazon EMR includes. We are not directly referencing or using their
> version of Solr but instead build our jar against Solr 4.4 and include all
> dependencies in our jar file.  Also error occurs not while reading existing
> index but simply creating an instance of EmbeddedSolrServer. I think there
> is a conflict between jars that EMR process loads and that our map/reduce
> job requires but I can't figure out what it is.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Problem running Solr indexing in Amazon EMR

Posted by Dmitriy Shvadskiy <ds...@gmail.com>.
Erick,

Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure what
version Amazon EMR includes. We are not directly referencing or using their
version of Solr but instead build our jar against Solr 4.4 and include all
dependencies in our jar file.  Also error occurs not while reading existing
index but simply creating an instance of EmbeddedSolrServer. I think there
is a conflict between jars that EMR process loads and that our map/reduce
job requires but I can't figure out what it is.

Dmitriy



--
View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4083855.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem running Solr indexing in Amazon EMR

Posted by Erick Erickson <er...@gmail.com>.
What version of Solr is Cloudera's CDH built on? Looks to me like
the Solr you're using to read the M/R produced index is different
than the one used to build it. Or the version specified in the
Solr configs, evidenced by the LUCENE40 in the error
message. See <luceneMatchVersion> in solrconfig.xml.

But probably a better question to ask Cloudera...

Erick


On Fri, Aug 9, 2013 at 3:50 PM, Dmitriy Shvadskiy <ds...@gmail.com>wrote:

> Hello,
> We are trying to utilize Amazon Elastic Map Reduce to build Solr indexes.
> We
> are using embedded Solr in the Reduce phase to create the actual index.
> However we run into a following error and not sure what is causing it. Solr
> version is 4.4. The job runs fine locally in Cloudera CDH 4.3 VM
>
> Thanks,
> Dmitriy
>
>
> 2013-08-09 14:52:02,602 FATAL org.apache.hadoop.mapred.Child (main): Error
> running child : java.lang.VerifyError: (class:
> org/apache/lucene/codecs/lucene40/Lucene40FieldInfosRead
> er, method: read signature:
>
> (Lorg/apache/lucene/store/Directory;Ljava/lang/String;Lorg/apache/lucene/store/IOContext;)Lorg/apache/lucene/index/FieldInfos;)
> Incompatible argument
> to function
>         at
>
> org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.<init>(Lucene40FieldInfosFormat.java:99)
>         at
>
> org.apache.lucene.codecs.lucene40.Lucene40Codec.<init>(Lucene40Codec.java:49)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at java.lang.Class.newInstance(Class.java:374)
>         at
> org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
>         at
> org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:47)
>         at
> org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
>         at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:41)
>         at
>
> org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:185)
>         at
> org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:121)
>         at
> org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:235)
>         at
> org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:149)
>         at
>
> org.finra.ss.solr.SolrIndexingReducer.getEmbeddedSolrServer(SolrIndexingReducer.java:195)
>         at
> org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:94)
>         at
> org.finra.ss.solr.SolrIndexingReducer.reduce(SolrIndexingReducer.java:33)
>         at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:528)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:429)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>