You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/10/12 10:07:00 UTC

[jira] [Commented] (NUTCH-2650) -addBinaryContent -base64 flags are causing "String length must be a multiple of four" error in IndexingJob

    [ https://issues.apache.org/jira/browse/NUTCH-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647736#comment-16647736 ] 

Sebastian Nagel commented on NUTCH-2650:
----------------------------------------

Hi [~asm123], I tried to reproduce the problem with Solr 6.6.0 (type "http", not "cloud") and Nutch 1.14 and the current master - without success: indexing of binary content ({{-addBinaryContent -base64}}) works. I've also successfully indexed one long document (11 MB).
Could you eventually share the document which causes the failure?
Does the log really contain a line break within the binaryContent value after the trailing {{==}}?
{noformat}
'binaryContent'='.......==
' msg=String length must be a multiple of four.
{noformat}

> -addBinaryContent -base64 flags are causing "String length must be a multiple of four" error in IndexingJob
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2650
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2650
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.14
>            Reporter: asmita
>            Priority: Major
>
>  I am running Nutch crawl command as follows, in distributed mode:
> {code:java}
> runtime/deploy/bin/crawl -i -D solr.server.url=http://my-solr:8983/solr/my-collection -D solr.server.type=cloud -D solr.zookeeper.url=http://my-solr:9983  -s /user/my-user/seed  /user/my-user/crawl 1
> {code}
> The IndexingJob fails with the following error:
>  
> {code:java}
> org.apache.solr.common.SolrException: ERROR: [doc=3b9a9fb7fd92d32287da1b2f3df5f8a1] Error adding field 'binaryContent'='.......==
> ' msg=String length must be a multiple of four.
> 	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:208)
> 	at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:101)
> 	at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:963)
> 	at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:954)
> 	at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:334)
> 	at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:271)
> 	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221)
> 	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950)
> 	at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1163)
> 	at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)
> 	at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 	at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)
> 	at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
> 	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
> 	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
> 	at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
> 	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
> 	at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
> 	at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
> 	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
> 	at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
> 	at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
> 	at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> 	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
> 	at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
> 	at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> 	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> 	at org.eclipse.jetty.server.Server.handle(Server.java:530)
> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
> 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
> 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
> 	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
> 	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: String length must be a multiple of four.
> 	at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:102)
> 	at org.apache.solr.schema.BinaryField.createField(BinaryField.java:101)
> 	at org.apache.solr.schema.FieldType.createFields(FieldType.java:317)
> 	at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:66)
> 	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:159)
> 	... 84 more
> {code}
>  Solr version: 7.3.1
> Indexing command from logs:
> {code:java}
> runtime/deploy/bin/nutch index -Dsolr.zookeeper.url=http://my-solr:9983 -Dsolr.server.type=cloud -Dsolr.server.url=http://my-solr:8983/solr/my-collection /user/my-user/crawl/crawldb -linkdb /user/my-user/crawl/linkdb /user/my-user/crawl/segments/20181011040457 -addBinaryContent -base64
> {code}
>  
> (removed huge binary content from the log)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)