You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2018/09/21 13:46:50 UTC

[CANCEL][VOTE] Release ManifoldCF 2.11, RC1

Canceling due to problems with Solr connector.
Karl


On Fri, Sep 21, 2018 at 9:35 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> Hi Karl,
>
> I understand that the piece of code involved is exactly the same as the
> one in the SolrJ API, which is the "reference" way of coding.
>
> Let me explain again the different steps of my tests :
>
> 1) I configured a job to crawl a winshare repository containing 3 files
> and ingesting them into a Solr 7.4.0 instance
>
> 2) The job ran and ended with a 'Done' status and the number of
> processed documents was correct.
>
> 3) I checked the number of documents of my Solr instance and noticed
> that it was 0
>
> 4) I checked the Simple history of MCF and found the following error for
> each of my 3 documents :
>
> 09-21-2018 11:49:09.362         document ingest (Solr)
> file://///localhost/OCR/subfolder/test_file.txt
>         400     61      118749  Error from server at
> http://localhost:8983/solr/FileShare: missing content stream
>
>
> 5) I then checked the logs of Solr and found the following error for
> each of the document ingestions :
>
> ERROR 2018-09-21T11:51:04,100 (qtp952486988-21) -
> Solr|Solr|solr.handler.RequestHandlerBase|[c:FileShare s:shard1
> r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: missing content stream
>      at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)
>      at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>      at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
>      at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>      at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>      at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>      at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>      at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>      at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>      at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>      at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>      at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>      at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
>      at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>      at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>      at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>      at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>      at org.eclipse.jetty.server.Server.handle(Server.java:531)
>      at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
>      at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
>      at
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
>      at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
>      at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:118)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>      at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>      at
>
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>      at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
>      at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
>      at java.lang.Thread.run(Thread.java:748)
>
> 6) I did a new crawl to debug the code and found that after the
> following lines (in the
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient:108) :
>      SolrParams params = request.getParams();
>      RequestWriter.ContentWriter contentWriter =
> requestWriter.getContentWriter(request);
>      Collection<ContentStream> streams = contentWriter == null ?
> requestWriter.getContentStreams(request) : null;
>
>      the 'streams' object is null
>
>      So I checked the value of the contentWriter object and found that
> it was not null. So it explains why the if statement attributed the null
> value to the 'streams' object instead of the
> requestWriter.getContentStreams(request) which, after checking it, is
> correctly returning a ContentStream collection containing the input
> stream of the incoming file.
>
>
> In conclusion, I am as confused as you and, knowing that you used the
> same piece of code than the SolrJ API, I am wondering wether we should
> ask them some explanation ?
>
> Julien
>
> On 21/09/2018 15:04, Karl Wright wrote:
> > Hi Julien,
> >
> > I verified that the integration test in question confirms the following:
> > (a) that the right number of documents were processed, and that (b) there
> > were no errors reported during the processing.  So unless the failure is
> > indeed a silent one, and documents are simply not getting transmitted to
> > Solr at all, that test should be valid.
> >
> > Can you describe the actual failure that you are seeing please?
> >
> > Karl
> >
> >
> > On Fri, Sep 21, 2018 at 8:52 AM Karl Wright <da...@gmail.com> wrote:
> >
> >> Julien,
> >>
> >> Integration tests do cover indexing via SolrJ, and they do succeed.
> >> (That's how I found the deletion bug FWIW).  I therefore need more
> >> information about the specific failure symptom you are seeing before
> I'll
> >> withdraw the candidate.  If it's a silent failure that's one thing but
> if
> >> you are are seeing a ManifoldCF exception then something is different
> >> between your setup and mine.
> >>
> >> Karl
> >>
> >>
> >> On Fri, Sep 21, 2018 at 8:09 AM Julien Massiera <
> >> julien.massiera@francelabs.com> wrote:
> >>
> >>> -1 ref : https://issues.apache.org/jira/browse/CONNECTORS-1533
> >>>
> >>> Julien
> >>>
> >>>
> >>> On 20/09/2018 10:38, Karl Wright wrote:
> >>>> All tests pass, artifacts look good.
> >>>>
> >>>> +1 from me.
> >>>>
> >>>> Karl
> >>>>
> >>>>
> >>>> On Wed, Sep 19, 2018 at 9:57 PM Karl Wright <da...@gmail.com>
> wrote:
> >>>>
> >>>>> Please vote on whether to release ManifoldCF 2.11, RC1.  This release
> >>>>> contains a number of fixes/improvements/additions, described in the
> >>>>> CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
> >>> number
> >>>>> of fixes for classpath issues specifically requested by ManifoldCF.
> >>>>>
> >>>>> This fixes a SolrJ related problem with the Solr Connector found in
> >>> RC1.
> >>>>> All tests pass.
> >>>>>
> >>>>> The release artifact can be found at:
> >>>>>
> >>>>>
> >>>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
> >>>>> There is also a tag at:
> >>>>>
> >>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC1
> >>>>>
> >>>>> Thanks again,
> >>>>> Karl Wright
> >>>>>
> >>>>>
> >>> --
> >>> Julien MASSIERA
> >>> Directeur développement produit
> >>> France Labs – Les experts du Search
> >>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> >>> www.francelabs.com
> >>>
> >>>
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> www.francelabs.com
>
>