You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Markey <su...@gmail.com> on 2012/04/21 01:42:09 UTC

null pointer error with solr deduplication

Hello,

I have been trying out deduplication in solr by following:
http://wiki.apache.org/solr/Deduplication. I have defined a signature field
to hold the values of the signature created based on few other fields in a
document and the idea seems to work like a charm in a single solr instance.
But, when I have multiple cores and try to do a distributed search (
Http://localhost:8080/solr/core0/select?q=*&shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2&facet=true&facet.field=doc_id)
I get the error pasted below. While normal search (with just q) works fine,
the facet/stats queries seem to be the culprit. The doc_id contains
duplicate ids since I'm testing the same set of documents indexed in both
the cores(dedupe, dedupe2). Any insights would be highly appreciated.

Thanks



20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Re: null pointer error with solr deduplication

Posted by Peter Markey <su...@gmail.com>.
Thanks for the response. Yes, I agree with you that I have to check for the
uniqueness of doc ids but our requirement is such that we need to send it
to solr and I know that solr discards duplicate documents and it does not
work fine when we manually create the unique id. But I just wanted to
report the error since in this scenario (i guess the components for
deduplication are pretty new), it would probably help the devs to make the
behavior more deterministic towards duplicate documents.

On Sat, Apr 21, 2012 at 2:21 AM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:

> Hi
>
> I might be wrong but it's your responsibility to put unique doc IDs across
> shards.
>
> read this page
>
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
>
> particualry
>
>   - Documents must have a unique key and the unique key must be stored
>   (stored="true" in schema.xml)
>   -
>
>   *The unique key field must be unique across all shards.* If docs with
>   duplicate unique keys are encountered, Solr will make an attempt to
> return
>   valid results, but the behavior may be non-deterministic.
>
> So solr bahaves as it should :) _unexpectidly_
>
> But I agree in that sence that there must be no error especially such as
> NPE.
>
> Best Regards
> Alexander Aristov
>
>
> On 21 April 2012 03:42, Peter Markey <su...@gmail.com> wrote:
>
> > Hello,
> >
> > I have been trying out deduplication in solr by following:
> > http://wiki.apache.org/solr/Deduplication. I have defined a signature
> > field
> > to hold the values of the signature created based on few other fields in
> a
> > document and the idea seems to work like a charm in a single solr
> instance.
> > But, when I have multiple cores and try to do a distributed search (
> >
> >
> Http://localhost:8080/solr/core0/select?q=*&shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2&facet=true&facet.field=doc_id
> > )
> > I get the error pasted below. While normal search (with just q) works
> fine,
> > the facet/stats queries seem to be the culprit. The doc_id contains
> > duplicate ids since I'm testing the same set of documents indexed in both
> > the cores(dedupe, dedupe2). Any insights would be highly appreciated.
> >
> > Thanks
> >
> >
> >
> > 20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.NullPointerException
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
> > at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> > at
> >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
> > at
> >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
> > at
> >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
> > at
> >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> > at
> >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> > at
> >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
> > at
> >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
> > at
> >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
>

Re: null pointer error with solr deduplication

Posted by Mark Miller <ma...@gmail.com>.
A better error would be nicer.

In the past, when I have had docs with the same id on multiple shards, I
never saw an NPE problem. A lot has changed since then though. I guess, to
me, checking if the id is stored sticks out a bit more. Roughly based on
the stacktrace, it looks to me like it's not finding an id value and that
is causing the NPE.

If it's a legit problem we should probably make a JIRA issue about
improving the error message you end up getting.

-- 
- Mark

http://www.lucidimagination.com

On Sat, Apr 21, 2012 at 5:21 AM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:

> Hi
>
> I might be wrong but it's your responsibility to put unique doc IDs across
> shards.
>
> read this page
>
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
>
> particualry
>
>   - Documents must have a unique key and the unique key must be stored
>   (stored="true" in schema.xml)
>   -
>
>   *The unique key field must be unique across all shards.* If docs with
>   duplicate unique keys are encountered, Solr will make an attempt to
> return
>   valid results, but the behavior may be non-deterministic.
>
> So solr bahaves as it should :) _unexpectidly_
>
> But I agree in that sence that there must be no error especially such as
> NPE.
>
> Best Regards
> Alexander Aristov
>
>
> On 21 April 2012 03:42, Peter Markey <su...@gmail.com> wrote:
>
> > Hello,
> >
> > I have been trying out deduplication in solr by following:
> > http://wiki.apache.org/solr/Deduplication. I have defined a signature
> > field
> > to hold the values of the signature created based on few other fields in
> a
> > document and the idea seems to work like a charm in a single solr
> instance.
> > But, when I have multiple cores and try to do a distributed search (
> >
> >
> Http://localhost:8080/solr/core0/select?q=*&shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2&facet=true&facet.field=doc_id
> > )
> > I get the error pasted below. While normal search (with just q) works
> fine,
> > the facet/stats queries seem to be the culprit. The doc_id contains
> > duplicate ids since I'm testing the same set of documents indexed in both
> > the cores(dedupe, dedupe2). Any insights would be highly appreciated.
> >
> > Thanks
> >
> >
> >
> > 20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.NullPointerException
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
> > at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> > at
> >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
> > at
> >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
> > at
> >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
> > at
> >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> > at
> >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> > at
> >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
> > at
> >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
> > at
> >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
>

Re: null pointer error with solr deduplication

Posted by Alexander Aristov <al...@gmail.com>.
Hi

I might be wrong but it's your responsibility to put unique doc IDs across
shards.

read this page
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

particualry

   - Documents must have a unique key and the unique key must be stored
   (stored="true" in schema.xml)
   -

   *The unique key field must be unique across all shards.* If docs with
   duplicate unique keys are encountered, Solr will make an attempt to return
   valid results, but the behavior may be non-deterministic.

So solr bahaves as it should :) _unexpectidly_

But I agree in that sence that there must be no error especially such as
NPE.

Best Regards
Alexander Aristov


On 21 April 2012 03:42, Peter Markey <su...@gmail.com> wrote:

> Hello,
>
> I have been trying out deduplication in solr by following:
> http://wiki.apache.org/solr/Deduplication. I have defined a signature
> field
> to hold the values of the signature created based on few other fields in a
> document and the idea seems to work like a charm in a single solr instance.
> But, when I have multiple cores and try to do a distributed search (
>
> Http://localhost:8080/solr/core0/select?q=*&shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2&facet=true&facet.field=doc_id
> )
> I get the error pasted below. While normal search (with just q) works fine,
> the facet/stats queries seem to be the culprit. The doc_id contains
> duplicate ids since I'm testing the same set of documents indexed in both
> the cores(dedupe, dedupe2). Any insights would be highly appreciated.
>
> Thanks
>
>
>
> 20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at
>
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612)
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
> at
>
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
> at
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
> at
>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>