You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Nick Dimiduk <nd...@gmail.com> on 2009/07/13 22:30:19 UTC

Sharded Index Creation Magic?

Hello!

I'm working with Solr-1.3.0 using a sharded index for distributed,
aggregated search. I've successfully run through the example described in
the DistributedSearch wiki page. I have built an index from a corpus of some
50mil documents in an HBase table and created 7 shards using the
org.apache.hadoop.hbase.mapred.BuildTableIndex. I can deploy any one of
these shards to a single Solr instance and happily search the index after
tweaking the schema appropriately. However, when I search across all
deployed shards using the &shards= query parameter (
http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body\%3A%3Aterm),
I get a NullPointerException:

java.lang.NullPointerException
	at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
	at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)

Debugging into the QueryComponent.mergeIds() method reveals the instance
sreq.responses (line 356) contains one response for each shard specified,
each with the number of results received by the independant queries. The
problems begin down at line 370 because the SolrDocument instance has only a
score field -- which proves problematic in the following line where the id
is requested. The SolrDocument, only containing a score, lacks the
designated ID field (from my schema) and thus the document cannot be added
to the results queue.

Because the example on the wiki works by loading the documents directly into
Solr for indexing, I have come to the conclusion that there is some extra
magic happening in this index generation process which my process lacks.

Thanks for the help!

Re: Sharded Index Creation Magic?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jul 14, 2009 at 10:30 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> I do, but you raise an interesting point. I had named the field
> incorrectly.
> I'm a little puzzled as to why individual search worked with the broken
> field name, but now all is well!
>
>
An individual Solr uses uniqueKey only for replacing documents during
indexing. During a search the uniqueKey is used only for associating certain
pieces of information with documents e.g. highlighting info is written in
the response per uniqueKey. Solr will complain only if you don't specify a
uniqueKey during indexing.

If you forgot to include uniqueKeys in some documents, changed to schema to
add a uniqueKey and then didn't reindex the whole bunch, there will be some
documents in the index without a value in the unique key field. In such a
case, if you use distributed search, it will blow up because it expects all
documents to have a value for the uniqueKey field. These values are used to
merge responses from the shards.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Sharded Index Creation Magic?

Posted by Nick Dimiduk <nd...@gmail.com>.

I do, but you raise an interesting point. I had named the field incorrectly.
I'm a little puzzled as to why individual search worked with the broken
field name, but now all is well!

On Tue, Jul 14, 2009 at 12:03 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > However, when I search across all
> > deployed shards using the &shards= query parameter (
> >
> >
> http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body
> > \%3A%3Aterm),
> > I get a NullPointerException:
> >
> > java.lang.NullPointerException
> >        at
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
> >        at
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
> >        at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
> >        at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> >        at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> >        at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> >
> > Debugging into the QueryComponent.mergeIds() method reveals the instance
> > sreq.responses (line 356) contains one response for each shard specified,
> > each with the number of results received by the independant queries. The
> > problems begin down at line 370 because the SolrDocument instance has
> only
> > a
> > score field -- which proves problematic in the following line where the
> id
> > is requested. The SolrDocument, only containing a score, lacks the
> > designated ID field (from my schema) and thus the document cannot be
> added
> > to the results queue.
> >
> > Because the example on the wiki works by loading the documents directly
> > into
> > Solr for indexing, I have come to the conclusion that there is some extra
> > magic happening in this index generation process which my process lacks.
> >
>
>
> Do you have a uniqueKey defined in your schema.xml?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Sharded Index Creation Magic?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> However, when I search across all
> deployed shards using the &shards= query parameter (
>
> http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body
> \%3A%3Aterm),
> I get a NullPointerException:
>
> java.lang.NullPointerException
>        at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
>        at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>
> Debugging into the QueryComponent.mergeIds() method reveals the instance
> sreq.responses (line 356) contains one response for each shard specified,
> each with the number of results received by the independant queries. The
> problems begin down at line 370 because the SolrDocument instance has only
> a
> score field -- which proves problematic in the following line where the id
> is requested. The SolrDocument, only containing a score, lacks the
> designated ID field (from my schema) and thus the document cannot be added
> to the results queue.
>
> Because the example on the wiki works by loading the documents directly
> into
> Solr for indexing, I have come to the conclusion that there is some extra
> magic happening in this index generation process which my process lacks.
>


Do you have a uniqueKey defined in your schema.xml?

-- 
Regards,
Shalin Shekhar Mangar.