You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sandy Ding <sa...@gmail.com> on 2015/01/04 04:48:30 UTC

Re: solr export get wrong results

Thanks a lot for your for your help, Joel.
Just wondering, why does "export" have such limitations? It uses the same
query handler with "select", isn't it?

2014-12-31 10:28 GMT+08:00 Joel Bernstein <jo...@gmail.com>:

> For the initial release only JSON output format is supported with the
> /export feature. Also there is no built-in distributed support yet. Both of
> these features are likely to follow in future releases.
>
> For the initial release you'll need a client that can handle the JSON
> format and distributed logic. The Heliosearch project includes a client
> called CloudSolrStream that you can use for this purpose. Here are two
> links to get started with CloudSolrStream:
>
>
> https://github.com/Heliosearch/heliosearch/blob/helio_4_10/solr/solrj/src/java/org/apache/solr/client/solrj/streaming/CloudSolrStream.java
> http://heliosearch.org/streaming-aggregation-for-solrcloud/
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Mon, Dec 29, 2014 at 2:20 AM, Sandy Ding <sa...@gmail.com>
> wrote:
>
> > Hi, Joel
> >
> > Thanks for your reply.
> > It seems that the weird export results is because that I removed the
> "<str
> > name>xsort</str>" invariant of the export request handler in the default
> > sorlconfig.xml to get csv-format output.
> > I don't quite understand the meaning of "xsort", but I removed it
> because I
> > always get json response (as you said) with the xsort invariant.
> > Is there a way to get a csv output using export?
> > And also, can I get full results from all shards? (I tried to set
> > "distrib=true" but get "SyntaxError:xport RankQuery is required for
> xsort:
> > rq={!xport}", and I do have rq={!xport} in the export invariants)
> >
> >
> > 2014-12-27 3:21 GMT+08:00 Joel Bernstein <jo...@gmail.com>:
> >
> > > Hi Sandy,
> > >
> > > I pulled Solr 4.10.3 to see if I could recreate the issue you are
> seeing
> > > with export and I wasn't able to recreate the bug you are seeing. For
> > > example the following query:
> > >
> > > http://localhost:8983/solr/collection1/export?q=join_i:[500000 TO
> > > 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i
> > >
> > >
> > > Brings back the following result:
> > >
> > >
> > > {"responseHeader": {"status": 0}, "response":{"numFound":11,
> > >
> > >
> >
> "docs":[{"join_i":500000,"ShopId_i":578917},{"join_i":500001,"ShopId_i":294217},{"join_i":500002,"ShopId_i":199805},{"join_i":500003,"ShopId_i":633461},{"join_i":500004,"ShopId_i":472995},{"join_i":500005,"ShopId_i":672122},{"join_i":500006,"ShopId_i":394637},{"join_i":500007,"ShopId_i":446443},{"join_i":500008,"ShopId_i":697329},{"join_i":500009,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}}
> > >
> > >
> > > Notice the join_i values are all within the correct range.
> > >
> > > If you can post the export handler configuration we should be able to
> > > see the issue.
> > >
> > >
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> > > On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein <jo...@gmail.com>
> > > wrote:
> > >
> > > > Hi Sandy,
> > > >
> > > > The export handler should only return documents in JSON format. The
> > > > results in your second example are in XML for format so something
> looks
> > > to
> > > > be wrong in the configuration. Can you post what your solrconfig
> looks
> > > like?
> > > >
> > > > Joel
> > > >
> > > > Joel Bernstein
> > > > Search Engineer at Heliosearch
> > > >
> > > > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson <
> > > erickerickson@gmail.com>
> > > > wrote:
> > > >
> > > >> I think you missed a very important part of Jack's reply:
> > > >>
> > > >> bq: I notice that you don't have distrib=false on your select, which
> > > >> would make your select be from all nodes, while export would only be
> > > >> docs from the specific node you sent the request to.
> > > >>
> > > >> And from the Reference Guide on export
> > > >>
> > > >> bq: The initial release treats all queries as non-distributed
> > > >> requests. So the client is responsible for making the calls to each
> > > >> Solr instance and merging the results.
> > > >>
> > > >> So the export statement you're sending is _only_ exporting the
> results
> > > >> from the shard on 8983 and completely ignoring the other (6?)
> shards,
> > > >> whereas the query you're sending is getting the results from all the
> > > >> shards.
> > > >>
> > > >> As Jack said, add &distrib=false to the query, send it to the same
> > > >> shard you send the export command to and the results should match.
> > > >>
> > > >> Also, be sure your configuration for the /select handler doesn't
> have
> > > >> any additional default parameters that might alter the results, but
> I
> > > >> doubt that's really a problem here.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Fri, Dec 26, 2014 at 7:02 AM, Ahmet Arslan
> > <iorixxx@yahoo.com.invalid
> > > >
> > > >> wrote:
> > > >> > Hi,
> > > >> >
> > > >> > Do you have any custom solr components deployed? May be custom
> > > response
> > > >> writer?
> > > >> >
> > > >> > Ahmet
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Friday, December 26, 2014 3:26 PM, Sandy Ding <
> > > >> sandy.dingxin@gmail.com> wrote:
> > > >> > Hi, Ahmet,
> > > >> >
> > > >> > I use libuuid for unique id and I guess there shouldn't be
> duplicate
> > > >> ids.
> > > >> > Also, the results are not just incomplete, they are screwed.
> > > >> >
> > > >> >
> > > >> > 2014-12-26 20:19 GMT+08:00 Ahmet Arslan <iorixxx@yahoo.com.invalid
> > >:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> Two different things :
> > > >> >>
> > > >> >> If you have unique key defined document with same id override
> > within
> > > a
> > > >> >> single shard.
> > > >> >>
> > > >> >> Plus, uniqueIDs expected to be unique across shards.
> > > >> >>
> > > >> >> Ahmet
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Friday, December 26, 2014 11:00 AM, Sandy Ding <
> > > >> sandy.dingxin@gmail.com>
> > > >> >> wrote:
> > > >> >> Hi, all
> > > >> >>
> > > >> >> I've recently set up a solr cluster and found that "export"
> returns
> > > >> >> different results from "select".
> > > >> >> And I confirmed that the "export" results are wrong by manually
> > query
> > > >> the
> > > >> >> results.
> > > >> >> Even simple queries as follows will get different results:
> > > >> >>
> > > >> >> curl "
> > > >> http://localhost:8983/solr/pa_info/select?q=*:*&fl=id&sort=id+desc
> ":
> > > >> >>
> > > >> >>     <response><lst name="responseHeader"><int
> > > name="status">0</int><int
> > > >> >> name="QTime">11</int><lst name="params"><str name="sort">id
> > > >> desc</str><str
> > > >> >> name="fl">id</str><str name="q">*:*</str></lst></lst><result
> > > >> >> name="response" *numFound="1197"*
> start="0"><doc>...</doc></result>
> > > >> >>
> > > >> >> curl "
> > > >> http://localhost:8983/solr/pa_info/export?q=*:*&fl=id&sort=id+desc"
> > > >> >> :
> > > >> >>     {*"numFound":172*, "docs":[..]
> > > >> >>
> > > >> >> Don't have a clue why this happen! Anyone help?
> > > >> >>
> > > >> >> Best,
> > > >> >> Sandy
> > > >> >>
> > > >>
> > > >
> > > >
> > >
> >
>