You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arun Rangarajan <ar...@gmail.com> on 2014/03/01 00:57:15 UTC

Re: Date query not returning results only some time

Thanks, Jack.

>
How is first_publish_date defined?

<field name="first_publish_date" type="date" indexed="true" stored="true" />

with "date" being

<fieldType name="date" class="solr.TrieDateField" precisionStep="0"
positionIncrementGap="0" />


Yes, we need to fix the Boolean operators AND, OR and NOT as mentioned in
http://searchhub.org/2011/12/28/why-not-and-or-and-not/ but I believe that
is not an issue here, because the same query returns results few mins
before the full index started.



On Fri, Feb 28, 2014 at 8:39 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> How is first_publish_date defined?
>
> After queries start failing, do an explicit query of some of the document
> IDs that you think should be present and see what the first_publish_date
> field contains.
>
> Also, Solr and Lucene queries are not strict Boolean, so ANDing of a
> purely negative term requires explicitly referring to all documents before
> applying the negation.
>
> So,
>
> AND -tag_id:268702
>
> should be:
>
> AND (*:* -tag_id:268702)
>
> Or, maybe you actually wanted this:
>
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] -tag_id:268702
>
> -- Jack Krupansky
>
> -----Original Message----- From: Arun Rangarajan
> Sent: Friday, February 28, 2014 11:15 AM
> To: solr-user@lucene.apache.org
> Subject: Date query not returning results only some time
>
>
> Solr server version 4.2.1
>
> I am facing a strange issue with a date query like this:
>
> q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
> -tag_id:268702&fq=(burial_score:[* TO 0.49] AND
> -tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id
>
> The only process by which we add documents to the core on which this query
> executes is via data import handler full import. We do indexing on master
> and queries are executed against a slave.
>
> This query returns results till the time full import starts (1 AM PST
> daily). But the moment full import starts, it does not return any results.
> Other queries return results.
>
> Our auto commit settings in solrconfig have openSearcher set to false as
> shown below:
> <updateHandler class="solr.DirectUpdateHandler2">
> <autoCommit>
> <maxDocs>25000</maxDocs>
> <maxTime>600000</maxTime> <!-- millis -->
> <openSearcher>false</openSearcher>
> </autoCommit>
>
>    <updateLog>
>      <str name="dir">${solr.updatelog.dir:}</str>
>    </updateLog>
> </updateHandler>
>
> It starts returning results after the full import finishes and issues a
> commit, which takes about 1.5 hrs. The pollInterval for slave is set for
> every hour:
>
> <requestHandler name="/replication" class="solr.ReplicationHandler" >
> <lst name="master">
> <str name="enable">${enable.master:false}</str>
> <str name="replicateAfter">startup</str>
> <str name="replicateAfter">commit</str>
> <str name="replicateAfter">optimize</str>
> <str
> name="confFiles">solrconfig.xml,data-config.xml,schema.
> xml,stopwords.txt,synonyms.txt,elevate.xml</str>
> </lst>
> <lst name="slave">
> <str name="enable">${enable.slave:false}</str>
> <str name="masterUrl">http://${master.ip}:${master.port}/solr/${
> solr.core.name}/replication</str>
> <str name="pollInterval">01:00:00</str>
> </lst>
> </requestHandler>
>
> What am I doing wrong? Please let me know if you need any more details to
> help me debug this.
>

Re: Date query not returning results only some time

Posted by Arun Rangarajan <ar...@gmail.com>.
Erick,
Thanks a lot for the detailed explanation. That clarified things for me
better.


On Sun, Mar 2, 2014 at 10:04 AM, Erick Erickson <er...@gmail.com>wrote:

> Well, in M/S setups the master shouldn't be searching at all,
> but that's a nit.
>
> That aside, whether the master has opened a new or
> searcher or not is irrelevant to what the slave replicates.
> What _is_ relevant is whether any of the files on disk that
> comprise the index (i.e. the segment files) have been
> changed. Really, if any of them have been closed/merged
> whatever since the last sync. Imagine it like this (this isn't
> quite what happens, but it's a useful model). The slave
> says "here's a list of my segments, is it the same as the
> list of closed segments on the master?" If the answer
> is no, a replication is performed. Actually, this is done
> much more efficiently, but that's the idea.
>
> You seem to be really asking about the whole issue of whether
> searches on the various nodes (master + slaves) is
> consistent. This is one of the problems with M/S setups, they
> can be different by whatever has happened in the polling interval.
>
> The state of the master's searchers just doesn't enter the picture.
>
> Glad the problem is solved no matter what.
>
> Erick
>
> On Sat, Mar 1, 2014 at 10:26 PM, Arun Rangarajan
> <ar...@gmail.com> wrote:
> >> The slave is polling the master after the interval specified in
> > solrconfig.xml. The slave essentially asks "has anything changed?" If
> so, the
> > changes are brought down to the slave.
> > Yes, I understand this, but if master does not open a new searcher after
> > auto commits (which would indicate that the new index is not quite ready
> > yet) and if master is still using the old index to serve search
> requests, I
> > would expect the slave to do the same as well. Or the slave should at
> least
> > not replicate or not open a new searcher, until the master opened a new
> > searcher. But that is just the way I see it and it may be wrong.
> >
> >> What's your polling interval on the slave anyway? Sounds like it's quite
> > frequent if you notice this immediately after the DIH starts.
> > No, polling interval is set to 1 hour, but the full import was set to run
> > at 1 AM. I believe a delete followed by few docs got replicated after the
> > first few auto commits when the slave probably polled around 1:10 AM and
> > slave index had few docs for an hour before the next polling happened,
> > which is why the date query was returning empty results for exactly that
> > one hour. (The full index takes about 1.5 hours to finish.)
> >
> > Anyway the problem is now solved by specifying "clean=false" in the DIH
> > full import command.
> >
> >
> > On Sat, Mar 1, 2014 at 9:12 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> bq: the slave anyway replicates the index after auto commits! (Is this
> >> desired behavior?)
> >>
> >> Absolutely it's desired behavior. The slave is polling the master
> >> after the interval
> >> specified in solrconfig.xml. The slave essentially asks "has anything
> >> changed?" If so,
> >> the changes are brought down to the slave. And by definition, commits
> >> change the index,
> >> especially if all docs have been deleted....
> >>
> >> What's your polling interval on the slave anyway? Sounds like it's
> >> quite frequent if you
> >> notice this immediately after the DIH starts.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
> >> <ar...@gmail.com> wrote:
> >> > I believe I figured out what the issue is. Even though we do not open
> a
> >> new
> >> > searcher on master during full import, the slave anyway replicates the
> >> > index after auto commits! (Is this desired behavior?) Since
> "clean=true"
> >> > this meant all the docs were deleted on slave and a partial index got
> >> > replicated! The reason only the date query did not return any results
> is
> >> > because recently created docs have higher doc IDs and we index by
> >> ascending
> >> > order of IDs!
> >> >
> >> > I believe I have two options:
> >> > - as Chris suggested I have to use "clean=false" so the existing docs
> are
> >> > not deleted first on the slave. Since we have primary keys, newly
> added
> >> > docs will overwrite old docs as they get added.
> >> > - disable replication after commits. Replicate only after optimize.
> >> >
> >> > Thx all for your help.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
> >> > <ar...@gmail.com>wrote:
> >> >
> >> >> Thx, Erick and Chris.
> >> >>
> >> >> This is indeed very strange. Other queries which do not restrict by
> the
> >> >> date field are returning results, so the index is definitely not
> empty.
> >> Has
> >> >> it got something to do with the date query part, with NOW/DAY or
> >> something
> >> >> in here?
> >> >> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
> >> >>
> >> >> For now, I have set up a script to just log the number of docs on the
> >> >> slave every minute. Will monitor and report the findings.
> >> >>
> >> >>
> >> >> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <
> >> hossman_lucene@fucit.org
> >> >> > wrote:
> >> >>
> >> >>>
> >> >>> : This is odd. The full import, I think, deletes the
> >> >>> : docs in the index when it starts.
> >> >>>
> >> >>> Yeah, if you are doing a full-import everyday, and you don't want
> it to
> >> >>> delete all docs when it starts, you need to specify "clearn=false"
> >> >>>
> >> >>>
> >> >>>
> >>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
> >> >>>
> >> >>>
> >> >>>
> >> >>> -Hoss
> >> >>> http://www.lucidworks.com/
> >> >>>
> >> >>
> >> >>
> >>
>

Re: Date query not returning results only some time

Posted by Erick Erickson <er...@gmail.com>.
Well, in M/S setups the master shouldn't be searching at all,
but that's a nit.

That aside, whether the master has opened a new or
searcher or not is irrelevant to what the slave replicates.
What _is_ relevant is whether any of the files on disk that
comprise the index (i.e. the segment files) have been
changed. Really, if any of them have been closed/merged
whatever since the last sync. Imagine it like this (this isn't
quite what happens, but it's a useful model). The slave
says "here's a list of my segments, is it the same as the
list of closed segments on the master?" If the answer
is no, a replication is performed. Actually, this is done
much more efficiently, but that's the idea.

You seem to be really asking about the whole issue of whether
searches on the various nodes (master + slaves) is
consistent. This is one of the problems with M/S setups, they
can be different by whatever has happened in the polling interval.

The state of the master's searchers just doesn't enter the picture.

Glad the problem is solved no matter what.

Erick

On Sat, Mar 1, 2014 at 10:26 PM, Arun Rangarajan
<ar...@gmail.com> wrote:
>> The slave is polling the master after the interval specified in
> solrconfig.xml. The slave essentially asks "has anything changed?" If so, the
> changes are brought down to the slave.
> Yes, I understand this, but if master does not open a new searcher after
> auto commits (which would indicate that the new index is not quite ready
> yet) and if master is still using the old index to serve search requests, I
> would expect the slave to do the same as well. Or the slave should at least
> not replicate or not open a new searcher, until the master opened a new
> searcher. But that is just the way I see it and it may be wrong.
>
>> What's your polling interval on the slave anyway? Sounds like it's quite
> frequent if you notice this immediately after the DIH starts.
> No, polling interval is set to 1 hour, but the full import was set to run
> at 1 AM. I believe a delete followed by few docs got replicated after the
> first few auto commits when the slave probably polled around 1:10 AM and
> slave index had few docs for an hour before the next polling happened,
> which is why the date query was returning empty results for exactly that
> one hour. (The full index takes about 1.5 hours to finish.)
>
> Anyway the problem is now solved by specifying "clean=false" in the DIH
> full import command.
>
>
> On Sat, Mar 1, 2014 at 9:12 AM, Erick Erickson <er...@gmail.com>wrote:
>
>> bq: the slave anyway replicates the index after auto commits! (Is this
>> desired behavior?)
>>
>> Absolutely it's desired behavior. The slave is polling the master
>> after the interval
>> specified in solrconfig.xml. The slave essentially asks "has anything
>> changed?" If so,
>> the changes are brought down to the slave. And by definition, commits
>> change the index,
>> especially if all docs have been deleted....
>>
>> What's your polling interval on the slave anyway? Sounds like it's
>> quite frequent if you
>> notice this immediately after the DIH starts.
>>
>> Best,
>> Erick
>>
>> On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
>> <ar...@gmail.com> wrote:
>> > I believe I figured out what the issue is. Even though we do not open a
>> new
>> > searcher on master during full import, the slave anyway replicates the
>> > index after auto commits! (Is this desired behavior?) Since "clean=true"
>> > this meant all the docs were deleted on slave and a partial index got
>> > replicated! The reason only the date query did not return any results is
>> > because recently created docs have higher doc IDs and we index by
>> ascending
>> > order of IDs!
>> >
>> > I believe I have two options:
>> > - as Chris suggested I have to use "clean=false" so the existing docs are
>> > not deleted first on the slave. Since we have primary keys, newly added
>> > docs will overwrite old docs as they get added.
>> > - disable replication after commits. Replicate only after optimize.
>> >
>> > Thx all for your help.
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
>> > <ar...@gmail.com>wrote:
>> >
>> >> Thx, Erick and Chris.
>> >>
>> >> This is indeed very strange. Other queries which do not restrict by the
>> >> date field are returning results, so the index is definitely not empty.
>> Has
>> >> it got something to do with the date query part, with NOW/DAY or
>> something
>> >> in here?
>> >> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>> >>
>> >> For now, I have set up a script to just log the number of docs on the
>> >> slave every minute. Will monitor and report the findings.
>> >>
>> >>
>> >> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <
>> hossman_lucene@fucit.org
>> >> > wrote:
>> >>
>> >>>
>> >>> : This is odd. The full import, I think, deletes the
>> >>> : docs in the index when it starts.
>> >>>
>> >>> Yeah, if you are doing a full-import everyday, and you don't want it to
>> >>> delete all docs when it starts, you need to specify "clearn=false"
>> >>>
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>> >>>
>> >>>
>> >>>
>> >>> -Hoss
>> >>> http://www.lucidworks.com/
>> >>>
>> >>
>> >>
>>

Re: Date query not returning results only some time

Posted by Arun Rangarajan <ar...@gmail.com>.
> The slave is polling the master after the interval specified in
solrconfig.xml. The slave essentially asks "has anything changed?" If so, the
changes are brought down to the slave.
Yes, I understand this, but if master does not open a new searcher after
auto commits (which would indicate that the new index is not quite ready
yet) and if master is still using the old index to serve search requests, I
would expect the slave to do the same as well. Or the slave should at least
not replicate or not open a new searcher, until the master opened a new
searcher. But that is just the way I see it and it may be wrong.

> What's your polling interval on the slave anyway? Sounds like it's quite
frequent if you notice this immediately after the DIH starts.
No, polling interval is set to 1 hour, but the full import was set to run
at 1 AM. I believe a delete followed by few docs got replicated after the
first few auto commits when the slave probably polled around 1:10 AM and
slave index had few docs for an hour before the next polling happened,
which is why the date query was returning empty results for exactly that
one hour. (The full index takes about 1.5 hours to finish.)

Anyway the problem is now solved by specifying "clean=false" in the DIH
full import command.


On Sat, Mar 1, 2014 at 9:12 AM, Erick Erickson <er...@gmail.com>wrote:

> bq: the slave anyway replicates the index after auto commits! (Is this
> desired behavior?)
>
> Absolutely it's desired behavior. The slave is polling the master
> after the interval
> specified in solrconfig.xml. The slave essentially asks "has anything
> changed?" If so,
> the changes are brought down to the slave. And by definition, commits
> change the index,
> especially if all docs have been deleted....
>
> What's your polling interval on the slave anyway? Sounds like it's
> quite frequent if you
> notice this immediately after the DIH starts.
>
> Best,
> Erick
>
> On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
> <ar...@gmail.com> wrote:
> > I believe I figured out what the issue is. Even though we do not open a
> new
> > searcher on master during full import, the slave anyway replicates the
> > index after auto commits! (Is this desired behavior?) Since "clean=true"
> > this meant all the docs were deleted on slave and a partial index got
> > replicated! The reason only the date query did not return any results is
> > because recently created docs have higher doc IDs and we index by
> ascending
> > order of IDs!
> >
> > I believe I have two options:
> > - as Chris suggested I have to use "clean=false" so the existing docs are
> > not deleted first on the slave. Since we have primary keys, newly added
> > docs will overwrite old docs as they get added.
> > - disable replication after commits. Replicate only after optimize.
> >
> > Thx all for your help.
> >
> >
> >
> >
> >
> > On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
> > <ar...@gmail.com>wrote:
> >
> >> Thx, Erick and Chris.
> >>
> >> This is indeed very strange. Other queries which do not restrict by the
> >> date field are returning results, so the index is definitely not empty.
> Has
> >> it got something to do with the date query part, with NOW/DAY or
> something
> >> in here?
> >> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
> >>
> >> For now, I have set up a script to just log the number of docs on the
> >> slave every minute. Will monitor and report the findings.
> >>
> >>
> >> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <
> hossman_lucene@fucit.org
> >> > wrote:
> >>
> >>>
> >>> : This is odd. The full import, I think, deletes the
> >>> : docs in the index when it starts.
> >>>
> >>> Yeah, if you are doing a full-import everyday, and you don't want it to
> >>> delete all docs when it starts, you need to specify "clearn=false"
> >>>
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
> >>>
> >>>
> >>>
> >>> -Hoss
> >>> http://www.lucidworks.com/
> >>>
> >>
> >>
>

Re: Date query not returning results only some time

Posted by Erick Erickson <er...@gmail.com>.
bq: the slave anyway replicates the index after auto commits! (Is this
desired behavior?)

Absolutely it's desired behavior. The slave is polling the master
after the interval
specified in solrconfig.xml. The slave essentially asks "has anything
changed?" If so,
the changes are brought down to the slave. And by definition, commits
change the index,
especially if all docs have been deleted....

What's your polling interval on the slave anyway? Sounds like it's
quite frequent if you
notice this immediately after the DIH starts.

Best,
Erick

On Fri, Feb 28, 2014 at 9:04 PM, Arun Rangarajan
<ar...@gmail.com> wrote:
> I believe I figured out what the issue is. Even though we do not open a new
> searcher on master during full import, the slave anyway replicates the
> index after auto commits! (Is this desired behavior?) Since "clean=true"
> this meant all the docs were deleted on slave and a partial index got
> replicated! The reason only the date query did not return any results is
> because recently created docs have higher doc IDs and we index by ascending
> order of IDs!
>
> I believe I have two options:
> - as Chris suggested I have to use "clean=false" so the existing docs are
> not deleted first on the slave. Since we have primary keys, newly added
> docs will overwrite old docs as they get added.
> - disable replication after commits. Replicate only after optimize.
>
> Thx all for your help.
>
>
>
>
>
> On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
> <ar...@gmail.com>wrote:
>
>> Thx, Erick and Chris.
>>
>> This is indeed very strange. Other queries which do not restrict by the
>> date field are returning results, so the index is definitely not empty. Has
>> it got something to do with the date query part, with NOW/DAY or something
>> in here?
>> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>>
>> For now, I have set up a script to just log the number of docs on the
>> slave every minute. Will monitor and report the findings.
>>
>>
>> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <hossman_lucene@fucit.org
>> > wrote:
>>
>>>
>>> : This is odd. The full import, I think, deletes the
>>> : docs in the index when it starts.
>>>
>>> Yeah, if you are doing a full-import everyday, and you don't want it to
>>> delete all docs when it starts, you need to specify "clearn=false"
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>>>
>>>
>>>
>>> -Hoss
>>> http://www.lucidworks.com/
>>>
>>
>>

Re: Date query not returning results only some time

Posted by Arun Rangarajan <ar...@gmail.com>.
I believe I figured out what the issue is. Even though we do not open a new
searcher on master during full import, the slave anyway replicates the
index after auto commits! (Is this desired behavior?) Since "clean=true"
this meant all the docs were deleted on slave and a partial index got
replicated! The reason only the date query did not return any results is
because recently created docs have higher doc IDs and we index by ascending
order of IDs!

I believe I have two options:
- as Chris suggested I have to use "clean=false" so the existing docs are
not deleted first on the slave. Since we have primary keys, newly added
docs will overwrite old docs as they get added.
- disable replication after commits. Replicate only after optimize.

Thx all for your help.





On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
<ar...@gmail.com>wrote:

> Thx, Erick and Chris.
>
> This is indeed very strange. Other queries which do not restrict by the
> date field are returning results, so the index is definitely not empty. Has
> it got something to do with the date query part, with NOW/DAY or something
> in here?
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>
> For now, I have set up a script to just log the number of docs on the
> slave every minute. Will monitor and report the findings.
>
>
> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : This is odd. The full import, I think, deletes the
>> : docs in the index when it starts.
>>
>> Yeah, if you are doing a full-import everyday, and you don't want it to
>> delete all docs when it starts, you need to specify "clearn=false"
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>

Re: Date query not returning results only some time

Posted by Erick Erickson <er...@gmail.com>.
Well, I'd certainly try removing parts of the query to see
what was actually in the index.

I don't see anything obvious though...

Erick


On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
<ar...@gmail.com>wrote:

> Thx, Erick and Chris.
>
> This is indeed very strange. Other queries which do not restrict by the
> date field are returning results, so the index is definitely not empty. Has
> it got something to do with the date query part, with NOW/DAY or something
> in here?
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>
> For now, I have set up a script to just log the number of docs on the slave
> every minute. Will monitor and report the findings.
>
>
> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter
> <ho...@fucit.org>wrote:
>
> >
> > : This is odd. The full import, I think, deletes the
> > : docs in the index when it starts.
> >
> > Yeah, if you are doing a full-import everyday, and you don't want it to
> > delete all docs when it starts, you need to specify "clearn=false"
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>

Re: Date query not returning results only some time

Posted by Arun Rangarajan <ar...@gmail.com>.
Thx, Erick and Chris.

This is indeed very strange. Other queries which do not restrict by the
date field are returning results, so the index is definitely not empty. Has
it got something to do with the date query part, with NOW/DAY or something
in here?
first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]

For now, I have set up a script to just log the number of docs on the slave
every minute. Will monitor and report the findings.


On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : This is odd. The full import, I think, deletes the
> : docs in the index when it starts.
>
> Yeah, if you are doing a full-import everyday, and you don't want it to
> delete all docs when it starts, you need to specify "clearn=false"
>
>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Date query not returning results only some time

Posted by Chris Hostetter <ho...@fucit.org>.
: This is odd. The full import, I think, deletes the
: docs in the index when it starts.

Yeah, if you are doing a full-import everyday, and you don't want it to 
delete all docs when it starts, you need to specify "clearn=false"

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand



-Hoss
http://www.lucidworks.com/

Re: Date query not returning results only some time

Posted by Erick Erickson <er...@gmail.com>.
This is odd. The full import, I think, deletes the
docs in the index when it starts.

If you check our index directory on the slave, is it empty
after the full import starts? If so, check your solr log
on the slave... does it show a replication?

Shooting in the dark...

Erick


On Fri, Feb 28, 2014 at 3:57 PM, Arun Rangarajan
<ar...@gmail.com>wrote:

> Thanks, Jack.
>
> >
> How is first_publish_date defined?
>
> <field name="first_publish_date" type="date" indexed="true" stored="true"
> />
>
> with "date" being
>
> <fieldType name="date" class="solr.TrieDateField" precisionStep="0"
> positionIncrementGap="0" />
>
>
> Yes, we need to fix the Boolean operators AND, OR and NOT as mentioned in
> http://searchhub.org/2011/12/28/why-not-and-or-and-not/ but I believe that
> is not an issue here, because the same query returns results few mins
> before the full index started.
>
>
>
> On Fri, Feb 28, 2014 at 8:39 AM, Jack Krupansky <jack@basetechnology.com
> >wrote:
>
> > How is first_publish_date defined?
> >
> > After queries start failing, do an explicit query of some of the document
> > IDs that you think should be present and see what the first_publish_date
> > field contains.
> >
> > Also, Solr and Lucene queries are not strict Boolean, so ANDing of a
> > purely negative term requires explicitly referring to all documents
> before
> > applying the negation.
> >
> > So,
> >
> > AND -tag_id:268702
> >
> > should be:
> >
> > AND (*:* -tag_id:268702)
> >
> > Or, maybe you actually wanted this:
> >
> > first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] -tag_id:268702
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Arun Rangarajan
> > Sent: Friday, February 28, 2014 11:15 AM
> > To: solr-user@lucene.apache.org
> > Subject: Date query not returning results only some time
> >
> >
> > Solr server version 4.2.1
> >
> > I am facing a strange issue with a date query like this:
> >
> > q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
> > -tag_id:268702&fq=(burial_score:[* TO 0.49] AND
> > -tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id
> >
> > The only process by which we add documents to the core on which this
> query
> > executes is via data import handler full import. We do indexing on master
> > and queries are executed against a slave.
> >
> > This query returns results till the time full import starts (1 AM PST
> > daily). But the moment full import starts, it does not return any
> results.
> > Other queries return results.
> >
> > Our auto commit settings in solrconfig have openSearcher set to false as
> > shown below:
> > <updateHandler class="solr.DirectUpdateHandler2">
> > <autoCommit>
> > <maxDocs>25000</maxDocs>
> > <maxTime>600000</maxTime> <!-- millis -->
> > <openSearcher>false</openSearcher>
> > </autoCommit>
> >
> >    <updateLog>
> >      <str name="dir">${solr.updatelog.dir:}</str>
> >    </updateLog>
> > </updateHandler>
> >
> > It starts returning results after the full import finishes and issues a
> > commit, which takes about 1.5 hrs. The pollInterval for slave is set for
> > every hour:
> >
> > <requestHandler name="/replication" class="solr.ReplicationHandler" >
> > <lst name="master">
> > <str name="enable">${enable.master:false}</str>
> > <str name="replicateAfter">startup</str>
> > <str name="replicateAfter">commit</str>
> > <str name="replicateAfter">optimize</str>
> > <str
> > name="confFiles">solrconfig.xml,data-config.xml,schema.
> > xml,stopwords.txt,synonyms.txt,elevate.xml</str>
> > </lst>
> > <lst name="slave">
> > <str name="enable">${enable.slave:false}</str>
> > <str name="masterUrl">http://${master.ip}:${master.port}/solr/${
> > solr.core.name}/replication</str>
> > <str name="pollInterval">01:00:00</str>
> > </lst>
> > </requestHandler>
> >
> > What am I doing wrong? Please let me know if you need any more details to
> > help me debug this.
> >
>