You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Solr User <so...@gmail.com> on 2017/06/05 18:18:25 UTC

Re: Work-around for "indexed without position data"

Sorry for the delay.  I was able to reproduce this easily with my setup,
but reproducing this on a Solr example proved challenging.  Hopefully the
work that I did to find the situation in which this is produced will help
in resolving the problem.  The driving factor for this appears to be how
updates are sent to Solr.  When sending batches of updates with commits,
the problem is reproduced.  If the commit is held until after all updates
are sent, then no problem is produced.  This leads me to believe that this
issue has something to do with overlapping commits or index merges.  This
was reproducible regardless of running classic or managed schema and
regardless of running Solr core or SolrCloud.

There are not many steps to reproduce this, but you will need a way to send
these updates.  I have included inline create.sh and create.pl scripts to
generate the data and send the updates.  You can index a lastModified field
or something to convince yourself that everything has been re-indexed.  I
left that out to keep the steps lean.  Also, this test is using commit
statements from the client sending the updates for simplicity even though
it is not a good practice.  My normal setup is using Solrj with
commitWithin to allow Solr to manage when the commits take place, but the
same error is produced either way.


*STEPS TO REPRODUCE*

   1. Install Solr 5.5.3 and change to that working directory
   2. bin/solr -e techproducts
   3. bin/solr stop     [Why these next 3 steps?  These are to start the
   index completely new without the 32 example documents as opposed to a
   delete query.  The documents are not posted after the core is detected the
   second time.]
   4. rm -rf ./example/techproducts/solr/techproducts/data/
   5. bin/solr -e techproducts
   6. ./create.sh
   7. curl -X POST -H 'Content-type:application/json' --data-binary '{
   "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
   "multiValued":true, "stored":true } }'
   http://localhost:8983/solr/techproducts/schema
   8.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error]
   9. ./create.sh
   10.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error even though all documents have been re-indexed]

*create.sh*
#!/bin/bash
for i in {1..100}; do
echo "$i"
./create.pl $i > ./create.xml$i
curl http://localhost:8983/solr/techproducts/update?commit=true -H
"Content-Type: text/xml" --data-binary @./create.xml$i
done

*create.pl <http://create.pl>*
#!/usr/bin/perl
my $S = $ARGV[0];
my $I = 100;
my $N = $S*$I + $I;
my $i;
print "<add>\n";
for($i=$S*$I; $i<$N; $i++) {
   print "<doc><field name=\"id\">SP${i}</field><field name=\"cat\">cat
hard drive ${i}</field></doc>\n";
}
print "</add>\n";


On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote:

> Can you reproduce this error? What are the steps you take to reproduce it?
> ( simple is better).
>
> cheers -- Rick
>
>
>
> On 2017-05-25 05:46 PM, Solr User wrote:
>
>> This is in regards to changing a field type from string to
>> text_en_splitting, re-indexing all documents, even optimizing to give the
>> index a chance to merge segments and rewrite itself entirely, and then
>> getting this error when running a phrase query:
>> java.lang.IllegalStateException: field "blah" was indexed without
>> position
>> data; cannot run PhraseQuery
>>
>> I have encountered this issue before and have always done one of the
>> following as a work-around:
>> 1.  Instead of changing the field type on an existing field just create a
>> new field and retire the old one.
>> 2.  Delete the index directory and start from scratch.
>>
>> These work-arounds are not always ideal.  Does anyone know what is holding
>> onto that old field type definition?  What thinks it is still a string?
>> Every document has been re-indexed and I am sure of this because I have a
>> time stamp indexed.  Is there any other way to get this to work?
>>
>> For what it is worth, I am running this in SolrCloud mode but I remember
>> seeing this issue before SolrCloud was released as well.
>>
>>
>

Re: Work-around for "indexed without position data"

Posted by Susheel Kumar <su...@gmail.com>.
Did you try to reproduce this on latest Solr (6.6) just to rule out any bug
with that version (though less likely).  Pls download and do a quick test.

On Mon, Jul 3, 2017 at 5:01 PM, Solr User <so...@gmail.com> wrote:

> Not sure if it helps beyond the steps to reproduce that I supplied above,
> but I also see that "Omit Term Frequencies & Positions" is still set on the
> field according to the LukeRequestHandler:
>
> <str name="flags">ITS------OF------</str>
>
>
>
> On Mon, Jun 5, 2017 at 1:18 PM, Solr User <so...@gmail.com> wrote:
>
> > Sorry for the delay.  I was able to reproduce this easily with my setup,
> > but reproducing this on a Solr example proved challenging.  Hopefully the
> > work that I did to find the situation in which this is produced will help
> > in resolving the problem.  The driving factor for this appears to be how
> > updates are sent to Solr.  When sending batches of updates with commits,
> > the problem is reproduced.  If the commit is held until after all updates
> > are sent, then no problem is produced.  This leads me to believe that
> this
> > issue has something to do with overlapping commits or index merges.  This
> > was reproducible regardless of running classic or managed schema and
> > regardless of running Solr core or SolrCloud.
> >
> > There are not many steps to reproduce this, but you will need a way to
> > send these updates.  I have included inline create.sh and create.pl
> > scripts to generate the data and send the updates.  You can index a
> > lastModified field or something to convince yourself that everything has
> > been re-indexed.  I left that out to keep the steps lean.  Also, this
> test
> > is using commit statements from the client sending the updates for
> > simplicity even though it is not a good practice.  My normal setup is
> using
> > Solrj with commitWithin to allow Solr to manage when the commits take
> > place, but the same error is produced either way.
> >
> >
> > *STEPS TO REPRODUCE*
> >
> >    1. Install Solr 5.5.3 and change to that working directory
> >    2. bin/solr -e techproducts
> >    3. bin/solr stop     [Why these next 3 steps?  These are to start the
> >    index completely new without the 32 example documents as opposed to a
> >    delete query.  The documents are not posted after the core is
> detected the
> >    second time.]
> >    4. rm -rf ./example/techproducts/solr/techproducts/data/
> >    5. bin/solr -e techproducts
> >    6. ./create.sh
> >    7. curl -X POST -H 'Content-type:application/json' --data-binary '{
> >    "replace-field":{ "name":"cat", "type":"text_en_splitting",
> "indexed":true,
> >    "multiValued":true, "stored":true } }' http://localhost:8983/solr/
> >    techproducts/schema
> >    8. http://localhost:8983/solr/techproducts/select?q=cat:%
> >    22hard%20drive%22  [error]
> >    9. ./create.sh
> >    10. http://localhost:8983/solr/techproducts/select?q=cat:%
> >    22hard%20drive%22  [error even though all documents have been
> >    re-indexed]
> >
> > *create.sh*
> > #!/bin/bash
> > for i in {1..100}; do
> > echo "$i"
> > ./create.pl $i > ./create.xml$i
> > curl http://localhost:8983/solr/techproducts/update?commit=true -H
> > "Content-Type: text/xml" --data-binary @./create.xml$i
> > done
> >
> > *create.pl <http://create.pl>*
> > #!/usr/bin/perl
> > my $S = $ARGV[0];
> > my $I = 100;
> > my $N = $S*$I + $I;
> > my $i;
> > print "<add>\n";
> > for($i=$S*$I; $i<$N; $i++) {
> >    print "<doc><field name=\"id\">SP${i}</field><field name=\"cat\">cat
> > hard drive ${i}</field></doc>\n";
> > }
> > print "</add>\n";
> >
> >
> > On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote:
> >
> >> Can you reproduce this error? What are the steps you take to reproduce
> >> it? ( simple is better).
> >>
> >> cheers -- Rick
> >>
> >>
> >>
> >> On 2017-05-25 05:46 PM, Solr User wrote:
> >>
> >>> This is in regards to changing a field type from string to
> >>> text_en_splitting, re-indexing all documents, even optimizing to give
> the
> >>> index a chance to merge segments and rewrite itself entirely, and then
> >>> getting this error when running a phrase query:
> >>> java.lang.IllegalStateException: field "blah" was indexed without
> >>> position
> >>> data; cannot run PhraseQuery
> >>>
> >>> I have encountered this issue before and have always done one of the
> >>> following as a work-around:
> >>> 1.  Instead of changing the field type on an existing field just
> create a
> >>> new field and retire the old one.
> >>> 2.  Delete the index directory and start from scratch.
> >>>
> >>> These work-arounds are not always ideal.  Does anyone know what is
> >>> holding
> >>> onto that old field type definition?  What thinks it is still a string?
> >>> Every document has been re-indexed and I am sure of this because I
> have a
> >>> time stamp indexed.  Is there any other way to get this to work?
> >>>
> >>> For what it is worth, I am running this in SolrCloud mode but I
> remember
> >>> seeing this issue before SolrCloud was released as well.
> >>>
> >>>
> >>
> >
>

Re: Work-around for "indexed without position data"

Posted by Solr User <so...@gmail.com>.
Not sure if it helps beyond the steps to reproduce that I supplied above,
but I also see that "Omit Term Frequencies & Positions" is still set on the
field according to the LukeRequestHandler:

<str name="flags">ITS------OF------</str>



On Mon, Jun 5, 2017 at 1:18 PM, Solr User <so...@gmail.com> wrote:

> Sorry for the delay.  I was able to reproduce this easily with my setup,
> but reproducing this on a Solr example proved challenging.  Hopefully the
> work that I did to find the situation in which this is produced will help
> in resolving the problem.  The driving factor for this appears to be how
> updates are sent to Solr.  When sending batches of updates with commits,
> the problem is reproduced.  If the commit is held until after all updates
> are sent, then no problem is produced.  This leads me to believe that this
> issue has something to do with overlapping commits or index merges.  This
> was reproducible regardless of running classic or managed schema and
> regardless of running Solr core or SolrCloud.
>
> There are not many steps to reproduce this, but you will need a way to
> send these updates.  I have included inline create.sh and create.pl
> scripts to generate the data and send the updates.  You can index a
> lastModified field or something to convince yourself that everything has
> been re-indexed.  I left that out to keep the steps lean.  Also, this test
> is using commit statements from the client sending the updates for
> simplicity even though it is not a good practice.  My normal setup is using
> Solrj with commitWithin to allow Solr to manage when the commits take
> place, but the same error is produced either way.
>
>
> *STEPS TO REPRODUCE*
>
>    1. Install Solr 5.5.3 and change to that working directory
>    2. bin/solr -e techproducts
>    3. bin/solr stop     [Why these next 3 steps?  These are to start the
>    index completely new without the 32 example documents as opposed to a
>    delete query.  The documents are not posted after the core is detected the
>    second time.]
>    4. rm -rf ./example/techproducts/solr/techproducts/data/
>    5. bin/solr -e techproducts
>    6. ./create.sh
>    7. curl -X POST -H 'Content-type:application/json' --data-binary '{
>    "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
>    "multiValued":true, "stored":true } }' http://localhost:8983/solr/
>    techproducts/schema
>    8. http://localhost:8983/solr/techproducts/select?q=cat:%
>    22hard%20drive%22  [error]
>    9. ./create.sh
>    10. http://localhost:8983/solr/techproducts/select?q=cat:%
>    22hard%20drive%22  [error even though all documents have been
>    re-indexed]
>
> *create.sh*
> #!/bin/bash
> for i in {1..100}; do
> echo "$i"
> ./create.pl $i > ./create.xml$i
> curl http://localhost:8983/solr/techproducts/update?commit=true -H
> "Content-Type: text/xml" --data-binary @./create.xml$i
> done
>
> *create.pl <http://create.pl>*
> #!/usr/bin/perl
> my $S = $ARGV[0];
> my $I = 100;
> my $N = $S*$I + $I;
> my $i;
> print "<add>\n";
> for($i=$S*$I; $i<$N; $i++) {
>    print "<doc><field name=\"id\">SP${i}</field><field name=\"cat\">cat
> hard drive ${i}</field></doc>\n";
> }
> print "</add>\n";
>
>
> On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Can you reproduce this error? What are the steps you take to reproduce
>> it? ( simple is better).
>>
>> cheers -- Rick
>>
>>
>>
>> On 2017-05-25 05:46 PM, Solr User wrote:
>>
>>> This is in regards to changing a field type from string to
>>> text_en_splitting, re-indexing all documents, even optimizing to give the
>>> index a chance to merge segments and rewrite itself entirely, and then
>>> getting this error when running a phrase query:
>>> java.lang.IllegalStateException: field "blah" was indexed without
>>> position
>>> data; cannot run PhraseQuery
>>>
>>> I have encountered this issue before and have always done one of the
>>> following as a work-around:
>>> 1.  Instead of changing the field type on an existing field just create a
>>> new field and retire the old one.
>>> 2.  Delete the index directory and start from scratch.
>>>
>>> These work-arounds are not always ideal.  Does anyone know what is
>>> holding
>>> onto that old field type definition?  What thinks it is still a string?
>>> Every document has been re-indexed and I am sure of this because I have a
>>> time stamp indexed.  Is there any other way to get this to work?
>>>
>>> For what it is worth, I am running this in SolrCloud mode but I remember
>>> seeing this issue before SolrCloud was released as well.
>>>
>>>
>>
>