You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by darniz <rn...@edmunds.com> on 2010/02/12 19:22:05 UTC

Re: Deleting spelll checker index

HI Guys 
Opening this thread again.
I need to get around this issue.
i have a spellcheck field defined and i am copying two fileds make and model
to this field
<copyField source="make" dest="spellText"/>
<copyField source="model" dest="spellText"/>
i have buildoncommit and buildonoptimize set to true hence when i index data
and try to search for a work accod i get back suggestion accord since model
is also being copied.
I stop the sorl server removed the copy filed for model. now i only copy
make to the spellText field and started solr server. 
i refreshed the dictiaonry by issuring the following command.
spellcheck.build=true&spellcheck.dictionary=default
So i hope it should rebuild by dictionary, bu the strange thing is that it
still gives a suggestion for accrd.
I have to reindex data again and then it wont offer me suggestion which is
the correct behavour.

How can i create the dictionary again by changing my schema and issuing a
command 
spellcheck.build=true&spellcheck.dictionary=default

i cant afford to reindex data everytime.

Any answer ASAP will be appreciated

Thanks
darniz

darniz wrote:
> 
> Then i assume the easiest way is to delete the directory itself.
> 
> darniz
> 
> 
> hossman wrote:
>> 
>> 
>> : We are using Index based spell checker.
>> : i was wondering with the help of any url parameters can we delete the
>> spell
>> : check index directory.
>> 
>> I don't think so.
>> 
>> You might be able to configure two differnet spell check components that 
>> point at the same directory -- one hat builds off of a real field, and
>> one 
>> that builds off of an (empty) text field (using FileBasedSpellChecker) .. 
>> then you could trigger a rebuild of an empty spell checking index using 
>> the second component.
>> 
>> But i've never tried it so i have no idea if it would work.
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27567465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleting spelll checker index

Posted by Lance Norskog <go...@gmail.com>.

More precisely, remnant terms from deleted documents slowly disappear
as you add new documents or when you optimize the index.

On Thu, Feb 18, 2010 at 11:09 AM, darniz <rn...@edmunds.com> wrote:
>
> Thanks
> If this is really the case, i declared a new filed called mySpellTextDup and
> retired the original field.
> Now i have a new field which powers my dictionary with no words in it and
> now i am free to index which ever term i want.
>
> This is not the best of solution but i cant think of a reasonable workaround
>
> Thanks
> darniz
>
>
> Lance Norskog-2 wrote:
>>
>> This is a quirk of Lucene - when you delete a document, the indexed
>> terms for the document are not deleted. That is, if 2 documents have
>> the word 'frampton' in an indexed field, the term dictionary contains
>> the entry 'frampton' and pointers to those two documents. When you
>> delete those two documents, the index contains the entry 'frampton'
>> with an empty list of pointers. So, the terms are still there even
>> when you delete all of the documents.
>>
>> Facets and the spellchecking dictionary build from this term
>> dictionary, not from the text string that are 'stored' and returned
>> when you search for the documents.
>>
>> The <optimize> command throws away these remnant terms.
>>
>> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
>>
>> On Wed, Feb 17, 2010 at 12:24 PM, darniz <rn...@edmunds.com> wrote:
>>>
>>> Please bear with me on the limitted understanding.
>>> i deleted all documents and i made a rebuild of my spell checker  using
>>> the
>>> command
>>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>>
>>> After this i went to the schema browser and i saw that mySpellText still
>>> has
>>> around 2000 values.
>>> How can i make sure that i clean up that field.
>>> We had the same issue with facets too, even though we delete all the
>>> documents, and if we do a facet on make we still see facets but we can
>>> filter out facets by saying facet.mincount>0.
>>>
>>> Again coming back to my question how can i make mySpellText fields get
>>> rid
>>> of all previous terms
>>>
>>> Thanks a lot
>>> darniz
>>>
>>>
>>>
>>> hossman wrote:
>>>>
>>>> : But still i cant stop thinking about this.
>>>> : i deleted my entire index and now i have 0 documents.
>>>> :
>>>> : Now if i make a query with accrd i still get a suggestion of accord
>>>> even
>>>> : though there are no document returned since i deleted my entire index.
>>>> i
>>>> : hope it also clear the spell check index field.
>>>>
>>>> there are two Lucene indexes when you use spell checking.
>>>>
>>>> there is the "main" index which is goverend by your schema.xml and is
>>>> what
>>>> you add your own documents to, and what searches are run agains for the
>>>> result section of solr responses.
>>>>
>>>> There is also the "spell" index which has only two fields and in
>>>> which each "document" corrisponds to a "word" that might be returend as
>>>> a
>>>> spelling suggestion, and the other fields contain various
>>>> start/end/middle
>>>> ngrams that represent possible misspellings.
>>>>
>>>> When you use the spellchecker component it builds the "spell" index
>>>> makinga document out of every word it finds in whatever field name you
>>>> configure it to use.
>>>>
>>>> deleting your entire "main" index won't automaticly delete the "spell"
>>>> index (allthough you should be able rebuild the "spell" index using the
>>>> *empty* "main" index, that should work).
>>>>
>>>> : i am copying both fields to a field called
>>>> : <copyField source="make" dest="mySpellText"/>
>>>> : <copyField source="model" dest="mySpellText"/>
>>>>
>>>> ..at this point your "main" index has a field named mySpellText, and for
>>>> ever document it contains a copy of make and model.
>>>>
>>>> :         <lst name="spellchecker">
>>>> :             <str name="name">default</str>
>>>> :             <str name="field">mySpellText</str>
>>>> :             <str name="buildOnOptimize">true</str>
>>>> :             <str name="buildOnCommit">true</str>
>>>>
>>>> ...so whenever you commit or optimize your "main" index it will take
>>>> every
>>>> word from the mySpellText and use them all as individual documents in
>>>> the
>>>> "spell" index.
>>>>
>>>> In your previous email you said you changed hte copyField declaration,
>>>> and
>>>> then triggered a commit -- that rebuilt your "spell" index, but the data
>>>> was still all there in the mySpellText field of the "main" index, so the
>>>> rebuilt "spell" index was exactly the same.
>>>>
>>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>>>> document
>>>> : i want my dictionary to be created but how can i make sure i remove
>>>> the
>>>> : preivious indexed terms.
>>>>
>>>> everytime the spellchecker component "builds" it will create a
>>>> completley
>>>> new "spell" index .. but if the old data is still in the "main" index
>>>> then
>>>> it will also be in the "spell" index.
>>>>
>>>> The only reason i can think of why you'd be seeing words in your "spell"
>>>> index after deleting documents from your "main" index is that even if
>>>> you
>>>> delete documents, the Terms are still there in the underlying index
>>>> untill
>>>> the segments are merged ... so if you do an optimize that will force
>>>> them
>>>> to be expunged --- but i honestly have no idea if that is what's causing
>>>> your problem, because quite frankly i really don't understand what your
>>>> problem is ... you have to provide specifics: reproducible steps anyone
>>>> can take using a clean install of solr to see the the behavior you are
>>>> seeing that seems incorrect.  (ie: modifications to the example schema,
>>>> and commands to execute against hte demo port to see the bug)
>>>>
>>>> if you can provide details like that then it's possible to understand
>>>> what
>>>> is going wrong for you -- which is a prereq to providing useful help.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27644054.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Deleting spelll checker index

Posted by darniz <rn...@edmunds.com>.

Thanks
If this is really the case, i declared a new filed called mySpellTextDup and
retired the original field.
Now i have a new field which powers my dictionary with no words in it and 
now i am free to index which ever term i want.

This is not the best of solution but i cant think of a reasonable workaround

Thanks
darniz


Lance Norskog-2 wrote:
> 
> This is a quirk of Lucene - when you delete a document, the indexed
> terms for the document are not deleted. That is, if 2 documents have
> the word 'frampton' in an indexed field, the term dictionary contains
> the entry 'frampton' and pointers to those two documents. When you
> delete those two documents, the index contains the entry 'frampton'
> with an empty list of pointers. So, the terms are still there even
> when you delete all of the documents.
> 
> Facets and the spellchecking dictionary build from this term
> dictionary, not from the text string that are 'stored' and returned
> when you search for the documents.
> 
> The <optimize> command throws away these remnant terms.
> 
> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
> 
> On Wed, Feb 17, 2010 at 12:24 PM, darniz <rn...@edmunds.com> wrote:
>>
>> Please bear with me on the limitted understanding.
>> i deleted all documents and i made a rebuild of my spell checker  using
>> the
>> command
>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>
>> After this i went to the schema browser and i saw that mySpellText still
>> has
>> around 2000 values.
>> How can i make sure that i clean up that field.
>> We had the same issue with facets too, even though we delete all the
>> documents, and if we do a facet on make we still see facets but we can
>> filter out facets by saying facet.mincount>0.
>>
>> Again coming back to my question how can i make mySpellText fields get
>> rid
>> of all previous terms
>>
>> Thanks a lot
>> darniz
>>
>>
>>
>> hossman wrote:
>>>
>>> : But still i cant stop thinking about this.
>>> : i deleted my entire index and now i have 0 documents.
>>> :
>>> : Now if i make a query with accrd i still get a suggestion of accord
>>> even
>>> : though there are no document returned since i deleted my entire index.
>>> i
>>> : hope it also clear the spell check index field.
>>>
>>> there are two Lucene indexes when you use spell checking.
>>>
>>> there is the "main" index which is goverend by your schema.xml and is
>>> what
>>> you add your own documents to, and what searches are run agains for the
>>> result section of solr responses.
>>>
>>> There is also the "spell" index which has only two fields and in
>>> which each "document" corrisponds to a "word" that might be returend as
>>> a
>>> spelling suggestion, and the other fields contain various
>>> start/end/middle
>>> ngrams that represent possible misspellings.
>>>
>>> When you use the spellchecker component it builds the "spell" index
>>> makinga document out of every word it finds in whatever field name you
>>> configure it to use.
>>>
>>> deleting your entire "main" index won't automaticly delete the "spell"
>>> index (allthough you should be able rebuild the "spell" index using the
>>> *empty* "main" index, that should work).
>>>
>>> : i am copying both fields to a field called
>>> : <copyField source="make" dest="mySpellText"/>
>>> : <copyField source="model" dest="mySpellText"/>
>>>
>>> ..at this point your "main" index has a field named mySpellText, and for
>>> ever document it contains a copy of make and model.
>>>
>>> :         <lst name="spellchecker">
>>> :             <str name="name">default</str>
>>> :             <str name="field">mySpellText</str>
>>> :             <str name="buildOnOptimize">true</str>
>>> :             <str name="buildOnCommit">true</str>
>>>
>>> ...so whenever you commit or optimize your "main" index it will take
>>> every
>>> word from the mySpellText and use them all as individual documents in
>>> the
>>> "spell" index.
>>>
>>> In your previous email you said you changed hte copyField declaration,
>>> and
>>> then triggered a commit -- that rebuilt your "spell" index, but the data
>>> was still all there in the mySpellText field of the "main" index, so the
>>> rebuilt "spell" index was exactly the same.
>>>
>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>>> document
>>> : i want my dictionary to be created but how can i make sure i remove
>>> the
>>> : preivious indexed terms.
>>>
>>> everytime the spellchecker component "builds" it will create a
>>> completley
>>> new "spell" index .. but if the old data is still in the "main" index
>>> then
>>> it will also be in the "spell" index.
>>>
>>> The only reason i can think of why you'd be seeing words in your "spell"
>>> index after deleting documents from your "main" index is that even if
>>> you
>>> delete documents, the Terms are still there in the underlying index
>>> untill
>>> the segments are merged ... so if you do an optimize that will force
>>> them
>>> to be expunged --- but i honestly have no idea if that is what's causing
>>> your problem, because quite frankly i really don't understand what your
>>> problem is ... you have to provide specifics: reproducible steps anyone
>>> can take using a clean install of solr to see the the behavior you are
>>> seeing that seems incorrect.  (ie: modifications to the example schema,
>>> and commands to execute against hte demo port to see the bug)
>>>
>>> if you can provide details like that then it's possible to understand
>>> what
>>> is going wrong for you -- which is a prereq to providing useful help.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> 

-- 
View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27644054.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleting spelll checker index

Posted by Lance Norskog <go...@gmail.com>.

This is a quirk of Lucene - when you delete a document, the indexed
terms for the document are not deleted. That is, if 2 documents have
the word 'frampton' in an indexed field, the term dictionary contains
the entry 'frampton' and pointers to those two documents. When you
delete those two documents, the index contains the entry 'frampton'
with an empty list of pointers. So, the terms are still there even
when you delete all of the documents.

Facets and the spellchecking dictionary build from this term
dictionary, not from the text string that are 'stored' and returned
when you search for the documents.

The <optimize> command throws away these remnant terms.

http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/

On Wed, Feb 17, 2010 at 12:24 PM, darniz <rn...@edmunds.com> wrote:
>
> Please bear with me on the limitted understanding.
> i deleted all documents and i made a rebuild of my spell checker  using the
> command
> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>
> After this i went to the schema browser and i saw that mySpellText still has
> around 2000 values.
> How can i make sure that i clean up that field.
> We had the same issue with facets too, even though we delete all the
> documents, and if we do a facet on make we still see facets but we can
> filter out facets by saying facet.mincount>0.
>
> Again coming back to my question how can i make mySpellText fields get rid
> of all previous terms
>
> Thanks a lot
> darniz
>
>
>
> hossman wrote:
>>
>> : But still i cant stop thinking about this.
>> : i deleted my entire index and now i have 0 documents.
>> :
>> : Now if i make a query with accrd i still get a suggestion of accord even
>> : though there are no document returned since i deleted my entire index. i
>> : hope it also clear the spell check index field.
>>
>> there are two Lucene indexes when you use spell checking.
>>
>> there is the "main" index which is goverend by your schema.xml and is what
>> you add your own documents to, and what searches are run agains for the
>> result section of solr responses.
>>
>> There is also the "spell" index which has only two fields and in
>> which each "document" corrisponds to a "word" that might be returend as a
>> spelling suggestion, and the other fields contain various start/end/middle
>> ngrams that represent possible misspellings.
>>
>> When you use the spellchecker component it builds the "spell" index
>> makinga document out of every word it finds in whatever field name you
>> configure it to use.
>>
>> deleting your entire "main" index won't automaticly delete the "spell"
>> index (allthough you should be able rebuild the "spell" index using the
>> *empty* "main" index, that should work).
>>
>> : i am copying both fields to a field called
>> : <copyField source="make" dest="mySpellText"/>
>> : <copyField source="model" dest="mySpellText"/>
>>
>> ..at this point your "main" index has a field named mySpellText, and for
>> ever document it contains a copy of make and model.
>>
>> :         <lst name="spellchecker">
>> :             <str name="name">default</str>
>> :             <str name="field">mySpellText</str>
>> :             <str name="buildOnOptimize">true</str>
>> :             <str name="buildOnCommit">true</str>
>>
>> ...so whenever you commit or optimize your "main" index it will take every
>> word from the mySpellText and use them all as individual documents in the
>> "spell" index.
>>
>> In your previous email you said you changed hte copyField declaration, and
>> then triggered a commit -- that rebuilt your "spell" index, but the data
>> was still all there in the mySpellText field of the "main" index, so the
>> rebuilt "spell" index was exactly the same.
>>
>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>> document
>> : i want my dictionary to be created but how can i make sure i remove the
>> : preivious indexed terms.
>>
>> everytime the spellchecker component "builds" it will create a completley
>> new "spell" index .. but if the old data is still in the "main" index then
>> it will also be in the "spell" index.
>>
>> The only reason i can think of why you'd be seeing words in your "spell"
>> index after deleting documents from your "main" index is that even if you
>> delete documents, the Terms are still there in the underlying index untill
>> the segments are merged ... so if you do an optimize that will force them
>> to be expunged --- but i honestly have no idea if that is what's causing
>> your problem, because quite frankly i really don't understand what your
>> problem is ... you have to provide specifics: reproducible steps anyone
>> can take using a clean install of solr to see the the behavior you are
>> seeing that seems incorrect.  (ie: modifications to the example schema,
>> and commands to execute against hte demo port to see the bug)
>>
>> if you can provide details like that then it's possible to understand what
>> is going wrong for you -- which is a prereq to providing useful help.
>>
>>
>>
>> -Hoss
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Deleting spelll checker index

Posted by darniz <rn...@edmunds.com>.

Please bear with me on the limitted understanding.
i deleted all documents and i made a rebuild of my spell checker  using the
command
spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default

After this i went to the schema browser and i saw that mySpellText still has
around 2000 values.
How can i make sure that i clean up that field.
We had the same issue with facets too, even though we delete all the
documents, and if we do a facet on make we still see facets but we can
filter out facets by saying facet.mincount>0.

Again coming back to my question how can i make mySpellText fields get rid
of all previous terms

Thanks a lot
darniz



hossman wrote:
> 
> : But still i cant stop thinking about this.
> : i deleted my entire index and now i have 0 documents.
> : 
> : Now if i make a query with accrd i still get a suggestion of accord even
> : though there are no document returned since i deleted my entire index. i
> : hope it also clear the spell check index field.
> 
> there are two Lucene indexes when you use spell checking.
> 
> there is the "main" index which is goverend by your schema.xml and is what 
> you add your own documents to, and what searches are run agains for the 
> result section of solr responses.  
> 
> There is also the "spell" index which has only two fields and in 
> which each "document" corrisponds to a "word" that might be returend as a 
> spelling suggestion, and the other fields contain various start/end/middle 
> ngrams that represent possible misspellings.
> 
> When you use the spellchecker component it builds the "spell" index 
> makinga document out of every word it finds in whatever field name you 
> configure it to use.
> 
> deleting your entire "main" index won't automaticly delete the "spell" 
> index (allthough you should be able rebuild the "spell" index using the 
> *empty* "main" index, that should work).
> 
> : i am copying both fields to a field called 
> : <copyField source="make" dest="mySpellText"/>
> : <copyField source="model" dest="mySpellText"/>
> 
> ..at this point your "main" index has a field named mySpellText, and for 
> ever document it contains a copy of make and model.
> 
> :         <lst name="spellchecker">
> :             <str name="name">default</str>
> :             <str name="field">mySpellText</str>
> :             <str name="buildOnOptimize">true</str>
> :             <str name="buildOnCommit">true</str>
> 
> ...so whenever you commit or optimize your "main" index it will take every 
> word from the mySpellText and use them all as individual documents in the 
> "spell" index.
> 
> In your previous email you said you changed hte copyField declaration, and 
> then triggered a commit -- that rebuilt your "spell" index, but the data 
> was still all there in the mySpellText field of the "main" index, so the 
> rebuilt "spell" index was exactly the same.
> 
> : i have buildOnOPtmize and buildOnCommit as true so when i index new
> document
> : i want my dictionary to be created but how can i make sure i remove the
> : preivious indexed terms. 
> 
> everytime the spellchecker component "builds" it will create a completley 
> new "spell" index .. but if the old data is still in the "main" index then 
> it will also be in the "spell" index.
> 
> The only reason i can think of why you'd be seeing words in your "spell" 
> index after deleting documents from your "main" index is that even if you 
> delete documents, the Terms are still there in the underlying index untill 
> the segments are merged ... so if you do an optimize that will force them 
> to be expunged --- but i honestly have no idea if that is what's causing 
> your problem, because quite frankly i really don't understand what your 
> problem is ... you have to provide specifics: reproducible steps anyone 
> can take using a clean install of solr to see the the behavior you are 
> seeing that seems incorrect.  (ie: modifications to the example schema, 
> and commands to execute against hte demo port to see the bug)
> 
> if you can provide details like that then it's possible to understand what 
> is going wrong for you -- which is a prereq to providing useful help.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleting spelll checker index

Posted by Chris Hostetter <ho...@fucit.org>.

: But still i cant stop thinking about this.
: i deleted my entire index and now i have 0 documents.
: 
: Now if i make a query with accrd i still get a suggestion of accord even
: though there are no document returned since i deleted my entire index. i
: hope it also clear the spell check index field.

there are two Lucene indexes when you use spell checking.

there is the "main" index which is goverend by your schema.xml and is what 
you add your own documents to, and what searches are run agains for the 
result section of solr responses.  

There is also the "spell" index which has only two fields and in 
which each "document" corrisponds to a "word" that might be returend as a 
spelling suggestion, and the other fields contain various start/end/middle 
ngrams that represent possible misspellings.

When you use the spellchecker component it builds the "spell" index 
makinga document out of every word it finds in whatever field name you 
configure it to use.

deleting your entire "main" index won't automaticly delete the "spell" 
index (allthough you should be able rebuild the "spell" index using the 
*empty* "main" index, that should work).

: i am copying both fields to a field called 
: <copyField source="make" dest="mySpellText"/>
: <copyField source="model" dest="mySpellText"/>

..at this point your "main" index has a field named mySpellText, and for 
ever document it contains a copy of make and model.

:         <lst name="spellchecker">
:             <str name="name">default</str>
:             <str name="field">mySpellText</str>
:             <str name="buildOnOptimize">true</str>
:             <str name="buildOnCommit">true</str>

...so whenever you commit or optimize your "main" index it will take every 
word from the mySpellText and use them all as individual documents in the 
"spell" index.

In your previous email you said you changed hte copyField declaration, and 
then triggered a commit -- that rebuilt your "spell" index, but the data 
was still all there in the mySpellText field of the "main" index, so the 
rebuilt "spell" index was exactly the same.

: i have buildOnOPtmize and buildOnCommit as true so when i index new document
: i want my dictionary to be created but how can i make sure i remove the
: preivious indexed terms. 

everytime the spellchecker component "builds" it will create a completley 
new "spell" index .. but if the old data is still in the "main" index then 
it will also be in the "spell" index.

The only reason i can think of why you'd be seeing words in your "spell" 
index after deleting documents from your "main" index is that even if you 
delete documents, the Terms are still there in the underlying index untill 
the segments are merged ... so if you do an optimize that will force them 
to be expunged --- but i honestly have no idea if that is what's causing 
your problem, because quite frankly i really don't understand what your 
problem is ... you have to provide specifics: reproducible steps anyone 
can take using a clean install of solr to see the the behavior you are 
seeing that seems incorrect.  (ie: modifications to the example schema, 
and commands to execute against hte demo port to see the bug)

if you can provide details like that then it's possible to understand what 
is going wrong for you -- which is a prereq to providing useful help.



-Hoss

Re: Deleting spelll checker index

Posted by darniz <rn...@edmunds.com>.

Thanks Hoss
Apology for flooding the post.

But still i cant stop thinking about this.
i deleted my entire index and now i have 0 documents.

Now if i make a query with accrd i still get a suggestion of accord even
though there are no document returned since i deleted my entire index. i
hope it also clear the spell check index field.

Let me give some history about what i am doing.
i want my spellchecker to be indexed by make and model name
both fields are of type string.
i am copying both fields to a field called 
<copyField source="make" dest="mySpellText"/>
<copyField source="model" dest="mySpellText"/>

definition of field and field type is 

<field name="mySpellText" type="textSpell" indexed="true" stored="false"
multiValued="true" />
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100">
          <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
          </analyzer>
</fieldType>

in the request handler
<requestHandler name="global-search" class="solr.SearchHandler" >
        <lst name="defaults">
            <str name="defType">dismax</str>
            <str name="qf">text</str>
            <str name="pf"></str>
            <str name="bf"></str>
            <str name="mm"></str>
            <int name="ps">100</int>
        </lst>
        <arr name="last-components">
            <str>spellcheck</str>
        </arr>
    </requestHandler>

and here is my spell check component default decleration
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">textSpell</str>
        <lst name="spellchecker">
            <str name="name">default</str>
            <str name="field">mySpellText</str>
            <str name="buildOnOptimize">true</str>
            <str name="buildOnCommit">true</str>
            <str
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
            <str name="spellcheckIndexDir">./mySpellcheckerDataIndex</str>
        </lst>
i have buildOnOPtmize and buildOnCommit as true so when i index new document
i want my dictionary to be created but how can i make sure i remove the
preivious indexed terms. 
Thanks
darniz




How can i reset my dictionary is there  away to do it.



hossman wrote:
> 
> 
> : Any update on this
> 
> Patience my friend ... 5 hours after you send an email isn't long enough 
> to wait before asking for "any update on this" -- it's just increasing the 
> volume of mail everyone gets and distracting people from actual 
> bugs/issues.
> 
> FWIW: this doesn't really seem directly related to the thread you
> initially started about Deleting the spell checker index -- what you're
> asking about now is rebuilding the spellchecker index...
> 
> : > I stop the sorl server removed the copy filed for model. now i only
> copy
> : > make to the spellText field and started solr server.
> : > i refreshed the dictiaonry by issuring the following command.
> : > spellcheck.build=true&spellcheck.dictionary=default
> : > So i hope it should rebuild by dictionary, bu the strange thing is
> that it
> : > still gives a suggestion for accrd.
> 
> that's because removing the copyField declaration doens't change anything
> about the values that have already been copied to the "spellText" field
> -- rebuilding your spellcheker index is just re-reading the same
> indexed values from that field.
> 
> : > How can i create the dictionary again by changing my schema and
> issuing a
> : > command 
> : > spellcheck.build=true&spellcheck.dictionary=default
> 
> it's just not possible.  a schema change like that doesn't magicly 
> undo all of the values that were already copied.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27615354.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleting spelll checker index

Posted by Chris Hostetter <ho...@fucit.org>.

: Any update on this

Patience my friend ... 5 hours after you send an email isn't long enough 
to wait before asking for "any update on this" -- it's just increasing the 
volume of mail everyone gets and distracting people from actual 
bugs/issues.

FWIW: this doesn't really seem directly related to the thread you
initially started about Deleting the spell checker index -- what you're
asking about now is rebuilding the spellchecker index...

: > I stop the sorl server removed the copy filed for model. now i only copy
: > make to the spellText field and started solr server.
: > i refreshed the dictiaonry by issuring the following command.
: > spellcheck.build=true&spellcheck.dictionary=default
: > So i hope it should rebuild by dictionary, bu the strange thing is that it
: > still gives a suggestion for accrd.

that's because removing the copyField declaration doens't change anything
about the values that have already been copied to the "spellText" field
-- rebuilding your spellcheker index is just re-reading the same
indexed values from that field.

: > How can i create the dictionary again by changing my schema and issuing a
: > command 
: > spellcheck.build=true&spellcheck.dictionary=default

it's just not possible.  a schema change like that doesn't magicly 
undo all of the values that were already copied.



-Hoss

Re: Deleting spelll checker index

Posted by darniz <rn...@edmunds.com>.

Any update on this
Do you guys want to rephrase my question, if its not clear.

Thanks
darniz


darniz wrote:
> 
> HI Guys 
> Opening this thread again.
> I need to get around this issue.
> i have a spellcheck field defined and i am copying two fileds make and
> model to this field
> <copyField source="make" dest="spellText"/>
> <copyField source="model" dest="spellText"/>
> i have buildoncommit and buildonoptimize set to true hence when i index
> data and try to search for a work accod i get back suggestion accord since
> model is also being copied.
> I stop the sorl server removed the copy filed for model. now i only copy
> make to the spellText field and started solr server. 
> i refreshed the dictiaonry by issuring the following command.
> spellcheck.build=true&spellcheck.dictionary=default
> So i hope it should rebuild by dictionary, bu the strange thing is that it
> still gives a suggestion for accrd.
> I have to reindex data again and then it wont offer me suggestion which is
> the correct behavour.
> 
> How can i create the dictionary again by changing my schema and issuing a
> command 
> spellcheck.build=true&spellcheck.dictionary=default
> 
> i cant afford to reindex data everytime.
> 
> Any answer ASAP will be appreciated
> 
> Thanks
> darniz
> 
> 
> 
> 
> 
> 
> 
> 
> 
> darniz wrote:
>> 
>> Then i assume the easiest way is to delete the directory itself.
>> 
>> darniz
>> 
>> 
>> hossman wrote:
>>> 
>>> 
>>> : We are using Index based spell checker.
>>> : i was wondering with the help of any url parameters can we delete the
>>> spell
>>> : check index directory.
>>> 
>>> I don't think so.
>>> 
>>> You might be able to configure two differnet spell check components that 
>>> point at the same directory -- one hat builds off of a real field, and
>>> one 
>>> that builds off of an (empty) text field (using FileBasedSpellChecker)
>>> .. 
>>> then you could trigger a rebuild of an empty spell checking index using 
>>> the second component.
>>> 
>>> But i've never tried it so i have no idea if it would work.
>>> 
>>> 
>>> -Hoss
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27570613.html
Sent from the Solr - User mailing list archive at Nabble.com.