You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christopher Baird <cb...@cardinalcommerce.com> on 2009/09/01 16:30:04 UTC

Adding new docs, but duplicating instead of updating

Hi All,

 

I'm running Solr in a multicore setup.  I've set one of the cores to have a
specific field as the unique key (marked as the uniqueKey in the document
and the field is defined as required).  I'm sending an <add> command with
all the docs using a multipart post.  After running the add file, I send
<commit/> and then send <optimize/>.  This works fine.  When I resend the
file (and commit and optimize), I double my document count and when I do a
query by unique key, I get two documents back.

 

I've confirmed using the admin UI that (schema browser) that my document
count has doubled.  I've also confirmed that unique key is the one I
specified (again, using schema browser).  The unique key field is marked as
type textTight.

 

Thanks for any help

 

-Chris


Re: Adding new docs, but duplicating instead of updating

Posted by Chris Hostetter <ho...@fucit.org>.
: specified (again, using schema browser).  The unique key field is marked as
: type textTight.

your uniqueKey field needs to be something where everydoc is only going to 
produce a single token, if you are using textTight, and sending product 
sku type data (as mentioned in another mesg in this thread) you are 
probably getting multiple tokens.

use copyField to putthissame sku value into a string field.


-Hoss


RE: Adding new docs, but duplicating instead of updating

Posted by Christopher Baird <cb...@cardinalcommerce.com>.
Hi Tim,

The value I'm using is a product SKU.  A sample would be like:  L49-4251.

Thanks
-Chris
-----Original Message-----
From: Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
[mailto:timothy.j.harsch@nasa.gov] 
Sent: Tuesday, September 01, 2009 12:52 PM
To: solr-user@lucene.apache.org; cbaird@cardinalcommerce.com
Subject: RE: Adding new docs, but duplicating instead of updating

What is the value of your uniqueKey?

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 8:20 AM
To: solr-user@lucene.apache.org
Subject: RE: Adding new docs, but duplicating instead of updating

Hi Tim,

I appreciate the suggestions.  I can tell you that the document I ran the
second time was the same document run the first time -- so any questions of
field value shouldn't be a concern.

Thanks
-Chris

-----Original Message-----
From: Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
[mailto:timothy.j.harsch@nasa.gov] 
Sent: Tuesday, September 01, 2009 10:45 AM
To: solr-user@lucene.apache.org; cbaird@cardinalcommerce.com
Subject: RE: Adding new docs, but duplicating instead of updating

I could be off base here, maybe using textTight as unique key is a common
SOLR practice I don't know.  But, It would seem to me that using any field
type that transforms a value (even if it is just whitespace removal) could
be problematic.   Maybe not the source of your issue here, but I'd be
worrying about collisions.  For instance what if you sent "xyz" as a key and
"XYZ" as a key?  The doc would be overwritten.  You may end up with
unexpected results when you get the record back...  Maybe with your use-case
this is OK but have you considered using string instead?

Tim

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 7:30 AM
To: solr-user@lucene.apache.org
Subject: Adding new docs, but duplicating instead of updating

Hi All,

 

I'm running Solr in a multicore setup.  I've set one of the cores to have a
specific field as the unique key (marked as the uniqueKey in the document
and the field is defined as required).  I'm sending an <add> command with
all the docs using a multipart post.  After running the add file, I send
<commit/> and then send <optimize/>.  This works fine.  When I resend the
file (and commit and optimize), I double my document count and when I do a
query by unique key, I get two documents back.

 

I've confirmed using the admin UI that (schema browser) that my document
count has doubled.  I've also confirmed that unique key is the one I
specified (again, using schema browser).  The unique key field is marked as
type textTight.

 

Thanks for any help

 

-Chris






RE: Adding new docs, but duplicating instead of updating

Posted by "Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]" <ti...@nasa.gov>.
What is the value of your uniqueKey?

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 8:20 AM
To: solr-user@lucene.apache.org
Subject: RE: Adding new docs, but duplicating instead of updating

Hi Tim,

I appreciate the suggestions.  I can tell you that the document I ran the
second time was the same document run the first time -- so any questions of
field value shouldn't be a concern.

Thanks
-Chris

-----Original Message-----
From: Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
[mailto:timothy.j.harsch@nasa.gov] 
Sent: Tuesday, September 01, 2009 10:45 AM
To: solr-user@lucene.apache.org; cbaird@cardinalcommerce.com
Subject: RE: Adding new docs, but duplicating instead of updating

I could be off base here, maybe using textTight as unique key is a common
SOLR practice I don't know.  But, It would seem to me that using any field
type that transforms a value (even if it is just whitespace removal) could
be problematic.   Maybe not the source of your issue here, but I'd be
worrying about collisions.  For instance what if you sent "xyz" as a key and
"XYZ" as a key?  The doc would be overwritten.  You may end up with
unexpected results when you get the record back...  Maybe with your use-case
this is OK but have you considered using string instead?

Tim

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 7:30 AM
To: solr-user@lucene.apache.org
Subject: Adding new docs, but duplicating instead of updating

Hi All,

 

I'm running Solr in a multicore setup.  I've set one of the cores to have a
specific field as the unique key (marked as the uniqueKey in the document
and the field is defined as required).  I'm sending an <add> command with
all the docs using a multipart post.  After running the add file, I send
<commit/> and then send <optimize/>.  This works fine.  When I resend the
file (and commit and optimize), I double my document count and when I do a
query by unique key, I get two documents back.

 

I've confirmed using the admin UI that (schema browser) that my document
count has doubled.  I've also confirmed that unique key is the one I
specified (again, using schema browser).  The unique key field is marked as
type textTight.

 

Thanks for any help

 

-Chris




RE: Adding new docs, but duplicating instead of updating

Posted by Christopher Baird <cb...@cardinalcommerce.com>.
Hi Tim,

I appreciate the suggestions.  I can tell you that the document I ran the
second time was the same document run the first time -- so any questions of
field value shouldn't be a concern.

Thanks
-Chris

-----Original Message-----
From: Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
[mailto:timothy.j.harsch@nasa.gov] 
Sent: Tuesday, September 01, 2009 10:45 AM
To: solr-user@lucene.apache.org; cbaird@cardinalcommerce.com
Subject: RE: Adding new docs, but duplicating instead of updating

I could be off base here, maybe using textTight as unique key is a common
SOLR practice I don't know.  But, It would seem to me that using any field
type that transforms a value (even if it is just whitespace removal) could
be problematic.   Maybe not the source of your issue here, but I'd be
worrying about collisions.  For instance what if you sent "xyz" as a key and
"XYZ" as a key?  The doc would be overwritten.  You may end up with
unexpected results when you get the record back...  Maybe with your use-case
this is OK but have you considered using string instead?

Tim

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 7:30 AM
To: solr-user@lucene.apache.org
Subject: Adding new docs, but duplicating instead of updating

Hi All,

 

I'm running Solr in a multicore setup.  I've set one of the cores to have a
specific field as the unique key (marked as the uniqueKey in the document
and the field is defined as required).  I'm sending an <add> command with
all the docs using a multipart post.  After running the add file, I send
<commit/> and then send <optimize/>.  This works fine.  When I resend the
file (and commit and optimize), I double my document count and when I do a
query by unique key, I get two documents back.

 

I've confirmed using the admin UI that (schema browser) that my document
count has doubled.  I've also confirmed that unique key is the one I
specified (again, using schema browser).  The unique key field is marked as
type textTight.

 

Thanks for any help

 

-Chris




RE: Adding new docs, but duplicating instead of updating

Posted by "Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]" <ti...@nasa.gov>.
I could be off base here, maybe using textTight as unique key is a common SOLR practice I don't know.  But, It would seem to me that using any field type that transforms a value (even if it is just whitespace removal) could be problematic.   Maybe not the source of your issue here, but I'd be worrying about collisions.  For instance what if you sent "xyz" as a key and "XYZ" as a key?  The doc would be overwritten.  You may end up with unexpected results when you get the record back...  Maybe with your use-case this is OK but have you considered using string instead?

Tim

-----Original Message-----
From: Christopher Baird [mailto:cbaird@cardinalcommerce.com] 
Sent: Tuesday, September 01, 2009 7:30 AM
To: solr-user@lucene.apache.org
Subject: Adding new docs, but duplicating instead of updating

Hi All,

 

I'm running Solr in a multicore setup.  I've set one of the cores to have a
specific field as the unique key (marked as the uniqueKey in the document
and the field is defined as required).  I'm sending an <add> command with
all the docs using a multipart post.  After running the add file, I send
<commit/> and then send <optimize/>.  This works fine.  When I resend the
file (and commit and optimize), I double my document count and when I do a
query by unique key, I get two documents back.

 

I've confirmed using the admin UI that (schema browser) that my document
count has doubled.  I've also confirmed that unique key is the one I
specified (again, using schema browser).  The unique key field is marked as
type textTight.

 

Thanks for any help

 

-Chris