You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Noor <no...@opentechindia.com> on 2009/07/21 08:33:34 UTC

solr indexing on same set of records with different value of unique field...not working...

hi,

 I need to run around 10 million records to index, by solr.
I has nearly 2lakh records, so i made a program to looping it till 10 
million.
Here, i specified 20 fields in schema.xml file. the unoque field i set 
was, currentTimeStamp field.
So, when i run the loader program (which loads xml data into solr) it 
creates currentTimestamp value...and loads into solr.

For this situation,
 i stopped the loader program, after 100 records indexed into solr.
Then again, i run the loader program for the SAME 100 records to indexed 
means,
the solr results 100, rather than 200.

Because, i set currentTimeStamp field as uniqueField. So i expect the 
result as 200, if i run again the same 100 records...

Any suggestions please...

regards,
Noor



Re: solr indexing on same set of records with different value of unique field, not working fine.

Posted by Chris Hostetter <ho...@fucit.org>.
: Sorry, schema.xml file is here in this mail...

in the schema.xml file you attached, the uniqueKey field is "evid"

you only provided one example of the type of input you are indexing, and 
in that example...

: >         <field name="evid">501</field>

...but in your orriginal email (see below) you said you were using a 
timestamp field as the uniqueKey, and you didn't understand why reindexing 
hte same 100 docs twice didn't give you 200 docs. that example uniqueKey 
value isn't a timestamp, so i don't really understand what you're talking 
about.  if you index that doc over and over with the schema.xml you sent, 
then it's constaintly going to replace it self over and over again because 
hte uniqueKey field (evid) is the same (501) everytime.

: > > : Here, i specified 20 fields in schema.xml file. the unoque field i set
: > > was,
: > > : currentTimeStamp field.
: > > : So, when i run the loader program (which loads xml data into solr) it
: > > creates
: > > : currentTimestamp value...and loads into solr.
: > > : : For this situation,
: > > : i stopped the loader program, after 100 records indexed into solr.
: > > : Then again, i run the loader program for the SAME 100 records to indexed
: > > : means,
: > > : the solr results 100, rather than 200.
: > > : : Because, i set currentTimeStamp field as uniqueField. So i expect the
: > > result
: > > : as 200, if i run again the same 100 records...
: > > : : Any suggestions please...




-Hoss


Re: solr indexing on same set of records with different value of unique field, not working fine.

Posted by noor <no...@opentechindia.com>.
Sorry, schema.xml file is here in this mail...

noor wrote:
> FYI
>  Attached schema.xml file.
>  And the add doc xml snippets are,
> <add>
>   <doc>
>         <field name="evid">501</field>
>         <field name="ssid">ESQ.VISION.A72</field>
>         <field name="evnum">201</field>
>         <field name="evtext">CpuLoopEnd Process=$Z4B1 CpuPin=0,992 
> Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
> CpuBusy=0 MemPage=24 User=50,10</field>
>         <field name="proc">\VEGAS.$QQDS</field>
>         <field name="layer">PLGOVNPM</field>
>         <field name="evtime">2008-10-07T03:00:30.0Z</field>
>         <field name="logtime">2008-10-07T10:02:27.95Z</field>
>         <field name="curts">1247905648000</field>
>    </doc>
>    .....
> </add>
>
> i just load the currentTimeStamps long value into the add doc xml to 
> load into solr.
>
>
> Chris Hostetter wrote:
>> I'm not really understanding how you could get the situation you 
>> describe ... which suggests that one (or both) of us don't understand 
>> exactly what happened.
>>
>> if you can post the actual schema.xml file you used and an example of 
>> the input you indexed perhaps we can spot the discrepency.
>>
>> FWIW: using a timestamp as a uniqueKey doesn't make much sense ...
>>
>>  1) if you have heavy parallelization two docs indexed at the exact 
>> same     time might overwrite eachother.
>>  2) you have no way of ever replacing an existing doc (unless you 
>> roll the     clock back) in which case there's no advantage to using 
>> a uniqueKey --     so you might as leave it out of your schema (which 
>> makes indexing     slightly faster)
>> : I need to run around 10 million records to index, by solr.
>> : I has nearly 2lakh records, so i made a program to looping it till 
>> 10 million.
>> : Here, i specified 20 fields in schema.xml file. the unoque field i 
>> set was,
>> : currentTimeStamp field.
>> : So, when i run the loader program (which loads xml data into solr) 
>> it creates
>> : currentTimestamp value...and loads into solr.
>> : : For this situation,
>> : i stopped the loader program, after 100 records indexed into solr.
>> : Then again, i run the loader program for the SAME 100 records to 
>> indexed
>> : means,
>> : the solr results 100, rather than 200.
>> : : Because, i set currentTimeStamp field as uniqueField. So i expect 
>> the result
>> : as 200, if i run again the same 100 records...
>> : : Any suggestions please...
>>
>>
>>
>> -Hoss
>>
>>
>>   
>
>


Re: solr indexing on same set of records with different value of unique field, not working fine.

Posted by noor <no...@opentechindia.com>.
FYI
  Attached schema.xml file.
  And the add doc xml snippets are,
<add>
   <doc>
         <field name="evid">501</field>
         <field name="ssid">ESQ.VISION.A72</field>
         <field name="evnum">201</field>
         <field name="evtext">CpuLoopEnd Process=$Z4B1 CpuPin=0,992 
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
CpuBusy=0 MemPage=24 User=50,10</field>
         <field name="proc">\VEGAS.$QQDS</field>
         <field name="layer">PLGOVNPM</field>
         <field name="evtime">2008-10-07T03:00:30.0Z</field>
         <field name="logtime">2008-10-07T10:02:27.95Z</field>
         <field name="curts">1247905648000</field>
    </doc>
    .....
</add>

i just load the currentTimeStamps long value into the add doc xml to 
load into solr.


Chris Hostetter wrote:
> I'm not really understanding how you could get the situation you describe 
> ... which suggests that one (or both) of us don't understand exactly what 
> happened.
>
> if you can post the actual schema.xml file you used and an example of the 
> input you indexed perhaps we can spot the discrepency.
>
> FWIW: using a timestamp as a uniqueKey doesn't make much sense ...
>
>  1) if you have heavy parallelization two docs indexed at the exact same 
>     time might overwrite eachother.
>  2) you have no way of ever replacing an existing doc (unless you roll the 
>     clock back) in which case there's no advantage to using a uniqueKey -- 
>     so you might as leave it out of your schema (which makes indexing 
>     slightly faster) 
>
> : I need to run around 10 million records to index, by solr.
> : I has nearly 2lakh records, so i made a program to looping it till 10 million.
> : Here, i specified 20 fields in schema.xml file. the unoque field i set was,
> : currentTimeStamp field.
> : So, when i run the loader program (which loads xml data into solr) it creates
> : currentTimestamp value...and loads into solr.
> : 
> : For this situation,
> : i stopped the loader program, after 100 records indexed into solr.
> : Then again, i run the loader program for the SAME 100 records to indexed
> : means,
> : the solr results 100, rather than 200.
> : 
> : Because, i set currentTimeStamp field as uniqueField. So i expect the result
> : as 200, if i run again the same 100 records...
> : 
> : Any suggestions please...
>
>
>
> -Hoss
>
>
>   


Re: solr indexing on same set of records with different value of unique field...not working...

Posted by Chris Hostetter <ho...@fucit.org>.
I'm not really understanding how you could get the situation you describe 
... which suggests that one (or both) of us don't understand exactly what 
happened.

if you can post the actual schema.xml file you used and an example of the 
input you indexed perhaps we can spot the discrepency.

FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact same 
    time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you roll the 
    clock back) in which case there's no advantage to using a uniqueKey -- 
    so you might as leave it out of your schema (which makes indexing 
    slightly faster) 

: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i set was,
: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) it creates
: currentTimestamp value...and loads into solr.
: 
: For this situation,
: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to indexed
: means,
: the solr results 100, rather than 200.
: 
: Because, i set currentTimeStamp field as uniqueField. So i expect the result
: as 200, if i run again the same 100 records...
: 
: Any suggestions please...



-Hoss