You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Noor <no...@opentechindia.com> on 2009/07/21 08:33:34 UTC
solr indexing on same set of records with different value of unique
field...not working...
hi,
I need to run around 10 million records to index, by solr.
I has nearly 2lakh records, so i made a program to looping it till 10
million.
Here, i specified 20 fields in schema.xml file. the unoque field i set
was, currentTimeStamp field.
So, when i run the loader program (which loads xml data into solr) it
creates currentTimestamp value...and loads into solr.
For this situation,
i stopped the loader program, after 100 records indexed into solr.
Then again, i run the loader program for the SAME 100 records to indexed
means,
the solr results 100, rather than 200.
Because, i set currentTimeStamp field as uniqueField. So i expect the
result as 200, if i run again the same 100 records...
Any suggestions please...
regards,
Noor
Re: solr indexing on same set of records with different value of
unique field, not working fine.
Posted by Chris Hostetter <ho...@fucit.org>.
: Sorry, schema.xml file is here in this mail...
in the schema.xml file you attached, the uniqueKey field is "evid"
you only provided one example of the type of input you are indexing, and
in that example...
: > <field name="evid">501</field>
...but in your orriginal email (see below) you said you were using a
timestamp field as the uniqueKey, and you didn't understand why reindexing
hte same 100 docs twice didn't give you 200 docs. that example uniqueKey
value isn't a timestamp, so i don't really understand what you're talking
about. if you index that doc over and over with the schema.xml you sent,
then it's constaintly going to replace it self over and over again because
hte uniqueKey field (evid) is the same (501) everytime.
: > > : Here, i specified 20 fields in schema.xml file. the unoque field i set
: > > was,
: > > : currentTimeStamp field.
: > > : So, when i run the loader program (which loads xml data into solr) it
: > > creates
: > > : currentTimestamp value...and loads into solr.
: > > : : For this situation,
: > > : i stopped the loader program, after 100 records indexed into solr.
: > > : Then again, i run the loader program for the SAME 100 records to indexed
: > > : means,
: > > : the solr results 100, rather than 200.
: > > : : Because, i set currentTimeStamp field as uniqueField. So i expect the
: > > result
: > > : as 200, if i run again the same 100 records...
: > > : : Any suggestions please...
-Hoss
Re: solr indexing on same set of records with different value of
unique field, not working fine.
Posted by noor <no...@opentechindia.com>.
Sorry, schema.xml file is here in this mail...
noor wrote:
> FYI
> Attached schema.xml file.
> And the add doc xml snippets are,
> <add>
> <doc>
> <field name="evid">501</field>
> <field name="ssid">ESQ.VISION.A72</field>
> <field name="evnum">201</field>
> <field name="evtext">CpuLoopEnd Process=$Z4B1 CpuPin=0,992
> Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM
> CpuBusy=0 MemPage=24 User=50,10</field>
> <field name="proc">\VEGAS.$QQDS</field>
> <field name="layer">PLGOVNPM</field>
> <field name="evtime">2008-10-07T03:00:30.0Z</field>
> <field name="logtime">2008-10-07T10:02:27.95Z</field>
> <field name="curts">1247905648000</field>
> </doc>
> .....
> </add>
>
> i just load the currentTimeStamps long value into the add doc xml to
> load into solr.
>
>
> Chris Hostetter wrote:
>> I'm not really understanding how you could get the situation you
>> describe ... which suggests that one (or both) of us don't understand
>> exactly what happened.
>>
>> if you can post the actual schema.xml file you used and an example of
>> the input you indexed perhaps we can spot the discrepency.
>>
>> FWIW: using a timestamp as a uniqueKey doesn't make much sense ...
>>
>> 1) if you have heavy parallelization two docs indexed at the exact
>> same time might overwrite eachother.
>> 2) you have no way of ever replacing an existing doc (unless you
>> roll the clock back) in which case there's no advantage to using
>> a uniqueKey -- so you might as leave it out of your schema (which
>> makes indexing slightly faster)
>> : I need to run around 10 million records to index, by solr.
>> : I has nearly 2lakh records, so i made a program to looping it till
>> 10 million.
>> : Here, i specified 20 fields in schema.xml file. the unoque field i
>> set was,
>> : currentTimeStamp field.
>> : So, when i run the loader program (which loads xml data into solr)
>> it creates
>> : currentTimestamp value...and loads into solr.
>> : : For this situation,
>> : i stopped the loader program, after 100 records indexed into solr.
>> : Then again, i run the loader program for the SAME 100 records to
>> indexed
>> : means,
>> : the solr results 100, rather than 200.
>> : : Because, i set currentTimeStamp field as uniqueField. So i expect
>> the result
>> : as 200, if i run again the same 100 records...
>> : : Any suggestions please...
>>
>>
>>
>> -Hoss
>>
>>
>>
>
>
Re: solr indexing on same set of records with different value of
unique field, not working fine.
Posted by noor <no...@opentechindia.com>.
FYI
Attached schema.xml file.
And the add doc xml snippets are,
<add>
<doc>
<field name="evid">501</field>
<field name="ssid">ESQ.VISION.A72</field>
<field name="evnum">201</field>
<field name="evtext">CpuLoopEnd Process=$Z4B1 CpuPin=0,992
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM
CpuBusy=0 MemPage=24 User=50,10</field>
<field name="proc">\VEGAS.$QQDS</field>
<field name="layer">PLGOVNPM</field>
<field name="evtime">2008-10-07T03:00:30.0Z</field>
<field name="logtime">2008-10-07T10:02:27.95Z</field>
<field name="curts">1247905648000</field>
</doc>
.....
</add>
i just load the currentTimeStamps long value into the add doc xml to
load into solr.
Chris Hostetter wrote:
> I'm not really understanding how you could get the situation you describe
> ... which suggests that one (or both) of us don't understand exactly what
> happened.
>
> if you can post the actual schema.xml file you used and an example of the
> input you indexed perhaps we can spot the discrepency.
>
> FWIW: using a timestamp as a uniqueKey doesn't make much sense ...
>
> 1) if you have heavy parallelization two docs indexed at the exact same
> time might overwrite eachother.
> 2) you have no way of ever replacing an existing doc (unless you roll the
> clock back) in which case there's no advantage to using a uniqueKey --
> so you might as leave it out of your schema (which makes indexing
> slightly faster)
>
> : I need to run around 10 million records to index, by solr.
> : I has nearly 2lakh records, so i made a program to looping it till 10 million.
> : Here, i specified 20 fields in schema.xml file. the unoque field i set was,
> : currentTimeStamp field.
> : So, when i run the loader program (which loads xml data into solr) it creates
> : currentTimestamp value...and loads into solr.
> :
> : For this situation,
> : i stopped the loader program, after 100 records indexed into solr.
> : Then again, i run the loader program for the SAME 100 records to indexed
> : means,
> : the solr results 100, rather than 200.
> :
> : Because, i set currentTimeStamp field as uniqueField. So i expect the result
> : as 200, if i run again the same 100 records...
> :
> : Any suggestions please...
>
>
>
> -Hoss
>
>
>
Re: solr indexing on same set of records with different value of
unique field...not working...
Posted by Chris Hostetter <ho...@fucit.org>.
I'm not really understanding how you could get the situation you describe
... which suggests that one (or both) of us don't understand exactly what
happened.
if you can post the actual schema.xml file you used and an example of the
input you indexed perhaps we can spot the discrepency.
FWIW: using a timestamp as a uniqueKey doesn't make much sense ...
1) if you have heavy parallelization two docs indexed at the exact same
time might overwrite eachother.
2) you have no way of ever replacing an existing doc (unless you roll the
clock back) in which case there's no advantage to using a uniqueKey --
so you might as leave it out of your schema (which makes indexing
slightly faster)
: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i set was,
: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) it creates
: currentTimestamp value...and loads into solr.
:
: For this situation,
: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to indexed
: means,
: the solr results 100, rather than 200.
:
: Because, i set currentTimeStamp field as uniqueField. So i expect the result
: as 200, if i run again the same 100 records...
:
: Any suggestions please...
-Hoss