You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Giovanni De Stefano <gi...@gmail.com> on 2009/03/17 16:18:03 UTC
Solr: delta-import, help needed
Hello all,
I have a table TEST in an Oracle DB with the following columns: URI
(varchar), CONTENT (varchar), CREATION_TIME (date).
The primary key both in the DB and Solr is URI.
Here is my data-config.xml:
<dataConfig>
<dataSource
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@localhost:1521/XE"
user="username"
password="password"
/>
<document name="Test">
<entity
name="test_item"
pk="URI"
query="select URI,CONTENT from TEST"
* deltaQuery="select URI,CONTENT from TEST where
TO_CHAR(CREATION_TIME,'YYYY-MM-DD HH:MI:SS') >
'${dataimporter.last_index_time}'" *
>
<field column="URI" name="uri"/>
<field column="CONTENT" name="content"/>
</entity>
</document>
</dataConfig>
The problem is that anytime I perform a delta-import, the index keeps being
populated as if new documents were added. In other words, I am not able to
UPDATE an existing document or REMOVE a document that is not anymore in the
DB.
What am I missing? How should I specify my deltaQuery?
Thanks a lot in advance!
Giovanni
Re: Solr: delta-import, help needed
Posted by Giovanni De Stefano <gi...@gmail.com>.
Hello Paul,
thank you for your feedback. I will ask to add an expiration date to the DB
and run a process that updates the index accordingly.
Cheers,
Giovanni
On 3/18/09, Noble Paul നോബിള് नोब्ळ् <no...@gmail.com> wrote:
>
> it is not possible to query details from Solr and find out deleted
> items using DIH
>
> you must maintain a deleted rows ids in the db or just flag them as
> deleted.
>
> --Noble
>
>
>
> On Wed, Mar 18, 2009 at 2:46 PM, Giovanni De Stefano
> <gi...@gmail.com> wrote:
> > Hello Paul,
> >
> > thank you for your reply.
> >
> > The UPDATE in fact works fine: I only had to update the CREATION_TIME on
> the
> > DB :-)
> >
> > Regarding the deletedPkQuery, I understand it has to return the primary
> keys
> > that should be removed from the index (because they have been removed
> from
> > the DB) but I don't have any "deleted" flag on the DB.
> >
> > Basically the deletedPkQuery should be something like "select URI *
> > from_the_current_index* where URI is not in (select URI from TEST)"
> >
> > That is returning a subset of primary keys currently in the index and
> that
> > are not in the DB anymore. Is this possible?
> >
> > I am no DB expert...so ANY tip is very welcome!
> >
> > Thanks,
> > Giovanni
> >
> >
> > On 3/18/09, Noble Paul നോബിള് नोब्ळ् <no...@gmail.com> wrote:
> >>
> >> are you sure your schema.xml has a <uniqueKey> field to UPDATE docs.
> >>
> >> to remove deleted docs you must have deletedPkQuery attribute in the
> root
> >> entity
> >>
> >> On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano
> >> <gi...@gmail.com> wrote:
> >> > Hello all,
> >> >
> >> > I have a table TEST in an Oracle DB with the following columns: URI
> >> > (varchar), CONTENT (varchar), CREATION_TIME (date).
> >> >
> >> > The primary key both in the DB and Solr is URI.
> >> >
> >> > Here is my data-config.xml:
> >> >
> >> > <dataConfig>
> >> > <dataSource
> >> > driver="oracle.jdbc.driver.OracleDriver"
> >> > url="jdbc:oracle:thin:@localhost:1521/XE"
> >> > user="username"
> >> > password="password"
> >> > />
> >> > <document name="Test">
> >> > <entity
> >> > name="test_item"
> >> > pk="URI"
> >> > query="select URI,CONTENT from TEST"
> >> > * deltaQuery="select URI,CONTENT from TEST where
> >> > TO_CHAR(CREATION_TIME,'YYYY-MM-DD HH:MI:SS') >
> >> > '${dataimporter.last_index_time}'" *
> >> > >
> >> > <field column="URI" name="uri"/>
> >> > <field column="CONTENT" name="content"/>
> >> > </entity>
> >> > </document>
> >> > </dataConfig>
> >> >
> >> > The problem is that anytime I perform a delta-import, the index keeps
> >> being
> >> > populated as if new documents were added. In other words, I am not
> able
> >> to
> >> > UPDATE an existing document or REMOVE a document that is not anymore
> in
> >> the
> >> > DB.
> >> >
> >> > What am I missing? How should I specify my deltaQuery?
> >> >
> >> > Thanks a lot in advance!
> >> >
> >> > Giovanni
> >> >
> >>
> >>
> >>
> >> --
> >> --Noble Paul
> >>
> >
>
>
>
> --
> --Noble Paul
>
Re: Solr: delta-import, help needed
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
it is not possible to query details from Solr and find out deleted
items using DIH
you must maintain a deleted rows ids in the db or just flag them as deleted.
--Noble
On Wed, Mar 18, 2009 at 2:46 PM, Giovanni De Stefano
<gi...@gmail.com> wrote:
> Hello Paul,
>
> thank you for your reply.
>
> The UPDATE in fact works fine: I only had to update the CREATION_TIME on the
> DB :-)
>
> Regarding the deletedPkQuery, I understand it has to return the primary keys
> that should be removed from the index (because they have been removed from
> the DB) but I don't have any "deleted" flag on the DB.
>
> Basically the deletedPkQuery should be something like "select URI *
> from_the_current_index* where URI is not in (select URI from TEST)"
>
> That is returning a subset of primary keys currently in the index and that
> are not in the DB anymore. Is this possible?
>
> I am no DB expert...so ANY tip is very welcome!
>
> Thanks,
> Giovanni
>
>
> On 3/18/09, Noble Paul നോബിള് नोब्ळ् <no...@gmail.com> wrote:
>>
>> are you sure your schema.xml has a <uniqueKey> field to UPDATE docs.
>>
>> to remove deleted docs you must have deletedPkQuery attribute in the root
>> entity
>>
>> On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano
>> <gi...@gmail.com> wrote:
>> > Hello all,
>> >
>> > I have a table TEST in an Oracle DB with the following columns: URI
>> > (varchar), CONTENT (varchar), CREATION_TIME (date).
>> >
>> > The primary key both in the DB and Solr is URI.
>> >
>> > Here is my data-config.xml:
>> >
>> > <dataConfig>
>> > <dataSource
>> > driver="oracle.jdbc.driver.OracleDriver"
>> > url="jdbc:oracle:thin:@localhost:1521/XE"
>> > user="username"
>> > password="password"
>> > />
>> > <document name="Test">
>> > <entity
>> > name="test_item"
>> > pk="URI"
>> > query="select URI,CONTENT from TEST"
>> > * deltaQuery="select URI,CONTENT from TEST where
>> > TO_CHAR(CREATION_TIME,'YYYY-MM-DD HH:MI:SS') >
>> > '${dataimporter.last_index_time}'" *
>> > >
>> > <field column="URI" name="uri"/>
>> > <field column="CONTENT" name="content"/>
>> > </entity>
>> > </document>
>> > </dataConfig>
>> >
>> > The problem is that anytime I perform a delta-import, the index keeps
>> being
>> > populated as if new documents were added. In other words, I am not able
>> to
>> > UPDATE an existing document or REMOVE a document that is not anymore in
>> the
>> > DB.
>> >
>> > What am I missing? How should I specify my deltaQuery?
>> >
>> > Thanks a lot in advance!
>> >
>> > Giovanni
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>
--
--Noble Paul
Re: Solr: delta-import, help needed
Posted by Giovanni De Stefano <gi...@gmail.com>.
Hello Paul,
thank you for your reply.
The UPDATE in fact works fine: I only had to update the CREATION_TIME on the
DB :-)
Regarding the deletedPkQuery, I understand it has to return the primary keys
that should be removed from the index (because they have been removed from
the DB) but I don't have any "deleted" flag on the DB.
Basically the deletedPkQuery should be something like "select URI *
from_the_current_index* where URI is not in (select URI from TEST)"
That is returning a subset of primary keys currently in the index and that
are not in the DB anymore. Is this possible?
I am no DB expert...so ANY tip is very welcome!
Thanks,
Giovanni
On 3/18/09, Noble Paul നോബിള് नोब्ळ् <no...@gmail.com> wrote:
>
> are you sure your schema.xml has a <uniqueKey> field to UPDATE docs.
>
> to remove deleted docs you must have deletedPkQuery attribute in the root
> entity
>
> On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano
> <gi...@gmail.com> wrote:
> > Hello all,
> >
> > I have a table TEST in an Oracle DB with the following columns: URI
> > (varchar), CONTENT (varchar), CREATION_TIME (date).
> >
> > The primary key both in the DB and Solr is URI.
> >
> > Here is my data-config.xml:
> >
> > <dataConfig>
> > <dataSource
> > driver="oracle.jdbc.driver.OracleDriver"
> > url="jdbc:oracle:thin:@localhost:1521/XE"
> > user="username"
> > password="password"
> > />
> > <document name="Test">
> > <entity
> > name="test_item"
> > pk="URI"
> > query="select URI,CONTENT from TEST"
> > * deltaQuery="select URI,CONTENT from TEST where
> > TO_CHAR(CREATION_TIME,'YYYY-MM-DD HH:MI:SS') >
> > '${dataimporter.last_index_time}'" *
> > >
> > <field column="URI" name="uri"/>
> > <field column="CONTENT" name="content"/>
> > </entity>
> > </document>
> > </dataConfig>
> >
> > The problem is that anytime I perform a delta-import, the index keeps
> being
> > populated as if new documents were added. In other words, I am not able
> to
> > UPDATE an existing document or REMOVE a document that is not anymore in
> the
> > DB.
> >
> > What am I missing? How should I specify my deltaQuery?
> >
> > Thanks a lot in advance!
> >
> > Giovanni
> >
>
>
>
> --
> --Noble Paul
>
Re: Solr: delta-import, help needed
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
are you sure your schema.xml has a <uniqueKey> field to UPDATE docs.
to remove deleted docs you must have deletedPkQuery attribute in the root entity
On Tue, Mar 17, 2009 at 8:48 PM, Giovanni De Stefano
<gi...@gmail.com> wrote:
> Hello all,
>
> I have a table TEST in an Oracle DB with the following columns: URI
> (varchar), CONTENT (varchar), CREATION_TIME (date).
>
> The primary key both in the DB and Solr is URI.
>
> Here is my data-config.xml:
>
> <dataConfig>
> <dataSource
> driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@localhost:1521/XE"
> user="username"
> password="password"
> />
> <document name="Test">
> <entity
> name="test_item"
> pk="URI"
> query="select URI,CONTENT from TEST"
> * deltaQuery="select URI,CONTENT from TEST where
> TO_CHAR(CREATION_TIME,'YYYY-MM-DD HH:MI:SS') >
> '${dataimporter.last_index_time}'" *
> >
> <field column="URI" name="uri"/>
> <field column="CONTENT" name="content"/>
> </entity>
> </document>
> </dataConfig>
>
> The problem is that anytime I perform a delta-import, the index keeps being
> populated as if new documents were added. In other words, I am not able to
> UPDATE an existing document or REMOVE a document that is not anymore in the
> DB.
>
> What am I missing? How should I specify my deltaQuery?
>
> Thanks a lot in advance!
>
> Giovanni
>
--
--Noble Paul