You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Justin L." <jt...@gmail.com> on 2013/03/19 00:49:51 UTC

DIH silently ignoring a record

Every time I do an import, DataImportHandler is not importing 1 row from my
database.

I have 3 entities each defined with a single query. I have confirmed, by
looking at totals from solr as well as comparing a "*:*" query to direct db
queries-- exactly 1 row is missing every time. And its the same row- the
first row of one of my entities when sorted by primary key. The other two
entities are fully imported without trouble.

There are no errors in the log- even when DIH logging is turned up to FINE.
When I alter the query to retrieve only the mysterious record, it shows up
as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:* it
returns 0 documents.

Ready for a twist? The DIH query for this entity does not have an ORDER BY
clause- when I add one to sort by primary key DESC it imports all of the
rows for that entity, including the mysterious record.

Ready to have your mind blown? I am using the alternative method for doing
delta imports (see query below). When I make clean=false, and update the
timestamp on the mysterious record- yup- it gets imported properly.



Because I have the ORDER BY DESC hack, I can get by and live to fight
another day. But I thought someone might like to know this because I think
I am hitting a bug in DIH- specifically, something after the querying but
before the posting to solr. If someone familiar with DIH innards wants to
suggest where I should look or how to step through it, I'd be willing to
take a look.

xoxo,
Justin


* Fun facts:
Solr 4.0
Oracle 11g
The mysterious record's id is "000001"
I use field elements to rename the columns rather than in-the-sql aliases
because of a problem I had with them earlier. But I will try changing that.


* Alternative delta import method:

http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport


* DIH query that should import mysterious record:

select organization_name, organization_id, address
from organization o
join rolodex r on r.rolodex_id = o.contact_address_id
and r.sponsor_address_flag = 'N'
and r.actv_ind = 'Y'
where '${dataimporter.request.clean}' = 'true'
or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') >
'${dataimporter.organization.last_index_time

Re: DIH silently ignoring a record

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Okay, thanks for clarifying.


On Wed, Mar 20, 2013 at 12:11 AM, Justin L. <jt...@gmail.com> wrote:

> Shalin,
>
> Thanks for your questions- the mystery is solved this morning. My "unique"
> key was only unique within an entity and not between them. There was only
> one instance of overlap- the no-longer mysterious record and its
> doppelganger.
>
> All the other symptoms were side effects from how I was troubleshooting.
> For example, if I did a full import, the doppelganger record (which I didnt
> know about) would be imported- but my test query was only looking for the
> one that didnt make it in. However, if I imported only that entity, it
> would, as expected, update the index record and things would appear fine to
> me.
>
> So, no bug. Just plain old bad/narrow troubleshooting combined with
> coincidence (only record not getting imported is first row, etc).
>
> -justin
>
>
> On Mon, Mar 18, 2013 at 7:34 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
> > That does sound perplexing.
> >
> > Justin, can you tell us which field in the query is your record id? What
> is
> > the record id's type in database and in solr schema? What is your unique
> > key and its type in solr schema?
> >
> >
> > On Tue, Mar 19, 2013 at 5:19 AM, Justin L. <jt...@gmail.com> wrote:
> >
> > > Every time I do an import, DataImportHandler is not importing 1 row
> from
> > my
> > > database.
> > >
> > > I have 3 entities each defined with a single query. I have confirmed,
> by
> > > looking at totals from solr as well as comparing a "*:*" query to
> direct
> > db
> > > queries-- exactly 1 row is missing every time. And its the same row-
> the
> > > first row of one of my entities when sorted by primary key. The other
> two
> > > entities are fully imported without trouble.
> > >
> > > There are no errors in the log- even when DIH logging is turned up to
> > FINE.
> > > When I alter the query to retrieve only the mysterious record, it shows
> > up
> > > as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:*
> it
> > > returns 0 documents.
> > >
> > > Ready for a twist? The DIH query for this entity does not have an ORDER
> > BY
> > > clause- when I add one to sort by primary key DESC it imports all of
> the
> > > rows for that entity, including the mysterious record.
> > >
> > > Ready to have your mind blown? I am using the alternative method for
> > doing
> > > delta imports (see query below). When I make clean=false, and update
> the
> > > timestamp on the mysterious record- yup- it gets imported properly.
> > >
> > >
> > >
> > > Because I have the ORDER BY DESC hack, I can get by and live to fight
> > > another day. But I thought someone might like to know this because I
> > think
> > > I am hitting a bug in DIH- specifically, something after the querying
> but
> > > before the posting to solr. If someone familiar with DIH innards wants
> to
> > > suggest where I should look or how to step through it, I'd be willing
> to
> > > take a look.
> > >
> > > xoxo,
> > > Justin
> > >
> > >
> > > * Fun facts:
> > > Solr 4.0
> > > Oracle 11g
> > > The mysterious record's id is "000001"
> > > I use field elements to rename the columns rather than in-the-sql
> aliases
> > > because of a problem I had with them earlier. But I will try changing
> > that.
> > >
> > >
> > > * Alternative delta import method:
> > >
> > > http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
> > >
> > >
> > > * DIH query that should import mysterious record:
> > >
> > > select organization_name, organization_id, address
> > > from organization o
> > > join rolodex r on r.rolodex_id = o.contact_address_id
> > > and r.sponsor_address_flag = 'N'
> > > and r.actv_ind = 'Y'
> > > where '${dataimporter.request.clean}' = 'true'
> > > or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') >
> > > '${dataimporter.organization.last_index_time
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH silently ignoring a record

Posted by "Justin L." <jt...@gmail.com>.
Shalin,

Thanks for your questions- the mystery is solved this morning. My "unique"
key was only unique within an entity and not between them. There was only
one instance of overlap- the no-longer mysterious record and its
doppelganger.

All the other symptoms were side effects from how I was troubleshooting.
For example, if I did a full import, the doppelganger record (which I didnt
know about) would be imported- but my test query was only looking for the
one that didnt make it in. However, if I imported only that entity, it
would, as expected, update the index record and things would appear fine to
me.

So, no bug. Just plain old bad/narrow troubleshooting combined with
coincidence (only record not getting imported is first row, etc).

-justin


On Mon, Mar 18, 2013 at 7:34 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> That does sound perplexing.
>
> Justin, can you tell us which field in the query is your record id? What is
> the record id's type in database and in solr schema? What is your unique
> key and its type in solr schema?
>
>
> On Tue, Mar 19, 2013 at 5:19 AM, Justin L. <jt...@gmail.com> wrote:
>
> > Every time I do an import, DataImportHandler is not importing 1 row from
> my
> > database.
> >
> > I have 3 entities each defined with a single query. I have confirmed, by
> > looking at totals from solr as well as comparing a "*:*" query to direct
> db
> > queries-- exactly 1 row is missing every time. And its the same row- the
> > first row of one of my entities when sorted by primary key. The other two
> > entities are fully imported without trouble.
> >
> > There are no errors in the log- even when DIH logging is turned up to
> FINE.
> > When I alter the query to retrieve only the mysterious record, it shows
> up
> > as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:* it
> > returns 0 documents.
> >
> > Ready for a twist? The DIH query for this entity does not have an ORDER
> BY
> > clause- when I add one to sort by primary key DESC it imports all of the
> > rows for that entity, including the mysterious record.
> >
> > Ready to have your mind blown? I am using the alternative method for
> doing
> > delta imports (see query below). When I make clean=false, and update the
> > timestamp on the mysterious record- yup- it gets imported properly.
> >
> >
> >
> > Because I have the ORDER BY DESC hack, I can get by and live to fight
> > another day. But I thought someone might like to know this because I
> think
> > I am hitting a bug in DIH- specifically, something after the querying but
> > before the posting to solr. If someone familiar with DIH innards wants to
> > suggest where I should look or how to step through it, I'd be willing to
> > take a look.
> >
> > xoxo,
> > Justin
> >
> >
> > * Fun facts:
> > Solr 4.0
> > Oracle 11g
> > The mysterious record's id is "000001"
> > I use field elements to rename the columns rather than in-the-sql aliases
> > because of a problem I had with them earlier. But I will try changing
> that.
> >
> >
> > * Alternative delta import method:
> >
> > http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
> >
> >
> > * DIH query that should import mysterious record:
> >
> > select organization_name, organization_id, address
> > from organization o
> > join rolodex r on r.rolodex_id = o.contact_address_id
> > and r.sponsor_address_flag = 'N'
> > and r.actv_ind = 'Y'
> > where '${dataimporter.request.clean}' = 'true'
> > or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') >
> > '${dataimporter.organization.last_index_time
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: DIH silently ignoring a record

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
That does sound perplexing.

Justin, can you tell us which field in the query is your record id? What is
the record id's type in database and in solr schema? What is your unique
key and its type in solr schema?


On Tue, Mar 19, 2013 at 5:19 AM, Justin L. <jt...@gmail.com> wrote:

> Every time I do an import, DataImportHandler is not importing 1 row from my
> database.
>
> I have 3 entities each defined with a single query. I have confirmed, by
> looking at totals from solr as well as comparing a "*:*" query to direct db
> queries-- exactly 1 row is missing every time. And its the same row- the
> first row of one of my entities when sorted by primary key. The other two
> entities are fully imported without trouble.
>
> There are no errors in the log- even when DIH logging is turned up to FINE.
> When I alter the query to retrieve only the mysterious record, it shows up
> as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:* it
> returns 0 documents.
>
> Ready for a twist? The DIH query for this entity does not have an ORDER BY
> clause- when I add one to sort by primary key DESC it imports all of the
> rows for that entity, including the mysterious record.
>
> Ready to have your mind blown? I am using the alternative method for doing
> delta imports (see query below). When I make clean=false, and update the
> timestamp on the mysterious record- yup- it gets imported properly.
>
>
>
> Because I have the ORDER BY DESC hack, I can get by and live to fight
> another day. But I thought someone might like to know this because I think
> I am hitting a bug in DIH- specifically, something after the querying but
> before the posting to solr. If someone familiar with DIH innards wants to
> suggest where I should look or how to step through it, I'd be willing to
> take a look.
>
> xoxo,
> Justin
>
>
> * Fun facts:
> Solr 4.0
> Oracle 11g
> The mysterious record's id is "000001"
> I use field elements to rename the columns rather than in-the-sql aliases
> because of a problem I had with them earlier. But I will try changing that.
>
>
> * Alternative delta import method:
>
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
>
>
> * DIH query that should import mysterious record:
>
> select organization_name, organization_id, address
> from organization o
> join rolodex r on r.rolodex_id = o.contact_address_id
> and r.sponsor_address_flag = 'N'
> and r.actv_ind = 'Y'
> where '${dataimporter.request.clean}' = 'true'
> or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') >
> '${dataimporter.organization.last_index_time
>



-- 
Regards,
Shalin Shekhar Mangar.