You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 5ton3 <oy...@gmail.com> on 2014/10/31 13:11:02 UTC

The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor

Hi!

Not sure if this is a problem or if I just don't understand the debug
response, but it seems somewhat odd to me.
The "main" entity can have multiple BLOB documents. I'm using Tika Entity
Processor to retrieve the body (plaintext) from these documents and put the
result in a multivalued field, "filedata".  The data-config looks like this:


It seems to work properly, but when I debug the data import, it seems that
the query on TABLE2 on the BLOB column ("FILEDATA_BIN") gets executed 1 time
for document #1, which is correct, but 2 times for document #2, 3 times for
document #3, and so on.
I.e. for document #1:

And for document #2:

The result seems correct, ie. it doesn't duplicate the filedata. But why
does it query the DB two times for document #2? Any ideas? Maybe something
wrong in my config?



--
View this message in context: http://lucene.472066.n3.nabble.com/The-exact-same-query-gets-executed-n-times-for-the-nth-row-when-retrieving-body-plaintext-from-BLOB-r-tp4166822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor

Posted by Erick Erickson <er...@gmail.com>.
Your message looks like it's missing stuff (snapshots?), the
e-mail for this list generally strips attachments, so you'll
have to put them somewhere else and link to them if you
want us to see them.

Best,
Erick

On Fri, Oct 31, 2014 at 5:11 AM, 5ton3 <oy...@gmail.com> wrote:
> Hi!
>
> Not sure if this is a problem or if I just don't understand the debug
> response, but it seems somewhat odd to me.
> The "main" entity can have multiple BLOB documents. I'm using Tika Entity
> Processor to retrieve the body (plaintext) from these documents and put the
> result in a multivalued field, "filedata".  The data-config looks like this:
>
>
> It seems to work properly, but when I debug the data import, it seems that
> the query on TABLE2 on the BLOB column ("FILEDATA_BIN") gets executed 1 time
> for document #1, which is correct, but 2 times for document #2, 3 times for
> document #3, and so on.
> I.e. for document #1:
>
> And for document #2:
>
> The result seems correct, ie. it doesn't duplicate the filedata. But why
> does it query the DB two times for document #2? Any ideas? Maybe something
> wrong in my config?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/The-exact-same-query-gets-executed-n-times-for-the-nth-row-when-retrieving-body-plaintext-from-BLOB-r-tp4166822.html
> Sent from the Solr - User mailing list archive at Nabble.com.