You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Bert van Hoesel <bh...@scamander.com> on 2013/02/11 09:57:38 UTC

Oracle jdbc queried documents not read and not ingested into solr

Hi,

I have setup an Oracle JDBC repository and a Solr output. Obvious I want 
to get the Oracle documents ingested into the Solr output. But when 
running a job to get this done, nothing gets read and ingested into the 
Solr output.

The Solr output is working since I can get documents ingested into the 
Solr output when using a file system  repository.

The Oracle queries are correct as far as I can check. The seeding and 
data queries work when issued directly to Oracle.

But the oracle - solr combination is not working. The Simple History 
shows the job is started and the two external queries execute without 
error. But no 'read' en 'ingest' actions show up in the history. The 
document count in the job status shows the number of documents and the 
number processed. These are the expected numbers.

As far as I can see in the Oracle database the queries are indeed 
executed by the database. When copying and executing (after a little 
edditing) the external queries from the history including shown bind 
data to a direct Oracle interface they do retrieve the expected rows.

What am I missing to get the documents (rows) from Oracle ingested into 
Solr?

Thanks in advance.

Regards,

Bert van Hoesel.

PS:
Please find below the seeding and data queries, both from definition and 
history

SEED:
=====
select mi.menu_id as "$(IDCOLUMN)"
from   sn_menu_items mi
where  1=1
and    mi.menu_type     = 'W'
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) > 
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + 
round($(STARTTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) <= 
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + 
round($(ENDTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) > 
sysdate - 200 /* just for testing */
connect by prior mi.menu_id = mi.top_menu_id
start with mi.menu_id in (185837,275)

DATA:
=====
select mi.menu_id "$(IDCOLUMN)"
,      'Thiz iz id: ' || to_char(mi.menu_id) "$(DATACOLUMN)"
,      '<a 
href="http://www.somesite.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p_fld='||mi.top_menu_id||'">'||mi.display_text||'</a>' 
"$(URLCOLUMN)"
from   sn_menu_items mi
,      sn_wiki_item  wi
where  mi.menu_id    in $(IDLIST)
and    mi.wiki_id     = wi.id

The external queries from the historie:

SEED:
=====
select mi.menu_id as "lcf__id" from sn_menu_items mi wher...
e 1=1 and mi.menu_type = 'W' and greatest(mi.dt_...
created, nvl(mi.dt_updated, mi.dt_created)) > to_date( '1970...
/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + round(?/86400000...
) and greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_cr...
eated)) <= to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:m...
i:ss') + round(?/86400000) and greatest(mi.dt_created, nv...
l(mi.dt_updated, mi.dt_created)) > sysdate - 200 connect by...
prior mi.menu_id = mi.top_menu_id start with mi.menu_id in ...
(185837,275); arguments = (0,1360570171797)

DATA:
=====
select mi.menu_id "lcf__id" , 'Thiz iz id: ' || to_char...
(mi.menu_id) "lcf__data" , '<a href="http://www.some...
site.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p...
_fld='||mi.top_menu_id||'">'||mi.display_text||'</a>' "lcf__u...
rl" from sn_menu_items mi , sn_wiki_item wi where ...
mi.menu_id in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?...
,?,?,?,?,?,?,?,?,?,?,?,?) and mi.wiki_id = wi.id; ar...
guments = ('185904','185641','185885','188488','184738','1856...
12','185853','185889','186158','185117','184723','185901','18...
6249','185263','185886','185229','190366','185103','185900','...
185892','184696','185613','185104','185903','184822','185116'...
,'185979','185866','185896','157508','185893','186185','185902')

Re: next step in implementing manifold: user authentication

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi Karl,

Thanks. That was the missing link I was looking for. So far I did not come across that variable name. The way I checked it works was the 'negation' way (not sure if the term s used correctly). I did not know what was needed so I presumed that if it is not set it will not authorize. And that seemed to work ;-) .

Thanks again. Up to the next step.

Regards,

Bert.

On 02/18/2013 04:01 PM, Karl Wright wrote:

Do you mean, what URL argument does the Apache Solr 4.x Plugin expect
to see the authenticated user ID? I would have thought you'd already
need that to confirm that everything works. But in case you didn't
find it anywhere, it's "AuthenticatedUserName".

Karl

On Mon, Feb 18, 2013 at 9:51 AM, Bert van Hoesel <bh...@scamander.com> wrote:

Hi Karl,

The construct this way is clear. I hoped it would be more 'transparent' to
the underlying processes.

The next question that raises is: what is the (environment) variable name
that ManifoldCF is expecting the authenticated username in? This is for me
the 'missing' link in the setup. I have no clue what (as an example) to
'append' to the url to convey the username to ManifoldCF. Or is this
configurable? If so where can I find it. As So far it has escaped my
attention.

Regards,

Bert.

On 02/18/2013 03:33 PM, Karl Wright wrote:

Hi Bert,

Typically the authenticated user name would get passed from
mod-auth-kerb to Tomcat (or whatever the app server is you are running
solr under) as an argument, maybe appended to the url. It's going to
be up to you to figure out how to do that. Others may have more
concrete suggestions.

Karl

On Mon, Feb 18, 2013 at 9:28 AM, Bert van Hoesel <bh...@scamander.com>
wrote:

Hi Karl,

To be more precise. We are trying to get an 'sightly' customized Blacklight
fronted to connect to solr via ManifoldCF with authorization (obvious).
Blacklight is running from within Apache. So that would be a pre for
mod-auth-kerb. But ManifoldCF is running from within a Tomcat instance. In
this construct it is still not clear to me how and if this is going to work.
Technically, I am still missing the link between the login on Apache and the
authentication / user 'handover' to the Tomcat environment for Manifold.

So if anyone can pitch in to describe their solution. It would be much
appreciated.

Regards,

Bert.

On 02/18/2013 03:09 PM, Karl Wright wrote:

Hi Bert,

Others, I hope, will chime in on this thread and let you know what
precise solutions they have adopted. But, in general, the solution
you use will depend on the environment you intend to run in. As you
point out, JAAS authentication is an option, should you be able to
find an appropriate JAAS plugin that does what you want. If you want
to do things via the Apache web server, I'd look at mod-auth-kerb
rather than mod-authz. Others, no doubt, have less generic
suggestions.

Karl

On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com>
wrote:

Hi,

At the moment for the most part it is clear how to install, configure and
populate manifoldcd and solr with authorized data. Using the added
Manifoldcf 'search' url I can see I do not have access to any 'authorized'
documents. Indeed I only see the non authorized documents.

Thus the next step would be an authentication mechanism on top of this. I
have been looking 'around' but was not able to find enough pointers on how
to accomplish this. Two 'obvious' paths seem to be available: JAAS or apache
mod_authz. But maybe other solutions exists. Most preferable options are
those with minimal (java) programming.

Biggest issue at the moment is that I can not figure out how authentication
data is propagated into ManifoldCF.

Can anybody point me to some howtoo's or documentation of some kind on how
to accomplish this authentication on top of ManifoldCF.

Thanks in advance.

Regards,

Bert.

Re: next step in implementing manifold: user authentication

Posted by Karl Wright <da...@gmail.com>.

Do you mean, what URL argument does the Apache Solr 4.x Plugin expect
to see the authenticated user ID?  I would have thought you'd already
need that to confirm that everything works.  But in case you didn't
find it anywhere, it's "AuthenticatedUserName".

Karl

On Mon, Feb 18, 2013 at 9:51 AM, Bert van Hoesel <bh...@scamander.com> wrote:
> Hi Karl,
>
> The construct this way is clear. I hoped it would be more 'transparent' to
> the underlying processes.
>
> The next question that raises is: what is the (environment) variable name
> that ManifoldCF is expecting the authenticated username in? This is for me
> the 'missing' link in the setup. I have no clue what (as an example) to
> 'append' to the url to convey the username to ManifoldCF. Or is this
> configurable? If so where can I find it. As So far it has escaped my
> attention.
>
> Regards,
>
> Bert.
>
> On 02/18/2013 03:33 PM, Karl Wright wrote:
>
> Hi Bert,
>
> Typically the authenticated user name would get passed from
> mod-auth-kerb to Tomcat (or whatever the app server is you are running
> solr under) as an argument, maybe appended to the url.  It's going to
> be up to you to figure out how to do that.  Others may have more
> concrete suggestions.
>
> Karl
>
> On Mon, Feb 18, 2013 at 9:28 AM, Bert van Hoesel <bh...@scamander.com>
> wrote:
>
> Hi Karl,
>
> To be more precise. We are trying to get an 'sightly' customized Blacklight
> fronted to connect to solr via ManifoldCF with authorization (obvious).
> Blacklight is running from within Apache. So that would be a pre for
> mod-auth-kerb. But ManifoldCF is running from within a Tomcat instance. In
> this construct it is still not clear to me how and if this is going to work.
> Technically, I am still missing the link between the login on Apache and the
> authentication / user 'handover' to the Tomcat environment for Manifold.
>
> So if anyone can pitch in to describe their solution. It would be much
> appreciated.
>
> Regards,
>
> Bert.
>
>
> On 02/18/2013 03:09 PM, Karl Wright wrote:
>
> Hi Bert,
>
> Others, I hope, will chime in on this thread and let you know what
> precise solutions they have adopted.  But, in general, the solution
> you use will depend on the environment you intend to run in.  As you
> point out, JAAS authentication is an option, should you be able to
> find an appropriate JAAS plugin that does what you want.  If you want
> to do things via the Apache web server, I'd look at mod-auth-kerb
> rather than mod-authz.  Others, no doubt, have less generic
> suggestions.
>
> Karl
>
> On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com>
> wrote:
>
> Hi,
>
> At the moment for the most part it is clear how to install, configure and
> populate manifoldcd and solr with authorized data. Using the added
> Manifoldcf 'search' url I can see I do not have access to any 'authorized'
> documents. Indeed I only see the non authorized documents.
>
> Thus the next step would be an authentication mechanism on top of this. I
> have been looking 'around' but was not able to find enough pointers on how
> to accomplish this. Two 'obvious' paths seem to be available: JAAS or apache
> mod_authz. But maybe other solutions exists. Most preferable options are
> those with minimal (java) programming.
>
> Biggest issue at the moment is that I can not figure out how authentication
> data is propagated into ManifoldCF.
>
> Can anybody point me to some howtoo's or documentation of some kind on how
> to accomplish this authentication on top of ManifoldCF.
>
> Thanks in advance.
>
> Regards,
>
> Bert.
>
>
>

Re: next step in implementing manifold: user authentication

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi Karl,

The construct this way is clear. I hoped it would be more 'transparent' to the underlying processes.

The next question that raises is: what is the (environment) variable name that ManifoldCF is expecting the authenticated username in? This is for me the 'missing' link in the setup. I have no clue what (as an example) to 'append' to the url to convey the username to ManifoldCF. Or is this configurable? If so where can I find it. As So far it has escaped my attention.

Regards,

Bert.

On 02/18/2013 03:33 PM, Karl Wright wrote:

Hi Bert,

Karl

On Mon, Feb 18, 2013 at 9:28 AM, Bert van Hoesel <bh...@scamander.com> wrote:

Hi Karl,

So if anyone can pitch in to describe their solution. It would be much
appreciated.

Regards,

Bert.

On 02/18/2013 03:09 PM, Karl Wright wrote:

Hi Bert,

Karl

On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com>
wrote:

Hi,

Biggest issue at the moment is that I can not figure out how authentication
data is propagated into ManifoldCF.

Can anybody point me to some howtoo's or documentation of some kind on how
to accomplish this authentication on top of ManifoldCF.

Thanks in advance.

Regards,

Bert.

Re: next step in implementing manifold: user authentication

Posted by Karl Wright <da...@gmail.com>.

Hi Bert,

Typically the authenticated user name would get passed from
mod-auth-kerb to Tomcat (or whatever the app server is you are running
solr under) as an argument, maybe appended to the url.  It's going to
be up to you to figure out how to do that.  Others may have more
concrete suggestions.

Karl

On Mon, Feb 18, 2013 at 9:28 AM, Bert van Hoesel <bh...@scamander.com> wrote:
> Hi Karl,
>
> To be more precise. We are trying to get an 'sightly' customized Blacklight
> fronted to connect to solr via ManifoldCF with authorization (obvious).
> Blacklight is running from within Apache. So that would be a pre for
> mod-auth-kerb. But ManifoldCF is running from within a Tomcat instance. In
> this construct it is still not clear to me how and if this is going to work.
> Technically, I am still missing the link between the login on Apache and the
> authentication / user 'handover' to the Tomcat environment for Manifold.
>
> So if anyone can pitch in to describe their solution. It would be much
> appreciated.
>
> Regards,
>
> Bert.
>
>
> On 02/18/2013 03:09 PM, Karl Wright wrote:
>
> Hi Bert,
>
> Others, I hope, will chime in on this thread and let you know what
> precise solutions they have adopted.  But, in general, the solution
> you use will depend on the environment you intend to run in.  As you
> point out, JAAS authentication is an option, should you be able to
> find an appropriate JAAS plugin that does what you want.  If you want
> to do things via the Apache web server, I'd look at mod-auth-kerb
> rather than mod-authz.  Others, no doubt, have less generic
> suggestions.
>
> Karl
>
> On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com>
> wrote:
>
> Hi,
>
> At the moment for the most part it is clear how to install, configure and
> populate manifoldcd and solr with authorized data. Using the added
> Manifoldcf 'search' url I can see I do not have access to any 'authorized'
> documents. Indeed I only see the non authorized documents.
>
> Thus the next step would be an authentication mechanism on top of this. I
> have been looking 'around' but was not able to find enough pointers on how
> to accomplish this. Two 'obvious' paths seem to be available: JAAS or apache
> mod_authz. But maybe other solutions exists. Most preferable options are
> those with minimal (java) programming.
>
> Biggest issue at the moment is that I can not figure out how authentication
> data is propagated into ManifoldCF.
>
> Can anybody point me to some howtoo's or documentation of some kind on how
> to accomplish this authentication on top of ManifoldCF.
>
> Thanks in advance.
>
> Regards,
>
> Bert.
>
>

Re: next step in implementing manifold: user authentication

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi Karl,

To be more precise. We are trying to get an 'sightly' customized Blacklight fronted to connect to solr via ManifoldCF with authorization (obvious). Blacklight is running from within Apache. So that would be a pre for mod-auth-kerb. But ManifoldCF is running from within a Tomcat instance. In this construct it is still not clear to me how and if this is going to work.
Technically, I am still missing the link between the login on Apache and the authentication / user 'handover' to the Tomcat environment for Manifold.

So if anyone can pitch in to describe their solution. It would be much appreciated.

Regards,

Bert.

On 02/18/2013 03:09 PM, Karl Wright wrote:

Hi Bert,

Karl

On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com> wrote:

Hi,

Biggest issue at the moment is that I can not figure out how authentication
data is propagated into ManifoldCF.

Can anybody point me to some howtoo's or documentation of some kind on how
to accomplish this authentication on top of ManifoldCF.

Thanks in advance.

Regards,

Bert.

Re: next step in implementing manifold: user authentication

Posted by Karl Wright <da...@gmail.com>.

Hi Bert,

Others, I hope, will chime in on this thread and let you know what
precise solutions they have adopted.  But, in general, the solution
you use will depend on the environment you intend to run in.  As you
point out, JAAS authentication is an option, should you be able to
find an appropriate JAAS plugin that does what you want.  If you want
to do things via the Apache web server, I'd look at mod-auth-kerb
rather than mod-authz.  Others, no doubt, have less generic
suggestions.

Karl

On Mon, Feb 18, 2013 at 9:03 AM, Bert van Hoesel <bh...@scamander.com> wrote:
> Hi,
>
> At the moment for the most part it is clear how to install, configure and
> populate manifoldcd and solr with authorized data. Using the added
> Manifoldcf 'search' url I can see I do not have access to any 'authorized'
> documents. Indeed I only see the non authorized documents.
>
> Thus the next step would be an authentication mechanism on top of this. I
> have been looking 'around' but was not able to find enough pointers on how
> to accomplish this. Two 'obvious' paths seem to be available: JAAS or apache
> mod_authz. But maybe other solutions exists. Most preferable options are
> those with minimal (java) programming.
>
> Biggest issue at the moment is that I can not figure out how authentication
> data is propagated into ManifoldCF.
>
> Can anybody point me to some howtoo's or documentation of some kind on how
> to accomplish this authentication on top of ManifoldCF.
>
> Thanks in advance.
>
> Regards,
>
> Bert.
>

next step in implementing manifold: user authentication

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi,

At the moment for the most part it is clear how to install, configure and populate manifoldcd and solr with authorized data. Using the added Manifoldcf 'search' url I can see I do not have access to any 'authorized' documents. Indeed I only see the non authorized documents.

Thus the next step would be an authentication mechanism on top of this. I have been looking 'around' but was not able to find enough pointers on how to accomplish this. Two 'obvious' paths seem to be available: JAAS or apache mod_authz. But maybe other solutions exists. Most preferable options are those with minimal (java) programming.

Biggest issue at the moment is that I can not figure out how authentication data is propagated into ManifoldCF.

Can anybody point me to some howtoo's or documentation of some kind on how to accomplish this authentication on top of ManifoldCF.

Thanks in advance.

Regards,

Bert.

Re: Oracle jdbc queried documents not read and not ingested into solr

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi Karl,

To answer my own question (as usual :-X  just after the fact).

Was looking at manifoldcf.log in the in the wrong directory.

Solved the problem of a wrongly formed url.

Thanks.

Regards,

Bert.

On 02/12/2013 09:36 AM, Bert van Hoesel wrote:
Hi Karl,

Thanks for the response.

Nothing in the manifoldcf.log file, its completely empty. I even tried several properties in the properties.xml (org.apache.manifoldcf.misc, org agents, jobs, perf and cache) set to INFO or DEBUG. But the logfile keeps completely empty at all times.

Am I missing another setting to activate the log?

As for the data query. As far as I could get the info from the docs it should be alright. The query was shown in the original email at the end (see below).  What other columns then the described minimal columns IDCOLUMN, DATACOLUMN, URLCOLUMN and IDLIST should be present in the query?

Regards,

Bert

On 02/11/2013 02:01 PM, Karl Wright wrote:

Is there anything in the manifoldCF log?

The other point is that the queries may run but unless you return the
right information in the right columns it isn't much use to
ManifoldCF.  For the seeding query there isn't much that can go wrong,
except maybe for the start/end time clauses.  Might want to look into
that.

Karl

On Mon, Feb 11, 2013 at 3:57 AM, Bert van Hoesel <bh...@scamander.com> wrote:


Hi,

I have setup an Oracle JDBC repository and a Solr output. Obvious I want
to get the Oracle documents ingested into the Solr output. But when
running a job to get this done, nothing gets read and ingested into the
Solr output.

The Solr output is working since I can get documents ingested into the
Solr output when using a file system  repository.

The Oracle queries are correct as far as I can check. The seeding and
data queries work when issued directly to Oracle.

But the oracle - solr combination is not working. The Simple History
shows the job is started and the two external queries execute without
error. But no 'read' en 'ingest' actions show up in the history. The
document count in the job status shows the number of documents and the
number processed. These are the expected numbers.

As far as I can see in the Oracle database the queries are indeed
executed by the database. When copying and executing (after a little
edditing) the external queries from the history including shown bind
data to a direct Oracle interface they do retrieve the expected rows.

What am I missing to get the documents (rows) from Oracle ingested into
Solr?

Thanks in advance.

Regards,

Bert van Hoesel.

PS:
Please find below the seeding and data queries, both from definition and
history

SEED:
=====
select mi.menu_id as "$(IDCOLUMN)"
from   sn_menu_items mi
where  1=1
and    mi.menu_type     = 'W'
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
round($(STARTTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) <=
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
round($(ENDTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
sysdate - 200 /* just for testing */
connect by prior mi.menu_id = mi.top_menu_id
start with mi.menu_id in (185837,275)

DATA:
=====
select mi.menu_id "$(IDCOLUMN)"
,      'Thiz iz id: ' || to_char(mi.menu_id) "$(DATACOLUMN)"
,      '<a
href="http://www.somesite.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p_fld='||mi.top_menu_id||'"<http://www.somesite.com/xx/xx_wiki.ht_show?p_id=%27%7C%7Cmi.wiki_id%7C%7Cchr%2838%29%7C%7C%27p_fld=%27%7C%7Cmi.top_menu_id%7C%7C%27>>'||mi.display_text||'</a>'
"$(URLCOLUMN)"
from   sn_menu_items mi
,      sn_wiki_item  wi
where  mi.menu_id    in $(IDLIST)
and    mi.wiki_id     = wi.id

The external queries from the historie:

SEED:
=====
select mi.menu_id as "lcf__id" from sn_menu_items mi wher...
e 1=1 and mi.menu_type = 'W' and greatest(mi.dt_...
created, nvl(mi.dt_updated, mi.dt_created)) > to_date( '1970...
/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + round(?/86400000...
) and greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_cr...
eated)) <= to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:m...
i:ss') + round(?/86400000) and greatest(mi.dt_created, nv...
l(mi.dt_updated, mi.dt_created)) > sysdate - 200 connect by...
prior mi.menu_id = mi.top_menu_id start with mi.menu_id in ...
(185837,275); arguments = (0,1360570171797)

DATA:
=====
select mi.menu_id "lcf__id" , 'Thiz iz id: ' || to_char...
(mi.menu_id) "lcf__data" , '<a href="http://www.some...
site.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p...
_fld='||mi.top_menu_id||'"<http://www.some...site.com/xx/xx_wiki.ht_show?p_id=%27%7C%7Cmi.wiki_id%7C%7Cchr%2838%29%7C%7C%27p..._fld=%27%7C%7Cmi.top_menu_id%7C%7C%27>>'||mi.display_text||'</a>' "lcf__u...
rl" from sn_menu_items mi , sn_wiki_item wi where ...
mi.menu_id in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?...
,?,?,?,?,?,?,?,?,?,?,?,?) and mi.wiki_id = wi.id; ar...
guments = ('185904','185641','185885','188488','184738','1856...
12','185853','185889','186158','185117','184723','185901','18...
6249','185263','185886','185229','190366','185103','185900','...
185892','184696','185613','185104','185903','184822','185116'...
,'185979','185866','185896','157508','185893','186185','185902')

Re: Oracle jdbc queried documents not read and not ingested into solr

Posted by Bert van Hoesel <bh...@scamander.com>.

Hi Karl,

Thanks for the response.

Nothing in the manifoldcf.log file, its completely empty. I even tried several properties in the properties.xml (org.apache.manifoldcf.misc, org agents, jobs, perf and cache) set to INFO or DEBUG. But the logfile keeps completely empty at all times.

Am I missing another setting to activate the log?

As for the data query. As far as I could get the info from the docs it should be alright. The query was shown in the original email at the end (see below).  What other columns then the described minimal columns IDCOLUMN, DATACOLUMN, URLCOLUMN and IDLIST should be present in the query?

Regards,

Bert

On 02/11/2013 02:01 PM, Karl Wright wrote:

Is there anything in the manifoldCF log?

The other point is that the queries may run but unless you return the
right information in the right columns it isn't much use to
ManifoldCF.  For the seeding query there isn't much that can go wrong,
except maybe for the start/end time clauses.  Might want to look into
that.

Karl

On Mon, Feb 11, 2013 at 3:57 AM, Bert van Hoesel <bh...@scamander.com> wrote:


Hi,

I have setup an Oracle JDBC repository and a Solr output. Obvious I want
to get the Oracle documents ingested into the Solr output. But when
running a job to get this done, nothing gets read and ingested into the
Solr output.

The Solr output is working since I can get documents ingested into the
Solr output when using a file system  repository.

The Oracle queries are correct as far as I can check. The seeding and
data queries work when issued directly to Oracle.

But the oracle - solr combination is not working. The Simple History
shows the job is started and the two external queries execute without
error. But no 'read' en 'ingest' actions show up in the history. The
document count in the job status shows the number of documents and the
number processed. These are the expected numbers.

As far as I can see in the Oracle database the queries are indeed
executed by the database. When copying and executing (after a little
edditing) the external queries from the history including shown bind
data to a direct Oracle interface they do retrieve the expected rows.

What am I missing to get the documents (rows) from Oracle ingested into
Solr?

Thanks in advance.

Regards,

Bert van Hoesel.

PS:
Please find below the seeding and data queries, both from definition and
history

SEED:
=====
select mi.menu_id as "$(IDCOLUMN)"
from   sn_menu_items mi
where  1=1
and    mi.menu_type     = 'W'
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
round($(STARTTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) <=
to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
round($(ENDTIME)/86400000)
and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
sysdate - 200 /* just for testing */
connect by prior mi.menu_id = mi.top_menu_id
start with mi.menu_id in (185837,275)

DATA:
=====
select mi.menu_id "$(IDCOLUMN)"
,      'Thiz iz id: ' || to_char(mi.menu_id) "$(DATACOLUMN)"
,      '<a
href="http://www.somesite.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p_fld='||mi.top_menu_id||'"<http://www.somesite.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p_fld='||mi.top_menu_id||'>>'||mi.display_text||'</a>'
"$(URLCOLUMN)"
from   sn_menu_items mi
,      sn_wiki_item  wi
where  mi.menu_id    in $(IDLIST)
and    mi.wiki_id     = wi.id

The external queries from the historie:

SEED:
=====
select mi.menu_id as "lcf__id" from sn_menu_items mi wher...
e 1=1 and mi.menu_type = 'W' and greatest(mi.dt_...
created, nvl(mi.dt_updated, mi.dt_created)) > to_date( '1970...
/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + round(?/86400000...
) and greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_cr...
eated)) <= to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:m...
i:ss') + round(?/86400000) and greatest(mi.dt_created, nv...
l(mi.dt_updated, mi.dt_created)) > sysdate - 200 connect by...
prior mi.menu_id = mi.top_menu_id start with mi.menu_id in ...
(185837,275); arguments = (0,1360570171797)

DATA:
=====
select mi.menu_id "lcf__id" , 'Thiz iz id: ' || to_char...
(mi.menu_id) "lcf__data" , '<a href="http://www.some...
site.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p...
_fld='||mi.top_menu_id||'"<http://www.some...site.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p..._fld='||mi.top_menu_id||'>>'||mi.display_text||'</a>' "lcf__u...
rl" from sn_menu_items mi , sn_wiki_item wi where ...
mi.menu_id in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?...
,?,?,?,?,?,?,?,?,?,?,?,?) and mi.wiki_id = wi.id; ar...
guments = ('185904','185641','185885','188488','184738','1856...
12','185853','185889','186158','185117','184723','185901','18...
6249','185263','185886','185229','190366','185103','185900','...
185892','184696','185613','185104','185903','184822','185116'...
,'185979','185866','185896','157508','185893','186185','185902')

Re: Oracle jdbc queried documents not read and not ingested into solr

Posted by Karl Wright <da...@gmail.com>.

Is there anything in the manifoldCF log?

The other point is that the queries may run but unless you return the
right information in the right columns it isn't much use to
ManifoldCF.  For the seeding query there isn't much that can go wrong,
except maybe for the start/end time clauses.  Might want to look into
that.

Karl

On Mon, Feb 11, 2013 at 3:57 AM, Bert van Hoesel <bh...@scamander.com> wrote:
> Hi,
>
> I have setup an Oracle JDBC repository and a Solr output. Obvious I want
> to get the Oracle documents ingested into the Solr output. But when
> running a job to get this done, nothing gets read and ingested into the
> Solr output.
>
> The Solr output is working since I can get documents ingested into the
> Solr output when using a file system  repository.
>
> The Oracle queries are correct as far as I can check. The seeding and
> data queries work when issued directly to Oracle.
>
> But the oracle - solr combination is not working. The Simple History
> shows the job is started and the two external queries execute without
> error. But no 'read' en 'ingest' actions show up in the history. The
> document count in the job status shows the number of documents and the
> number processed. These are the expected numbers.
>
> As far as I can see in the Oracle database the queries are indeed
> executed by the database. When copying and executing (after a little
> edditing) the external queries from the history including shown bind
> data to a direct Oracle interface they do retrieve the expected rows.
>
> What am I missing to get the documents (rows) from Oracle ingested into
> Solr?
>
> Thanks in advance.
>
> Regards,
>
> Bert van Hoesel.
>
> PS:
> Please find below the seeding and data queries, both from definition and
> history
>
> SEED:
> =====
> select mi.menu_id as "$(IDCOLUMN)"
> from   sn_menu_items mi
> where  1=1
> and    mi.menu_type     = 'W'
> and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
> to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
> round($(STARTTIME)/86400000)
> and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) <=
> to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') +
> round($(ENDTIME)/86400000)
> and    greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_created)) >
> sysdate - 200 /* just for testing */
> connect by prior mi.menu_id = mi.top_menu_id
> start with mi.menu_id in (185837,275)
>
> DATA:
> =====
> select mi.menu_id "$(IDCOLUMN)"
> ,      'Thiz iz id: ' || to_char(mi.menu_id) "$(DATACOLUMN)"
> ,      '<a
> href="http://www.somesite.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p_fld='||mi.top_menu_id||'">'||mi.display_text||'</a>'
> "$(URLCOLUMN)"
> from   sn_menu_items mi
> ,      sn_wiki_item  wi
> where  mi.menu_id    in $(IDLIST)
> and    mi.wiki_id     = wi.id
>
> The external queries from the historie:
>
> SEED:
> =====
> select mi.menu_id as "lcf__id" from sn_menu_items mi wher...
> e 1=1 and mi.menu_type = 'W' and greatest(mi.dt_...
> created, nvl(mi.dt_updated, mi.dt_created)) > to_date( '1970...
> /01/01:00:00:00', 'yyyy/mm/dd:hh24:mi:ss') + round(?/86400000...
> ) and greatest(mi.dt_created, nvl(mi.dt_updated, mi.dt_cr...
> eated)) <= to_date( '1970/01/01:00:00:00', 'yyyy/mm/dd:hh24:m...
> i:ss') + round(?/86400000) and greatest(mi.dt_created, nv...
> l(mi.dt_updated, mi.dt_created)) > sysdate - 200 connect by...
> prior mi.menu_id = mi.top_menu_id start with mi.menu_id in ...
> (185837,275); arguments = (0,1360570171797)
>
> DATA:
> =====
> select mi.menu_id "lcf__id" , 'Thiz iz id: ' || to_char...
> (mi.menu_id) "lcf__data" , '<a href="http://www.some...
> site.com/xx/xx_wiki.ht_show?p_id='||mi.wiki_id||chr(38)||'p...
> _fld='||mi.top_menu_id||'">'||mi.display_text||'</a>' "lcf__u...
> rl" from sn_menu_items mi , sn_wiki_item wi where ...
> mi.menu_id in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?...
> ,?,?,?,?,?,?,?,?,?,?,?,?) and mi.wiki_id = wi.id; ar...
> guments = ('185904','185641','185885','188488','184738','1856...
> 12','185853','185889','186158','185117','184723','185901','18...
> 6249','185263','185886','185229','190366','185103','185900','...
> 185892','184696','185613','185104','185903','184822','185116'...
> ,'185979','185866','185896','157508','185893','186185','185902')