You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by abhayd <aj...@hotmail.com> on 2011/09/18 07:05:42 UTC

DIH error when nested db datasource and file data source

hi 
I have a requirement where i fetch some data from db , and based on db data
i pull details from XML file to index solr.

When i try to import it gives me following error. 
------------------------------------------------------------------------------------------------------
SEVERE: Exception while processing: topic_tree document :
SolrInputDocument[{topic_id=topic_id(1.0)={9000034},
category_level_1=category_level_1(1.0)={Internet
Services}}]:org.apache.solr.h
ndler.dataimport.DataImportHandlerException: Unable to execute query:
C:\Projects\att\solr\catalogSOLRSearch.ear\SOLR-HOME\live_meta.xml
Processing Document # 1
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:252)
        at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
        at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:201)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:594)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:620)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:620)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:266)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:358)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:426)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:407)

-------------------------------------------------------------------------------------------------------

Here is my dih config file
<dataConfig>
	<dataSource name="esupport_db"
				type="JdbcDataSource" 		
                driver="oracle.jdbc.driver.OracleDriver"
                url="xxxxx" 
                user="xxxx" 
                password="yyyy"
		        convertType="true" />
	
	<dataSource type="FileDataSource" name="video_datasource"/>
	          
    <document>
		<entity name="topic_tree" datasource="esupport_db" 
				query="SELECT topic_id, parent_id,
					   REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (display_name, ';'), '[^;]+', 1,
1) AS category_level_1,
					   FROM src_topic
					   START WITH parent_id = 9000033
					   CONNECT BY parent_id = PRIOR topic_id" 
				deltaQuery="" 
				pk="TOPIC_ID">
            <field column="topic_id" name="topic_id" />
            <field column="category_level_1" name="CATEGORY_LEVEL_1" />
            				
	        <entity name="f" processor="FileListEntityProcessor"
baseDir="${solr.solr.home}" fileName=".xml" 
                recursive="false" rootEntity="true"
dataSource="video_datasource">
    	        <entity name="x" processor="XPathEntityProcessor" 
            	
forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']" 
            		url="${f.fileAbsolutePath}" 
                    >
                <field column="media_details"
xpath="/gvpVideoMetaData/mediaItem/media_details"/>
                </entity>                               
            </entity>
		</entity>
    </document>
</dataConfig>

--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3345664.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH error when nested db datasource and file data source

Posted by abhayd <aj...@hotmail.com>.
Any help?


--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3360637.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH error when nested db datasource and file data source

Posted by abhayd <aj...@hotmail.com>.
hi
thanks for details. I will look into xsl suggestion.

Any idea how would i send parameter to script? 
As i understand thats the syntax for script transformer
<entity name="e" pk="id" transformer="script:f1" query="select * from
table1">

--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3363762.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH error when nested db datasource and file data source

Posted by Pulkit Singhal <pu...@gmail.com>.
Few thoughts:

1) If you place the script transformer method on the entity named "x"
and then pass the ${topic_tree.topic_id} to that as an argument, then
shouldn't you have everything you need to work with x's row? Even if
you can't look up at the parent, all you needed to know was the
topic_id and based on that you can edit or not edit x's row ...
shouldn't that be sufficient to get you what you need to do?

2) Regarding the manner in which you are trying to use the following
xpath syntax:
forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
There are two other closely related thread that I've come across:
(a) http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html
(b) http://lucene.472066.n3.nabble.com/using-DIH-with-mets-alto-file-sets-td1926642.html

They both seemed to want to use the full power of XPath like you do
and I think that in a roundabout way they were told utilize the xsl
attribute to make up for what the XPath was lacking by default.

Here are some choice words by Lance that I've extracted out for you:
====
"XPathEntityProcessor parses a very limited XPath syntax. However, you
can add an XSL script as an attribute, and this somehow gets called
instead."
====
- Lance

====
There is an option somewhere to use the full XML DOM implementation
for using xpaths. The purpose of the XPathEP is to be as simple and
dumb as possible and handle most cases: RSS feeds and other open
standards.
Search for xsl(optional)
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
====
- Lance

I hope you can make some sense of this, I'm no expert, but just
thought I'd offer my 2 cts.

On Fri, Sep 23, 2011 at 9:21 AM, abhayd <aj...@hotmail.com> wrote:
> hi
> I am not getting exception anymore.. I had issue with database
>
> But now real problem i always have ...
> Now that i can fetch ID's from database how would i fetch correcponding data
> from ID in xm file
>
> So after getting DB info from jdbcsource I use xpath processor like this,
> but it does not work.
> <entity name="f" processor="FileListEntityProcessor"
> baseDir="${solr.solr.home}" fileName=".xml"
>                recursive="false" rootEntity="true"
> dataSource="video_datasource">
>           <entity name="x" processor="XPathEntityProcessor"
>
> forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
>            url="${f.fileAbsolutePath}"
>                    >
>
> I even tried using script transformer but "row" in script transformer has
> scope limited to entity "f"  If this is nested under another entity u cant
> access top level variables with "row" .
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3362007.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: DIH error when nested db datasource and file data source

Posted by abhayd <aj...@hotmail.com>.
hi 
I am not getting exception anymore.. I had issue with database

But now real problem i always have ...
Now that i can fetch ID's from database how would i fetch correcponding data
from ID in xm file

So after getting DB info from jdbcsource I use xpath processor like this,
but it does not work.
<entity name="f" processor="FileListEntityProcessor"
baseDir="${solr.solr.home}" fileName=".xml" 
                recursive="false" rootEntity="true"
dataSource="video_datasource">
           <entity name="x" processor="XPathEntityProcessor" 
           
forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']" 
            url="${f.fileAbsolutePath}" 
                    >

I even tried using script transformer but "row" in script transformer has
scope limited to entity "f"  If this is nested under another entity u cant
access top level variables with "row" .



--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3362007.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH error when nested db datasource and file data source

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sun, Sep 18, 2011 at 11:47 AM, abhayd <aj...@hotmail.com> wrote:

> hi gora,
> Query works and if i remove xml data load indexing works fine too
>
> Problem seem to be with this
>
>  <entity name="f" processor="FileListEntityProcessor"
> baseDir="${solr.solr.home}" fileName=".xml"
>                recursive="false" rootEntity="true"
> dataSource="video_datasource">
>                <entity name="x" processor="XPathEntityProcessor"
>
> forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']"
>                        url="${f.fileAbsolutePath}"
>                    >
>
> Basically how would i get details abt a id fetched from db using xpath from
> a xml file.
>
>
Is the following path mentioned in the error message correct?
C:\Projects\att\solr\catalogSOLRSearch.ear\SOLR-HOME\live_meta.xml

Also, the actual cause of the exception will also be in the logs. Can you
paste the complete stack trace?

-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH error when nested db datasource and file data source

Posted by abhayd <aj...@hotmail.com>.
hi gora,
Query works and if i remove xml data load indexing works fine too

Problem seem to be with this 
 
 <entity name="f" processor="FileListEntityProcessor"
baseDir="${solr.solr.home}" fileName=".xml" 
                recursive="false" rootEntity="true"
dataSource="video_datasource">
    	        <entity name="x" processor="XPathEntityProcessor" 
            	
forEach="/gvpVideoMetaData/mediaItem[@media_id='${topic_tree.topic_id}']" 
            		url="${f.fileAbsolutePath}" 
                    >

Basically how would i get details abt a id fetched from db using xpath from
a xml file.



--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-when-nested-db-datasource-and-file-data-source-tp3345664p3345735.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH error when nested db datasource and file data source

Posted by Gora Mohanty <go...@mimirtech.com>.
On Sun, Sep 18, 2011 at 10:35 AM, abhayd <aj...@hotmail.com> wrote:
> hi
> I have a requirement where i fetch some data from db , and based on db data
> i pull details from XML file to index solr.
>
> When i try to import it gives me following error.
> ------------------------------------------------------------------------------------------------------
> SEVERE: Exception while processing: topic_tree document :
> SolrInputDocument[{topic_id=topic_id(1.0)={9000034},
> category_level_1=category_level_1(1.0)={Internet
> Services}}]:org.apache.solr.h
> ndler.dataimport.DataImportHandlerException: Unable to execute query:

^^^^^^^^^^^^^^^^^^^^^^^

This often indicates a
problem with the query syntax. The first thing to check would be to try to
run exactly the same SELECT directly against the database. One thing
that I see from your query is ...AS category_level_1, FROM src_topic...,
i.e., there is a spurious comma before the FROM.

Regards,
Gora