You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "O. Olson" <ol...@yahoo.it> on 2013/05/21 16:21:18 UTC

How do I use CachedSqlEntityProcessor?

I am using the DataImportHandler to Query a SQL Server and populate Solr with
data that has hierarchical relationships. 

The following is an outline of my table structure: 


PROD_TABLE 
-> SKU (Primary Key) 
-> Title  (varchar) 
-> Descr (varchar) 

CAT_TABLE 
-> SKU (Foreign Key) 
->  CategoryLevel (int i.e. 1, 2, 3 …) 
-> CategoryName  (varchar) 

I specify the SQL Query in the db-data-config.xml file – a snippet of which
looks like: 

<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost\...."/>
    <document>
        <entity name="Product" 
                                query="SELECT SKU, Title, Descr FROM
PROD_TABLE">
            <field column="SKU" name="SKU" />
                        <field column="Title" name="Title" />
            <field column="Descr" name="Descr" />

                        <entity name="Cat1"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1">
                                <field column="CategoryName"
name="Category1" />  
                        </entity>
                        <entity name="Cat2"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2">
                                <field column="CategoryName"
name="Category2" />  
                        </entity>
                        <entity name="Cat3"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3">
                                <field column="CategoryName"
name="Category3" />  
                        </entity>
                        
        </entity>
    </document>
</dataConfig>


Unfortunately this is a bit slow, and it was recommended to me to use the
CachedSqlEntityProcessor
(http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor).
Hence I modified my db-data-config.xml to look like: 

<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost\...."/>
    <document>
        <entity name="Product" 
                                query="SELECT SKU, Title, Descr FROM
PROD_TABLE">
            <field column="SKU" name="SKU" />
                        <field column="Title" name="Title" />
            <field column="Descr" name="Descr" />

                        <entity name="Cat1"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category1" />  
                        </entity>
                        <entity name="Cat2"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category2" />  
                        </entity>
                        <entity name="Cat3"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category3" />  
                        </entity>
                        
        </entity>
    </document>
</dataConfig>

The import works really quickly, but there are no Categories e.g. Category1,
Category2 etc. in the imported documents. Any clue’s on how to debug this
problem? 

I should mention that I don’t change my schema.xml or any other file in the
config. All I do is switch between the first db-data-config.xml – where I
get the Categories as part of the document, and the second, where I do not.
I went back and re-verified this result. 

Thank you all for your help. 
O. O.




--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I use CachedSqlEntityProcessor?

Posted by bbarani <bb...@gmail.com>.
Try like this...

<entity x query="select * from x"> 
<entity y query="select * from y" processor="cachedSqlEntityprocessor"
cachekey=y.id  cachevalue=x.id>  



--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065030.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by "O. Olson" <ol...@yahoo.it>.
Thank you bbarani. Unfortunately, this does not work. I do not get any
exception, and the documents import OK. However there is no Category1,
Category2 … etc. when I retrieve the documents.

I don’t think I am using the Alpha or Beta of 4.0. I think I downloaded the
plain vanilla release version. 
O. O.



bbarani wrote
> Try this..
> <entity name="Cat1"  
>                     query="SELECT CategoryName,SKU from CAT_TABLE WHERE
> CategoryLevel=1" cacheKey="Cat1.SKU" cacheLookup="Product.SKU"
> processor="CachedSqlEntityProcessor">
>                                 
> <field column="CategoryName" name="Category1" />
>  
>                         
> </entity>
> sample data import config:
> 
> 		
> <entity name="property" query="select UID,name as name, value as value
> from opTable where type='${dataimporter.request.type}' and indexed='Y' "
> processor="CachedSqlEntityProcessor" cacheKey="UID"
> cacheLookup="object.uid"
> transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer">
> 			
> 			
> <field column="value" name="${property.name}"/>
>  //dynamic column
> 		
> </entity>
> 
> Also not sure if you are using Alpha / Beta release of SOLR 4.0.
> 
> In Solr 3.6, 3.6.1, 4.0-Alpha & 4.0-Beta, the "cacheKey" parameter was
> re-named "cachePk". This is renamed back for 4.0 (& 3.6.2, if released).
> See SOLR-3850





--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065309.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by bbarani <bb...@gmail.com>.
Try this..

<entity name=&quot;Cat1&quot;  
                    query=&quot;SELECT CategoryName,&lt;b>SKU* from
CAT_TABLE WHERE CategoryLevel=1" cacheKey=*"Cat1.SKU"*
cacheLookup="Product.SKU" processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category1" /> 
                        </entity>

Also not sure if you are using Alpha / Beta release of SOLR 4.0.

In Solr 3.6, 3.6.1, 4.0-Alpha & 4.0-Beta, the "cacheKey" parameter was
re-named "cachePk". This is renamed back for 4.0 (& 3.6.2, if released). See
SOLR-3850





--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065116.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by chuotlac <bi...@gmail.com>.
The conversation helps me understand Cached processor a lot. I'm working on
DIH cache using MapDB as backed engine instead of default
CachedSqlEntityProcessor



--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4198037.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by "O. Olson" <ol...@yahoo.it>.
Thank you guys, particularly James, very much. I just imported 200K documents
in a little more than 2 mins – which is great for me :-). Thank you Stefan.
I did not realize that it was not a syntax error and hence no error. Thank
you for clearing that up. 
O. O.




--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065392.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
That would be a worthy enhancement to do.  Always nice to give the user a warning when something is going to fail so they can troubleshoot better...

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Stefan Matheis [mailto:matheis.stefan@gmail.com] 
Sent: Wednesday, May 22, 2013 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: How do I use CachedSqlEntityProcessor?

> I am curious why I did not get any errors before.
Because there was no (syntax) error before - the fact that you didn't include a SKU (but using it as cacheKey) just doesn't match anything .. therefore you got nothing added to your documents.

Perhaps we should add an ticket as improvement for that, to issue a notice/warning if the result set itself doesn't contain the cacheKey? WDYT James?

Stefan 


On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote:

> Thank you very much James. Your suggestion worked exactly! I am curious why I
> did not get any errors before. For others, the following worked for me: 
> 
> <entity name="Cat1" 
> query="SELECT CategoryName, SKU from CAT_TABLE WHERE
> CategoryLevel=1" cacheKey="SKU" cacheLookup="Product.SKU"
> processor="CachedSqlEntityProcessor">
> <field column="CategoryName" name="Category1" /> 
> </entity>
> 
> Similarly for other Categories i.e. Category2, Category3, etc. 
> 
> I am now going to try this for a larger dataset. I hope this works.
> O.O.
> 
> 
> Dyer, James-2 wrote
> > There was a mistake in my last reply. Your child entities need to SELECT
> > on the join key so DIH has it to do the join. So use "SELECT SKU,
> > CategoryName..."
> > 
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> > 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
> Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
> 
> 


Re: How do I use CachedSqlEntityProcessor?

Posted by Stefan Matheis <ma...@gmail.com>.
> I am curious why I did not get any errors before.
Because there was no (syntax) error before - the fact that you didn't include a SKU (but using it as cacheKey) just doesn't match anything .. therefore you got nothing added to your documents.

Perhaps we should add an ticket as improvement for that, to issue a notice/warning if the result set itself doesn't contain the cacheKey? WDYT James?

Stefan 


On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote:

> Thank you very much James. Your suggestion worked exactly! I am curious why I
> did not get any errors before. For others, the following worked for me: 
> 
> <entity name="Cat1" 
> query="SELECT CategoryName, SKU from CAT_TABLE WHERE
> CategoryLevel=1" cacheKey="SKU" cacheLookup="Product.SKU"
> processor="CachedSqlEntityProcessor">
> <field column="CategoryName" name="Category1" /> 
> </entity>
> 
> Similarly for other Categories i.e. Category2, Category3, etc. 
> 
> I am now going to try this for a larger dataset. I hope this works.
> O.O.
> 
> 
> Dyer, James-2 wrote
> > There was a mistake in my last reply. Your child entities need to SELECT
> > on the join key so DIH has it to do the join. So use "SELECT SKU,
> > CategoryName..."
> > 
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> > 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
> Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
> 
> 



RE: How do I use CachedSqlEntityProcessor?

Posted by "O. Olson" <ol...@yahoo.it>.
Thank you very much James. Your suggestion worked exactly! I am curious why I
did not get any errors before. For others, the following worked for me: 

<entity name="Cat1"  
                    query="SELECT CategoryName, SKU from CAT_TABLE WHERE
CategoryLevel=1" cacheKey="SKU" cacheLookup="Product.SKU"
processor="CachedSqlEntityProcessor">
				<field column="CategoryName" name="Category1" /> 
			</entity>

Similarly for other Categories i.e. Category2, Category3, etc. 

I am now going to try this for a larger dataset. I hope this works.
O.O.


Dyer, James-2 wrote
> There was a mistake in my last reply.  Your child entities need to SELECT
> on the join key so DIH has it to do the join.  So use "SELECT SKU,
> CategoryName..."
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311





--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
There was a mistake in my last reply.  Your child entities need to SELECT on the join key so DIH has it to do the join.  So use "SELECT SKU, CategoryName..."

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: O. Olson [mailto:olson_ord@yahoo.it] 
Sent: Tuesday, May 21, 2013 5:06 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I use CachedSqlEntityProcessor?

Thank you James & bbarani. 

This worked in the sense that there was no error or exception in the data
import. Unfortunately, I do not see any of my Category1, Category2 etc. when
I retrieve the documents. If I use the first configuration of the
db-data-config.xml posted in my original post, I see these fields in each
document. Doing an import with your suggestion of  

<entity name="Cat1"  
                    query="SELECT CategoryName from CAT_TABLE WHERE
CategoryLevel=1" cacheKey="SKU" cacheLookup="Product.SKU"
processor="CachedSqlEntityProcessor">
				<field column="CategoryName" name="Category1" /> 
			</entity>

I do not see Category1. 

I have not changed my schema.xml, so I don’t think this should affect the
results. For e.g. Category1 is declared as: 

<field name="Category1" type="string" indexed="true" stored="true"
multiValued="true"/>

I am curious to what I am doing wrong. I should mention that I am using Solr
4.0.0. I know a more recent version is out – but I don’t think it should
make a difference.
Thank you again for your help.
O. O.





Dyer, James-2 wrote
> First remove the "where" condition from the child entities, then use the
> "cacheKey" and "cacheLookup" parameters to instruct DIH how to do the
> join.
> 
> Example:
> <entity 
>  name="Cat1" 
>  cacheKey="SKU"
>  cacheLookup="Product.SKU" 
>  query="SELECT CategoryName from CAT_TABLE where CategoryLevel=1" 
> />
> See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
> , particularly the 3rd configuration option.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311





--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065091.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I use CachedSqlEntityProcessor?

Posted by "O. Olson" <ol...@yahoo.it>.
Thank you James & bbarani. 

This worked in the sense that there was no error or exception in the data
import. Unfortunately, I do not see any of my Category1, Category2 etc. when
I retrieve the documents. If I use the first configuration of the
db-data-config.xml posted in my original post, I see these fields in each
document. Doing an import with your suggestion of  

<entity name="Cat1"  
                    query="SELECT CategoryName from CAT_TABLE WHERE
CategoryLevel=1" cacheKey="SKU" cacheLookup="Product.SKU"
processor="CachedSqlEntityProcessor">
				<field column="CategoryName" name="Category1" /> 
			</entity>

I do not see Category1. 

I have not changed my schema.xml, so I don’t think this should affect the
results. For e.g. Category1 is declared as: 

<field name="Category1" type="string" indexed="true" stored="true"
multiValued="true"/>

I am curious to what I am doing wrong. I should mention that I am using Solr
4.0.0. I know a more recent version is out – but I don’t think it should
make a difference.
Thank you again for your help.
O. O.





Dyer, James-2 wrote
> First remove the "where" condition from the child entities, then use the
> "cacheKey" and "cacheLookup" parameters to instruct DIH how to do the
> join.
> 
> Example:
> <entity 
>  name="Cat1" 
>  cacheKey="SKU"
>  cacheLookup="Product.SKU" 
>  query="SELECT CategoryName from CAT_TABLE where CategoryLevel=1" 
> />
> See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
> , particularly the 3rd configuration option.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311





--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065091.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
First remove the "where" condition from the child entities, then use the "cacheKey" and "cacheLookup" parameters to instruct DIH how to do the join.

Example:

<entity 
 name="Cat1" 
 cacheKey="SKU"
 cacheLookup="Product.SKU" 
 query="SELECT CategoryName from CAT_TABLE where CategoryLevel=1" 
/>

See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor , particularly the 3rd configuration option.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: O. Olson [mailto:olson_ord@yahoo.it] 
Sent: Tuesday, May 21, 2013 9:21 AM
To: solr-user@lucene.apache.org
Subject: How do I use CachedSqlEntityProcessor?

I am using the DataImportHandler to Query a SQL Server and populate Solr with
data that has hierarchical relationships. 

The following is an outline of my table structure: 


PROD_TABLE 
-> SKU (Primary Key) 
-> Title  (varchar) 
-> Descr (varchar) 

CAT_TABLE 
-> SKU (Foreign Key) 
->  CategoryLevel (int i.e. 1, 2, 3 …) 
-> CategoryName  (varchar) 

I specify the SQL Query in the db-data-config.xml file – a snippet of which
looks like: 

<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost\...."/>
    <document>
        <entity name="Product" 
                                query="SELECT SKU, Title, Descr FROM
PROD_TABLE">
            <field column="SKU" name="SKU" />
                        <field column="Title" name="Title" />
            <field column="Descr" name="Descr" />

                        <entity name="Cat1"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1">
                                <field column="CategoryName"
name="Category1" />  
                        </entity>
                        <entity name="Cat2"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2">
                                <field column="CategoryName"
name="Category2" />  
                        </entity>
                        <entity name="Cat3"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3">
                                <field column="CategoryName"
name="Category3" />  
                        </entity>
                        
        </entity>
    </document>
</dataConfig>


Unfortunately this is a bit slow, and it was recommended to me to use the
CachedSqlEntityProcessor
(http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor).
Hence I modified my db-data-config.xml to look like: 

<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost\...."/>
    <document>
        <entity name="Product" 
                                query="SELECT SKU, Title, Descr FROM
PROD_TABLE">
            <field column="SKU" name="SKU" />
                        <field column="Title" name="Title" />
            <field column="Descr" name="Descr" />

                        <entity name="Cat1"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category1" />  
                        </entity>
                        <entity name="Cat2"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category2" />  
                        </entity>
                        <entity name="Cat3"   
                    query="SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3"
processor="CachedSqlEntityProcessor">
                                <field column="CategoryName"
name="Category3" />  
                        </entity>
                        
        </entity>
    </document>
</dataConfig>

The import works really quickly, but there are no Categories e.g. Category1,
Category2 etc. in the imported documents. Any clue’s on how to debug this
problem? 

I should mention that I don’t change my schema.xml or any other file in the
config. All I do is switch between the first db-data-config.xml – where I
get the Categories as part of the document, and the second, where I do not.
I went back and re-verified this result. 

Thank you all for your help. 
O. O.




--
View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919.html
Sent from the Solr - User mailing list archive at Nabble.com.