You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "solr.searcher" <so...@gmail.com> on 2009/10/09 19:57:01 UTC

Dynamic Data Import from multiple identical tables

Hi all,

First of all, please accept my apologies if this has been asked and answered
before. I tried my best to search and couldn't find anything on this.

The problem I am trying to solve is as follows. I have multiple tables with
identical schema - table_a, table_b, table_c ... and I am trying to create
one big index with the data from each of these tables. The idea was to
programatically create the data-config file (just changing the table name)
and do a reload-config followed by a full-import with clean set to false. In
other words:

1. publish the data-config file
2. do a reload-config
3. do a full-import with clean = false
4. commit, optimize
5. repeat with new table name

I wanted to then follow the same procedure for delta imports. The problem is
that after I do a reload-config and then do a full-import, the old data in
the index is getting lost. 

What am I missing here? Please note that I am new to solr.

INFO: [] webapp=/solr path=/dataimport
params={command=reload-config&clean=false} status=0 QTime=4 
Oct 9, 2009 10:17:30 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport
params={command=full-import&clean=false} status=0 QTime=1 
Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity <blah blah blah>
Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 12
Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
       
commit{dir=/blah/blah/index,segFN=segments_1z,version=1255032607825,generation=71,filenames=[segments_1z,
_cl.cfs]
Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: last commit = 1255032607825

Any help will be greatly appreciated. Is there any other way to automaticaly
slurp data from multiple, identical tables?

Thanks a lot.

-- 
View this message in context: http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25825381.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic Data Import from multiple identical tables

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
there is another option of passing the table name as a request
parameter and make your sql query templatized .

example
query="select * from ${table}"
and pass the value of table as a request parameter

On Sat, Oct 10, 2009 at 3:52 AM, solr.searcher <so...@gmail.com> wrote:
>
> Hmmm. Interesting line of thought. Thanks a lot Jay. Will explore this
> approach. There are lot of duplicate tables though :).
>
> I was about to try a different approach - set up two solar cores, keep
> reloading config and updating one, merge with the bigger index ...
>
> But your approach is worth exploring. Thanks.
>
>
>
> Jay Hill wrote:
>>
>> You could use separate DIH config files for each of your three tables.
>> This
>> might be overkill, but it would keep them separate. The DIH is not limited
>> to one request handler setup, so you could create a unique handler for
>> each
>> case with a unique name:
>>
>>    <requestHandler name="/indexer/table1"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>     <lst name="defaults">
>>       <str name="config">table1-config.xml</str>
>>     </lst>
>>   </requestHandler>
>>
>>    <requestHandler name="/indexer/table2"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>      <lst name="defaults">
>>       <str name="config">table2-config.xml</str>
>>     </lst>
>>   </requestHandler>
>>
>>    <requestHandler name="/indexer/table3"
>> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>      <lst name="defaults">
>>       <str name="config">table3-config.xml</str>
>>     </lst>
>>   </requestHandler>
>>
>> When you go to ...solr/admin/dataimport.jsp you should see a list of all
>> DataImportHandlers that are configured, and can select them individually,
>> if
>> that works for your needs.
>>
>> -Jay
>> http://www.lucidimagination.com
>>
>> On Fri, Oct 9, 2009 at 10:57 AM, solr.searcher
>> <so...@gmail.com>wrote:
>>
>>>
>>> Hi all,
>>>
>>> First of all, please accept my apologies if this has been asked and
>>> answered
>>> before. I tried my best to search and couldn't find anything on this.
>>>
>>> The problem I am trying to solve is as follows. I have multiple tables
>>> with
>>> identical schema - table_a, table_b, table_c ... and I am trying to
>>> create
>>> one big index with the data from each of these tables. The idea was to
>>> programatically create the data-config file (just changing the table
>>> name)
>>> and do a reload-config followed by a full-import with clean set to false.
>>> In
>>> other words:
>>>
>>> 1. publish the data-config file
>>> 2. do a reload-config
>>> 3. do a full-import with clean = false
>>> 4. commit, optimize
>>> 5. repeat with new table name
>>>
>>> I wanted to then follow the same procedure for delta imports. The problem
>>> is
>>> that after I do a reload-config and then do a full-import, the old data
>>> in
>>> the index is getting lost.
>>>
>>> What am I missing here? Please note that I am new to solr.
>>>
>>> INFO: [] webapp=/solr path=/dataimport
>>> params={command=reload-config&clean=false} status=0 QTime=4
>>> Oct 9, 2009 10:17:30 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport
>>> params={command=full-import&clean=false} status=0 QTime=1
>>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
>>> readIndexerProperties
>>> INFO: Read dataimport.properties
>>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
>>> readIndexerProperties
>>> INFO: Read dataimport.properties
>>> Oct 9, 2009 10:17:30 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Creating a connection for entity <blah blah blah>
>>> Oct 9, 2009 10:17:30 AM
>>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>>> call
>>> INFO: Time taken for getConnection(): 12
>>> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy onInit
>>> INFO: SolrDeletionPolicy.onInit: commits:num=1
>>>
>>>
>>> commit{dir=/blah/blah/index,segFN=segments_1z,version=1255032607825,generation=71,filenames=[segments_1z,
>>> _cl.cfs]
>>> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy
>>> updateCommits
>>> INFO: last commit = 1255032607825
>>>
>>> Any help will be greatly appreciated. Is there any other way to
>>> automaticaly
>>> slurp data from multiple, identical tables?
>>>
>>> Thanks a lot.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25825381.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25828773.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Dynamic Data Import from multiple identical tables

Posted by "solr.searcher" <so...@gmail.com>.
Hmmm. Interesting line of thought. Thanks a lot Jay. Will explore this
approach. There are lot of duplicate tables though :).

I was about to try a different approach - set up two solar cores, keep
reloading config and updating one, merge with the bigger index ... 

But your approach is worth exploring. Thanks.



Jay Hill wrote:
> 
> You could use separate DIH config files for each of your three tables.
> This
> might be overkill, but it would keep them separate. The DIH is not limited
> to one request handler setup, so you could create a unique handler for
> each
> case with a unique name:
> 
>    <requestHandler name="/indexer/table1"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">table1-config.xml</str>
>     </lst>
>   </requestHandler>
> 
>    <requestHandler name="/indexer/table2"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>      <lst name="defaults">
>       <str name="config">table2-config.xml</str>
>     </lst>
>   </requestHandler>
> 
>    <requestHandler name="/indexer/table3"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>      <lst name="defaults">
>       <str name="config">table3-config.xml</str>
>     </lst>
>   </requestHandler>
> 
> When you go to ...solr/admin/dataimport.jsp you should see a list of all
> DataImportHandlers that are configured, and can select them individually,
> if
> that works for your needs.
> 
> -Jay
> http://www.lucidimagination.com
> 
> On Fri, Oct 9, 2009 at 10:57 AM, solr.searcher
> <so...@gmail.com>wrote:
> 
>>
>> Hi all,
>>
>> First of all, please accept my apologies if this has been asked and
>> answered
>> before. I tried my best to search and couldn't find anything on this.
>>
>> The problem I am trying to solve is as follows. I have multiple tables
>> with
>> identical schema - table_a, table_b, table_c ... and I am trying to
>> create
>> one big index with the data from each of these tables. The idea was to
>> programatically create the data-config file (just changing the table
>> name)
>> and do a reload-config followed by a full-import with clean set to false.
>> In
>> other words:
>>
>> 1. publish the data-config file
>> 2. do a reload-config
>> 3. do a full-import with clean = false
>> 4. commit, optimize
>> 5. repeat with new table name
>>
>> I wanted to then follow the same procedure for delta imports. The problem
>> is
>> that after I do a reload-config and then do a full-import, the old data
>> in
>> the index is getting lost.
>>
>> What am I missing here? Please note that I am new to solr.
>>
>> INFO: [] webapp=/solr path=/dataimport
>> params={command=reload-config&clean=false} status=0 QTime=4
>> Oct 9, 2009 10:17:30 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport
>> params={command=full-import&clean=false} status=0 QTime=1
>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> INFO: Starting Full Import
>> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Oct 9, 2009 10:17:30 AM
>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Creating a connection for entity <blah blah blah>
>> Oct 9, 2009 10:17:30 AM
>> org.apache.solr.handler.dataimport.JdbcDataSource$1
>> call
>> INFO: Time taken for getConnection(): 12
>> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=1
>>
>>
>> commit{dir=/blah/blah/index,segFN=segments_1z,version=1255032607825,generation=71,filenames=[segments_1z,
>> _cl.cfs]
>> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy
>> updateCommits
>> INFO: last commit = 1255032607825
>>
>> Any help will be greatly appreciated. Is there any other way to
>> automaticaly
>> slurp data from multiple, identical tables?
>>
>> Thanks a lot.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25825381.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25828773.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic Data Import from multiple identical tables

Posted by Jay Hill <ja...@gmail.com>.
You could use separate DIH config files for each of your three tables. This
might be overkill, but it would keep them separate. The DIH is not limited
to one request handler setup, so you could create a unique handler for each
case with a unique name:

   <requestHandler name="/indexer/table1"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">table1-config.xml</str>
    </lst>
  </requestHandler>

   <requestHandler name="/indexer/table2"
class="org.apache.solr.handler.dataimport.DataImportHandler">
     <lst name="defaults">
      <str name="config">table2-config.xml</str>
    </lst>
  </requestHandler>

   <requestHandler name="/indexer/table3"
class="org.apache.solr.handler.dataimport.DataImportHandler">
     <lst name="defaults">
      <str name="config">table3-config.xml</str>
    </lst>
  </requestHandler>

When you go to ...solr/admin/dataimport.jsp you should see a list of all
DataImportHandlers that are configured, and can select them individually, if
that works for your needs.

-Jay
http://www.lucidimagination.com

On Fri, Oct 9, 2009 at 10:57 AM, solr.searcher <so...@gmail.com>wrote:

>
> Hi all,
>
> First of all, please accept my apologies if this has been asked and
> answered
> before. I tried my best to search and couldn't find anything on this.
>
> The problem I am trying to solve is as follows. I have multiple tables with
> identical schema - table_a, table_b, table_c ... and I am trying to create
> one big index with the data from each of these tables. The idea was to
> programatically create the data-config file (just changing the table name)
> and do a reload-config followed by a full-import with clean set to false.
> In
> other words:
>
> 1. publish the data-config file
> 2. do a reload-config
> 3. do a full-import with clean = false
> 4. commit, optimize
> 5. repeat with new table name
>
> I wanted to then follow the same procedure for delta imports. The problem
> is
> that after I do a reload-config and then do a full-import, the old data in
> the index is getting lost.
>
> What am I missing here? Please note that I am new to solr.
>
> INFO: [] webapp=/solr path=/dataimport
> params={command=reload-config&clean=false} status=0 QTime=4
> Oct 9, 2009 10:17:30 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport
> params={command=full-import&clean=false} status=0 QTime=1
> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity <blah blah blah>
> Oct 9, 2009 10:17:30 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 12
> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy onInit
> INFO: SolrDeletionPolicy.onInit: commits:num=1
>
>
> commit{dir=/blah/blah/index,segFN=segments_1z,version=1255032607825,generation=71,filenames=[segments_1z,
> _cl.cfs]
> Oct 9, 2009 10:17:31 AM org.apache.solr.core.SolrDeletionPolicy
> updateCommits
> INFO: last commit = 1255032607825
>
> Any help will be greatly appreciated. Is there any other way to
> automaticaly
> slurp data from multiple, identical tables?
>
> Thanks a lot.
>
> --
> View this message in context:
> http://www.nabble.com/Dynamic-Data-Import-from-multiple-identical-tables-tp25825381p25825381.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>