You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by amitj <am...@ieee.org> on 2009/11/16 13:55:26 UTC

Re: DataImportHandler Questions-Load data in parallel and temp tables

Is there also a way we can include some kind of annotation on the schema
field and send the data retrieved for that field to an external application.
We have a requirement where we require some data fields (out of the fields
for an entity defined in data-config.xml) to act as entities for entity
extraction and auto complete purposes and we are using some external
application.


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> writing to a remote Solr through SolrJ is in the cards. I may even
> take it up after 1.4 release. For now your best bet is to override the
> class SolrWriter and override the corresponding methods for
> add/delete.
> 
>>> 2009/4/27 Amit Nithian <an...@gmail.com>:
>>> > All,
>>> > I have a few questions regarding the data import handler. We have some
>>> > pretty gnarly SQL queries to load our indices and our current loader
>>> > implementation is extremely fragile. I am looking to migrate over to
>>> the
>>> > DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
>>> stuff
>>> > to remotely load the indices so that my index loader and main search
>>> engine
>>> > are separated.
>>> > Currently, unless I am missing something, the data gathering from the
>>> entity
>>> > and the data processing (i.e. conversion to a Solr Document) is done
>>> > sequentially and I was looking to make this execute in parallel so
>>> that I
>>> > can have multiple threads processing different parts of the resultset
>>> and
>>> > loading documents into Solr. Secondly, I need to create temporary
>>> tables
>>> to
>>> > store results of a few queries and use them later for inner joins was
>>> > wondering how to best go about this?
>>> >
>>> > I am thinking to add support in DIH for the following:
>>> > 1) Temporary tables (maybe call it temporary entities)? --Specific
>>> only
>>> to
>>> > SQL though unless it can be generalized to other sources.
>>> > 2) Parallel support
>>> >  - Including some mechanism to get the number of records (whether it
>>> be
>>> > count or the MAX(custom_id)-MIN(custom_id))
>>> > 3) Support in DIH or Solr to post documents to a remote index (i.e.
>>> create a
>>> > new UpdateHandler instead of DirectUpdateHandler2).
>>> >
>>> > If any of these exist or anyone else is working on this (OR you have
>>> better
>>> > suggestions), please let me know.
>>> >
>>> > Thanks!
>>> > Amit
>>> >
>>>
>>>
>>>
>>> --
>>>
>>> -
>>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://old.nabble.com/DataImportHandler-Questions-Load-data-in-parallel-and-temp-tables-tp23266396p26371403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler Questions-Load data in parallel and temp tables

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
On Mon, Nov 16, 2009 at 6:25 PM, amitj <am...@ieee.org> wrote:
>
> Is there also a way we can include some kind of annotation on the schema
> field and send the data retrieved for that field to an external application.
> We have a requirement where we require some data fields (out of the fields
> for an entity defined in data-config.xml) to act as entities for entity
> extraction and auto complete purposes and we are using some external
> application.
No. it is not possible in Solr now.
>
>
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
>>
>> writing to a remote Solr through SolrJ is in the cards. I may even
>> take it up after 1.4 release. For now your best bet is to override the
>> class SolrWriter and override the corresponding methods for
>> add/delete.
>>
>>>> 2009/4/27 Amit Nithian <an...@gmail.com>:
>>>> > All,
>>>> > I have a few questions regarding the data import handler. We have some
>>>> > pretty gnarly SQL queries to load our indices and our current loader
>>>> > implementation is extremely fragile. I am looking to migrate over to
>>>> the
>>>> > DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
>>>> stuff
>>>> > to remotely load the indices so that my index loader and main search
>>>> engine
>>>> > are separated.
>>>> > Currently, unless I am missing something, the data gathering from the
>>>> entity
>>>> > and the data processing (i.e. conversion to a Solr Document) is done
>>>> > sequentially and I was looking to make this execute in parallel so
>>>> that I
>>>> > can have multiple threads processing different parts of the resultset
>>>> and
>>>> > loading documents into Solr. Secondly, I need to create temporary
>>>> tables
>>>> to
>>>> > store results of a few queries and use them later for inner joins was
>>>> > wondering how to best go about this?
>>>> >
>>>> > I am thinking to add support in DIH for the following:
>>>> > 1) Temporary tables (maybe call it temporary entities)? --Specific
>>>> only
>>>> to
>>>> > SQL though unless it can be generalized to other sources.
>>>> > 2) Parallel support
>>>> >  - Including some mechanism to get the number of records (whether it
>>>> be
>>>> > count or the MAX(custom_id)-MIN(custom_id))
>>>> > 3) Support in DIH or Solr to post documents to a remote index (i.e.
>>>> create a
>>>> > new UpdateHandler instead of DirectUpdateHandler2).
>>>> >
>>>> > If any of these exist or anyone else is working on this (OR you have
>>>> better
>>>> > suggestions), please let me know.
>>>> >
>>>> > Thanks!
>>>> > Amit
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -
>>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://old.nabble.com/DataImportHandler-Questions-Load-data-in-parallel-and-temp-tables-tp23266396p26371403.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com