You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Beeby <mb...@cambridge.org> on 2012/03/08 17:06:43 UTC

Importing dynamicField data on the fly

Hello Everyone, 

I'm trying to work out how, if at all possible, dynamicFields can be 
imported from a dynamic data source through the DataImportHandler 
configurations. Currently the DataImportHandler configuration file 
requires me to name every single field I want to map in advance, but I do 
not know the dynamicField set at this stage necessarily. 

Here's my example schema.xml dynamic field definition:

    <dynamicField name="*_sortable"  type="alphaOnlySort"  indexed="true" 
stored="true"/>

My DataImportHandler import configuration file looks like this:

    <dataSource name="Gateway1Source" type="HttpDataSource" baseUrl="
http://acproplatforms.internal/feeds.xml" encoding="UTF-8" 
connectionTimeout="15000" readTimeout="15000"/>
        <document name="feeds">
            <entity name="feed" processor="XPathEntityProcessor" 
stream="true" forEach="/gateway/feedItem/" url="">
                <field column="type" xpath="/gateway/feedItem/type"/>
                ...
            </entity>
        </document>
    </dataConfig>

I have looked, very optimistically, at Script Transformers 
(transformer="script:importDynamics"), specifically hoping the row in the 
transformer function would hold the dynamic field content, but this was 
silly thinking obviously, as they would already fall through had they made 
it into here. 

Has anyone managed to import into dynamic fields in advance of knowing 
what they were going to be in the data source?

To give you an idea of why I want this, there's an application aggregating 
web services from many sources, some of which contain patterns of fields I 
know we'll want, and the nature of their data types, but which are added 
to quite frequently. It seems aside from the field mappings here, the hard 
work has been done in Solr to achieve this!

Kindest Regards,
Mark 




From:   Shawn Heisey <so...@elyograg.org>
To:     solr-user@lucene.apache.org, 
Date:   08/03/2012 14:58
Subject:        Re: Understanding update handler statistics



On 3/8/2012 7:02 AM, stetogias wrote:
> Hi,
>
> Trying to understand the update handler statistics
> so I have this:
>
> commits : 2824
> autocommit maxDocs : 10000
> autocommit maxTime : 1000ms
> autocommits : 41
> optimizes : 822
> rollbacks : 0
> expungeDeletes : 0
> docsPending : 0
> adds : 0
> deletesById : 0
> deletesByQuery : 0
> errors : 0
> cumulative_adds : 17457
> cumulative_deletesById : 1959
> cumulative_deletesByQuery : 0
> cumulative_errors : 0
>
> my problem is with the cumulative part.
>
> If for instance I am doing a commit after each add and delete operation 
then
> the sum of cumulative_adds plus
> cumulative_deletes plus cumulative_errors should much the commit number.
> is that right?
> And another question, these stats are since SOLR instance startup or 
since
> update handler startup, these
> can differ as far as I understand...
>
> and from this part:
> docsPending : 0
> adds : 0
> deletesById : 0
> deletesByQuery : 0
> errors : 0
>
> I understand that if I had docsPending I should have adds(pending)
> deletes*(pending) but how could I have errors...

I'm fairly sure that adds and deletes refer to the number of documents 
added or deleted.  You can have many documents added and/or deleted for 
each commit.  I would not expect the sums to match, unless you are 
adding or deleting only one document at a time and doing a commit after 
every one.  I hope you're not doing that, unless you're using trunk with 
the near-realtime feature and doing soft commits, with which I have no 
experience.  Normally doing a commit after every document would be too 
much of a load for good performance, unless there is a relatively long 
time period between each add or delete.

Your question about errors - that probably tracks the number of times 
that the update handler returned an error response, though I don't 
really know.  If I'm right, then that number, like commits, has little 
to do with the number of documents.

Thanks,
Shawn