You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "helder.sepulveda" <he...@homes.com> on 2014/05/12 18:11:09 UTC

URLDataSource : indexing from other Solr servers

I been trying to index data from other solr servers but the import always
shows:
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Fetched: 0, Skipped: 0, Processed

My data config looks like this:



Any help will be greatly appreciated



--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by "helder.sepulveda" <he...@homes.com>.
I tested calling the URL using curl right on the server, and I get a valid
response and the correct content




--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by "helder.sepulveda" <he...@homes.com>.
Here is the relevant portion of the schema:

<fields>
    <field name="sz_id"         type="string"   indexed="true"
stored="true"/>
    <field name="batch_address" type="string"     indexed="false"
stored="true"/>
    <field name="batch_city"    type="string"   indexed="false"
stored="true"/>
    <field name="batch_state"   type="string"   indexed="false"
stored="true"/>
    <field name="batch_zip"     type="string"   indexed="false"
stored="true"/>
    <field name="timestamp"     type="date"     indexed="false"
stored="true" />
</fields>

Yes the URL is accessible, did you see my previous comment:
I tested calling the URL using curl right on the server, and I get a valid
response and the correct content.



--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135567.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by Gora Mohanty <go...@mimirtech.com>.
On 12 May 2014 22:52, helder.sepulveda <he...@homes.com> wrote:
> Here is the data config:
>
> <dataConfig>
>     <dataSource type="URLDataSource" />
>
>     <document name="listingcore">
>         <entity name="listing" pk="link"
>                 url="http://slszip11.as.homes.com/solr/select?q=*:*"
>                 processor="XPathEntityProcessor"
>                 forEach="/response/result/doc"
>                 transformer="DateFormatTransformer">
>             <field column="batch_address"
> xpath="/response/result/doc/str[@name='batch_address']"/>
>             <field column="batch_state"
> xpath="/response/result/doc/str[@name='batch_state']"/>
>             <field column="batch_city"
> xpath="/response/result/doc/str[@name='batch_city']"/>
>             <field column="batch_zip"
> xpath="/response/result/doc/str[@name='batch_zip']"/>
>             <field column="sz_id"
> xpath="/response/result/doc/long[@name='sz_id']"/>
>         </entity>
>     </document>
> </dataConfig>

Hmm, see no issues here. Can you also share your Solr schema?
Is the URL accessible, and the results from Solr show properly when
loaded in a browser window? I cannot seem to reach slszip11.as.homes.com
but that could be because it is restricted to certain IPs.

Regards,
Gora

Re: URLDataSource : indexing from other Solr servers

Posted by "helder.sepulveda" <he...@homes.com>.
Here is the data config:

<dataConfig>
    <dataSource type="URLDataSource" />

    <document name="listingcore">
        <entity name="listing" pk="link"
                url="http://slszip11.as.homes.com/solr/select?q=*:*"
                processor="XPathEntityProcessor"
                forEach="/response/result/doc"
                transformer="DateFormatTransformer">
            <field column="batch_address"  
xpath="/response/result/doc/str[@name='batch_address']"/> 
            <field column="batch_state"    
xpath="/response/result/doc/str[@name='batch_state']"/> 
            <field column="batch_city"     
xpath="/response/result/doc/str[@name='batch_city']"/> 
            <field column="batch_zip"      
xpath="/response/result/doc/str[@name='batch_zip']"/> 
            <field column="sz_id"          
xpath="/response/result/doc/long[@name='sz_id']"/> 
        </entity>
    </document>
</dataConfig>






--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135331.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by "helder.sepulveda" <he...@homes.com>.
Just in case the url is not available from outside my network, here is how
the url response looks like:


<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1007</int>
<lst name="params">
<str name="q">*:*</str>
</lst>
</lst>
<result name="response" numFound="89993613" start="0">

<doc>
<str name="address_combo">1518 INDIANA CT, IRVING, TX</str>
<str name="air_conditioning">Central</str>
<int name="avm">200600</int>
<float name="avm_confidence">0.31</float>
<int name="avm_high">230690</int>
<int name="avm_low">170510</int>
<str name="basement">No Basement</str>
<str name="batch_address">1518 INDIANA CT</str>
<str name="batch_city">IRVING</str>
<str name="batch_country">US</str>
<str name="batch_state">TX</str>
<str name="batch_zip">75060</str>
<float name="bath">2.0</float>
<int name="bed">4</int>
<str name="cbsa_label">Dallas-Fort Worth-Arlington</str>
<str name="city">IRVING</str>
<str name="construction_type">Frame</str>
<str name="county_label">Dallas</str>
<int name="delta_avm">38300</int>
<date name="delta_avm_timestamp">2014-03-11T00:00:01Z</date>
<int name="delta_home_score">-6</int>
<date name="delta_home_score_timestamp">2014-03-11T00:00:01Z</date>
<int name="delta_investor_score">-12</int>
<date name="delta_investor_score_timestamp">2014-03-11T00:00:01Z</date>
<float name="delta_tax_rate">-3.0E-4</float>
<date name="delta_tax_rate_timestamp">2013-07-10T00:00:01Z</date>
<int name="estimated_rent">1550</int>
<str name="exterior_wall">Brick veneer</str>
<str name="fireplace">1</str>
<str name="foundation">Slab</str>
<int name="garage">4</int>
<str name="heating">Central</str>
<int name="home_score">29</int>
<float name="hpi">146.7849</float>
<int name="investor_score">38</int>
<date name="last_tran_date">2010-01-13T00:00:01Z</date>
<int name="last_tran_price">0</int>
<double name="lat">32.79920959472656</double>
<str name="latlng_combo">32.799209594726562,-96.926918029785156</str>
<double name="lng">-96.92691802978516</double>
<str name="place_label">IRVING</str>
<str name="property_type">SFH</str>
<int name="sqft">2348</int>
<long name="sqft_lot">0</long>
<str name="state">TX</str>
<str name="state_label">Texas</str>
<str name="street_address">1518 INDIANA CT</str>
<str name="street_name">INDIANA CT</str>
<str name="street_name_sz">INDIANA CT</str>
<str name="street_no_sz">1518</str>
<long name="sz_id">500018666323</long>
<float name="tax_rate">0.0178</float>
<float name="taxes">3893.0</float>
<str name="tract_label">IRVING 015000</str>
<int name="ttl_assessed">0</int>
<int name="year_built">2002</int>
<str name="zip">75060</str>
<date name="timestamp">2014-04-20T16:28:52.467Z</date>
</doc>

<doc>
<str name="address_combo">2600 ASH CRK, MESQUITE, TX</str>
<str name="air_conditioning">Central</str>
<int name="avm">144200</int>
<float name="avm_confidence">0.28</float>
<int name="avm_high">165830</int>
<int name="avm_low">122570</int>
<str name="basement">No Basement</str>
<str name="batch_address">2600 ASH CREEK</str>
<str name="batch_city">MESQUITE</str>
<str name="batch_country">US</str>
<str name="batch_state">TX</str>
<str name="batch_zip">75181</str>
<float name="bath">2.0</float>
<int name="bed">4</int>
<str name="cbsa_label">Dallas-Fort Worth-Arlington</str>
<str name="city">MESQUITE</str>
<str name="construction_type">Frame</str>
<str name="county_label">Dallas</str>
<int name="delta_avm">100</int>
<date name="delta_avm_timestamp">2014-04-11T00:00:01Z</date>
<int name="delta_home_score">-1</int>
<date name="delta_home_score_timestamp">2014-03-11T00:00:01Z</date>
<int name="delta_investor_score">-1</int>
<date name="delta_investor_score_timestamp">2014-04-11T00:00:01Z</date>
<float name="delta_tax_rate">-3.0E-4</float>
<date name="delta_tax_rate_timestamp">2013-07-10T00:00:01Z</date>
<int name="estimated_rent">1470</int>
<str name="exterior_wall">Brick veneer</str>
<str name="fireplace">1</str>
<str name="foundation">Slab</str>
<int name="garage">1</int>
<str name="heating">Central</str>
<int name="home_score">35</int>
<float name="hpi">153.4116</float>
<int name="investor_score">54</int>
<date name="last_tran_date">2006-01-20T00:00:01Z</date>
<int name="last_tran_price">0</int>
<double name="lat">32.7484283447266</double>
<str name="latlng_combo">32.7484283447266,-96.5575180053711</str>
<double name="lng">-96.5575180053711</double>
<str name="place_label">MESQUITE</str>
<str name="property_type">SFH</str>
<int name="sqft">2189</int>
<long name="sqft_lot">0</long>
<str name="state">TX</str>
<str name="state_label">Texas</str>
<str name="street_address">2600 ASH CRK</str>
<str name="street_name">ASH CRK</str>
<str name="street_name_sz">ASH CRK</str>
<str name="street_no_sz">2600</str>
<long name="sz_id">500018666324</long>
<float name="tax_rate">0.0178</float>
<float name="taxes">3345.0</float>
<str name="tract_label">MESQUITE 017304</str>
<int name="ttl_assessed">0</int>
<int name="year_built">1996</int>
<str name="zip">75181</str>
<date name="timestamp">2014-04-20T16:28:52.467Z</date>
</doc>

</result>
</response>





--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135332.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by Gora Mohanty <go...@mimirtech.com>.
On 12 May 2014 21:41, helder.sepulveda <he...@homes.com> wrote:
>
> I been trying to index data from other solr servers but the import always
> shows:
> Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> Requests: 1, Fetched: 0, Skipped: 0, Processed
>
> My data config looks like this:

Nothing came through for your data config. Please send it again, or put it up
somewhere online. From the DIH message, it seems that it is not even fetching
anything, so make sure that your URLs are correct.

Regards,
Gora

Re: URLDataSource : indexing from other Solr servers

Posted by "helder.sepulveda" <he...@homes.com>.
I will try with the SolrEntityProcessor
 but I'm still intrested to know why will it not work with the
XPathEntityProcessor 



--
View this message in context: http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: URLDataSource : indexing from other Solr servers

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/12/2014 10:11 AM, helder.sepulveda wrote:
> I been trying to index data from other solr servers but the import always
> shows:
> Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> Requests: 1, Fetched: 0, Skipped: 0, Processed

I'm wondering why you're using the XPathEntityProcessor instead of
SolrEntityProcessor.

http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Solr (as of version 3.6) comes with the capability to fully understand
the output from another Solr server, so you should probably be using
that instead of trying to parse XML.

Thanks,
Shawn