You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sebastian Aswin <sa...@fossil.com> on 2018/09/24 06:43:26 UTC

Solr Import

Hi Experts,
Good Day!

We are having Solr 7.4 installed in our premise and we are planning to do
indexing of xml file. I am using data import handler to do the indexing,
but I had few queries on the indexing.
1.  Within a doc tag, there are multiple store, but the Solr response
contains only one *store value.*  With the below structure, Solr is not
accepting the xml, so when I changed the xml structure, I was able to
import the xml file to the Solr using the post tool and got *all *the value
of store which was comma separated.

*Snippet of the xml import file. *

<ProductFeedRetailToSOLR>
 <doc>
  <sku>FS4120</sku>
  <store>MCUS</store>
  </doc>
 <doc>
  <sku>FS4122</sku>
  <store>MCIN</store>
  <store>MFAU</store>
  <store>MCUS</store>
  </doc>
 <doc>
  <sku>FS4123</sku>
  <store>MFAU</store>
  </doc>
  </ProductFeedRetailToSOLR>

*Snippet of the data-config.xml *

  <entity name="f" processor="FileListEntityProcessor"
fileName="ProductFeed20180924-001434-719.xml$" recursive="true"
rootEntity="false" dataSource="null" transformer="DateFormatTransformer"
baseDir="/dataimport/ISR">

      <!-- this processor extracts content using Xpath from each file found
-->
      <entity name="nested" processor="XPathEntityProcessor"
forEach="/ProductFeedRetailToSOLR   |  ProductFeedRetailToSOLR/doc |
/metadata" url="${f.fileAbsolutePath}" >
              <field column="sku" xpath="/ProductFeedRetailToSOLR/doc/sku"/>
              <field column="store_s" xpath="/MT_ProductFeed/doc/store"/>
    </entity>

          What changes needs to be done to the data-config.xml so that we
have the response similar to the output that we get while using the post
script, that is to get *all the values of the store* that is comma
separated in the Solr response for each document.


2. Delta indexing of xml file.
We would be provided with an xml file and that would be imported to Solr
using full-import during the first import. Subsequently we would be
provided with changes made to the xml file (will be provided as an delta
file) and I would need to import just the changes to the Solr using
delta-import. When I click on delta-import, I do not see any update to the
Solr response.
Please guide us how we can achieve delta-import for *xml *file.

Thanks for the time and advice in advance.

-- 
Regards,
Ashwin

Re: Solr Import

Posted by Yasufumi Mizoguchi <ya...@gmail.com>.
Hi,

I do not have a good idea about No. 1, but No. 2 is clear.

> 2. Delta indexing of xml file.
> We would be provided with an xml file and that would be imported to Solr
> using full-import during the first import. Subsequently we would be
> provided with changes made to the xml file (will be provided as an delta
> file) and I would need to import just the changes to the Solr using
> delta-import. When I click on delta-import, I do not see any update to the
> Solr response.
> Please guide us how we can achieve delta-import for *xml *file.

Solr itself can not detect which documents are updated from the last import
operation.
So, delta import is only supported in SqlEntityProcessor because Solr can
detect the difference
by appropriate SQL.

From Solr ref. guide.
> For incremental imports and change detection. Only the SqlEntityProcessor
supports delta imports.
(
https://lucene.apache.org/solr/guide/7_4/uploading-structured-data-store-data-with-the-data-import-handler.html#uploading-structured-data-store-data-with-the-data-import-handler
)

So, if you use delta import, you should use SqlEntityProcessor by saving
data into RDB.

Thanks,
Yasufumi

2018年9月24日(月) 3:48 Sebastian Aswin <sa...@fossil.com>:

> Hi Experts,
> Good Day!
>
> We are having Solr 7.4 installed in our premise and we are planning to do
> indexing of xml file. I am using data import handler to do the indexing,
> but I had few queries on the indexing.
> 1.  Within a doc tag, there are multiple store, but the Solr response
> contains only one *store value.*  With the below structure, Solr is not
> accepting the xml, so when I changed the xml structure, I was able to
> import the xml file to the Solr using the post tool and got *all *the value
> of store which was comma separated.
>
> *Snippet of the xml import file. *
>
> <ProductFeedRetailToSOLR>
>  <doc>
>   <sku>FS4120</sku>
>   <store>MCUS</store>
>   </doc>
>  <doc>
>   <sku>FS4122</sku>
>   <store>MCIN</store>
>   <store>MFAU</store>
>   <store>MCUS</store>
>   </doc>
>  <doc>
>   <sku>FS4123</sku>
>   <store>MFAU</store>
>   </doc>
>   </ProductFeedRetailToSOLR>
>
> *Snippet of the data-config.xml *
>
>   <entity name="f" processor="FileListEntityProcessor"
> fileName="ProductFeed20180924-001434-719.xml$" recursive="true"
> rootEntity="false" dataSource="null" transformer="DateFormatTransformer"
> baseDir="/dataimport/ISR">
>
>       <!-- this processor extracts content using Xpath from each file found
> -->
>       <entity name="nested" processor="XPathEntityProcessor"
> forEach="/ProductFeedRetailToSOLR   |  ProductFeedRetailToSOLR/doc |
> /metadata" url="${f.fileAbsolutePath}" >
>               <field column="sku"
> xpath="/ProductFeedRetailToSOLR/doc/sku"/>
>               <field column="store_s" xpath="/MT_ProductFeed/doc/store"/>
>     </entity>
>
>           What changes needs to be done to the data-config.xml so that we
> have the response similar to the output that we get while using the post
> script, that is to get *all the values of the store* that is comma
> separated in the Solr response for each document.
>
>
> 2. Delta indexing of xml file.
> We would be provided with an xml file and that would be imported to Solr
> using full-import during the first import. Subsequently we would be
> provided with changes made to the xml file (will be provided as an delta
> file) and I would need to import just the changes to the Solr using
> delta-import. When I click on delta-import, I do not see any update to the
> Solr response.
> Please guide us how we can achieve delta-import for *xml *file.
>
> Thanks for the time and advice in advance.
>
> --
> Regards,
> Ashwin
>