You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Miller, William K - Norman, OK - Contractor" <Wi...@usps.gov.INVALID> on 2017/06/12 13:39:08 UTC

DIH issue with streaming xml file

I am using Solr 6.5.1 and working on importing xml files using the DataImportHandler.  I am wanting to get the files from a remote server, but I am dealing with multiple xml files in multiple folders.  I am using a nested entity in my dataConfig.  Below is an example of how I have my dataConfig set up.  I got most of this from an online reference.  In this example I am getting the xml files from a folder on the Solr server, but as I mentioned above I want to get the files from a remote server.  I have looked at the different Entity Processors for the DIH, but have not seen anything that seems to work.  Is there a way to configure the below code to let me do this?


<dataConfig>

                <dataSource name="hbk" encoding="UTF-8" type="FileDataSource" />
                <document name="hbk">
                                <!--
            Pickupdir fetches all files matching the filename regex in the supplied directory
            and passes them to other entities which parse the file contents.
        -->

                                <entity
            name="pickupdir"
            processor="FileListEntityProcessor"
            rootEntity="false"
            dataSource="null"
            fileName="^[\w\d-]+\.xml$"
            baseDir="/var/solr/data/hbk/data/xml/"
            recursive="true"

        >
                                                <!--
                                                                Pickupxmlfile parses standard Solr update XML.
                                                -->

                                                <entity
                                                                name="xml"
                                                                pk="itemId"
                                                                processor="XPathEntityProcessor"
                                                                transformer="RegexTransformer,TemplateTransformer"
                                                                datasource="pickupdir"
                                                                stream="true"
                                                                xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
                                                                url="${pickupdir.fileAbsolutePath}"
                                                                forEach="/eflow/section | /eflow/section/item"
                                                >

                                                                <field column="sectionId" xpath="/eflow/section/@id" commonField="true" />
                                                                <field column="sectionTitle" xpath="/eflow/section/@title" commonField="true" />
                                                                <field column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
                                                                <field column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
                                                                <field column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />

                                                                <field column="itemId" xpath="/eflow/section/item/@id" />
                                                                <field column="itemTitle" xpath="/eflow/section/item/@title" />
                                                                <field column="itemNo" xpath="/eflow/section/item/@mit" />
                                                                <field column="itemFile" xpath="/eflow/section/item/@file" />
                                                                <field column="itemType" xpath="/eflow/section/item/@type" />
                                                </entity>
                                </entity>
                </document>
</dataConfig>





~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller
[ecsLogo]
ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


RE: DIH issue with streaming xml file

Posted by "Miller, William K - Norman, OK - Contractor" <Wi...@usps.gov.INVALID>.
Please consider this issue closed as we are looking at moving our xml files to the solr server for now.




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158

-----Original Message-----
From: Miller, William K - Norman, OK - Contractor 
Sent: Monday, June 12, 2017 2:12 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: DIH issue with streaming xml file

Thank you for your response.  I will look into this link.  Also, sorry I did not specify the file type.   I am working with XML files.




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
Sent: Monday, June 12, 2017 1:26 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM example in 6.6) that shows how to get stuff from https URL. You can see the atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that list of file on the server. Is that a plain list? Is it XML with files? Are you doing directory listing?

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor <Wi...@usps.gov.invalid> wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot figure out how to get the list of files from the remote server.  I have tried changing the parent Entity Processor to the XPathEntityProcessor and the baseDir to a url using https.  This did not work as it was looking for a "foreach" attribute.  Is there an Entity Processor that can be used to get the list of files from an https source or am I going to have to use solrj or create a custom entity processor?
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's probably the first issue. Once you have the URLs in an outside entity or two, you can feed them one by one into the inner entity.
>
> Regards,
>    Alex.
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < William.K.Miller@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the 
>> DataImportHandler.  I am wanting to get the files from a remote 
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of 
>> how I have my dataConfig set up.  I got most of this from an online 
>> reference.  In this example I am getting the xml files from a folder 
>> on the Solr server, but as I mentioned above I want to get the files 
>> from a remote server.  I have looked at the different Entity 
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> <dataConfig>
>>
>>
>>
>>                 <dataSource name="hbk" encoding="UTF-8"
>> type="FileDataSource" />
>>
>>                 <document name="hbk">
>>
>>                                 <!--
>>
>>             Pickupdir fetches all files matching the filename regex 
>> in the supplied directory
>>
>>             and passes them to other entities which parse the file 
>> contents.
>>
>>         -->
>>
>>
>>
>>                                 <entity
>>
>>             name="pickupdir"
>>
>>             processor="FileListEntityProcessor"
>>
>>             rootEntity="false"
>>
>>             dataSource="null"
>>
>>             fileName="^[\w\d-]+\.xml$"
>>
>>             baseDir="/var/solr/data/hbk/data/xml/"
>>
>>             recursive="true"
>>
>>
>>
>>         >
>>
>>                                                 <!--
>>
>>
>> Pickupxmlfile parses standard Solr update XML.
>>
>>                                                 -->
>>
>>
>>
>>                                                 <entity
>>
>>                                                                 name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>>                                                 >
>>
>>
>>
>>                                                                 
>> <field column="sectionId" xpath="/eflow/section/@id" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>>                                                                 
>> <field column="sectionNo" xpath="/eflow/section/@secno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="hbkNo" xpath="/eflow/section/@hbkno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="volumeNo" xpath="/eflow/section/@volno" 
>> commonField="true" />
>>
>>
>>
>>                                                                 
>> <field column="itemId" xpath="/eflow/section/item/@id" />
>>
>>                                                                 
>> <field column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>>                                                                 
>> <field column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>>                                                                 
>> <field column="itemFile" xpath="/eflow/section/item/@file" />
>>
>>                                                                 
>> <field column="itemType" xpath="/eflow/section/item/@type" />
>>
>>                                                 </entity>
>>
>>                                 </entity>
>>
>>                 </document>
>>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>

RE: DIH issue with streaming xml file

Posted by "Miller, William K - Norman, OK - Contractor" <Wi...@usps.gov.INVALID>.
Thank you for your response.  I will look into this link.  Also, sorry I did not specify the file type.   I am working with XML files.




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Monday, June 12, 2017 1:26 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM example in 6.6) that shows how to get stuff from https URL. You can see the atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that list of file on the server. Is that a plain list? Is it XML with files? Are you doing directory listing?

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor <Wi...@usps.gov.invalid> wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot figure out how to get the list of files from the remote server.  I have tried changing the parent Entity Processor to the XPathEntityProcessor and the baseDir to a url using https.  This did not work as it was looking for a "foreach" attribute.  Is there an Entity Processor that can be used to get the list of files from an https source or am I going to have to use solrj or create a custom entity processor?
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's probably the first issue. Once you have the URLs in an outside entity or two, you can feed them one by one into the inner entity.
>
> Regards,
>    Alex.
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < William.K.Miller@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the 
>> DataImportHandler.  I am wanting to get the files from a remote 
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of 
>> how I have my dataConfig set up.  I got most of this from an online 
>> reference.  In this example I am getting the xml files from a folder 
>> on the Solr server, but as I mentioned above I want to get the files 
>> from a remote server.  I have looked at the different Entity 
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> <dataConfig>
>>
>>
>>
>>                 <dataSource name="hbk" encoding="UTF-8"
>> type="FileDataSource" />
>>
>>                 <document name="hbk">
>>
>>                                 <!--
>>
>>             Pickupdir fetches all files matching the filename regex 
>> in the supplied directory
>>
>>             and passes them to other entities which parse the file 
>> contents.
>>
>>         -->
>>
>>
>>
>>                                 <entity
>>
>>             name="pickupdir"
>>
>>             processor="FileListEntityProcessor"
>>
>>             rootEntity="false"
>>
>>             dataSource="null"
>>
>>             fileName="^[\w\d-]+\.xml$"
>>
>>             baseDir="/var/solr/data/hbk/data/xml/"
>>
>>             recursive="true"
>>
>>
>>
>>         >
>>
>>                                                 <!--
>>
>>
>> Pickupxmlfile parses standard Solr update XML.
>>
>>                                                 -->
>>
>>
>>
>>                                                 <entity
>>
>>                                                                 name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>>                                                 >
>>
>>
>>
>>                                                                 
>> <field column="sectionId" xpath="/eflow/section/@id" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>>                                                                 
>> <field column="sectionNo" xpath="/eflow/section/@secno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="hbkNo" xpath="/eflow/section/@hbkno" 
>> commonField="true" />
>>
>>                                                                 
>> <field column="volumeNo" xpath="/eflow/section/@volno" 
>> commonField="true" />
>>
>>
>>
>>                                                                 
>> <field column="itemId" xpath="/eflow/section/item/@id" />
>>
>>                                                                 
>> <field column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>>                                                                 
>> <field column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>>                                                                 
>> <field column="itemFile" xpath="/eflow/section/item/@file" />
>>
>>                                                                 
>> <field column="itemType" xpath="/eflow/section/item/@type" />
>>
>>                                                 </entity>
>>
>>                                 </entity>
>>
>>                 </document>
>>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>

Re: DIH issue with streaming xml file

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as
ATOM example in 6.6) that shows how to get stuff from https URL. You
can see the atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that
list of file on the server. Is that a plain list? Is it XML with
files? Are you doing directory listing?

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor
<Wi...@usps.gov.invalid> wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot figure out how to get the list of files from the remote server.  I have tried changing the parent Entity Processor to the XPathEntityProcessor and the baseDir to a url using https.  This did not work as it was looking for a "foreach" attribute.  Is there an Entity Processor that can be used to get the list of files from an https source or am I going to have to use solrj or create a custom entity processor?
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's probably the first issue. Once you have the URLs in an outside entity or two, you can feed them one by one into the inner entity.
>
> Regards,
>    Alex.
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < William.K.Miller@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the
>> DataImportHandler.  I am wanting to get the files from a remote
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of
>> how I have my dataConfig set up.  I got most of this from an online
>> reference.  In this example I am getting the xml files from a folder
>> on the Solr server, but as I mentioned above I want to get the files
>> from a remote server.  I have looked at the different Entity
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> <dataConfig>
>>
>>
>>
>>                 <dataSource name="hbk" encoding="UTF-8"
>> type="FileDataSource" />
>>
>>                 <document name="hbk">
>>
>>                                 <!--
>>
>>             Pickupdir fetches all files matching the filename regex in
>> the supplied directory
>>
>>             and passes them to other entities which parse the file
>> contents.
>>
>>         -->
>>
>>
>>
>>                                 <entity
>>
>>             name="pickupdir"
>>
>>             processor="FileListEntityProcessor"
>>
>>             rootEntity="false"
>>
>>             dataSource="null"
>>
>>             fileName="^[\w\d-]+\.xml$"
>>
>>             baseDir="/var/solr/data/hbk/data/xml/"
>>
>>             recursive="true"
>>
>>
>>
>>         >
>>
>>                                                 <!--
>>
>>
>> Pickupxmlfile parses standard Solr update XML.
>>
>>                                                 -->
>>
>>
>>
>>                                                 <entity
>>
>>                                                                 name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>>                                                 >
>>
>>
>>
>>                                                                 <field
>> column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>>
>>                                                                 <field
>> column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>>                                                                 <field
>> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>>
>>                                                                 <field
>> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>>
>>                                                                 <field
>> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>>
>>
>>
>>                                                                 <field
>> column="itemId" xpath="/eflow/section/item/@id" />
>>
>>                                                                 <field
>> column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>>                                                                 <field
>> column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>>                                                                 <field
>> column="itemFile" xpath="/eflow/section/item/@file" />
>>
>>                                                                 <field
>> column="itemType" xpath="/eflow/section/item/@type" />
>>
>>                                                 </entity>
>>
>>                                 </entity>
>>
>>                 </document>
>>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>

RE: DIH issue with streaming xml file

Posted by "Miller, William K - Norman, OK - Contractor" <Wi...@usps.gov.INVALID>.
Thank you for your response.  That is the issue that I am having.  I cannot figure out how to get the list of files from the remote server.  I have tried changing the parent Entity Processor to the XPathEntityProcessor and the baseDir to a url using https.  This did not work as it was looking for a "foreach" attribute.  Is there an Entity Processor that can be used to get the list of files from an https source or am I going to have to use solrj or create a custom entity processor?




~~~~~~~~~~~~~~~~~~~~~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Monday, June 12, 2017 12:57 PM
To: solr-user
Subject: Re: DIH issue with streaming xml file

How do you get a list of URLs for the files on the remote server? That's probably the first issue. Once you have the URLs in an outside entity or two, you can feed them one by one into the inner entity.

Regards,
   Alex.

----
http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < William.K.Miller@usps.gov.invalid> wrote:

> I am using Solr 6.5.1 and working on importing xml files using the 
> DataImportHandler.  I am wanting to get the files from a remote 
> server, but I am dealing with multiple xml files in multiple folders.  
> I am using a nested entity in my dataConfig.  Below is an example of 
> how I have my dataConfig set up.  I got most of this from an online 
> reference.  In this example I am getting the xml files from a folder 
> on the Solr server, but as I mentioned above I want to get the files 
> from a remote server.  I have looked at the different Entity 
> Processors for the DIH, but have not seen anything that seems to work.  
> Is there a way to configure the below code to let me do this?
>
>
>
>
>
> <dataConfig>
>
>
>
>                 <dataSource name="hbk" encoding="UTF-8"
> type="FileDataSource" />
>
>                 <document name="hbk">
>
>                                 <!--
>
>             Pickupdir fetches all files matching the filename regex in 
> the supplied directory
>
>             and passes them to other entities which parse the file 
> contents.
>
>         -->
>
>
>
>                                 <entity
>
>             name="pickupdir"
>
>             processor="FileListEntityProcessor"
>
>             rootEntity="false"
>
>             dataSource="null"
>
>             fileName="^[\w\d-]+\.xml$"
>
>             baseDir="/var/solr/data/hbk/data/xml/"
>
>             recursive="true"
>
>
>
>         >
>
>                                                 <!--
>
>
> Pickupxmlfile parses standard Solr update XML.
>
>                                                 -->
>
>
>
>                                                 <entity
>
>                                                                 name="xml"
>
>
> pk="itemId"
>
>
> processor="XPathEntityProcessor"
>
>
> transformer="RegexTransformer,TemplateTransformer"
>
>
> datasource="pickupdir"
>
>
> stream="true"
>
>
> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>
>
> url="${pickupdir.fileAbsolutePath}"
>
>
> forEach="/eflow/section | /eflow/section/item"
>
>                                                 >
>
>
>
>                                                                 <field 
> column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>
>                                                                 <field 
> column="sectionTitle" xpath="/eflow/section/@title" commonField="true" 
> />
>
>                                                                 <field 
> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>
>                                                                 <field 
> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>
>                                                                 <field 
> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>
>
>
>                                                                 <field 
> column="itemId" xpath="/eflow/section/item/@id" />
>
>                                                                 <field 
> column="itemTitle" xpath="/eflow/section/item/@title" />
>
>                                                                 <field 
> column="itemNo" xpath="/eflow/section/item/@mit" />
>
>                                                                 <field 
> column="itemFile" xpath="/eflow/section/item/@file" />
>
>                                                                 <field 
> column="itemType" xpath="/eflow/section/item/@type" />
>
>                                                 </entity>
>
>                                 </entity>
>
>                 </document>
>
> </dataConfig>
>
>
>
>
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>

Re: DIH issue with streaming xml file

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
How do you get a list of URLs for the files on the remote server? That's
probably the first issue. Once you have the URLs in an outside entity or
two, you can feed them one by one into the inner entity.

Regards,
   Alex.

----
http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor <
William.K.Miller@usps.gov.invalid> wrote:

> I am using Solr 6.5.1 and working on importing xml files using the
> DataImportHandler.  I am wanting to get the files from a remote server, but
> I am dealing with multiple xml files in multiple folders.  I am using a
> nested entity in my dataConfig.  Below is an example of how I have my
> dataConfig set up.  I got most of this from an online reference.  In this
> example I am getting the xml files from a folder on the Solr server, but as
> I mentioned above I want to get the files from a remote server.  I have
> looked at the different Entity Processors for the DIH, but have not seen
> anything that seems to work.  Is there a way to configure the below code to
> let me do this?
>
>
>
>
>
> <dataConfig>
>
>
>
>                 <dataSource name="hbk" encoding="UTF-8"
> type="FileDataSource" />
>
>                 <document name="hbk">
>
>                                 <!--
>
>             Pickupdir fetches all files matching the filename regex in the
> supplied directory
>
>             and passes them to other entities which parse the file
> contents.
>
>         -->
>
>
>
>                                 <entity
>
>             name="pickupdir"
>
>             processor="FileListEntityProcessor"
>
>             rootEntity="false"
>
>             dataSource="null"
>
>             fileName="^[\w\d-]+\.xml$"
>
>             baseDir="/var/solr/data/hbk/data/xml/"
>
>             recursive="true"
>
>
>
>         >
>
>                                                 <!--
>
>
> Pickupxmlfile parses standard Solr update XML.
>
>                                                 -->
>
>
>
>                                                 <entity
>
>                                                                 name="xml"
>
>
> pk="itemId"
>
>
> processor="XPathEntityProcessor"
>
>
> transformer="RegexTransformer,TemplateTransformer"
>
>
> datasource="pickupdir"
>
>
> stream="true"
>
>
> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>
>
> url="${pickupdir.fileAbsolutePath}"
>
>
> forEach="/eflow/section | /eflow/section/item"
>
>                                                 >
>
>
>
>                                                                 <field
> column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>
>                                                                 <field
> column="sectionTitle" xpath="/eflow/section/@title" commonField="true" />
>
>                                                                 <field
> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>
>                                                                 <field
> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>
>                                                                 <field
> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>
>
>
>                                                                 <field
> column="itemId" xpath="/eflow/section/item/@id" />
>
>                                                                 <field
> column="itemTitle" xpath="/eflow/section/item/@title" />
>
>                                                                 <field
> column="itemNo" xpath="/eflow/section/item/@mit" />
>
>                                                                 <field
> column="itemFile" xpath="/eflow/section/item/@file" />
>
>                                                                 <field
> column="itemType" xpath="/eflow/section/item/@type" />
>
>                                                 </entity>
>
>                                 </entity>
>
>                 </document>
>
> </dataConfig>
>
>
>
>
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>