You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Fergus McMenemie (JIRA)" <ji...@apache.org> on 2009/02/02 20:22:04 UTC

[jira] Created: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

using invariant request values from solrconfig.xml inside a data-config.xml regexp
----------------------------------------------------------------------------------

                 Key: SOLR-1001
                 URL: https://issues.apache.org/jira/browse/SOLR-1001
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 1.3
            Reporter: Fergus McMenemie
             Fix For: 1.4


As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.

   <dataConfig>
   <dataSource name="myfilereader" type="FileDataSource"/>    
    <document>
       <entity name="jc"
	       processor="FileListEntityProcessor"
	       fileName="^.*\.xml$"
	       newerThan="'NOW-1000DAYS'"
	       recursive="true"
	       rootEntity="false"
	       dataSource="null"
	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
	  <entity name="x"
	          dataSource="myfilereader"
		  processor="XPathEntityProcessor"
		  url="${jc.fileAbsolutePath}"
		  stream="false"
		  forEach="/record"
		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">

   <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
   <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
   <field column="test"             template="${dataimporter.request.finstalldir}" />
   <field column="title"            xpath="/record/title" />
   <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
   <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
   	     </entity>
       </entity>
       </document>
    </dataConfig>

Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1001:
----------------------------------------

    Attachment: SOLR-1001.patch

I see your point Noble. I don't really want to have a helper method in Context right now. Lets fix the issue at hand.

I have made the following changes:
# HTMLStripTransformer allows variable in stripHTML attribute
# NumberFormatTransformer allows variable in formatStyle and locale attributes

I'll commit this shortly.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch, SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671000#action_12671000 ] 

Noble Paul commented on SOLR-1001:
----------------------------------

bq.Would it be alright if we resolve all entity attributes in ContextImpl.getEntityAttribute?

So the components will never be able to get the actual string if they wish to. Moreover it is not backcompat. We can add extra method
# ContextImpl.getResolvedEntityAttribute
# ContextImpl.getAllResolvedEntityFields .It is expensive to do that in here because the component may be interested in only one variable and we end up resolving all variables . If we can cache it it may be ok.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669705#action_12669705 ] 

Fergus McMenemie commented on SOLR-1001:
----------------------------------------

I could probably hack around this myself given Shalin's clue as to the cause. However two possible issues come to mind.

* Is it possible that an equivalent change needs made to other transformers?

* Is the construct ${XXX} a valid part of a regular expression, I dont think so, but.... ?

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669714#action_12669714 ] 

Shalin Shekhar Mangar commented on SOLR-1001:
---------------------------------------------

bq. Is it possible that an equivalent change needs made to other transformers?

Yes, I think so. We need to review other transformer and entity processors to make sure we attempt to resolve all attributes.

bq. Is the construct ${XXX} a valid part of a regular expression, I dont think so, but.... ?

Curly braces are used in regex to specify repetitions but they are not preceded by '$'. I think VariableResolver ignores any variables it is not able to resolve, so we should be fine; we will need to check this though.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1001:
-----------------------------

    Attachment: SOLR-1001.patch

this just fixes the RegexTransformer. We may take a look at the other Transformers and fix them too

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-1001.
-----------------------------------------

    Resolution: Fixed
      Assignee: Shalin Shekhar Mangar

Committed revision 741435.

Thanks Fergus and Noble!

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch, SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670896#action_12670896 ] 

Shalin Shekhar Mangar commented on SOLR-1001:
---------------------------------------------

Would it be alright if we resolve all entity attributes in ContextImpl.getEntityAttribute? Similarily, we can change ContextImpl.getAllEntityFields to resolve field values just-in-time. This would keep this logic in one place and avoid this problem popping up in different places.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1001) using invariant request values from solrconfig.xml inside a data-config.xml regexp

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669926#action_12669926 ] 

Fergus McMenemie commented on SOLR-1001:
----------------------------------------

Download and installed the patch. Works fine for me. Thanks very much.

> using invariant request values from solrconfig.xml inside a data-config.xml regexp
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-1001
>                 URL: https://issues.apache.org/jira/browse/SOLR-1001
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As per several postings I noted that I can define variables inside an invariants list section of the DIH handler of solrconfig.xml. I can also reference these variables within data-config.xml. This works properly, the solr field "test" is nicely populated. However it is not substituted into my regex transformer? Here is my  data-config.xml which gives a hint of the use case.
>    <dataConfig>
>    <dataSource name="myfilereader" type="FileDataSource"/>    
>     <document>
>        <entity name="jc"
> 	       processor="FileListEntityProcessor"
> 	       fileName="^.*\.xml$"
> 	       newerThan="'NOW-1000DAYS'"
> 	       recursive="true"
> 	       rootEntity="false"
> 	       dataSource="null"
> 	       baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
> 	  <entity name="x"
> 	          dataSource="myfilereader"
> 		  processor="XPathEntityProcessor"
> 		  url="${jc.fileAbsolutePath}"
> 		  stream="false"
> 		  forEach="/record"
> 		  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>    <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>    <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" sourceColName="fileAbsolutePath"/>
>    <field column="test"             template="${dataimporter.request.finstalldir}" />
>    <field column="title"            xpath="/record/title" />
>    <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>    <field column="date"             xpath="/record/metadata/date[@qualifier='Date']" dateTimeFormat="yyyyMMdd"   />
>    	     </entity>
>        </entity>
>        </document>
>     </dataConfig>
> Shalin has pointed out that we are creating the regex Pattern without first resolving the variable. So we need to call VariableResolver.resolve on the 'regex' attribute's value before creating the Pattern object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.