You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Fergus McMenemie (JIRA)" <ji...@apache.org> on 2009/02/21 14:32:02 UTC

[jira] Created: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

DIH transformers cannot reuse output from previous transformations
------------------------------------------------------------------

                 Key: SOLR-1033
                 URL: https://issues.apache.org/jira/browse/SOLR-1033
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 1.4
         Environment: All operating systems and software platforms
            Reporter: Fergus McMenemie
             Fix For: 1.4


It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.

This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675856#action_12675856 ] 

fergus edited comment on SOLR-1033 at 2/23/09 3:33 AM:
-----------------------------------------------------------------

OK here goes. My document contains references to embeded imagery. For each image there is the image itself along with a thumbnail and caption. The source document contains:-

  <mediaObject vurl="1043130" imageType="graphic"/>

I have a search application that searches only the captions associated with a given image. It would be nice to populate solr fields with the correct relative path to each image and thumbnails at index time. Problem arises in that although the thumbnail is:

   s${e.vurl}.jpg

The name of the image itself varies depending on the first letter of the image type imageType! It could be one of 'picture' 'graphic' 'lineDrawing' or 'map'. ie:-

   p${e.vurl}.jpg
   g${e.vurl}.jpg
   l${e.vurl}.jpg
   m${e.vurl}.jpg

My patch would allow the following sort of thing to be added to a data-config. I feel this considerably increases its power and usefulness.

{code}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
  <field column="fileWebPath"            template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)" replaceWith="/ford$1" />
  <field column="vurl"                          xpath="/record/mediaBlock/mediaObject/@vurl" />
  <field column="imagetype"               xpath="/record/mediaBlock/mediaObject/@imageType" regex="^(\w).*"/>
  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg" sourceColName="fileWebPath"/>
  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"  sourceColName="fileWebPath"/>
{code}


      was (Author: fergus):
    OK here goes. My document contains references to embeded imagery. For each image there is the image itself along with a thumbnail and caption. The source document contains:-

  <mediaObject vurl="1043130" imageType="graphic"/>

I have a search application that searches only the captions associated with a given image. It would be nice to populate solr fields with the correct relative path to each image and thumbnails at index time. Problem arises in that although the thumbnail is:

   s${e.vurl}.jpg

The name of the image itself varies depending on the first letter of the image type imageType! It could be one of 'picture' 'graphic' 'lineDrawing' or 'map'. ie:-

   p${e.vurl}.jpg
   g${e.vurl}.jpg
   l${e.vurl}.jpg
   m${e.vurl}.jpg

My patch would allow the following sort of thing to be added to a data-config. I feel this considerably increases its power and usefulness.

{{code}}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
  <field column="fileWebPath"            template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)" replaceWith="/ford$1" />
  <field column="vurl"                          xpath="/record/mediaBlock/mediaObject/@vurl" />
  <field column="imagetype"               xpath="/record/mediaBlock/mediaObject/@imageType" regex="^(\w).*"/>
  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg" sourceColName="fileWebPath"/>
  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"  sourceColName="fileWebPath"/>
{{code}}

  
> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675753#action_12675753 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

Fergus, the changes required for TemplateTransformer was clear and your fix is right. 
 Can you give the usecase for RegexTranformer also?

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676498#action_12676498 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

Fergus, 
Looks good. Thanks

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675580#action_12675580 ] 

Fergus McMenemie commented on SOLR-1033:
----------------------------------------

Sorry. I was not as clear as I could have been. No the use case is more

  <entity name="e" transformer="TemplateTransformer,RegexTransformer">
    <field column="a" template="hello"/>
    <field column="c" template="hello world"/>
    <field column="b" regex="${e.a}(.*)" sourceColName="c"/>
    </entity>



> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fergus McMenemie updated SOLR-1033:
-----------------------------------

    Attachment:     (was: SOLR-1033.patch)

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675953#action_12675953 ] 

noble.paul edited comment on SOLR-1033 at 2/23/09 7:54 AM:
-----------------------------------------------------------

OK , I see your point. you are constructing the regex replacements themselves with templates. I missed that

I am wondering , if the system can be modified to have the current entities rows be available always to all transformers. It can be done as a simple change in the EntityprocessorBase#applyTransformers

      was (Author: noble.paul):
    OK , I see your point. you are constructing the regex replacements themselves with templates. I missed that
  
> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675902#action_12675902 ] 

Fergus McMenemie commented on SOLR-1033:
----------------------------------------

Not sure I am following what you say. If I number the different steps in my example entity as follows:-

{code}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
1  <field column="fileWebPath"     template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)" replaceWith="/ford$1" />
2  <field column="vurl"            xpath="/record/mediaBlock/mediaObject/@vurl" />
3  <field column="imagetype"       xpath="/record/mediaBlock/mediaObject/@imageType" regex="^(\w).*"/>
4  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg" sourceColName="fileWebPath"/>
5  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"  sourceColName="fileWebPath"/>
{code}

We see that column 5 involves a regex which in turn involves columns 3 and 2. Column 3 is itself a regex. We therefore have the output from one regex being used within another regex. So as far as I can see we need the fix made to both the TemplateTransformer and the RegexTransformer. 

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675860#action_12675860 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

If I am not wrong the output of one transformation in Regextransformer is available in the next transformation , becaus ethe value is added to the same row object . So it should be working if the TemplateTransformer is fixed

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fergus McMenemie updated SOLR-1033:
-----------------------------------

    Attachment: SOLR-1033.patch

Following on from Noble's comments I realised that the test case for regex was not testing or highlighting the use case at all. This patch contains a new working regexp junit test case.

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675808#action_12675808 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

bq.Sure. However I need a little help. What is it I need to do?
A simple usecase with an example which demonstrates the feature .

The TemplateTransformer example you provided was self explanatory. If you can give a similar one that is more than sufficient.

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675856#action_12675856 ] 

Fergus McMenemie commented on SOLR-1033:
----------------------------------------

OK here goes. My document contains references to embeded imagery. For each image there is the image itself along with a thumbnail and caption. The source document contains:-

  <mediaObject vurl="1043130" imageType="graphic"/>

I have a search application that searches only the captions associated with a given image. It would be nice to populate solr fields with the correct relative path to each image and thumbnails at index time. Problem arises in that although the thumbnail is:

   s${e.vurl}.jpg

The name of the image itself varies depending on the first letter of the image type imageType! It could be one of 'picture' 'graphic' 'lineDrawing' or 'map'. ie:-

   p${e.vurl}.jpg
   g${e.vurl}.jpg
   l${e.vurl}.jpg
   m${e.vurl}.jpg

My patch would allow the following sort of thing to be added to a data-config. I feel this considerably increases its power and usefulness.

{{code}}
<entity name="x" .... transformer="TemplateTransformer,RegexTransformer">
  <field column="fileWebPath"            template="${jc.fileAbsolutePath}" regex="${dataimporter.request.contentdir}(.*)" replaceWith="/ford$1" />
  <field column="vurl"                          xpath="/record/mediaBlock/mediaObject/@vurl" />
  <field column="imagetype"               xpath="/record/mediaBlock/mediaObject/@imageType" regex="^(\w).*"/>
  <field column="imgWebPathICON"  regex="(.*)/.*" replaceWith="$1/imagery/s${x.vurl}.jpg" sourceColName="fileWebPath"/>
  <field column="imgWebPathFULL"  regex="(.*)/.*" replaceWith="$1/imagery/${x.imagetype}${x.vurl}.jpg"  sourceColName="fileWebPath"/>
{{code}}


> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fergus McMenemie updated SOLR-1033:
-----------------------------------

    Attachment: SOLR-1033.patch

A patch to address the issue.

Yet again, I cannot get one of unit tests to work. I am hoping that folk better than me can point out where I am going wrong!

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675969#action_12675969 ] 

Fergus McMenemie commented on SOLR-1033:
----------------------------------------

Your comment about modifying the system "to have the current entities rows be available always to all transformers" is good and will produce the fastest most efficient code. 

But I need to sure we are not using the term "template" twice in different ways. You say "you are constructing the regex replacements themselves with templates" by which you mean using the ${XXX} syntax and not the output from a templatetransformer?

Also, is your patch a replacement for mine?

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1033:
-----------------------------

    Attachment: SOLR-1033.patch

the complete patch. XPathEntityprocessor needed some rework

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675801#action_12675801 ] 

Fergus McMenemie commented on SOLR-1033:
----------------------------------------

Noble,

Sure. However I need a little help. What is it I need to do?

  1) reference the examples I posted to solr-user in JIRA?

  2) simplify/clarify what was posted to solr-user?

  3) include a snippet in JIRA?

  4) add example explicitly showing reuse of regex output in another regex?

  5) or details of the problem I am trying to solve right now?

I had thought the general case included below was sufficient!

Regards Fergus.


-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================


> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1033) DIH transformers should be able to access current entity's namespace

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-1033.
-----------------------------------------

    Resolution: Fixed

Committed revision 747664.

Thanks Fergus and Noble!

> DIH transformers should be able to access current entity's namespace
> --------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1033:
-----------------------------

    Attachment: SOLR-1033.patch

This should help all other transformers implicitly support templating

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675564#action_12675564 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

the output of one transformer can be consumed from other. 
example

{code}
<entity transformer="TemplateTransformer,RegexTransformer">
  <field column="a" template="hello"/>
  <field column="b" regex="(.*)" sourceColName="a"/>
</entity> 
{code}
in this case , the output of TemplateTransformer goes to 'a' . The RegexTransformer can read from column 'a' and it can be put into column 'b' . It is still possible to have another transformer which reads from 'b' and puts the value into 'c'

Is this the usecase? or am I missing something?


> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fergus McMenemie updated SOLR-1033:
-----------------------------------

    Attachment: SOLR-1033.patch

Hmmm, some  thoughts and an enhanced patch for your consideration.

Surely the test cases should still be revised to test the new functionality.

Also as the XPathEntityProcessor has been revised, I felt this might be the best time to sort some formating typo's within the error messages.

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676165#action_12676165 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

bq.You say "you are constructing the regex replacements themselves with templates" by which you mean using the ${XXX} syntax and not the output from a templatetransformer?

when I said 'template' I mean any string with ${xxx} content. the 'template' attribute is the only value Templatetransformer is interested in.

Any attribute value in DIH is potentially a template .Some are honoured and some are not. I hope we can consistently make it work across all.



> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1033) DIH transformers should be able to access current entity's namespace

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1033:
----------------------------------------

    Assignee: Shalin Shekhar Mangar
     Summary: DIH transformers should be able to access current entity's namespace  (was: DIH transformers cannot reuse output from previous transformations)

Updating issue title per the final resolution.

Patch looks good, I'll commit shortly.

> DIH transformers should be able to access current entity's namespace
> --------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675969#action_12675969 ] 

fergus edited comment on SOLR-1033 at 2/23/09 3:06 PM:
-----------------------------------------------------------------

Your comment about modifying the system "to have the current entities rows be available always to all transformers" is good and will produce the fastest most efficient code. 

But I need to sure we are not using the term "template" twice in different ways. You say "you are constructing the regex replacements themselves with templates" by which you mean using the ${XXX} syntax and not the output from a templatetransformer?

Anyway I have backed out my patch and applied yours. Everything seems fine, but I am still testing.

Thanks very much.

      was (Author: fergus):
    Your comment about modifying the system "to have the current entities rows be available always to all transformers" is good and will produce the fastest most efficient code. 

But I need to sure we are not using the term "template" twice in different ways. You say "you are constructing the regex replacements themselves with templates" by which you mean using the ${XXX} syntax and not the output from a templatetransformer?

Also, is your patch a replacement for mine?
  
> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Fergus McMenemie (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675580#action_12675580 ] 

fergus edited comment on SOLR-1033 at 2/21/09 9:06 AM:
-----------------------------------------------------------------

Sorry. I was not as clear as I could have been. No the use case is more
{code}
  <entity name="e" transformer="TemplateTransformer,RegexTransformer">
    <field column="a" template="hello"/>
    <field column="c" template="hello world"/>
    <field column="b" regex="${e.a}(.*)" sourceColName="c"/>
    </entity>
{code}


      was (Author: fergus):
    Sorry. I was not as clear as I could have been. No the use case is more

  <entity name="e" transformer="TemplateTransformer,RegexTransformer">
    <field column="a" template="hello"/>
    <field column="c" template="hello world"/>
    <field column="b" regex="${e.a}(.*)" sourceColName="c"/>
    </entity>


  
> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1033) DIH transformers cannot reuse output from previous transformations

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675953#action_12675953 ] 

Noble Paul commented on SOLR-1033:
----------------------------------

OK , I see your point. you are constructing the regex replacements themselves with templates. I missed that

> DIH transformers cannot reuse output from previous transformations
> ------------------------------------------------------------------
>
>                 Key: SOLR-1033
>                 URL: https://issues.apache.org/jira/browse/SOLR-1033
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>         Environment: All operating systems and software platforms
>            Reporter: Fergus McMenemie
>             Fix For: 1.4
>
>         Attachments: SOLR-1033.patch, SOLR-1033.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It can be very useful to reuse the output from a DIH template in other templates and or regex transformers. Currently this cannot be done. The resolver is initialized at the start of the transformer run with what ever values exist for a column name at that instant. As the transformer executes it may define new values for column names. My change is intended to update the hash used by the resolver after each successful transformation.
> This only applies to the template and regex transformers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.