You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Uri Boness (JIRA)" <ji...@apache.org> on 2010/01/18 21:34:54 UTC

[jira] Created: (SOLR-1725) Script based UpdateRequestProcessorFactory

Script based UpdateRequestProcessorFactory
------------------------------------------

                 Key: SOLR-1725
                 URL: https://issues.apache.org/jira/browse/SOLR-1725
             Project: Solr
          Issue Type: New Feature
          Components: update
    Affects Versions: 1.4
            Reporter: Uri Boness


A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.

The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).

The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.

Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.

The following variables are define as global variables for each script:
 * {{req}} - The SolrQueryRequest
 * {{rsp}}- The SolrQueryResponse
 * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802022#action_12802022 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Yes, it depends on Java 6. I guess the concern is mainly for the unit tests? (at runtime the it shouldn't really matter)


> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801971#action_12801971 ] 

Mark Miller commented on SOLR-1725:
-----------------------------------

This is a great idea.

bq. This patch also introduces a new Interface - SolrResourceLoaderAware

What about the existing ResourceLoaderAware?


> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805678#action_12805678 ] 

Yonik Seeley commented on SOLR-1725:
------------------------------------

As you pointed out, Java5 is EOL'd already and Sun/Oracle doesn't even let you download JDK5 anymore w/o registration.
Wouldn't hurt my feelings to move to Java6.  After all, the SolrCloud stuff we're working on uses zookeeper which requires 1.6.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805612#action_12805612 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

{quote}
Performance:

It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them.
{quote}
Yes, indeed the scripts are evaluated per request but for a reason. One of the goals here is to keep the scripts as close as possible to the update processor interface, so the functions in the scripts has the same signature as the methods in the processor. But in order for the scripts to be flexible I decided to introduce some global scoped variables which are accessible in the functions. (currently the current solr request, response and a logger are there). The problem is that the API only defines 3 scopes where you can register variables and the lowest one is the engine itself. Since the evaluation of a script is done on the engine level as well, when using this API together with the global variables I don't think you can escape the need for creating an engine per request (thus, also evaluating the scripts).

But I agree with you that if there is a way around it, caching the evaluated/compiled scripts will definitely boost things up. I'll need to investigate this further and come up with alternatives (I already have some ideas using ThreadLocals).

bq. Should we have a way to specify a script in-line (in solrconfig.xml)?

Personally I prefer keeping the solrconfig.xml as clean as possible. I do however think that a standardization of Solr scripting support in general can be great. (for example, have a scripts folder under _solr.solr.home_ were all the scripts are placed, or come up with a standard configuration structure for the scripts... perhaps something in the direction Hoss suggested above).

bq. This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue.
+1

bq. Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code.
Yep.. that would probably be very helpful. basically I think anyone who's ever written an update processor can perhaps try to convert it to a script and see how it works. The usual use case for me is to just add a few fields which are derived from the other fields, but perhaps there are some other more interesting use cases out there. I guess these examples should be put in the Wiki, right?





> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802024#action_12802024 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Sorry... of course it matters for the build :-)

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uri Boness updated SOLR-1725:
-----------------------------

    Attachment: SOLR-1725.patch

A new patch, this time leverages the already existing ResourceLoaderAware interface

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805254#action_12805254 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

bq. 1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor?

Well... to my taste it makes the configuration cleaner (no need to define several script processors). The thing is, you have the choice here - either specify several scripts (comma separated) or split them to several processors.

bq. 2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ...

That's definitely another option.

The only thing is that you'd probably want some way to define shared parameters (shared between the scripts that is) and not be forced to specify them several times for each script. I guess you can do something like this:

{code}
<processor class="solr.ScriptUpdateProcessorFactory">
  <lst name="sharedParams">
    <bool name="paramName">true</bool>
  </lst>
  <lst name="scripts">
    <lst name="updateProcessor1.js">
      <bool name="someParamName">true</bool>
      <int name="someOtherParamName">3</int>
    </lst>
    <lst name="updateProcessor2.js">
      <bool name="fooParam">true</bool>
      <str name="barParam">3</str>
    </lst>
  </lst>
  <lst name="otherProcessorOPtionsIfNeeded">
    ...
  </lst>
</processor>
{code}

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Jan Høydahl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805500#action_12805500 ] 

Jan Høydahl commented on SOLR-1725:
-----------------------------------

It looks logical and nice.

However, I'm leaning towards keeping it very simple. The simplest is one script per processor, since that will always work.

As more and more update processors are written, in Java, JS, Jython and more, it would be a clear benefit if Administrators don't need to care about the underlying implementation, but can use same way of configuring each one - That's why I opt for the top-level param structure as default.

I have years of experience with FAST document processing, which is really a killer feature, mainly because it's so dead simple. Drop in a python script with a deployment descriptor and start using it in your pipelines. You don't care if the implementation is pure Python, a C library wrapper or whatever, you just care about what parameters to give it. I see this patch as one big step towards the same simplicity with Solr!

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801983#action_12801983 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

bq. What about the existing ResourceLoaderAware?

Woops... missed that one out :-)... I'll check it out and update the patch

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801998#action_12801998 ] 

Mark Miller commented on SOLR-1725:
-----------------------------------

I believe the history was, on development of *aware - hey, here is a patch that restricts this to certain classes if we want? Anyone object? - and no one did.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805247#action_12805247 ] 

Hoss Man commented on SOLR-1725:
--------------------------------

Some random comments/questions from the peanut gallery...

1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor?

2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ...

{code}
<processor class="solr.ScriptUpdateProcessorFactory">
  <lst name="scripts">
    <lst name="updateProcessor1.js">
      <bool name="someParamName">true</bool>
      <int name="someOtherParamName">3</int>
    </lst>
    <lst name="updateProcessor2.js">
      <bool name="fooParam">true</bool>
      <str name="barParam">3</str>
    </lst>
  </lst>
  <lst name="otherProcessorOPtionsIfNeeded">
    ...
  </lst>
</processor>
{code}

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uri Boness updated SOLR-1725:
-----------------------------

    Attachment: SOLR-1725.patch

Third try :-), this time Java 5 compatible

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801988#action_12801988 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Is there any reason for currently limiting the classes that can be ResourceLoaderAware? This limitation is explicit in SolrResourceLoader (line: 584):

{code}
awareCompatibility.put(
      ResourceLoaderAware.class, new Class[] {
        CharFilterFactory.class,
        TokenFilterFactory.class,
        TokenizerFactory.class,
        FieldType.class
      }
    );
{code}

If the type is not one of this classes an exception is thrown

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Jan Høydahl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804093#action_12804093 ] 

Jan Høydahl commented on SOLR-1725:
-----------------------------------

This is very much wanted!

Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble.

Isn't solr/lib/scripts a more natural location of code than solr/conf?

How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones?

To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course.

Example use case: Say you have an XML input with structured data where one of the fields is the file name of a PDF. You want to convert the PDF usigng a Tika processor (hopefully to come) and then sanitize Author metadata from the parsed PDF. This could then look like this in solrconfig.xml:

<updateRequestProcessorChain name="xmlwithpdf">
    <processor class="solr.FileReaderProcessorFactory">
      <str name="filenameField">filename</str>
      <str name="outputField">binarydata</str>
    </processor>
    <processor class="solr.TikaProcessorFactory">
      <str name="inputField">binarydata</str>
      <str name="fmap.author">tikaauthor</str>
      <str name="fmap.content">text</str>
    </processor>
    <processor class="solr.ScriptUpdateProcessorFactory">
      <str name="script">author_samitizer.js</str>
      <str name="inputField">tikaauthor</str>
      <str name="outputField">author</str>
      <str name="discardRegex">Microsoft Word.*|Adobe Distiller.*|PDF995.*|Unknown</str>
      <bool name="overwriteExisting">false</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802100#action_12802100 ] 

Noble Paul commented on SOLR-1725:
----------------------------------


Why are we limiting this to javascript . Why can't somebody use groovy/Scala/JRuby or whatever is supported by the the java6 scripting engine?



> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Jan Høydahl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805051#action_12805051 ] 

Jan Høydahl commented on SOLR-1725:
-----------------------------------

Lance, what do you mean by "DIH language"?

In my example xml that you quoted, the two first processors, FileReaderProcessorFactory and TikaProcessorFactory, were supposed to be (imagined) ordinary Java processors, not scripting ones.

Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself).

Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch.


> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801991#action_12801991 ] 

Mark Miller commented on SOLR-1725:
-----------------------------------

I believe its somewhat arbitrary based on what makes sense - so adding something to it shouldn't be a problem.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805235#action_12805235 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Lance, I lost you a bit as well.

bq. Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself).

Hmm... personally I prefer configurations that clearly indicate their purpose. leaving out the _params_ list will make things a bit confusing - some parameters are available for the scripts, others are not... it's not really clear.

bq. manner of configuration was as similar as possible

The configuration are similar. All elements in solrconfig.xml have one standard way of configuration which can be anything from a _lst_, _bool_, _str_, etc.... Tomorrow a new processor will popup which will also require a _lst_ configuration... and that's fine. 

bq.Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch.

Well, indeed that's a bigger change. Like everything, this kind of configuration has it's pros&cons.

I guess it's best if people will just state their preferences regarding how they would like to see this processor configured and based on that I'll adjust the patch.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802014#action_12802014 ] 

Koji Sekiguchi commented on SOLR-1725:
--------------------------------------

I like the idea, Uri. I've not looked into the patch yet, it depends on Java 6? I think Solr support Java 5. There is ScriptTransformer in DIH which uses javax.script but it looks like ScriptEngineManager is loaded at runtime.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804725#action_12804725 ] 

Lance Norskog commented on SOLR-1725:
-------------------------------------

This is growing into the DataImportHandler :) 
Is it possible to just create a super-simple configuration language that turns into the DIH language? That way you get the benefit of this simplified format without having to support another execution engine. 

<updateRequestProcessorChain name="xmlwithpdf"></bq> 
<processor class="solr.FileReaderProcessorFactory"> 
<str name="filenameField">filename</str> 
<str name="outputField">binarydata</str>


> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-1725:
-----------------------------

    Comment: was deleted

(was: 
Why are we limiting this to javascript . Why can't somebody use groovy/Scala/JRuby or whatever is supported by the the java6 scripting engine?

)

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802001#action_12802001 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Thanks for the reference

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801994#action_12801994 ] 

Uri Boness edited comment on SOLR-1725 at 1/18/10 11:52 PM:
------------------------------------------------------------

Right... ok... I'll add another class to this list (I just don't understand why would you want to limit the types that can be *Aware - in a way it defeats the whole idea of the *Aware abstraction).

      was (Author: uboness):
    Right... ok... I'll add another class to this list (I just don't understand why would you want to limit the types that can be *Aware)
  
> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uri Boness updated SOLR-1725:
-----------------------------

    Attachment: SOLR-1725.patch

Initial implementation. Includes a simple test (probably more tests are required). Builds a script engine per script file - each file has its own scope.

This patch also introduces a new Interface - {{SolrResourceLoaderAware}} which can be used by any plugin loaded by SolrCore. (Any plugin implementing this interface will be injected by the resource loader of the SolrCore). The ScriptUpdateRequestProcessorFactory uses the resource loader to load the scripts from solr home conf directory.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804094#action_12804094 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

bq. Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble.

Absolutely! The reason why it is currently lexicographically ordered is due to an initial (different) implementation that i had. I'll change it and add a patch for it.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802607#action_12802607 ] 

Mark Miller commented on SOLR-1725:
-----------------------------------

We might want to think about making this a contrib?

How do people feel about putting in core Solr code/functionality that requires Java 6?

So far you have been able to run all of Solr with just Java 5. Do we want to go down the road of some non contrib features only working with something higher than 5?

Should that only be allowed as a contrib?

We certainly want the functionality I think, but as far as I can tell this would break new ground in terms of Solr's jvm version requirements (noting that Uri has made it so that everything still builds and runs with 5, you just can't use this functionality unless you are on 6) 

Any opinions?

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805672#action_12805672 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Been looking more into it and I think there's a nice way in which we can cache the evaluated scripts. But... (and there's always a but) to make it work cleanly we need to be able to extend the scripting support, which means we need to be able to compile the code in Java 6.

And this brings us back to Mark's comment above on how do we want to do that.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802496#action_12802496 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

The DIH ScriptTransformer can really be cleaned up using this patch as well. I didn't add it to this patch as I didn't know whether it was a good idea to put too much into one patch. 

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805691#action_12805691 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Well then... I just hope others will not shed tears as well and we can make Solr 1.5 Java 6 compiled :-)

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805532#action_12805532 ] 

Yonik Seeley commented on SOLR-1725:
------------------------------------

Cool feature!

Performance:
 - It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them.

Interface:
- Should we have a way to specify a script in-line (in solrconfig.xml)?
- Or even cooler... allow passing of scripts as parameters in the update request! Think about the power of pointing Solr to a CSV file and also providing document transformers & field manipulators on the fly!
- This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue.
- Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uri Boness updated SOLR-1725:
-----------------------------

    Attachment: SOLR-1725.patch

A new patch which incorporates some of the ideas suggested above, in particular:

1. The order of script execution is based on the order in which they are configured in solrconfig.xml
2. It is now possible to configure global scope variables which will be available for scripts. for example:
{code}
<processor class="solr.ScriptUpdateProcessorFactory">
    <str name="scripts">updateProcessor.js</str>
    <lst name="params">
        <bool name="boolValue">true</bool>
        <int name="intValue">3</int>
    </lst>
</processor>
{code}

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804114#action_12804114 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

bq. oh and one other thing - I really like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea!
Thanks :-)

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804106#action_12804106 ] 

Chris A. Mattmann commented on SOLR-1725:
-----------------------------------------

oh and one other thing -- I _really_ like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea!

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801998#action_12801998 ] 

Mark Miller edited comment on SOLR-1725 at 1/19/10 12:09 AM:
-------------------------------------------------------------

I believe the history was, on development of *aware - hey, here is a patch that restricts this to certain classes if we want? Anyone object? - and no one did.

*edit*

Actually there was some further discussion - 

https://issues.apache.org/jira/browse/SOLR-414

      was (Author: markrmiller@gmail.com):
    I believe the history was, on development of *aware - hey, here is a patch that restricts this to certain classes if we want? Anyone object? - and no one did.
  
> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804105#action_12804105 ] 

Chris A. Mattmann commented on SOLR-1725:
-----------------------------------------

Is there any reason for SOLR _not_ to move to JDK 1.6?

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801994#action_12801994 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

Right... ok... I'll add another class to this list (I just don't understand why would you want to limit the types that can be *Aware)

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804097#action_12804097 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

bq. How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones?

Another great idea... I'll put it in the same patch. 

{quote}
To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course.
{quote}

Well, once you'll have the parameters support than you'll be able to do it even if the processor supports multiple scripts.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uri Boness updated SOLR-1725:
-----------------------------

    Attachment: SOLR-1725.patch

This new patch cleans things up a bit. The main change here is the addition of the script engine abstraction. Supporting JDK 6 script engine at the moment is a bit tricky as Solr is built on JDK 5. So this support must be done using reflection at runtime which makes the code a bit of a mess. The abstraction helps to keep the code clean.

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

Posted by "Uri Boness (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804092#action_12804092 ] 

Uri Boness commented on SOLR-1725:
----------------------------------

If we move this to the contrib. we'll need to extract the script engine abstraction of a separate contrib. utils library (so the DIH will be able to utilize it). I believe this can create a bit of a mess just for this small (though useful) functionality. Is there are real reason for not keeping it in the core? 

I think either way, if people will want to use it they'll need to read somewhere how... I think it'd be nice to save them the extra effort of putting an extra jar file in the lib directory - the configuration (writing the script and configuring the update processors) they'll need to adjust anyway. The only thing that we must stress in the documentation (both in the schema and in the wiki) is that they can only use this feature in Java 6.

Two additional things to note:
1. JDK 5 has reached the end of service life (EOSL) already and is not actively supported by Sun (/Oracle).
2. The general recommendation is to run Solr on Java 6 anyways (due to some threading issues in Java 5).

> Script based UpdateRequestProcessorFactory
> ------------------------------------------
>
>                 Key: SOLR-1725
>                 URL: https://issues.apache.org/jira/browse/SOLR-1725
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.4
>            Reporter: Uri Boness
>         Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.
> Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.