You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Nicolas Dessaigne (JIRA)" <ji...@apache.org> on 2008/04/14 18:13:04 UTC

[jira] Created: (SOLR-538) CopyField maxLength property

CopyField maxLength property
----------------------------

                 Key: SOLR-538
                 URL: https://issues.apache.org/jira/browse/SOLR-538
             Project: Solr
          Issue Type: Improvement
          Components: update
            Reporter: Nicolas Dessaigne


As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.

This is particularly useful to avoid very slow highlighting when the index contains big documents.

Example :
<copyField source="text" dest="highlight" maxLength="30000" />

This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-538) CopyField maxLength property

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645967#action_12645967 ] 

koji edited comment on SOLR-538 at 11/7/08 11:51 PM:
---------------------------------------------------------------

Thank you for the updated patch. I have some comments.

- I think this breaks back-compat:

{code}
Index: src/java/org/apache/solr/schema/IndexSchema.java
===================================================================
--- src/java/org/apache/solr/schema/IndexSchema.java	(revision 712067)
+++ src/java/org/apache/solr/schema/IndexSchema.java	(working copy)
  :
@@ -654,12 +664,12 @@
    * 
    * @see SolrCoreAware
    */
-  public void registerCopyField( String source, String dest )
+  public void registerCopyField( String source, String dest, Integer maxLength )
{code}

and the type of maxLength is int (not Integer) right?

- Can you use 2-space indentation instead of tab indentation?
- Can you change "@since 1.3" javadoc comments to "@since 1.4"?
- The new attribute "maxLength" may make users confused (number of chars or tokens?). How about using "maxChars" or something?



      was (Author: koji):
    Thank you for the updated patch. I have some comments.

# I think this breaks back-compat:

{code}
Index: src/java/org/apache/solr/schema/IndexSchema.java
===================================================================
--- src/java/org/apache/solr/schema/IndexSchema.java	(revision 712067)
+++ src/java/org/apache/solr/schema/IndexSchema.java	(working copy)
  :
@@ -654,12 +664,12 @@
    * 
    * @see SolrCoreAware
    */
-  public void registerCopyField( String source, String dest )
+  public void registerCopyField( String source, String dest, Integer maxLength )
{code}

and the type of maxLength is int (not Integer) right?

# Can you use 2-space indentation instead of tab indentation?
# Can you change "@since 1.3" javadoc comments to "@since 1.4"?
# The new attribute "maxLength" may make users confused (number of chars or tokens?). How about using "maxChars" or something?


  
> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-538:
-------------------------------

    Attachment: SOLR-538.patch

Attaching new patch which fixes tabs.

bq. Does the assertTrue message really accurately describe the exception? 

That's actually not the assertTrue message, that's asserting that the exception message contains that string.

bq. The previous/default copy-everything behaviour, right? Maybe I missed it, but I didn't see that in any javadocs or comments or the example schema.xml.

I've added a note to the schema.xml. It is also mentioned in the javadocs for the CopyField constructor.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-538) CopyField maxLength property

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi resolved SOLR-538.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

Committed revision 721758. Thanks everyone!

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-538:
-------------------------------

    Attachment: SOLR-538.patch

Attaching new patch which
 * adds method for backwards compatibility
 * replaces tabs with 2 spaces
 * fixes javadoc
 * renames maxLength to maxChars
as per Koji's suggestions.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Chris Harris (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Harris updated SOLR-538:
------------------------------

    Attachment: SOLR-538.patch

Syncing w/ trunk r712067

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-538:
-------------------------------

    Attachment: SOLR-538.patch

Syncing patch with trunk.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-538) CopyField maxLength property

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi reassigned SOLR-538:
-----------------------------------

    Assignee: Koji Sekiguchi

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Georgios Stamatis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Georgios Stamatis updated SOLR-538:
-----------------------------------

    Attachment: CopyFieldMaxLength.patch

Updated for latest trunk (tested on revision 669140)

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651238#action_12651238 ] 

Shalin Shekhar Mangar commented on SOLR-538:
--------------------------------------------

Looking at the watchers, voters and the comments, a lot of people seem to be interested in this issue. Is there any objection on committing it to trunk?

Koji, since you have used the patch yourself, would you like to take this up?

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-538:
-------------------------------

    Attachment: SOLR-538.patch

Attaching new patch which applies to current trunk and changes some minor things wrt error handling and testing.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628054#action_12628054 ] 

Lars Kotthoff commented on SOLR-538:
------------------------------------

Interesting idea, but this should probably be a separate issue. It would require more significant changes, for example the update handler should probably warn when the value for a field is truncated etc.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Lars Kotthoff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Kotthoff updated SOLR-538:
-------------------------------

    Attachment: SOLR-538.patch

bq. When I apply this patch, the last line of test-files/solr/conf/schema-copyfield-test.xml seems to get cut off.

Sorry, my bad. That's what you get for modifying diffs manually. I'm attaching a new version which should fix this problem.

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645967#action_12645967 ] 

Koji Sekiguchi commented on SOLR-538:
-------------------------------------

Thank you for the updated patch. I have some comments.

# I think this breaks back-compat:

{code}
Index: src/java/org/apache/solr/schema/IndexSchema.java
===================================================================
--- src/java/org/apache/solr/schema/IndexSchema.java	(revision 712067)
+++ src/java/org/apache/solr/schema/IndexSchema.java	(working copy)
  :
@@ -654,12 +664,12 @@
    * 
    * @see SolrCoreAware
    */
-  public void registerCopyField( String source, String dest )
+  public void registerCopyField( String source, String dest, Integer maxLength )
{code}

and the type of maxLength is int (not Integer) right?

# Can you use 2-space indentation instead of tab indentation?
# Can you change "@since 1.3" javadoc comments to "@since 1.4"?
# The new attribute "maxLength" may make users confused (number of chars or tokens?). How about using "maxChars" or something?



> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Georgios Stamatis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Georgios Stamatis updated SOLR-538:
-----------------------------------

    Attachment: CopyFieldMaxLength.patch

Should be applied on svn trunk (tested on revision 647855)

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>         Attachments: CopyFieldMaxLength.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Chris Harris (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Harris updated SOLR-538:
------------------------------

    Attachment: SOLR-538.patch

Same as prev SOLR-538.patch (2008-11-21 02:21 PM), except with some unneeded carriage return characters removed. (This may be overly cautious, but I don't want those to cause problems for anyone.)

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Chris Harris (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Harris updated SOLR-538:
------------------------------

    Attachment: SOLR-538.patch

Small change to bring Lars' 2008-11-08 version of SOLR-538.patch in sync with trunk r719187

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Chris Harris (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627558#action_12627558 ] 

Chris Harris commented on SOLR-538:
-----------------------------------

When I apply this patch, the last line of test-files/solr/conf/schema-copyfield-test.xml seems to get cut off. That is, rather than

[...]
 <!-- Similarity is the scoring routine for each document vs a query.
      A custom similarity may be specified here, but the default is fine
      for most applications.
 -->
 <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> -->
{{blank line}}
</schema>
{{EOF}}

I get

[...]
 <!-- Similarity is the scoring routine for each document vs a query.
      A custom similarity may be specified here, but the default is fine
      for most applications.
 -->
 <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> -->
{{blank line}}
{{EOF}}

This makes the XML invalid, and makes "ant test" fail.

I thought I was just being inept with Windows/TortoiseSVN, but now I've had the same thing happen when applying the patch with the patch command on the OS X command line. This makes me suspicious that there might be something wrong with the patch -- though I can't find anything wrong by looking at it manually. Any thoughts?


> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622879#action_12622879 ] 

Otis Gospodnetic commented on SOLR-538:
---------------------------------------

Lars, do you think you could change those tabs into 2 spaces to follow the code style convention?

In CopyFieldTest I see things like:

+              try {
+                       new CopyField(null,
+                                       new SchemaField("destination", new TextField()), 1000);
+                       fail("CopyField failed with null SchemaField argument.");
+               } catch (IllegalArgumentException e) {
+                       assertTrue(e.getLocalizedMessage().contains("can't be NULL"));

Does the assertTrue message really accurately describe the exception?  What is null in the above case?  What happens when maxLength="0"?  The previous/default copy-everything behaviour, right?  Maybe I missed it, but I didn't see that in any javadocs or comments or the example schema.xml.


> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-538:
----------------------------------

    Priority: Minor  (was: Major)

This no longer applies.  Could you bring it up to date?  I also noticed new code that introduces new coding style (e.g. tab indentation instead of 2-space indentation) - could you please make it look like the existing Solr code style?

$ patch -p1 -i CopyFieldMaxLength.patch --dry-run
patching file src/java/org/apache/solr/schema/CopyField.java
patching file src/java/org/apache/solr/schema/IndexSchema.java
Hunk #1 succeeded at 601 with fuzz 2 (offset 44 lines).
Hunk #2 FAILED at 615.
Hunk #3 FAILED at 636.
Hunk #4 FAILED at 650.
Hunk #5 succeeded at 859 (offset 67 lines).
Hunk #6 succeeded at 884 (offset 67 lines).
Hunk #7 succeeded at 1065 (offset 67 lines).
Hunk #8 FAILED at 1086.
Hunk #9 succeeded at 1099 (offset 67 lines).
Hunk #10 succeeded at 1112 (offset 67 lines).
4 out of 10 hunks FAILED -- saving rejects to file src/java/org/apache/solr/schema/IndexSchema.java.rej
patching file src/java/org/apache/solr/update/DocumentBuilder.java
Hunk #2 succeeded at 121 (offset 29 lines).
Hunk #3 succeeded at 129 (offset 29 lines).
Hunk #4 succeeded at 219 (offset 29 lines).
Hunk #5 succeeded at 247 (offset 29 lines).
Hunk #6 succeeded at 259 (offset 29 lines).
patching file src/test/org/apache/solr/schema/CopyFieldTest.java
patching file src/test/test-files/solr/conf/schema-copyfield-test.xml


> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-538) CopyField maxLength property

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi updated SOLR-538:
--------------------------------

    Attachment: SOLR-538-for-1.3.patch

Attached a patch for branch-1.3. I have a performance issue on a highlight field and I'll see if this helps for my project in next week (I know that hl.maxAnalyzedChars helps, but I'd like to reduce index size with SOLR-538).

> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538-for-1.3.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-538) CopyField maxLength property

Posted by "Chris Harris (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628045#action_12628045 ] 

Chris Harris commented on SOLR-538:
-----------------------------------

Thanks, Lars; that was fast. I think this patch is going to be handy.

I'm wondering what people thought about an alternative approach to keeping stored fields from being too large, which would require mucking around with Lucene. In particular, the idea would be to allow field definitions like this:

<field name="body" type="text" indexed="true" stored="true"
     omitNorms="false" compressed="true"
     maxFieldLength="2000" storeOnlyAnalyzedText="true"
/>

Here we've made the normal Lucene maxFieldLength (i.e. # tokens to analyze) configurable a field-by-field basis. And in this declaration we've also made it so that what is stored is a function of what is analyzed. (Here if the first 2,000 tokens correspond to the first, say, 8,000 characters, then those 8,000 characters are what's going to be actually stored in the stored field.) This seems a little more natural than lopping off the text after a fixed number of characters.

If I could do the above, I'm thinking I would use that single field for both searching and highlighting. But if you wanted a separate field for highlighting (and were willing to have things run slower than with the current patch), then you could do this:

<field name="body" type="text" indexed="true" stored="false" omitNorms="false" />
<field name="highlighting" type="text" indexed="false" stored="true"
     compressed="true" maxFieldLength="2000" storeOnlyAnalyzedText="true" />
<copyField src="body" dest="highlighting" />


> CopyField maxLength property
> ----------------------------
>
>                 Key: SOLR-538
>                 URL: https://issues.apache.org/jira/browse/SOLR-538
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Nicolas Dessaigne
>            Priority: Minor
>         Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch
>
>
> As discussed shortly on the mailing list (http://www.mail-archive.com/solr-user@lucene.apache.org/msg09807.html), the objective of this task is to add a maxLength property to the CopyField "command". This property simply limits the number of characters that are copied.
> This is particularly useful to avoid very slow highlighting when the index contains big documents.
> Example :
> <copyField source="text" dest="highlight" maxLength="30000" />
> This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.