You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Matt Weber (JIRA)" <ji...@apache.org> on 2009/05/20 17:50:46 UTC

[jira] Created: (SOLR-1177) Distributed TermsComponent

Distributed TermsComponent
--------------------------

                 Key: SOLR-1177
                 URL: https://issues.apache.org/jira/browse/SOLR-1177
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 1.4
            Reporter: Matt Weber
            Priority: Minor
             Fix For: 1.5


TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Matthew Woytowitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Woytowitz updated SOLR-1177:
------------------------------------

    Attachment: TermsComponent.java
                TermsComponent.patch

I got the previous patch working.  It was we close.  I attached the java file and a patch for just the TermsComponent

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789790#action_12789790 ] 

Shalin Shekhar Mangar commented on SOLR-1177:
---------------------------------------------

Thanks Matt. Can you please attach the relevant portions to SOLR-1139. We can commit SOLR-1139 first and then resolve this one.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar reassigned SOLR-1177:
-------------------------------------------

    Assignee: Shalin Shekhar Mangar  (was: Grant Ingersoll)

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Weber updated SOLR-1177:
-----------------------------

    Attachment: SOLR-1177.patch

I have cleaned up the patch, tested it against the latest version of trunk and wrote some unit tests.  This patch invalidates SOLR-1156 because it also includes sorting support.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1177:
----------------------------------------

    Attachment: SOLR-1177.patch

{code}
if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax) {
  fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; 
}
{code}

I changed freqmin and freqmax to long and used Yonik's method to write int if possible or else switch to longs in the response.

I'll commit this shortly.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Weber updated SOLR-1177:
-----------------------------

    Attachment: SOLR-1177.patch

Here is an updated patch that includes Shalin's suggestions:

- replace TermData with TermsResponse.Term
- updates TermsHelper to use the parsing code from TermsResponse

I also changed TermsResponse.Term#frequency to a long so that we don't overflow when calculating the frequency.  Then to keep back-compatbility with existing code I do the following when writing it to the NamedList:

if (tc.getFrequency() >= freqmin && tc.getFrequency() <= freqmax) {
    fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue());
    cnt++;
}

Is this a good approach?


This new patch includes SOLR-1139.



> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789336#action_12789336 ] 

Matt Weber commented on SOLR-1177:
----------------------------------

Thanks for the update Yonik!  I will see if I can get this and SOLR-1139 using the same classes.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-1177) Distributed TermsComponent

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-1177:
-------------------------------------

    Assignee: Grant Ingersoll

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789795#action_12789795 ] 

Shalin Shekhar Mangar commented on SOLR-1177:
---------------------------------------------

bq. The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them? 

Yes. I'll commit SOLR-1139 first so remove those classes from the current patch.

PS: I'm sorry if I am confusing you. It is 3AM here and I'm a little confused myself :)

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-1177.
-----------------------------------------

    Resolution: Fixed

Committed revision 890199.

Thanks Matt!

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Weber updated SOLR-1177:
-----------------------------

    Attachment: SOLR-1177.patch

New patch that DOES NOT include the code for SOLR-1139.  Make sure you have SOLR-1139 applied before using this patch.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-1177:
----------------------------------------

    Attachment: SOLR-1177.patch

# Based on Matt's patch
# Synced to trunk
# Uses BaseDistributedTestCase

All tests pass.

I had to change TermData#frequency to an int to match the output of distributed and non-distributed cases. It is theoretically possible to have the sum of frequencies from all shards to exceed size of an int but I don't think it is practical right now. The problem is that we represent frequency as int everywhere for non-distributed responses. If we want longs in distributed search responses then we must start using longs in non-distributed responses as well to maintain compatibility.

Matt -- There is an issue open for adding SolrJ support for TermsComponent - SOLR-1139. Is it possible to replace the TermsHelper and TermData classes by classes in SOLR-1139? I'd like to have the same classes parsing responses in Solrj and distributed search.

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Weber updated SOLR-1177:
-----------------------------

    Attachment: SOLR-1177.patch

Here is my first attempt at a patch that is not currently working.  For some reason only the prepare and process methods are being called.  It seems that the shards parameter is not being honored like it is in the other distributed components because rb.shards is always null.  I have looked at the other distributed components and did not notice them doing anything special with the shards parameter.   I have based this code on the information from http://wiki.apache.org/solr/WritingDistributedSearchComponents and looking though the FacetComponent, DebugComponent, StatsComponent, and HighlightComponent code.  Any help figuring out why the other methods are not being called is greatly appreciated.  Please ignore the println statments, they are for debug only and will be removed in the finalized, working patch.

Thanks!

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1177) Distributed TermsComponent

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789334#action_12789334 ] 

Yonik Seeley commented on SOLR-1177:
------------------------------------

The facet component internally uses long to add up distributed facet counts, and then uses this code:

{code}
// use <int> tags for smaller facet counts (better back compatibility)
  private Number num(long val) {
   if (val < Integer.MAX_VALUE) return (int)val;
   else return val;
  }
{code}

Yes, it's not ideal to switch from <int> to <long> in a running application, but I think it's better than failing or overflowing the int.
Client code in SolrJ should be written to handle either via ((Number)x).longValue()

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1177) Distributed TermsComponent

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789794#action_12789794 ] 

Matt Weber commented on SOLR-1177:
----------------------------------

The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket.  Should I separate them?

> Distributed TermsComponent
> --------------------------
>
>                 Key: SOLR-1177
>                 URL: https://issues.apache.org/jira/browse/SOLR-1177
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Matt Weber
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch
>
>
> TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.