You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Jason Cater (JIRA)" <ji...@apache.org> on 2007/04/26 21:16:15 UTC

[jira] Created: (SOLR-216) Improvements to solr.py

Improvements to solr.py
-----------------------

                 Key: SOLR-216
                 URL: https://issues.apache.org/jira/browse/SOLR-216
             Project: Solr
          Issue Type: Improvement
          Components: clients - python
    Affects Versions: 1.2
            Reporter: Jason Cater
            Priority: Trivial


I've taken the original solr.py code and extended it to include higher-level functions.

  * Requires python 2.3+
  * Supports SSL (https://) schema
  * Conforms (mostly) to PEP 8 -- the Python Style Guide
  * Provides a high-level results object with implicit data type conversion
  * Supports batching of update commands


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Dorneles Tremea (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638803#action_12638803 ] 

Dorneles Tremea commented on SOLR-216:
--------------------------------------

Looks like the above issues are now being addressed at http://code.google.com/p/solrpy/issues and so this ticket can be closed...

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Walter Underwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499923 ] 

Walter Underwood commented on SOLR-216:
---------------------------------------

GET is the right semantic for a query, since it doesn't change the resource. It also allows HTTP caching.

If Solr has URL length limits, that's a bug.


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (SOLR-216) Improvements to solr.py

Posted by Mike Klaas <mi...@gmail.com>.
Thanks Yonik!  I'll take a look at cleaning this up and committing it  
when I return from vacation.

-Mike

On 14-Aug-07, at 7:55 AM, Yonik Seeley (JIRA) wrote:

>
>      [ https://issues.apache.org/jira/browse/SOLR-216? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Yonik Seeley updated SOLR-216:
> ------------------------------
>
>     Attachment: solr.py
>
> Uploading a  slightly patched version that works with the new  
> update response format, and fixes some exception related issues  
> (http reason is now grabbed, exception trace prints httpcode and  
> reason)
>
>> Improvements to solr.py
>> -----------------------
>>
>>                 Key: SOLR-216
>>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>>             Project: Solr
>>          Issue Type: Improvement
>>          Components: clients - python
>>    Affects Versions: 1.2
>>            Reporter: Jason Cater
>>            Assignee: Mike Klaas
>>            Priority: Trivial
>>         Attachments: solr.py, solr.py
>>
>>
>> I've taken the original solr.py code and extended it to include  
>> higher-level functions.
>>   * Requires python 2.3+
>>   * Supports SSL (https://) schema
>>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>>   * Provides a high-level results object with implicit data type  
>> conversion
>>   * Supports batching of update commands
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-216:
------------------------------

    Attachment: solr.py

Uploading a  slightly patched version that works with the new update response format, and fixes some exception related issues (http reason is now grabbed, exception trace prints httpcode and reason)

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py, solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dariusz Suchojad updated SOLR-216:
----------------------------------

    Attachment: solr-solrpy-r5.patch

Patch against r5 of http://code.google.com/p/solrpy/ to make it work with Solr 1.2

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629976#action_12629976 ] 

Dariusz Suchojad commented on SOLR-216:
---------------------------------------

Hi Mike,

I've joined the solrpy (http://code.google.com/p/solrpy/) project where I'd
like to incorporate the changes I had made and to work on adding more
features to the Python client. I hope to get back to the discussion when,
like you said, it becomes more stable and popular.

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622391#action_12622391 ] 

Mike Klaas commented on SOLR-216:
---------------------------------

Hi Dariusz,

There will almost certainly be no more releases of Solr 1.2.  1.3 will likely be released in less than a month.  However, it is good that you published this code so that it can be found by other parties.

I'd be much more interested in working toward a client that is compatible with the upcoming 1.3 release (it is unlikely that it can be included, but it can be distributed separately).

cheers,
-Mike

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Jason Cater (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Cater updated SOLR-216:
-----------------------------

    Attachment: solr.py

Updated solr.py file

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519688 ] 

Yonik Seeley commented on SOLR-216:
-----------------------------------

I quickly tried the example in comments, and ran into some issues:

The examples should be made to work with the example Solr install... so instead of
   >>> c = SolrConnection('http://localhost:8983')
   >>> c.add(id='500', name='python test doc', active=True)

It should be
   >>> c = SolrConnection('http://localhost:8983/solr')    #need to specify full path to solr
   >>> c.add(id='500', name='python test doc')                #active field doesn't exist

After that, a document is added, but I get an exception on the python side.
I guess it's probably related to response parsing?  It should probably be updated to check the HTTP response and not parse the response.


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493835 ] 

Brian Whitman commented on SOLR-216:
------------------------------------

Hi Jason, this is really great. I had one small issue -- highlighting did not seem to work. I looked into your code and found you were using hi.fl and hi, not hl.fl and hl. Not sure if your solr expects hi, but mine expects hl. Once I changed line 453 & 457 to hl instead of hi it works fine. 


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622470#action_12622470 ] 

Dariusz Suchojad commented on SOLR-216:
---------------------------------------

I just checked solr.py with 1.3 (r685786) and, except for the one mentioned
above, all tests pass correctly. However, I would like to add some more tests
before merging features from other places, replacing xml.sax with etree/lxml
or use some other Python 2.5 niceties. One thing I still don't know what
to do about though is batch updating, I can't get it working right now,
and I'm still not sure what it is good for, hence it's hard for me to come up
with some good test cases here before trying to fix it :-)

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-216) Improvements to solr.py

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved SOLR-216.
-----------------------------------

    Resolution: Won't Fix

Closing per comment

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622390#action_12622390 ] 

Dariusz Suchojad commented on SOLR-216:
---------------------------------------

Hello,

there are a few Python clients available, which is quite confusing, so I took 
the one with the richest API available (http://code.google.com/p/solrpy/, rev5)
and created a set of test cases which you may found in attached file test_all.py.
The tests are meant to be run with Solr 1.2 and only after applying attached
solr-solrpy-r5.patch, which fixes a couple of issues while still trying not to
break the already exposed API. Attached is also a solr.py module with the
patch applied. One of the tests is failing deliberately, I'm simply not sure 
whether doing batch updates is still feasible with Solr 1.2.

What I'd like to propose now is to integrate all the code available - solrpy
and pysolr from code.google.com, the crude & tiny one from Solr SVN
(http://svn.apache.org/viewvc/lucene/solr/trunk/client/python/), various patches
from JIRA - replace the XML parser used currently with elementtree or, optionally, 
lxml (which is *the* toolkit for any high-performance XML processing with Python),
add more tests and have one officially blessed rich client for Python.
I'm willing to do all the dirty work if there's a consensus among people using
Solr with Python that doing so is a good idea.

What do you think?

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Jason Cater (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499915 ] 

Jason Cater commented on SOLR-216:
----------------------------------

In regard to Yonik Seeley's point of GET vs POST for queries, is there any need for us to keep everything as POSTs? It is fairly trivial to change my code to use GETs for queries.  

(Mike Klaas: apologies for replying to the list earlier and not using JIRA ... I'm still learning.)



> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Klaas reassigned SOLR-216:
-------------------------------

    Assignee: Mike Klaas

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on SOLR-216 started by Mike Klaas.

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dariusz Suchojad updated SOLR-216:
----------------------------------

    Attachment: test_all.py

Tests for solr.py

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495393 ] 

Brian Whitman commented on SOLR-216:
------------------------------------

I'm noticing that you are looking for 

        if not data.startswith('<result status="0"'):

When detecting if /update didn't like something. With the latest solr trunk, that doesn't come back anymore. It looks like:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">40</int></lst>
</response>

now. I'm not too used to python to fix this properly at the moment.


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Rob Young (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Young updated SOLR-216:
---------------------------

    Attachment: solr.py

Added support for boosted documents and fields. Added Document and Field classes.
Examples:
connection.add(id="mydoc", auther=["you", Field("me", boost=2.0), Field("dupree", boost=0.1)])
document = Document(boost=1.5)
document.add(id="mydoc", auther="me")
connection.add(document)

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py, solr.py, solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629981#action_12629981 ] 

Mike Klaas commented on SOLR-216:
---------------------------------

That's great!  Be sure to update http://wiki.apache.org/solr/SolPython as the project progresses.



> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: Commented: (SOLR-216) Improvements to solr.py

Posted by Mike Klaas <mi...@gmail.com>.
On 30-May-07, at 12:43 PM, Erik Hatcher wrote:

>
> On May 29, 2007, at 4:38 PM, Mike Klaas wrote:
>>> I agree. I was not aware of field boosts at the time. I'll code  
>>> this change.
>>
>> Unfortunately, it is still somewhat awkward.  In my python client  
>> I end up passing (<name>, <value>, <field boost or None>)  
>> everywhere, but that clutters up the api considerably.
>>
>> It might be worth taking a look at the ruby client to see what  
>> Eric's done for the api.
>
> In the ruby client, we have the ability to use a Ruby Hash as a  
> document when no boosts are needed:
>
>    doc = {:field => "value"}
>
> But also use the Field object when boosts are desired:
>
>    doc = Solr::Document.new
>    doc << Solr::Field.new(:field => "value", :boost => 3.0)
>
> The boost stuff was contributed by Coda Hale (credit where it's  
> due :) - I had punted on that initially.

I think I prefer the approach that I suggested for the python  
client.  The ruby approach makes sense if you want to define a proper  
Document class with Fields and such.  99% of the time a map is  
sufficient, and tuple fieldname keys allow maps to be slightly  
extended to support boost functionality.

-Mike

Re: Commented: (SOLR-216) Improvements to solr.py

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 29, 2007, at 4:38 PM, Mike Klaas wrote:
>> I agree. I was not aware of field boosts at the time. I'll code  
>> this change.
>
> Unfortunately, it is still somewhat awkward.  In my python client I  
> end up passing (<name>, <value>, <field boost or None>) everywhere,  
> but that clutters up the api considerably.
>
> It might be worth taking a look at the ruby client to see what  
> Eric's done for the api.

In the ruby client, we have the ability to use a Ruby Hash as a  
document when no boosts are needed:

    doc = {:field => "value"}

But also use the Field object when boosts are desired:

    doc = Solr::Document.new
    doc << Solr::Field.new(:field => "value", :boost => 3.0)

The boost stuff was contributed by Coda Hale (credit where it's  
due :) - I had punted on that initially.

	Erik



Re: Commented: (SOLR-216) Improvements to solr.py

Posted by Mike Klaas <mi...@gmail.com>.
[reposting to solr-dev as JIRA destroyed my quoting...]

On 29-May-07, at 12:41 PM, Jason Cater wrote:
> I've had my solr.py in production use internally for about a month  
> now. So, as you can imagine, I've worked through a few oddball bugs  
> that occasionally pop up.  It seems pretty stable for me.

Yes, I agree that it is looking good.  Since we would be replacing  
the existing implementation completely, I think that it is worth  
taking extra care and examining the api choices carefully so we won't  
have to replace it or deprecate things in the near future.

> I would prefer to have a complete directory structure (i.e.,  
> setup.py, unit tests, samples, etc) instead of just the solr.py  
> file.  Would anyone see a problem with this?

+1.  This would be great--a unittest that could be run against the  
solr example would be spectacular!

> Also, on some of your comments:
>
>>  - list comprehensions solely to perform looped execution are  
>> harder to parse and slower than explicitly writing a for loop
>>
>
> List comprehensions seem to be a matter of contention for some.   
> However, it's a battle I'm not interested in fighting, so changed  
> it to a for loop.

It is not a matter of contention for me for use in creating a list,  
but ISTM less clear and less efficient if the purpose is _solely_ to  
perform a loop:

$ python -m timeit '[i+i for i in xrange(10000)]'
100 loops, best of 3: 1.95 msec per loop

$ python -m timeit 'for i in xrange(10000): i+i'
100 loops, best of 3: 1.38 msec per loop

>>  - shadowing builtins is generally a bad idea
> Any shadowing of builtins was unintentional.  Did you see specific  
> examples?  I run the code through pychecker and pylint to try to  
> avoid such cases.

`id` is shadowed in a few places.
>>

>>  - all NamedList's appearing in the output are converted to dicts-- 
>> this loses information (in particular, it will be unnecessarily  
>> hard for the user to use highlighting/debug data).  Using the  
>> python/json response format would prevent this.  Not returning  
>> highlight/debug data in the standard response format (and yet  
>> providing said parameters in the query() method) seems odd.  Am I  
>> missing something?  Oh, they are set as dynamic attributes of  
>> Response, I see.  Definitely needs documentation.
>>
>
> Yes, this needs to be documented.  (Please c.f. to my question  
> about allowing a complete directory structure.)
>
>>  - passing fields='' to query() will return all fields, when the  
>> desired return is likely no fields
>>
>
> I've changed the default for fields= to be '*', instead of None or  
> "".  This way, passing in 'fields=""' will result in 'fl=' being  
> passed to the backend.  However, I still don't see the point, as  
> passing both 'fl=' and 'fl=*' return the exact same set of fields  
> (i.e., "all") on my test setup.

Hmm, what if you pass fields='', score=True?  Ideally tha would pass  
fl=score to the backend, bypassing all stored fields.

>>  - it might be better to settle on an api that permits doc/field  
>> boosts.  How about using a tuple as the field name in the field dict?
>>
>> conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])
>>
>> doc boosts could be handled by optionally providing the fielddict  
>> as a (<fielddict>, boost) tuple.
>>
>
> I agree. I was not aware of field boosts at the time. I'll code  
> this change.

Unfortunately, it is still somewhat awkward.  In my python client I  
end up passing (<name>, <value>, <field boost or None>) everywhere,  
but that clutters up the api considerably.

It might be worth taking a look at the ruby client to see what Eric's  
done for the api.

>> - for 2.5+, a cool addition might be:
>>
>> if sys.version > 2.5
>>    import contextlib      def batched(solrconn):
>>           solrconn.begin_batch()
>> 	yield solrconn
>> 	solrconn.end_batch()
>>   batched = contextlib.contextmanager(batched)
>>
>> Use as:
>>
>> with batched(solrconn):
>>        solrconn.add(...)
>>        solrconn.add(...)
>>        solrconn.add(...)
>>
>
> Adding...

Unfortunately, it does push the required python version to 2.4.   
Personally, I think that requiring 2.4 is not unreasonable, but I'm  
somewhat of a bleeding edge guy...

-Mike

Re: Commented: (SOLR-216) Improvements to solr.py

Posted by Jason Cater <ja...@ncsmags.com>.
Mike, 

I've had my solr.py in production use internally for about a month now. 
So, as you can imagine, I've worked through a few oddball bugs that 
occasionally pop up.  It seems pretty stable for me.

I'm planning to upload a new file attachment to this issue containing my 
changes, plus fixing the bug reports that were filed against my open 
ticket.  But first, a few quick questions....

I would prefer to have a complete directory structure (i.e., setup.py, 
unit tests, samples, etc) instead of just the solr.py file.  Would 
anyone see a problem with this?

Also, on some of your comments:

>  - list comprehensions solely to perform looped execution are harder to parse and slower than explicitly writing a for loop
>   

List comprehensions seem to be a matter of contention for some.  
However, it's a battle I'm not interested in fighting, so changed it to 
a for loop.

>  - shadowing builtins is generally a bad idea
>   

Any shadowing of builtins was unintentional.  Did you see specific 
examples?  I run the code through pychecker and pylint to try to avoid 
such cases.

>  - SolrConnection is an old-style class, but Response is new-style 
>   

This was a holdover from the old SolrConnection class I copied from. I'm 
fixing this.

> functionality:
>
>  - why are 'status'/'QTime' returned as floats?
>   

This was just a misunderstanding on my part of what QTime was actually 
returning.  Fixing.

>  - all NamedList's appearing in the output are converted to dicts--this loses information (in particular, it will be unnecessarily hard for the user to use highlighting/debug data).  Using the python/json response format would prevent this.  Not returning highlight/debug data in the standard response format (and yet providing said parameters in the query() method) seems odd.  Am I missing something?  Oh, they are set as dynamic attributes of Response, I see.  Definitely needs documentation.
>   

Yes, this needs to be documented.  (Please c.f. to my question about 
allowing a complete directory structure.)

>  - passing fields='' to query() will return all fields, when the desired return is likely no fields
>   

I've changed the default for fields= to be '*', instead of None or "".  
This way, passing in 'fields=""' will result in 'fl=' being passed to 
the backend.  However, I still don't see the point, as passing both 
'fl=' and 'fl=*' return the exact same set of fields (i.e., "all") on my 
test setup.

>  - it might be better to settle on an api that permits doc/field boosts.  How about using a tuple as the field name in the field dict?
>
> conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])
>
> doc boosts could be handled by optionally providing the fielddict as a (<fielddict>, boost) tuple.
>   

I agree. I was not aware of field boosts at the time. I'll code this change.

> - for 2.5+, a cool addition might be:
>
> if sys.version > 2.5
>    import contextlib   
>    def batched(solrconn):
>           solrconn.begin_batch()
> 	yield solrconn
> 	solrconn.end_batch()
>   batched = contextlib.contextmanager(batched)
>
> Use as:
>
> with batched(solrconn):
>        solrconn.add(...)
>        solrconn.add(...)
>        solrconn.add(...)
>   

Adding...

[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499273 ] 

Mike Klaas commented on SOLR-216:
---------------------------------

Thanks for your contribution!  Some comments:

style:

 - list comprehensions solely to perform looped execution are harder to parse and slower than explicitly writing a for loop

 - shadowing builtins is generally a bad idea

 - SolrConnection is an old-style class, but Response is new-style 

functionality:

 - why are 'status'/'QTime' returned as floats?

 - all NamedList's appearing in the output are converted to dicts--this loses information (in particular, it will be unnecessarily hard for the user to use highlighting/debug data).  Using the python/json response format would prevent this.  Not returning highlight/debug data in the standard response format (and yet providing said parameters in the query() method) seems odd.  Am I missing something?  Oh, they are set as dynamic attributes of Response, I see.  Definitely needs documentation.

 - passing fields='' to query() will return all fields, when the desired return is likely no fields

 - it might be better to settle on an api that permits doc/field boosts.  How about using a tuple as the field name in the field dict?

conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])

doc boosts could be handled by optionally providing the fielddict as a (<fielddict>, boost) tuple.

- for 2.5+, a cool addition might be:

if sys.version > 2.5
   import contextlib   
   def batched(solrconn):
          solrconn.begin_batch()
	yield solrconn
	solrconn.end_batch()
  batched = contextlib.contextmanager(batched)

Use as:

with batched(solrconn):
       solrconn.add(...)
       solrconn.add(...)
       solrconn.add(...)


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-216) Improvements to solr.py

Posted by "Dariusz Suchojad (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dariusz Suchojad updated SOLR-216:
----------------------------------

    Attachment: solr.py

solr.py intended to work with Solr 1.2

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py, solr.py, solr.py, solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Ian Holsman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527319 ] 

Ian Holsman commented on SOLR-216:
----------------------------------

Hi.

I had a minor issue with UTF-8 encoded strings (for example שיווק )

my 'fix' for this was to encode the query string ala
        params['q'] = q.encode('utf-8')
in the 2-3 places, and it fixed it.
I'm no python expert, so I'm not sure if this is right the thing to do or not.

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py, solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499907 ] 

Yonik Seeley commented on SOLR-216:
-----------------------------------

FYI, I ran into an issue with httplib in python.
It sends the headers and body separately when doing a POST, and this triggered some pretty bad performance on Linux systems using persistent connections (due to triggering Nagel's alg, I think)... 40 times slower than non-persistent connections.

GET could probably normally be used for queries at least, but POST is still needed for updates.


> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499922 ] 

Mike Klaas commented on SOLR-216:
---------------------------------

I just noticed that I changed my own python query code to POST from GET two months ago, but I can't remember why at the moment.  It is possibly due to url length limitations (occasionally I was passing a lot of data in the query args), but that doesn't seem quite right now.  Changing to GET makes sense to me (though rapid updates are still a potential problem--perhaps it would be worth recommending against persistent connections on linux).

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Ed Summers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525590 ] 

Ed Summers commented on SOLR-216:
---------------------------------

Thanks guys for this really nice update. I was wrestling with the current solr.py in svn and solr v1.2, and then remembered seeing something on the list about an updated solr client for python.

Anyhow, I was running some unicode through add() in the latest solr.py attached to this issue and noticed that the body isn't being encoded before sending off to httplib.HTTPConnection.request() on line 711:

  self.conn.request('POST', url, body, headers)

This can result in a stack trace like this when sending utf8 over the wire:

Traceback (most recent call last):
  File "./solr_index.py", line 25, in <module>
    solr.add_many([solr_doc])
  File "../../../wdl/solr.py", line 587, in add_many
    return self._update(xstr)
  File "../../../wdl/solr.py", line 653, in _update
    request, self.xmlheaders)
  File "../../../wdl/solr.py", line 711, in _post
    self.conn.request('POST', url, body, headers)
  File "/usr/lib/python2.5/httplib.py", line 862, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.5/httplib.py", line 888, in _send_request
    self.send(body)
  File "/usr/lib/python2.5/httplib.py", line 707, in send
    self.sock.sendall(str)
  File "<string>", line 1, in sendall
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0302' in position 85: ordinal not in range(128)

Changing line 711 to:

  self.conn.request('POST', url, body.encode('utf-8'), headers)

makes the problem go away...

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py, solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629556#action_12629556 ] 

Mike Klaas commented on SOLR-216:
---------------------------------

Dariusz, 

Have you thought about publishing the update python client as a thirdparty/standalone package?  If it becomes popular and stable, it could be folded back in to the Solr distro, but for the time being I suspect that it will be difficult to work on in trunk (since there don't seem to be core devs who use it).

cheers,
-Mike

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-216) Improvements to solr.py

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499913 ] 

Mike Klaas commented on SOLR-216:
---------------------------------


On 29-May-07, at 12:41 PM, Jason Cater wrote:
I've had my solr.py in production use internally for about a month now. So, as you can imagine, I've worked through a few oddball bugs that occasionally pop up.  It seems pretty stable for me.

Yes, I agree that it is looking good.  Since we would be replacing the existing implementation completely, I think that it is worth taking extra care and examining the api choices carefully so we won't have to replace it or deprecate things in the near future.

I would prefer to have a complete directory structure (i.e., setup.py, unit tests, samples, etc) instead of just the solr.py file.  Would anyone see a problem with this?

+1.  This would be great--a unittest that could be run against the solr example would be spectacular!

Also, on some of your comments:

 - list comprehensions solely to perform looped execution are harder to parse and slower than explicitly writing a for loop
  

List comprehensions seem to be a matter of contention for some.  However, it's a battle I'm not interested in fighting, so changed it to a for loop.

It is not a matter of contention for me for use in creating a list, but ISTM less clear and less efficient if the purpose is _solely_ to perform a loop:

$ python -m timeit '[i+i for i in xrange(10000)]'
100 loops, best of 3: 1.95 msec per loop

$ python -m timeit 'for i in xrange(10000): i+i'
100 loops, best of 3: 1.38 msec per loop

 - shadowing builtins is generally a bad idea
Any shadowing of builtins was unintentional.  Did you see specific examples?  I run the code through pychecker and pylint to try to avoid such cases.

`id` is shadowed in a few places.  


 - all NamedList's appearing in the output are converted to dicts--this loses information (in particular, it will be unnecessarily hard for the user to use highlighting/debug data).  Using the python/json response format would prevent this.  Not returning highlight/debug data in the standard response format (and yet providing said parameters in the query() method) seems odd.  Am I missing something?  Oh, they are set as dynamic attributes of Response, I see.  Definitely needs documentation.
  

Yes, this needs to be documented.  (Please c.f. to my question about allowing a complete directory structure.)

 - passing fields='' to query() will return all fields, when the desired return is likely no fields
  

I've changed the default for fields= to be '*', instead of None or "".  This way, passing in 'fields=""' will result in 'fl=' being passed to the backend.  However, I still don't see the point, as passing both 'fl=' and 'fl=*' return the exact same set of fields (i.e., "all") on my test setup.

Hmm, what if you pass fields='', score=True?  Ideally tha would pass fl=score to the backend, bypassing all stored fields.

 - it might be better to settle on an api that permits doc/field boosts.  How about using a tuple as the field name in the field dict?

conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])

doc boosts could be handled by optionally providing the fielddict as a (<fielddict>, boost) tuple.
  

I agree. I was not aware of field boosts at the time. I'll code this change.

Unfortunately, it is still somewhat awkward.  In my python client I end up passing (<name>, <value>, <field boost or None>) everywhere, but that clutters up the api considerably.

It might be worth taking a look at the ruby client to see what Eric's done for the api.

- for 2.5+, a cool addition might be:

if sys.version > 2.5
   import contextlib      def batched(solrconn):
          solrconn.begin_batch()
	yield solrconn
	solrconn.end_batch()
  batched = contextlib.contextmanager(batched)

Use as:

with batched(solrconn):
       solrconn.add(...)
       solrconn.add(...)
       solrconn.add(...)
  

Adding...

Unfortunately, it does push the required python version to 2.4.  Personally, I think that requiring 2.4 is not unreasonable, but I'm somewhat of a bleeding edge guy...

[incidently, it would be best to keep comments in JIRA, for posterity]

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.