You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bloodhound.apache.org by Apache Bloodhound <de...@bloodhound.apache.org> on 2014/03/21 19:01:02 UTC

[Apache Bloodhound] Proposals/BEP-0014 modified

Page "Proposals/BEP-0014" was changed by antonia.horincar
Diff URL: <https://issues.apache.org/bloodhound/wiki/Proposals/BEP-0014?action=diff&version=2>
Revision 2
Changes:
-------8<------8<------8<------8<------8<------8<------8<------8<--------
Index: Proposals/BEP-0014
=========================================================================
--- Proposals/BEP-0014 (version: 1)
+++ Proposals/BEP-0014 (version: 2)
@@ -1,69 +1,91 @@
-
-= BEP <BEP number> : <BEP title> #overview
+= BEP 14 : Add Apache Solr to Bloodhound #overview
 
 [[PageOutline]]
 
-|| '''BEP''' || <BEP number> ||
-|| '''Title''' || <BEP title> ||
-|| '''Version''' || <leave blank> ||
-|| '''Last-Modified''' || <leave blank> ||
-|| '''Author''' || Author With Email <us...@dom.ain>, Author Name Only, or The Bloodhound project (see [wiki:/Proposals#bep-header-preamble BEP preamble explained]) ||
+|| '''BEP''' || 14 ||
+|| '''Title''' || Add Apache Solr to Bloodhound ||
+|| '''Version''' ||  ||
+|| '''Last-Modified''' ||  ||
+|| '''Author''' || Antonia Horincar <an...@gmail.com> ||
 || '''Status''' || Draft ||
-|| '''Type''' || <BEP type (see [wiki:/Proposals#bep-types BEP types explained])> ||
+|| '''Type''' || Standards Track ||
 || '''Content-Type''' || [wiki:PageTemplates/Proposals text/x-trac-wiki] ||
-|| '''Created''' || <leave blank> ||
-|| '''Post-History''' || <leave blank> ||
+|| '''Created''' ||  ||
+|| '''Post-History''' ||  ||
 
 ----
 
 == Abstract #abstract
 
-<Delete text in this section and add a short (~200 word) description of the technical issue being addressed. Take a look at sample abstract below>
+The Bloodhound Search plugins supports different search backends, but only Woosh has been implemented so far[1]. Even though Whoosh (Python implemented search backend [2]) is a great solution for a small amount of data, it doesn’t provide scalability when it comes to dealing to a higher amount of items to search. Apache Solr is a search platform focused on delivering enterprise class, high performance search functionality[3]. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable for most popular programming languages[4]. It provides great scalability, and great performance under heavy usage, being a great tool for services having a high number of users, and high amount of data.
 
-This template provides a boilerplate or sample template for creating your
-own BEPs.  In conjunction with the [wiki:/Proposals general content guidelines] and the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines]  
-, this should make it easy for you to conform your own
-BEPs to the format outlined below. See [#howto How to Use This Template] for further instructions.
+== Rationale #rationale
 
-**Note**: if you are reading this template via the web, you should first try to create a new wiki page by selecting `ProposalsRst` |page template guide|.  **DO NOT EDIT THIS WIKI PAGE IN ORDER TO CREATE A NEW BEP! **
+When considering providing Solr support for an existing Python service, there are multiple solutions, as there are multiple Python libraries that provide REST-ful interaction with the Solr server. 
 
-If you would prefer not to use WikiFormatting markup in your BEP, please see  [wiki:/Proposals/Formats/RestructuredText reStructuredText BEP guidelines].
+Two of the most popular Python libraries for working with Solr are Sunburnt and Solrpy. They are both open source, well documented libraries that make use of Solr’s REST-like JSON API. There are a lot of similarities between the two libraries, in terms of speed and performance.
 
-== Motivation ==
+Sunburnt is more extensible, providing the possibility of implementing a wider variety of operations, while solrpy is more restrictive from this point of view. Sunburnt allows query chaining like:
+query.find_by_phrase(title="Bloodhound").paginate(4, 40).execute()
+It also supports more complex queries than solrpy, being a better solution when regarding the long term development of Bloodhound. Important features like pagination, and facets can be accomplished with the provided methods in sunburnt. One way of achieving spell checking, and suggestions is to extend the default Sunburnt functionality, as shown in this example [4].
 
-<The motivation is critical for BEPs that want to change the copy of ''Trac'' patched using vendor branch . It should clearly explain why the existing ''Bloodhound'' solution is inadequate to address the problem that the ''BEP'' solves. ''BEP'' submissions without sufficient motivation may be rejected outright. >
+On the other hand, solrpy doesn’t have any dependencies, as it works out of the box without any issues. Sunburnt needs httplib2 [5] (or requests [6]) and lxml [7]. It is also strongly recommended to use the mx.DateTime [8] and the pytz [9] libraries for a better experience. 
+
+Since Sunburnt provides a more extensive way of using the Solr search service features, it will be the one used for the project, but switching from Sunburnt to Solrpy and vice-versa is easy in the early development stages.
 
 == Proposal #proposal
 
-<The technical specification should describe any new features , detail its impact on the components architecture , mention what plugins will be included as a result , whether they are hosted by ​[http://trac-hacks.org trac-hacks.org] or not , and any other relevant technical subject . The specification should be detailed enough to allow competing, interoperable implementations for any of the current supported database platforms (e.g. ''SQLite'', ''Postgres'', ''MySQL'') and server technologies (e.g. ''Apache HTTPD server'', ''nginx'', ''mod_wsgi'', ''CGI'').. >
+The first step that needs to be taken when integrating the Solr backend service to an application is having the Solr service installed. Solr must be run on a server that supports Java, such as Apache Tomcat, or Jetty. Installing a Tomcat server is very straightforward on Linux and Mac machines [10] [11], as well as having the Solr service installed on a Tomcat instance.
 
-== Rationale #rationale
+After having the Tomcat server up and running, the next step is creating Solr schemas for the Bloodhound classes that are searchable (i.e. Tickets, Comments, Milestones, etc.) This is done through the schema.xml files in the Solr configuration folders. These files must contain information about all the fields of the classes that must be indexed. This feature of Solr (that makes it different from Lucene, which does not require a schema) makes searching very flexible, by allowing only the defined fields to be searchable. 
 
-<The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other issue trackers or ''Trac'' hacks . The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. Take a look at sample rationale below>
+The Solr schema will be generated using a trac-admin task, by populating a pre-defined XML template with the attributes of a specified resource.
 
-''BEP'' submissions come in a wide variety of forms, not all adhering to the format guidelines set forth below. Use this template, in conjunction with the [wiki:/Proposals general content guidelines] and the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines], to ensure that your ''BEP'' submission is easy to read and understand.
+After having Solr configured and running, the next step is making the Bloodhound instance interact with the Solr service for the searching and indexing operations. This will be done using the Sunburnt library, which provides methods for interacting with the Solr JSON API in a REST-ful way. Next, the Solr service will be pre-populated with the existing data in Bloodhound database, by creating a script to iterate through all the fields and add their values in Solr. Therefore, method calls of the Sunburnt instance are required in the Bloodhound code.
 
-This template allows to create BEPs and is very similar to [http://www.python.org/dev/peps/pep-0012 PEP 12] . However it has been optimized by moving long explanations to the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines] . If you are interested take a look at the  [?action=diff&old_version=1 differences]. The goal is to redact new BEPs just by following in-line instructions between angle brackets (i.e. **<** **>**) . Even if this will allow to write BEPs faster , it is highly recommended to read the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines] at least once in your lifetime to be aware of good practices and expected style rules . 
+In order to interact with Solr from the Bloodhound instance, the existing callbacks [12] for interacting with the main Bloodhound searchable objects, will be used:
+- on creation, insert a new entry in the Solr service
+- on delete, find the corresponding entry in Solr, and delete it.
+- on editing, find the corresponding entry in Solr, delete it, and add the entry with the new correct information
 
-== How to Use This Template #howto
+Placing the Solr interactions in the callbacks ensures that every successful operation triggers the corresponding operation with Solr. 
 
-<BEPs may include further sections. This is an example.>
+When the user performs a search, the results returned, are the ones provided by the Solr instance for the searched query. All these operations can be done in the existent Bloodhound implementation, using the ISearchBackend [13] interface, which provides methods for all the operations described above (adding, editing, deleting, and querying). 
 
-Quick edits will consist in following the instructions inside angle brackets (i.e. **<** **>**) . That should be everything needed to write new BEPs. To be more informed about advanced considerations please read the [wiki:/Proposals general content guidelines] and the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines] . If there is no point in including one of the sections in this document then feel free to remove it.
+The code will be structured as a plugin, which can be enabled, or disabled, so that the administrator can switch between multiple backend engines (when they decide that Solr is not suitable for their project).
 
-== Backwards Compatibility #backwards-compatibility
+== Deliverables #deliverables
+- Solr configuration files required for having the Solr running
+- Schema files for mapping the Bloodhound objects to Solr 
+- A Bloodhound plugin containing the code that establishes the interaction between the Bloodhound server and the Solr server
+- Unit tests for the Bloodhound plugin
+- Documentation
+- Further work on Bloodhound’s libraries and packages
 
-<All BEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The ''BEP'' must explain how to deal with these incompatibilities. ''BEP'' submissions without a sufficient backwards compatibility treatise may be rejected outright. >
+== Timeframe #timeframe
 
-== Reference Implementation #reference-implementation
+''April 21 - May 18''
+- Research Apache Solr and Sunburnt
+- Collaborate with the Bloodhound community, in order to get a more in depth understanding of all the modules and classes that deal with searching in Bloodhound at the moment
+- Draft an implementation design for the plugin
 
-< The reference implementation **must** be completed before any ''BEP'' is given status **Final**, but it need not be completed before the ''BEP'' is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation **must** include test code and documentation appropriate for either the wiki pages in ''Bloodhound'' users guide or an specific wiki page in the [http://issues.apache.org/bloodhound issue tracker] . >
+''May 19 - June 22 ''
+- Generate the schema file for Solr
+- User Sunburnt to create script for pre-populating Solr with existing data in Bloodhound database
+- Add methods to process and introduce the new data into Solr. 
 
-<In order to list tickets related to a given proposal edit sample text provided below by including the appropriate **<BEP number>**. Target tickets have to be tagged with `bep-<BEP number>` keyword. Do not forget to remove curly braces so that the tickets list will be actually rendered.>
+''June 23 - June 26''
+- Midterm evaluation
 
-{{{
-[[Widget(TicketQuery, query="keywords=~bep-<BEP number>&col=id&col=summary&col=status&col=priority&col=milestone", title=BEP <BEP number> ticket summary)]]
-}}}
+''June 27 - August 10''
+- Create methods to create queries for Solr, and bind them to the existing searching infrastructure 
+- Develop unit tests for the API
+
+''August 11 - August 17''
+- Refactor code, finish tests and documentation
+
+''August 18 - August 21''
+- Final evaluation
 
 == Resources #resources
 
@@ -73,15 +95,19 @@
 
 == References #references
 
-<List the references included in BEP body>
-
-  1. PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
-     http://www.python.org/dev/peps/pep-0001/
-  2. PEP 9, Sample Plaintext PEP Template, Warsaw
-     http://www.python.org/dev/peps/pep-0009
-  2. PEP 12, Sample reStructuredText ''PEP'' Template, Goodger, Warsaw
-     http://www.python.org/dev/peps/pep-0012/
-  3. http://www.opencontent.org/openpub/
+1. http://stackoverflow.com/questions/3226596/full-text-search-whoosh-vs-solr
+2. http://deeson-online.co.uk/blog/when-use-apache-solr-drupal
+3. http://en.wikipedia.org/wiki/Apache_Solr
+4. https://groups.google.com/forum/#!topic/python-sunburnt/rcbd2yLLUaQ
+5. https://code.google.com/p/httplib2/
+6. http://requests.readthedocs.org/en/latest/
+7. http://lxml.de/
+8. http://www.egenix.com/products/python/mxBase/mxDateTime/
+9. http://pytz.sourceforge.net/
+10. http://wolfpaulus.com/jounal/mac/tomcat7/
+11. https://www.digitalocean.com/community/articles/how-to-install-apache-tomcat-on-ubuntu-12-04
+13. https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/api.py#L94
+12. https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/search_resources/ticket_search.py#L45
 
 == Copyright #copyright
 
-------8<------8<------8<------8<------8<------8<------8<------8<--------

--
Page URL: <https://issues.apache.org/bloodhound/wiki/Proposals/BEP-0014>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound issue tracker

This is an automated message. Someone added your email address to be
notified of changes on 'Proposals/BEP-0014' page.
If it was not you, please report to .