You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bloodhound.apache.org by Antonia Horincar <an...@gmail.com> on 2014/07/23 02:06:40 UTC

[GSoC COMDEV-108] Solr plugin questions

Hi devs,

I have a question regarding the location of the Solr schema file. Should it be placed in the plugin’s directory (as it currently is), or should it be placed in the Solr config directories? 

Also, what other Solr features would you like to be added to the plugin?

Thanks,
Antonia

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Antonia Horincar <an...@gmail.com>.
On 12 August 2014 at 22:43:30, Anže Starič (anze.staric@gmail.com) wrote:
Sounds like something that needs to be done when Solr is configured  
for the first time so I would put it in the installation guide. What  
happens if the default search field is not defined? Will search crash,  
return no results or something else entirely?  


To be honest, it behaves strangely, it doesn’t crash all the time. I just tried using the Solr backend (and also generating a schema) without modifying the default option in the solrconfig.xml file (i.e. not modifying the default “text” field), and it worked. However, it happened to crash and throw this[1] error. And that’s why I thought it would be good to change the field anyway, as there is no “text” field in the Bloodhound schema. 

[1] http://stackoverflow.com/questions/10130163/solr-query-http-error-404-undefined-field-text

Anze  

On Mon, Aug 11, 2014 at 9:44 PM, Antonia Horincar  
<an...@gmail.com> wrote:  
> The issue is the following: Solr needs to be told which field to search on when using the default search. By default, this is set to the ‘text’ field (which doesn’t exist in the BH schema). So basically, this only needs to be changed once. Should I then specify this as a step in the installation guide? On one hand, this would allow the user to easily choose the default field to be searched, but on the other hand this means one additional installation step.  
>  
> On 11 August 2014 at 10:02:34, Anže Starič (anze.staric@gmail.com) wrote:  
>  
> Do we have to make the change each time the schema is generated, or  
> just the first time? If it is a one time process, it would be easier  
> to just include a step in the installation guide (or printing a  
> message when schema is generated).  
>  
>  
> Anze  

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Anže Starič <an...@gmail.com>.
Sounds like something that needs to be done when Solr is configured
for the first time so I would put it in the installation guide. What
happens if the default search field is not defined? Will search crash,
return no results or something else entirely?


Anze

On Mon, Aug 11, 2014 at 9:44 PM, Antonia Horincar
<an...@gmail.com> wrote:
> The issue is the following: Solr needs to be told which field to search on when using the default search. By default, this is set to the ‘text’ field (which doesn’t exist in the BH schema). So basically, this only needs to be changed once. Should I then specify this as a step in the installation guide? On one hand, this would allow the user to easily choose the default field to be searched, but on the other hand this means one additional installation step.
>
> On 11 August 2014 at 10:02:34, Anže Starič (anze.staric@gmail.com) wrote:
>
> Do we have to make the change each time the schema is generated, or
> just the first time? If it is a one time process, it would be easier
> to just include a step in the installation guide (or printing a
> message when schema is generated).
>
>
> Anze

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Antonia Horincar <an...@gmail.com>.
The issue is the following: Solr needs to be told which field to search on when using the default search. By default, this is set to the ‘text’ field (which doesn’t exist in the BH schema). So basically, this only needs to be changed once. Should I then specify this as a step in the installation guide? On one hand, this would allow the user to easily choose the default field to be searched, but on the other hand this means one additional installation step.

On 11 August 2014 at 10:02:34, Anže Starič (anze.staric@gmail.com) wrote:

Do we have to make the change each time the schema is generated, or  
just the first time? If it is a one time process, it would be easier  
to just include a step in the installation guide (or printing a  
message when schema is generated).  


Anze  

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Anže Starič <an...@gmail.com>.
Do we have to make the change each time the schema is generated, or
just the first time? If it is a one time process, it would be easier
to just include a step in the installation guide (or printing a
message when schema is generated).


Anze

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Antonia Horincar <an...@gmail.com>.

On 8 August 2014 at 11:44:03, Anže Starič (anze.staric@gmail.com) wrote:

> Sorry for the delay in my reply. I have been working on the features yet to be added to the plugin and this is the current status of the project: 
> I managed to fix the pagination of results, in order to retrieve a specified number of results per page. Previously, the plugin retrieved all Solr results that matched the query, and paginated them afterwards (which wasn’t very efficient). 
> 
> There was a problem with wildcard searching (Solr allows it by default, however due to the fact that I am using the existing DefaultQueryParser, not all wildcards worked). But I managed to fix this, and Solr wildcards now work. 

Nice. I'll take a look at it. 

> There is now a trac-admin command to generate a Solr schema at the desired location (I followed your advice and allowed the user to provide a path via the Terminal). This works, the schema can be generated, however there is a problem which I am hoping on fixing tomorrow. The user can provide a schema path when using the command, however I also need the path when instantiating the Sunburnt instance, and at the moment the path is not updated in the module that deals with Sunburnt. So once I fix this, the plugin should be able to use the generated schema for Solr requests. 

Based on the sunburnt docs[1]: 

schemadoc: By default, sunburnt will query the solr instance for its 
currently active schema. If you want to use a different schema for any 
reason, pass in a file object here which yields a schema document. 
Thanks for pointing this out, I forgot that the schema path wasn’t necessarily needed when creating the Sunburnt instance. When I began to work on the project, I thought I would need to store the schema in the plugin directory, so I used the schemadoc parameter for providing a schema path. And now I didn’t realise this parameter wasn’t mandatory. This fixed my problem, however I realised today that the Bloodhound Solr plugin also needs to make a small change in one of the Solr config files (when generating the schema), so I am writing some methods to search for a specific tag in an xml file and change its value. 



Is there a specific reason why you need the path to schema file when 
instantiating sunburnt instance? 


Anze 

[1] http://opensource.timetric.com/sunburnt/connectionconfiguration.html 

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Anže Starič <an...@gmail.com>.
> Sorry for the delay in my reply. I have been working on the features yet to be added to the plugin and this is the current status of the project:
> I managed to fix the pagination of results, in order to retrieve a specified number of results per page. Previously, the plugin retrieved all Solr results that matched the query, and paginated them afterwards (which wasn’t very efficient).
>
> There was a problem with wildcard searching (Solr allows it by default, however due to the fact that I am using the existing DefaultQueryParser, not all wildcards worked). But I managed to fix this, and Solr wildcards now work.

Nice. I'll take a look at it.

> There is now a trac-admin command to generate a Solr schema at the desired location (I followed your advice and allowed the user to provide a path via the Terminal). This works, the schema can be generated, however there is a problem which I am hoping on fixing tomorrow. The user can provide a schema path when using the command, however I also need the path when instantiating the Sunburnt instance, and at the moment the path is not updated in the module that deals with Sunburnt. So once I fix this, the plugin should be able to use the generated schema for Solr requests.

Based on the sunburnt docs[1]:

schemadoc: By default, sunburnt will query the solr instance for its
currently active schema. If you want to use a different schema for any
reason, pass in a file object here which yields a schema document.

Is there a specific reason why you need the path to schema file when
instantiating sunburnt instance?


Anze

[1] http://opensource.timetric.com/sunburnt/connectionconfiguration.html

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Antonia Horincar <an...@gmail.com>.
Hi,

Sorry for the delay in my reply. I have been working on the features yet to be added to the plugin and this is the current status of the project:
I managed to fix the pagination of results, in order to retrieve a specified number of results per page. Previously, the plugin retrieved all Solr results that matched the query, and paginated them afterwards (which wasn’t very efficient).

There was a problem with wildcard searching (Solr allows it by default, however due to the fact that I am using the existing DefaultQueryParser, not all wildcards worked). But I managed to fix this, and Solr wildcards now work.

There is now a trac-admin command to generate a Solr schema at the desired location (I followed your advice and allowed the user to provide a path via the Terminal). This works, the schema can be generated, however there is a problem which I am hoping on fixing tomorrow. The user can provide a schema path when using the command, however I also need the path when instantiating the Sunburnt instance, and at the moment the path is not updated in the module that deals with Sunburnt. So once I fix this, the plugin should be able to use the generated schema for Solr requests. 

I am also working on displaying the More Like This results in the interface. For now, I added a More Like This button (by implementing the ITemplateStreamFilter interface) to each result retrieved for the original query, but I’m not sure how the similar results should be shown in the interface. What do you think? 

I will also do some refactoring while working on these remaining issues. 

Thanks,
Antonia
On 30 July 2014 at 12:01:05, Anže Starič (anze.staric@gmail.com) wrote:

On Wed, Jul 30, 2014 at 7:11 AM, Antonia Horincar  
<an...@gmail.com> wrote:  
> I’ve began to work on generating a Solr schema rather than storing it. I am  
> using the lxml library for writing to the xml file. I am currently trying  
> to map fields from the Whoosh schema with the fields to be written in the  
> schema.xml file. Should the user provide the path to their current Solr  
> installation in the trac.ini file (in order to know where the schema.xml  
> file should be generated)?  

Storing the path to trac.ini would be convenient, but only when solr  
and bloodhound run on the same server. What about passing it as a  
command line parameter? (And creating schema.xml in current directory  
if no path is specified).  


Anze  

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Anže Starič <an...@gmail.com>.
On Wed, Jul 30, 2014 at 7:11 AM, Antonia Horincar
<an...@gmail.com> wrote:
> I’ve began to work on generating a Solr schema rather than storing it. I am
> using the lxml library for writing to the xml file. I am currently trying
> to map fields from the Whoosh schema with the fields to be written in the
> schema.xml file. Should the user provide the path to their current Solr
> installation in the trac.ini file (in order to know where the schema.xml
> file should be generated)?

Storing the path to trac.ini would be convenient, but only when solr
and bloodhound run on the same server. What about passing it as a
command line parameter? (And creating schema.xml in current directory
if no path is specified).


Anze

[GSoC COMDEV-108] Solr plugin questions

Posted by Antonia Horincar <an...@gmail.com>.
I’ve began to work on generating a Solr schema rather than storing it. I am
using the lxml library for writing to the xml file. I am currently trying
to map fields from the Whoosh schema with the fields to be written in the
schema.xml file. Should the user provide the path to their current Solr
installation in the trac.ini file (in order to know where the schema.xml
file should be generated)?

Regarding the instructions, I am going to write documentation for the
plugin, so I’ll include instructions to install Solr and use it in
Bloodhound.

Thanks,
Antonia

On 24 July 2014 at 09:54:04, Anže Starič (anze.staric@gmail.com
<javascript:_e(%7B%7D,'cvml','anze.staric@gmail.com');>) wrote:

Hi.

When solr is configured, schema.xml should be put in solr's
configuration directory.

Is it possible to generate schema.xml using a trac-admin command
instead of storing it in a repository? It can be generated from whoosh
schema, or if that is not possible, you can create a new schema
definition format that can then be used to generate whoosh and solr
schemas. This way, there is only one place in code that needs to be
modified when new fields are defined.

I also miss some instructions on how to set everything up (a basic
overview of a minimal solr install and steps needed to replace whoosh
backend with solr).

Anze

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Anže Starič <an...@gmail.com>.
Hi.

When solr is configured, schema.xml should be put in solr's
configuration directory.

Is it possible to generate schema.xml using a trac-admin command
instead of storing it in a repository? It can be generated from whoosh
schema, or if that is not possible, you can create a new schema
definition format that can then be used to generate whoosh and solr
schemas. This way, there is only one place in code that needs to be
modified when new fields are defined.

I also miss some instructions on how to set everything up (a basic
overview of a minimal solr install and steps needed to replace whoosh
backend with solr).

Anze

Re: [GSoC COMDEV-108] Solr plugin questions

Posted by Gary Martin <ga...@wandisco.com>.
On 23/07/14 01:06, Antonia Horincar wrote:
> Hi devs,
>
> I have a question regarding the location of the Solr schema file. Should it be placed in the plugin’s directory (as it currently is), or should it be placed in the Solr config directories? 
>
> Also, what other Solr features would you like to be added to the plugin?
>
> Thanks,
> Antonia
>

Hi Antonia,

I guess I would expect configuration to end up in the environment's
directories somewhere rather than just in the plugin, particularly if
there is a good reason to adjust them (I don't know if there is of course).

I take it from your request for suggestions for features that you have
already got feature parity with whoosh. I don't have any particular
ideas yet for more to do so hopefully others can come up with some
suggestions but I'll have a think.

Cheers,
    Gary