You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ph...@free.fr on 2015/03/13 09:16:18 UTC

Word frequency

Hello,

is it possible to create dynamic facets with SOLR 5.0.0?

For instance, I would like to display the most-frequently occurring words in the left-hand side of my Velocity SOLR GUI (facet_fields.vm).

Facet_fields.vm currently looks like this:


-----------------------------------------------
#**
 *  Display facets based on field values
 *  e.g.: fields specified by &facet.field=
 *#

#if($response.facetFields)
  <h2 #annTitle("Facets generated by adding &facet.field= to the request")>
    ##Field Facets
    Results
  </h2>
  #foreach($field in $response.facetFields)
    ## Hide facets without value
    #if($field.values.size() > 0)
      <span class="facet-field">$field.name</span>
      <ul>
        #foreach($facet in $field.values)
          <li>
            <a href="#url_for_facet_filter($field.name, $facet.name)">$facet.name</a> ($facet.count)
          </li>
        #end
      </ul>
    #end  ## end if > 0
  #end    ## end for each facet field
#end      ## end if response has facet fields

------------------------------------------

Many thanks.

Philippe

Re: Word frequency

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
The usual recommendation is to use Solr as a database, internally with
a separate user-facing app in a different container. Solr is not
really easy to secure, so best is to use O/S level protection, e.g.
listening on localhost only or only on a secure IP address.

This separate client also gives you more flexibility with scaling/etc later.

You could look at something like Spring Data Solr if your search needs
are simple and you want quick UI building support.

Regards,
   Alex.


----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 13 March 2015 at 10:54,  <ph...@free.fr> wrote:
>
> If you are asking whether users have access to /browse, then the answer is yes.
>
> Currently, they can type keywords in the q input field to do searches.
>
> I plan to turn q into a hidden field and add a 'keywords' input field whose contents will be transferred to q when users press Search, using Javascript.
>
> I will also add date selects so that users don't have to type date queries.
>
> How do you secure the rest of SOLR (e.g., admin)?
>
> Would would recommend creating an alternative Search GUI with, say, Wicket, which queries SOLR using AJAX?
>
> Sounds hard, but I will try. Velocity is so much simpler.
>
> Cheers,
>
> Philippe
>
>
>
>
>
>
>
> ----- Mail original -----
> De: "Alexandre Rafalovitch" <ar...@gmail.com>
> À: "solr-user" <so...@lucene.apache.org>
> Envoyé: Vendredi 13 Mars 2015 15:41:45
> Objet: Re: Word frequency
>
> On 13 March 2015 at 10:25,  <ph...@free.fr> wrote:
>> I would like to:
>>
>> - loop throught the documents in my core
>> - extract the most-frequently-appearing words in each document's text field
>> - generate a .vm  which displays those words ranked number of occurrences, or, ideally, automatically generate that .vm whenever users use SOLR.
>
> That's what faceting does. You you can fine tune it further by telling
> how many of top hits you want to get back. Have a look at those
> parameters and play with them first in Web Admin UI before trying to
> apply them to the browse handler.
>
> Regards,
>    Alex.
> P.s. You are not planning to expose /browse handler directly to users,
> do you? Because unless you REALLY know how to secure the rest of Solr,
> you are asking for big troubles.

Re: Word frequency

Posted by ph...@free.fr.
Point taken, Shawn. Thanks for your input.


----- Mail original -----
De: "Shawn Heisey" <ap...@elyograg.org>
À: solr-user@lucene.apache.org
Envoyé: Vendredi 13 Mars 2015 16:12:46
Objet: Re: Word frequency

On 3/13/2015 8:54 AM, phiroc@free.fr wrote:
> 
> If you are asking whether users have access to /browse, then the answer is yes.
> 
> Currently, they can type keywords in the q input field to do searches.
> 
> I plan to turn q into a hidden field and add a 'keywords' input field whose contents will be transferred to q when users press Search, using Javascript.
> 
> I will also add date selects so that users don't have to type date queries.
> 
> How do you secure the rest of SOLR (e.g., admin)?
> 
> Would would recommend creating an alternative Search GUI with, say, Wicket, which queries SOLR using AJAX?
> 
> Sounds hard, but I will try. Velocity is so much simpler.

Anything that requires an end user to have direct access to Solr (which
includes both the /browse handler and AJAX) is a potential security
issue.  If that access is unfiltered, a user can completely erase your
index, or cause other problems.  Switching to a different input field
other than "q" won't be any kind of protection ... they will just have
to run their browser in debug mode and they'll be able to see the Solr
queries sent, and then they can completely bypass any javascript
protections you create.

The intent with Solr is that it will be completely firewalled from user
access and only queried by server-side programs (PHP, Java, Ruby, etc).

Securing a Solr server exposed to the public requires an intelligent
proxy server with a specific config tailored to only allowing certain
requests to work.  I had this discussion with someone else on a
javascript client for Solr, and they said they're using this for a
proxy, and that this code will protect the Solr server from malicious
activity:

https://github.com/adsabs/solr-service

I haven't looked deeper, so I don't know if that claim is valid.

Note that even with a proxy server, it is still usually possible to send
denial-of-service queries designed to keep the server too busy to handle
legitimate requests.  If the code that accesses Solr is server-side,
then you may be able to detect malicious queries created from user input
and stop them from being sent to Solr.

Thanks,
Shawn


Re: Word frequency

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/13/2015 8:54 AM, phiroc@free.fr wrote:
> 
> If you are asking whether users have access to /browse, then the answer is yes.
> 
> Currently, they can type keywords in the q input field to do searches.
> 
> I plan to turn q into a hidden field and add a 'keywords' input field whose contents will be transferred to q when users press Search, using Javascript.
> 
> I will also add date selects so that users don't have to type date queries.
> 
> How do you secure the rest of SOLR (e.g., admin)?
> 
> Would would recommend creating an alternative Search GUI with, say, Wicket, which queries SOLR using AJAX?
> 
> Sounds hard, but I will try. Velocity is so much simpler.

Anything that requires an end user to have direct access to Solr (which
includes both the /browse handler and AJAX) is a potential security
issue.  If that access is unfiltered, a user can completely erase your
index, or cause other problems.  Switching to a different input field
other than "q" won't be any kind of protection ... they will just have
to run their browser in debug mode and they'll be able to see the Solr
queries sent, and then they can completely bypass any javascript
protections you create.

The intent with Solr is that it will be completely firewalled from user
access and only queried by server-side programs (PHP, Java, Ruby, etc).

Securing a Solr server exposed to the public requires an intelligent
proxy server with a specific config tailored to only allowing certain
requests to work.  I had this discussion with someone else on a
javascript client for Solr, and they said they're using this for a
proxy, and that this code will protect the Solr server from malicious
activity:

https://github.com/adsabs/solr-service

I haven't looked deeper, so I don't know if that claim is valid.

Note that even with a proxy server, it is still usually possible to send
denial-of-service queries designed to keep the server too busy to handle
legitimate requests.  If the code that accesses Solr is server-side,
then you may be able to detect malicious queries created from user input
and stop them from being sent to Solr.

Thanks,
Shawn


Re: Word frequency

Posted by ph...@free.fr.
If you are asking whether users have access to /browse, then the answer is yes.

Currently, they can type keywords in the q input field to do searches.

I plan to turn q into a hidden field and add a 'keywords' input field whose contents will be transferred to q when users press Search, using Javascript.

I will also add date selects so that users don't have to type date queries.

How do you secure the rest of SOLR (e.g., admin)?

Would would recommend creating an alternative Search GUI with, say, Wicket, which queries SOLR using AJAX?

Sounds hard, but I will try. Velocity is so much simpler.

Cheers,

Philippe







----- Mail original -----
De: "Alexandre Rafalovitch" <ar...@gmail.com>
À: "solr-user" <so...@lucene.apache.org>
Envoyé: Vendredi 13 Mars 2015 15:41:45
Objet: Re: Word frequency

On 13 March 2015 at 10:25,  <ph...@free.fr> wrote:
> I would like to:
>
> - loop throught the documents in my core
> - extract the most-frequently-appearing words in each document's text field
> - generate a .vm  which displays those words ranked number of occurrences, or, ideally, automatically generate that .vm whenever users use SOLR.

That's what faceting does. You you can fine tune it further by telling
how many of top hits you want to get back. Have a look at those
parameters and play with them first in Web Admin UI before trying to
apply them to the browse handler.

Regards,
   Alex.
P.s. You are not planning to expose /browse handler directly to users,
do you? Because unless you REALLY know how to secure the rest of Solr,
you are asking for big troubles.

Re: Word frequency

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On 13 March 2015 at 10:25,  <ph...@free.fr> wrote:
> I would like to:
>
> - loop throught the documents in my core
> - extract the most-frequently-appearing words in each document's text field
> - generate a .vm  which displays those words ranked number of occurrences, or, ideally, automatically generate that .vm whenever users use SOLR.

That's what faceting does. You you can fine tune it further by telling
how many of top hits you want to get back. Have a look at those
parameters and play with them first in Web Admin UI before trying to
apply them to the browse handler.

Regards,
   Alex.
P.s. You are not planning to expose /browse handler directly to users,
do you? Because unless you REALLY know how to secure the rest of Solr,
you are asking for big troubles.

Re: Word frequency

Posted by ph...@free.fr.
Yes.

Except that I don't want to facet the entire text field (as it can contain thousands of words).

I would like to:

- loop throught the documents in my core
- extract the most-frequently-appearing words in each document's text field
- generate a .vm  which displays those words ranked number of occurrences, or, ideally, automatically generate that .vm whenever users use SOLR.






----- Mail original -----
De: "Erik Hatcher" <er...@gmail.com>
À: solr-user@lucene.apache.org
Envoyé: Vendredi 13 Mars 2015 15:05:21
Objet: Re: Word frequency

Do you mean like faceting on one of your full text fields?   Something like /browse?facet.field=_text or one of your other fields?


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>




> On Mar 13, 2015, at 4:16 AM, phiroc@free.fr wrote:
> 
> Hello,
> 
> is it possible to create dynamic facets with SOLR 5.0.0?
> 
> For instance, I would like to display the most-frequently occurring words in the left-hand side of my Velocity SOLR GUI (facet_fields.vm).
> 
> Facet_fields.vm currently looks like this:
> 
> 
> -----------------------------------------------
> #**
> *  Display facets based on field values
> *  e.g.: fields specified by &facet.field=
> *#
> 
> #if($response.facetFields)
>  <h2 #annTitle("Facets generated by adding &facet.field= to the request")>
>    ##Field Facets
>    Results
>  </h2>
>  #foreach($field in $response.facetFields)
>    ## Hide facets without value
>    #if($field.values.size() > 0)
>      <span class="facet-field">$field.name</span>
>      <ul>
>        #foreach($facet in $field.values)
>          <li>
>            <a href="#url_for_facet_filter($field.name, $facet.name)">$facet.name</a> ($facet.count)
>          </li>
>        #end
>      </ul>
>    #end  ## end if > 0
>  #end    ## end for each facet field
> #end      ## end if response has facet fields
> 
> ------------------------------------------
> 
> Many thanks.
> 
> Philippe


Re: Word frequency

Posted by Erik Hatcher <er...@gmail.com>.
Do you mean like faceting on one of your full text fields?   Something like /browse?facet.field=_text or one of your other fields?


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>




> On Mar 13, 2015, at 4:16 AM, phiroc@free.fr wrote:
> 
> Hello,
> 
> is it possible to create dynamic facets with SOLR 5.0.0?
> 
> For instance, I would like to display the most-frequently occurring words in the left-hand side of my Velocity SOLR GUI (facet_fields.vm).
> 
> Facet_fields.vm currently looks like this:
> 
> 
> -----------------------------------------------
> #**
> *  Display facets based on field values
> *  e.g.: fields specified by &facet.field=
> *#
> 
> #if($response.facetFields)
>  <h2 #annTitle("Facets generated by adding &facet.field= to the request")>
>    ##Field Facets
>    Results
>  </h2>
>  #foreach($field in $response.facetFields)
>    ## Hide facets without value
>    #if($field.values.size() > 0)
>      <span class="facet-field">$field.name</span>
>      <ul>
>        #foreach($facet in $field.values)
>          <li>
>            <a href="#url_for_facet_filter($field.name, $facet.name)">$facet.name</a> ($facet.count)
>          </li>
>        #end
>      </ul>
>    #end  ## end if > 0
>  #end    ## end for each facet field
> #end      ## end if response has facet fields
> 
> ------------------------------------------
> 
> Many thanks.
> 
> Philippe