You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2008/11/16 23:12:24 UTC

Solr security

I'm pondering the viability of running Solr as effectively a UI  
server... what I mean by that is having a public facing browser-based  
application hitting a Solr backend directly for JSON, XML, etc data.

I know folks are doing this (I won't name names, in case this thread  
comes up with any vulnerabilities that would effect such existing  
environments).

Let's just assume a typical deployment environment... replicated  
Solr's behind a load balancer, maybe even a caching proxy.
What known vulnerabilities are there in Solr 1.3, for example?

What I think we can get out this is a Solr deployment configuration  
suitable for direct browser access, but we're not safely there yet are  
we?  Is this an absurd goal?  Must we always have a moving piece  
between browser and data/search servers?

Thanks,
	Erik


Re: Solr security

Posted by Ian Holsman <li...@holsman.net>.
Ryan McKinley wrote:
>
> On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:
>
>> trouble is, you can also GET /solr/update, even all on the URL, no 
>> request body...
>>
>>   
>> <http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true> 
>>
>>
>> Solr is a bad RESTafarian.
>>
>
> but with Ian's options in the apache config, this would not work...  
> rather it would only work if stream.body was a POST
>
> <location /solr/update>
order deny,allow
deny from all
allow from 192.168.0.1
</location>
?
or perhaps locationmatch.. but you get the picture.

>
>
>
>> Getting warmer!
>>
>>     Erik
>>
>>
>> On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:
>>
>>> if thats the case putting apache in front of it would be handy.
>>>
>>> something like
>>> <limit  POST>
>>> order deny,allow
>>> deny from all
>>> allow from 192.168.0.1
>>> </limit>
>>>
>>> might be helpful.
>>>
>>> Sean Timm wrote:
>>>> I believe the Solr replication scripts require POSTing a commit to 
>>>> read in the new index--so at least limited POST capability is 
>>>> required in most scenarios.
>>>>
>>>> -Sean
>>>>
>>>> Lance Norskog wrote:
>>>>> About that "read-only" switch for Solr: one of the basic HTTP design
>>>>> guidelines is that GET should only return values, and should never 
>>>>> change
>>>>> the state of the data. All changes to the data should be made with 
>>>>> POST. (In
>>>>> REST style guidelines, PUT, POST, and DELETE.) This prevents you from
>>>>> passing around URLs in email that can destroy the index.  The 
>>>>> first role of
>>>>> security is to prevent accidents.
>>>>>
>>>>> I would suggest two layers of "read-only" switch. 1) Open the 
>>>>> Lucene index
>>>>> in read-only mode. 2) Allow only search servers to accept GET 
>>>>> requests.
>>>>>
>>>>> Lance
>>>>>
>>>>>
>>>>
>>
>
>


Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:

> trouble is, you can also GET /solr/update, even all on the URL, no  
> request body...
>
>   <http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true 
> >
>
> Solr is a bad RESTafarian.
>

but with Ian's options in the apache config, this would not work...   
rather it would only work if stream.body was a POST





> Getting warmer!
>
> 	Erik
>
>
> On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:
>
>> if thats the case putting apache in front of it would be handy.
>>
>> something like
>> <limit  POST>
>> order deny,allow
>> deny from all
>> allow from 192.168.0.1
>> </limit>
>>
>> might be helpful.
>>
>> Sean Timm wrote:
>>> I believe the Solr replication scripts require POSTing a commit to  
>>> read in the new index--so at least limited POST capability is  
>>> required in most scenarios.
>>>
>>> -Sean
>>>
>>> Lance Norskog wrote:
>>>> About that "read-only" switch for Solr: one of the basic HTTP  
>>>> design
>>>> guidelines is that GET should only return values, and should  
>>>> never change
>>>> the state of the data. All changes to the data should be made  
>>>> with POST. (In
>>>> REST style guidelines, PUT, POST, and DELETE.) This prevents you  
>>>> from
>>>> passing around URLs in email that can destroy the index.  The  
>>>> first role of
>>>> security is to prevent accidents.
>>>>
>>>> I would suggest two layers of "read-only" switch. 1) Open the  
>>>> Lucene index
>>>> in read-only mode. 2) Allow only search servers to accept GET  
>>>> requests.
>>>>
>>>> Lance
>>>>
>>>>
>>>
>


Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
trouble is, you can also GET /solr/update, even all on the URL, no  
request body...

    <http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true 
 >

Solr is a bad RESTafarian.

Getting warmer!

	Erik


On Nov 17, 2008, at 4:11 PM, Ian Holsman wrote:

> if thats the case putting apache in front of it would be handy.
>
> something like
> <limit  POST>
> order deny,allow
> deny from all
> allow from 192.168.0.1
> </limit>
>
> might be helpful.
>
> Sean Timm wrote:
>> I believe the Solr replication scripts require POSTing a commit to  
>> read in the new index--so at least limited POST capability is  
>> required in most scenarios.
>>
>> -Sean
>>
>> Lance Norskog wrote:
>>> About that "read-only" switch for Solr: one of the basic HTTP design
>>> guidelines is that GET should only return values, and should never  
>>> change
>>> the state of the data. All changes to the data should be made with  
>>> POST. (In
>>> REST style guidelines, PUT, POST, and DELETE.) This prevents you  
>>> from
>>> passing around URLs in email that can destroy the index.  The  
>>> first role of
>>> security is to prevent accidents.
>>>
>>> I would suggest two layers of "read-only" switch. 1) Open the  
>>> Lucene index
>>> in read-only mode. 2) Allow only search servers to accept GET  
>>> requests.
>>>
>>> Lance
>>>
>>>
>>


Re: Solr security

Posted by Ian Holsman <li...@holsman.net>.
if thats the case putting apache in front of it would be handy.

something like
<limit  POST>
order deny,allow
deny from all
allow from 192.168.0.1
</limit>

might be helpful.

Sean Timm wrote:
> I believe the Solr replication scripts require POSTing a commit to 
> read in the new index--so at least limited POST capability is required 
> in most scenarios.
>
> -Sean
>
> Lance Norskog wrote:
>> About that "read-only" switch for Solr: one of the basic HTTP design
>> guidelines is that GET should only return values, and should never 
>> change
>> the state of the data. All changes to the data should be made with 
>> POST. (In
>> REST style guidelines, PUT, POST, and DELETE.) This prevents you from
>> passing around URLs in email that can destroy the index.  The first 
>> role of
>> security is to prevent accidents.
>>
>> I would suggest two layers of "read-only" switch. 1) Open the Lucene 
>> index
>> in read-only mode. 2) Allow only search servers to accept GET requests.
>>
>> Lance
>>
>>   
>


Re: Solr security

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
If the user is using the new java Solr replication then he can get rid
of the /update and /update/csv handlers altogether. So the slaves are
completely read-only
--Noble



On Tue, Nov 18, 2008 at 2:14 AM, Sean Timm <ti...@aol.com> wrote:
> I believe the Solr replication scripts require POSTing a commit to read in
> the new index--so at least limited POST capability is required in most
> scenarios.
>
> -Sean
>
> Lance Norskog wrote:
>>
>> About that "read-only" switch for Solr: one of the basic HTTP design
>> guidelines is that GET should only return values, and should never change
>> the state of the data. All changes to the data should be made with POST.
>> (In
>> REST style guidelines, PUT, POST, and DELETE.) This prevents you from
>> passing around URLs in email that can destroy the index.  The first role
>> of
>> security is to prevent accidents.
>>
>> I would suggest two layers of "read-only" switch. 1) Open the Lucene index
>> in read-only mode. 2) Allow only search servers to accept GET requests.
>>
>> Lance
>>
>>
>



-- 
--Noble Paul

Re: Solr security

Posted by Sean Timm <ti...@aol.com>.
I believe the Solr replication scripts require POSTing a commit to read 
in the new index--so at least limited POST capability is required in 
most scenarios.

-Sean

Lance Norskog wrote:
> About that "read-only" switch for Solr: one of the basic HTTP design
> guidelines is that GET should only return values, and should never change
> the state of the data. All changes to the data should be made with POST. (In
> REST style guidelines, PUT, POST, and DELETE.) This prevents you from
> passing around URLs in email that can destroy the index.  The first role of
> security is to prevent accidents.
>
> I would suggest two layers of "read-only" switch. 1) Open the Lucene index
> in read-only mode. 2) Allow only search servers to accept GET requests.
>
> Lance
>
>   

RE: Solr security

Posted by Lance Norskog <go...@gmail.com>.
About that "read-only" switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never change
the state of the data. All changes to the data should be made with POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing around URLs in email that can destroy the index.  The first role of
security is to prevent accidents.

I would suggest two layers of "read-only" switch. 1) Open the Lucene index
in read-only mode. 2) Allow only search servers to accept GET requests.

Lance


Re: Solr security

Posted by Matthias Epheser <ma...@gmx.at>.
Erik Hatcher schrieb:
> 
> On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:
> 
>> my assumption with solrjs is that you are hitting "read-only" solr 
>> servers that you don't mind if people query directly.
> 
> Exactly the assumption I'm going with too.
> 
>>  It would not be appropriate for something where you don't want people 
>> (who really care) to know you are running solr and could execute 
>> arbitrary queries.
>>
>> Since it is an example, I don't mind leaving the /admin interface open 
>> on:
>> http://example.solrstuff.org/solrjs/admin/
>> but /update has a password:
>> http://example.solrstuff.org/solrjs/update
>>
>> I have said in the past I like the idea of a "read-only" flag in solr 
>> config that would throw an error if you try to do something with the 
>> UpdateHandler.  However there are other ways to do that also.
> 

As the thoughts and ideas of this thread are spread in several emails, let me 
just drop my uncoordinated thoughts here:

For solrjs, what exactly is the required information solr has to provide 
"directly":

- We need data for several widgets. This data will be in 99% of the cases some 
facet information and/or result docs. The result docs will be in suitable 
ranges, no webpage will display 100000+ result items at the same time.

- So "potentially dangerous" request params like rows>1000 or some other 
handlers apart from StandardRequest may be blocked.

- update handlers and admin interface shouldn't be exposed.


Like others mentioned before, I'm not sure this is a task that *has* to be 
solved inside Solr. As a standalone servlet, it is verly likely that it is NOT 
accessible directly in a production environment.

Hiding or password protecting update/admin is an easy task using a proxy like 
apache http. It could also be solved by a configurable ServletFilter delivered 
with solr, that is initialized inside solr's web.xml. To separate the concerns, 
I think it should not be coded "deeper" inside the solr code. The idea of a 
"read-only" server can be implemented like that. Optional update urls that are 
only accessed inside a firewall or something may also be present.

This servlet filter may also check the request params for things that are not 
needed for solrjs and potentially dangerous. It even may check how frequently 
urls are accessed (thinking about DoS).

I think even if it looks like a direct access, using solrjs doesn't have to be 
different to "common" solr webapps. Usually these apps take user input, a web 
application translates this input into a solr query and translates the result in 
a suitable client format. Other solr stuff is blocked indirectly because only 
this app has access to solr. Now the last 2 steps are done inside the client. 
But if we block stuff that isn't used by the client, we are in control of what 
may happen.

If that isn't secure enough, the more complicated solution would be the create 
such a stateful servlet that holds the query state of a client, and solrjs only 
performs /select/solrjs/?new_query=city:vienna or something. Then the query 
generation and all solr related stuff happens again on the server.

I think it should easily be reached to deliver this SecuritySolrFilter with the 
standard solr distribution, making it configurable for the user to decide what 
urls are blocked/password protected and what request parameters should be 
checked for illegal values. On the other hand, existing firewalls and proxies of 
the destination system may be used.Therefore some "best-practices" may be 
helpful in the solr wiki.

I would be fine by me to help implementing a standard securty filter for solr.

WDYT?

regards,
matthias

Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:

> my assumption with solrjs is that you are hitting "read-only" solr  
> servers that you don't mind if people query directly.

Exactly the assumption I'm going with too.

>  It would not be appropriate for something where you don't want  
> people (who really care) to know you are running solr and could  
> execute arbitrary queries.
>
> Since it is an example, I don't mind leaving the /admin interface  
> open on:
> http://example.solrstuff.org/solrjs/admin/
> but /update has a password:
> http://example.solrstuff.org/solrjs/update
>
> I have said in the past I like the idea of a "read-only" flag in  
> solr config that would throw an error if you try to do something  
> with the UpdateHandler.  However there are other ways to do that also.

Yes, I was asked about this elusive read-only switch at Solr Boot Camp  
at ApacheCon as well.

How are you password protecting the update handler?  This is the kind  
of goody I'd like to distill out of this thread and wikify <http://wiki.apache.org/solr/SolrSecurity 
 >

What's it take to make a read-only Solr server now?  Can replication  
still be made to work?  (I plead ignorance on the guts of the Java- 
based replication feature) - requires password protected handlers?   
Shouldn't we bake some of this into the default example configuration  
instead of update handlers being wide open by default?

	Erik



Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
my assumption with solrjs is that you are hitting "read-only" solr  
servers that you don't mind if people query directly.  It would not be  
appropriate for something where you don't want people (who really  
care) to know you are running solr and could execute arbitrary queries.

Since it is an example, I don't mind leaving the /admin interface open  
on:
http://example.solrstuff.org/solrjs/admin/
but /update has a password:
http://example.solrstuff.org/solrjs/update

I have said in the past I like the idea of a "read-only" flag in solr  
config that would throw an error if you try to do something with the  
UpdateHandler.  However there are other ways to do that also.

ryan


On Nov 16, 2008, at 6:03 PM, Erik Hatcher wrote:

> What about SolrJS?   Isn't it designed to hit a Solr directly?   
> (Sure, as long as the response looked like Solr response, it could  
> have come through some magic 'security' tier).
>
> 	Erik
>
> On Nov 16, 2008, at 5:54 PM, Ryan McKinley wrote:
>> I'm not totally sure what you are suggesting.  Is there a general  
>> way people deal with security and search?
>>
>> I'm assuming we already have good ways (better ways) to make sure  
>> people are authorized/logged in etc.  What do you imagine "solr  
>> security" would add?
>>
>> FYI, I used to have a custom RequstHandler that got the user  
>> principal from the HttpServletRequest (I have a custom  
>> SolrDispatchFilter that adds that to the context) and then augments  
>> the query with a filter that limits to stuff that user can see.  I  
>> replaced all that with a something that adds the filter to the  
>> Solrj query.
>>
>> Assuming it is "safe" and all that, what do you think we could add  
>> that would be general enough?
>>
>> ryan
>>
>>
>> On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:
>>
>>> I'm pondering the viability of running Solr as effectively a UI  
>>> server... what I mean by that is having a public facing browser- 
>>> based application hitting a Solr backend directly for JSON, XML,  
>>> etc data.
>>>
>>> I know folks are doing this (I won't name names, in case this  
>>> thread comes up with any vulnerabilities that would effect such  
>>> existing environments).
>>>
>>> Let's just assume a typical deployment environment... replicated  
>>> Solr's behind a load balancer, maybe even a caching proxy.
>>> What known vulnerabilities are there in Solr 1.3, for example?
>>>
>>> What I think we can get out this is a Solr deployment  
>>> configuration suitable for direct browser access, but we're not  
>>> safely there yet are we?  Is this an absurd goal?  Must we always  
>>> have a moving piece between browser and data/search servers?
>>>
>>> Thanks,
>>> 	Erik
>>>
>


Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
What about SolrJS?   Isn't it designed to hit a Solr directly?  (Sure,  
as long as the response looked like Solr response, it could have come  
through some magic 'security' tier).

	Erik

On Nov 16, 2008, at 5:54 PM, Ryan McKinley wrote:
> I'm not totally sure what you are suggesting.  Is there a general  
> way people deal with security and search?
>
> I'm assuming we already have good ways (better ways) to make sure  
> people are authorized/logged in etc.  What do you imagine "solr  
> security" would add?
>
> FYI, I used to have a custom RequstHandler that got the user  
> principal from the HttpServletRequest (I have a custom  
> SolrDispatchFilter that adds that to the context) and then augments  
> the query with a filter that limits to stuff that user can see.  I  
> replaced all that with a something that adds the filter to the Solrj  
> query.
>
> Assuming it is "safe" and all that, what do you think we could add  
> that would be general enough?
>
> ryan
>
>
> On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:
>
>> I'm pondering the viability of running Solr as effectively a UI  
>> server... what I mean by that is having a public facing browser- 
>> based application hitting a Solr backend directly for JSON, XML,  
>> etc data.
>>
>> I know folks are doing this (I won't name names, in case this  
>> thread comes up with any vulnerabilities that would effect such  
>> existing environments).
>>
>> Let's just assume a typical deployment environment... replicated  
>> Solr's behind a load balancer, maybe even a caching proxy.
>> What known vulnerabilities are there in Solr 1.3, for example?
>>
>> What I think we can get out this is a Solr deployment configuration  
>> suitable for direct browser access, but we're not safely there yet  
>> are we?  Is this an absurd goal?  Must we always have a moving  
>> piece between browser and data/search servers?
>>
>> Thanks,
>> 	Erik
>>


Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
I'm not totally sure what you are suggesting.  Is there a general way  
people deal with security and search?

I'm assuming we already have good ways (better ways) to make sure  
people are authorized/logged in etc.  What do you imagine "solr  
security" would add?

FYI, I used to have a custom RequstHandler that got the user principal  
from the HttpServletRequest (I have a custom SolrDispatchFilter that  
adds that to the context) and then augments the query with a filter  
that limits to stuff that user can see.  I replaced all that with a  
something that adds the filter to the Solrj query.

Assuming it is "safe" and all that, what do you think we could add  
that would be general enough?

ryan


On Nov 16, 2008, at 5:12 PM, Erik Hatcher wrote:

> I'm pondering the viability of running Solr as effectively a UI  
> server... what I mean by that is having a public facing browser- 
> based application hitting a Solr backend directly for JSON, XML, etc  
> data.
>
> I know folks are doing this (I won't name names, in case this thread  
> comes up with any vulnerabilities that would effect such existing  
> environments).
>
> Let's just assume a typical deployment environment... replicated  
> Solr's behind a load balancer, maybe even a caching proxy.
> What known vulnerabilities are there in Solr 1.3, for example?
>
> What I think we can get out this is a Solr deployment configuration  
> suitable for direct browser access, but we're not safely there yet  
> are we?  Is this an absurd goal?  Must we always have a moving piece  
> between browser and data/search servers?
>
> Thanks,
> 	Erik
>


Re: Solr security

Posted by Mark Miller <ma...@gmail.com>.
Plus, it's just too big a can of worms for solr to handle. You could  
protect up to a small point, but a real ddos attack is not going to be  
defended against by solr. At best we could put in 'kiddie' protection  
against.

- Mark


On Nov 16, 2008, at 5:51 PM, Erik Hatcher <er...@ehatchersolutions.com>  
wrote:

>
> On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
>> First thing I would look at is disabling write access, or writing a  
>> servlet that sits on top of the write handler to filter your data.
>
> We can turn off all the update handlers, but how does that affect  
> replication?  Can a Solr replicant be entirely read-only in the HTTP  
> request sense?
>
>> Second thing I would be concerned about is people writing DoS  
>> queries that bypass the cache.
>>
>>
>> so you may need to write your own custom request handler to filter  
>> out that kind of thing.
>
> Is this a concern that can be punted to what you'd naturally be  
> putting in front of Solr anyway or a proxy tier that can have DoS  
> blocking rules?  I mean, if you're deploying a Struts that hits Solr  
> under the covers, how do you prevent against DoS on that? A  
> malicious user could keep sending queries indirectly to a Solr  
> through a whole lot of public apps now.  In other words, another  
> tier in front of Solr doesn't add (much) to DoS protection to an  
> underlying Solr, no?
>
>    Erik
>

Re: Solr security

Posted by Mark Miller <ma...@gmail.com>.
Ryan McKinley wrote:
> solr.jar on the other hand lets you package what you want around 
> search features to build a setup for your needs.  Java already has so 
> many options for how to secure / authenticate that you can just plug 
> them into your own app.  (if that is appropriate).  In the past I have 
> used a filter based on:
> http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html
> to limit load -- however I have found that in any site where 
> stability/load and uptime are a serious concern, this is better 
> handled in a tier in front of java -- typically the loadbalancer / 
> haproxy / whatever -- and managed by people more cautious then me.
>
> ryan
>
Couldn't agree more. Almost all security and protection belong outside 
of solr. It can and will be done better, and solr can stick to what its 
good at. Smaller things like limiting complex query attacks or something 
seem more reasonable, but any real security should be provided 
elsewhere. Wouldn't that be odd if a bunch of open source products 
reimplemented network security layers and defenses on every project...


Re: Solr security

Posted by Sean Timm <ti...@aol.com>.
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only 
request handler) is pertinent to this discussion as well.

-Sean

Ian Holsman wrote:
> There was a patch by Sean Timm you should investigate as well.
>
> It limited a query so it would take a maximum of X seconds to execute, 
> and would just return the rows it had found in that time.
>
>
> Feak, Todd wrote:
>> I see value in this in the form of protecting the client from itself.
>>
>> For example, our Solr isn't accessible from the Internet. It's all
>> behind firewalls. But, the client applications can make programming
>> mistakes. I would love the ability to lock them down to a certain number
>> of rows, just in case someone typos and puts in 1000 instead of 100, or
>> the like.
>>
>> Admittedly, testing and QA should catch these things, but sometimes it's
>> nice to put in a few safeguards to stop the obvious mistakes from
>> occurring.
>>
>> -Todd Feak
>>
>> -----Original Message-----
>> From: Matthias Epheser [mailto:matthias.epheser@gmx.at] Sent: Monday, 
>> November 17, 2008 9:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr security
>>
>> Ryan McKinley schrieb:
>>   however I have found that in any site where
>>  
>>> stability/load and uptime are a serious concern, this is better
>>>     
>> handled  
>>> in a tier in front of java -- typically the loadbalancer / haproxy / 
>>> whatever -- and managed by people more cautious then me.
>>>     
>>
>> Full ack. What do you think about the only solr related thing "left",
>> the paramter filtering/blocking (eg. rows<1000). Is this suitable to 
>> do it
>> in a Filter delivered by solr? Of course as an optional alternative.
>>
>>  
>>> ryan
>>>
>>>
>>>     
>>
>>
>>
>>   
>

Re: Solr security

Posted by Ian Holsman <li...@holsman.net>.
There was a patch by Sean Timm you should investigate as well.

It limited a query so it would take a maximum of X seconds to execute, 
and would just return the rows it had found in that time.


Feak, Todd wrote:
> I see value in this in the form of protecting the client from itself.
>
> For example, our Solr isn't accessible from the Internet. It's all
> behind firewalls. But, the client applications can make programming
> mistakes. I would love the ability to lock them down to a certain number
> of rows, just in case someone typos and puts in 1000 instead of 100, or
> the like.
>
> Admittedly, testing and QA should catch these things, but sometimes it's
> nice to put in a few safeguards to stop the obvious mistakes from
> occurring.
>
> -Todd Feak
>
> -----Original Message-----
> From: Matthias Epheser [mailto:matthias.epheser@gmx.at] 
> Sent: Monday, November 17, 2008 9:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr security
>
> Ryan McKinley schrieb:
>   however I have found that in any site where
>   
>> stability/load and uptime are a serious concern, this is better
>>     
> handled 
>   
>> in a tier in front of java -- typically the loadbalancer / haproxy / 
>> whatever -- and managed by people more cautious then me.
>>     
>
> Full ack. What do you think about the only solr related thing "left",
> the 
> paramter filtering/blocking (eg. rows<1000). Is this suitable to do it
> in a 
> Filter delivered by solr? Of course as an optional alternative.
>
>   
>> ryan
>>
>>
>>     
>
>
>
>   


RE: Solr security

Posted by "Feak, Todd" <To...@smss.sony.com>.
I see value in this in the form of protecting the client from itself.

For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in case someone typos and puts in 1000 instead of 100, or
the like.

Admittedly, testing and QA should catch these things, but sometimes it's
nice to put in a few safeguards to stop the obvious mistakes from
occurring.

-Todd Feak

-----Original Message-----
From: Matthias Epheser [mailto:matthias.epheser@gmx.at] 
Sent: Monday, November 17, 2008 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr security

Ryan McKinley schrieb:
  however I have found that in any site where
> stability/load and uptime are a serious concern, this is better
handled 
> in a tier in front of java -- typically the loadbalancer / haproxy / 
> whatever -- and managed by people more cautious then me.

Full ack. What do you think about the only solr related thing "left",
the 
paramter filtering/blocking (eg. rows<1000). Is this suitable to do it
in a 
Filter delivered by solr? Of course as an optional alternative.

> 
> ryan
> 
> 



Re: Solr security

Posted by Chris Hostetter <ho...@fucit.org>.
: > Full ack. What do you think about the only solr related thing "left", the
: > paramter filtering/blocking (eg. rows<1000). Is this suitable to do it in a
: > Filter delivered by solr? Of course as an optional alternative.

: As eric mentioned earlier, this could be done in a QueryComponent -- the
: prepare part could just make sure the query parameters are all within
: reasonable ranges.  This seems like something reasonable to add to solr.

i don't even see it requiring a new component -- the existing 
QueryComponent could treat this similar to the way the DismaxQParser deals 
with q and q.alt ... add two new params: start.max and rows.max that 
default to some very large values; QueryComponent respects start & rows 
only as long as they don't exceed the corrisponding max; peoples that want 
ot lock down their ports can make them invariants for the handlers that 
are exposed.


-Hoss


Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
On Nov 17, 2008, at 12:06 PM, Matthias Epheser wrote:

> Ryan McKinley schrieb:
> however I have found that in any site where
>> stability/load and uptime are a serious concern, this is better  
>> handled in a tier in front of java -- typically the loadbalancer /  
>> haproxy / whatever -- and managed by people more cautious then me.
>
> Full ack. What do you think about the only solr related thing  
> "left", the paramter filtering/blocking (eg. rows<1000). Is this  
> suitable to do it in a Filter delivered by solr? Of course as an  
> optional alternative.
>

This could be done in a standard ServletFilter -- but that requires  
mucking with web.xml and may be more difficult if you are worried  
about it for some Handlers and not others.

As eric mentioned earlier, this could be done in a QueryComponent --  
the prepare part could just make sure the query parameters are all  
within reasonable ranges.  This seems like something reasonable to add  
to solr.

ryan

Re: Solr security

Posted by Matthias Epheser <ma...@gmx.at>.
Ryan McKinley schrieb:
  however I have found that in any site where
> stability/load and uptime are a serious concern, this is better handled 
> in a tier in front of java -- typically the loadbalancer / haproxy / 
> whatever -- and managed by people more cautious then me.

Full ack. What do you think about the only solr related thing "left", the 
paramter filtering/blocking (eg. rows<1000). Is this suitable to do it in a 
Filter delivered by solr? Of course as an optional alternative.

> 
> ryan
> 
> 


Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
>>>
>> Say you do filtering by user - how would you enforce that the client
>> (if it's a browser) only send in the proper filter?
>
> Ryan already mentioned his technique... and here's how I'd do it  
> similarly...
>
>  Write a custom servlet Filter that grokked roles/authentication  
> (this piece you'd need in any Java application tier anyway) [or  
> plugin in an existing implementation through Spring or something  
> like that]  And then massaging of the request to Solr could happen  
> in that pipeline, or adding a query parameter to the Solr request  
> (ignoring anything sent by the client request for say, &user=...).   
> Perhaps plug in a custom SearchComponent that massaged a request  
> parameter into a Solr filter query or whatever.
>

right, but the question is still: is there anything general enough to  
be in solr core?

Everything I can think of requires a good sense of how the auth model  
is encoded in your data and how you want to expose it.  Nothing I have  
done is general enough to share with even my next project.

The only think I could imagine is perhaps adding "getUserPrincipal()"  
to the SolrRequest interface -- but this quickly explodes into also  
wanting the request method (POST vs GET) or the user-agent...  in the  
end I just add the HttpServletRequest to the context and grab stuff  
from there.  Perhaps the default RequestDispatcher could add the  
HttpServletRequest to the context...


>> Doesn't seem like
>> you can unless you put all the user authentication stuff and
>> application logic right in Solr.
>
>   ;)
>
> Exactly.  Sort of.
>
>> Now I guess you *could* stick everything in Solr that you would
>> normally stick in the middle tier, but it doesn't seem like a great
>> idea to me.
>
> Let's be clear about where we are drawing the boundaries of the  
> definition of "Solr".
>
> One could say that Solr is solr.war and the HTTP conventions.  Or is  
> it solr.jar?  Or is it the SolrJ API?
>

all of the above :)

In my view we need to be clear about who solr.war is packaged for.  I  
think we are pretty clear that solr.war should be thought of similar  
to a MySQL install -- that is a database server that unless you  
*really* know what you are doing should most likely be behind a  
firewall.

solr.jar on the other hand lets you package what you want around  
search features to build a setup for your needs.  Java already has so  
many options for how to secure / authenticate that you can just plug  
them into your own app.  (if that is appropriate).  In the past I have  
used a filter based on:
http://www.onjava.com/pub/a/onjava/2004/03/24/loadcontrol.html
to limit load -- however I have found that in any site where stability/ 
load and uptime are a serious concern, this is better handled in a  
tier in front of java -- typically the loadbalancer / haproxy /  
whatever -- and managed by people more cautious then me.

ryan



Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 17, 2008, at 9:07 AM, Yonik Seeley wrote:
> On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
> <er...@ehatchersolutions.com> wrote:
>> Sounds like the perfect case for a query parser plugin... or use  
>> dismax as
>> Ryan mentioned.  Shouldn't Solr be hardened for these cases  
>> anyway?  Or at
>> least hardenable.
>
> Say you do filtering by user - how would you enforce that the client
> (if it's a browser) only send in the proper filter?

Ryan already mentioned his technique... and here's how I'd do it  
similarly...

   Write a custom servlet Filter that grokked roles/authentication  
(this piece you'd need in any Java application tier anyway) [or plugin  
in an existing implementation through Spring or something like that]   
And then massaging of the request to Solr could happen in that  
pipeline, or adding a query parameter to the Solr request (ignoring  
anything sent by the client request for say, &user=...).  Perhaps plug  
in a custom SearchComponent that massaged a request parameter into a  
Solr filter query or whatever.

>  Doesn't seem like
> you can unless you put all the user authentication stuff and
> application logic right in Solr.

    ;)

Exactly.  Sort of.

> Now I guess you *could* stick everything in Solr that you would
> normally stick in the middle tier, but it doesn't seem like a great
> idea to me.

Let's be clear about where we are drawing the boundaries of the  
definition of "Solr".

One could say that Solr is solr.war and the HTTP conventions.  Or is  
it solr.jar?  Or is it the SolrJ API?

	Erik


Re: Solr security

Posted by Yonik Seeley <yo...@apache.org>.
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
<er...@ehatchersolutions.com> wrote:
> Sounds like the perfect case for a query parser plugin... or use dismax as
> Ryan mentioned.  Shouldn't Solr be hardened for these cases anyway?  Or at
> least hardenable.

Say you do filtering by user - how would you enforce that the client
(if it's a browser) only send in the proper filter?  Doesn't seem like
you can unless you put all the user authentication stuff and
application logic right in Solr.

Now I guess you *could* stick everything in Solr that you would
normally stick in the middle tier, but it doesn't seem like a great
idea to me.

-Yonik

Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 16, 2008, at 6:12 PM, Ian Holsman wrote:
> famous last words and all, but you shouldn't be just passing what a  
> user types directly into a application should you?

LOL....

> I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
> least thinking about the effects).
> I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a  
> regular query.

Sounds like the perfect case for a query parser plugin... or use  
dismax as Ryan mentioned.  Shouldn't Solr be hardened for these cases  
anyway?  Or at least hardenable.

> but they don't let me into design meetings any more ;(

Apparently they shouldn't let me into them either ;)

	Erik


Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 16, 2008, at 6:27 PM, Ryan McKinley wrote:
>> I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
>> least thinking about the effects).
>> I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a  
>> regular query.
>>
>
> Even if you leave the solr instance public, you can still limit  
> grossly inefficent params by forcing things to use  the dismax query  
> parser.  You can use invariants to lock what options are available.
>
> I suppose we don't have a way to say the *maximum* number of rows  
> you can request is 100 (or something like that)

A LimitingRowsSearchComponent could easily do this as a plugin though.

	Erik


Re: Solr security

Posted by Walter Underwood <wu...@netflix.com>.
TCP-level attacks like SYN-flooding.

All kinds of HTTP breakage that Apache has fixed over the years.
You really want a bombproof TCP and HTTP implementation.

Very, very slow clients that keep a socket open for a long time
while the bits drool out to them.

We saw problems with all service threads being busy, and implemented
a deadman timer to reboot if no threads were in listen state for
two minutes.

We put in IP address checks for access to admin pages. You can do a
similar thing with Apache by only making the search pages available
and requiring admins to go directly to Solr on a different port.
That port can be blocked by a firewall.

Finally, you get the years of experience and documentation in configuring
Apache for use exposed on the Internet.

wunder

On 11/17/08 7:28 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:

> 
> On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote:
>> It is possible to make it safe, but a lot of work. We did this for
>> Ultraseek. I would always, always front it with Apache, to get some
>> of Apache's protection.
> 
> What protections specifically are you speaking of with Apache in
> front?  Authentication?  Row limiting?
> 
> Erik
> 


Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote:
> It is possible to make it safe, but a lot of work. We did this for
> Ultraseek. I would always, always front it with Apache, to get some
> of Apache's protection.

What protections specifically are you speaking of with Apache in  
front?  Authentication?  Row limiting?

	Erik


Re: Solr security

Posted by Walter Underwood <wu...@netflix.com>.
Limiting the number of rows only handles one attack. The one I mentioned,
fetching one page deep in the result set, caused a big issue on prod at
our site. We needed to limit the max for "start" as well as "rows".

It is possible to make it safe, but a lot of work. We did this for
Ultraseek. I would always, always front it with Apache, to get some
of Apache's protection.

wunder

On 11/17/08 6:04 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:
> 
> On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote:
>> Limiting the maximum number of rows doesn't work, because
>> they can request rows 20000-20100. --wunder
> 
> But you could limit how many rows could be returned in a single
> request... that'd close off one DoS mechanism.
> 
> Erik



Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote:
> Limiting the maximum number of rows doesn't work, because
> they can request rows 20000-20100. --wunder

But you could limit how many rows could be returned in a single  
request... that'd close off one DoS mechanism.

	Erik


Re: Solr security

Posted by Walter Underwood <wu...@netflix.com>.
Limiting the maximum number of rows doesn't work, because
they can request rows 20000-20100. --wunder

On 11/16/08 3:27 PM, "Ryan McKinley" <ry...@gmail.com> wrote:

>> 
>> I'd be parsing out wildcards, boosts, and fuzzy searches (or at
>> least thinking about the effects).
>> I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a
>> regular query.
>> 
> 
> Even if you leave the solr instance public, you can still limit
> grossly inefficent params by forcing things to use  the dismax query
> parser.  You can use invariants to lock what options are available.
> 
> I suppose we don't have a way to say the *maximum* number of rows you
> can request is 100 (or something like that)
> 
> ryan


Re: Solr security

Posted by Ryan McKinley <ry...@gmail.com>.
>
> I'd be parsing out wildcards, boosts, and fuzzy searches (or at  
> least thinking about the effects).
> I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a  
> regular query.
>

Even if you leave the solr instance public, you can still limit  
grossly inefficent params by forcing things to use  the dismax query  
parser.  You can use invariants to lock what options are available.

I suppose we don't have a way to say the *maximum* number of rows you  
can request is 100 (or something like that)

ryan

Re: Solr security

Posted by Walter Underwood <wu...@netflix.com>.
Agreed, it is pretty easy to create a large variety of denial
of service attacks with sorts, wildcards, requesting a large
number of results, or a page deep in the results.

We have protected against several different DoS problems
in our front-end code.

wunder

On 11/16/08 3:12 PM, "Ian Holsman" <li...@holsman.net> wrote:

> Erik Hatcher wrote:
>> 
>> On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
>>> First thing I would look at is disabling write access, or writing a
>>> servlet that sits on top of the write handler to filter your data.
>> 
>> We can turn off all the update handlers, but how does that affect
>> replication?  Can a Solr replicant be entirely read-only in the HTTP
>> request sense?
>> 
>>> Second thing I would be concerned about is people writing DoS queries
>>> that bypass the cache.
>>> 
>>> 
>>> so you may need to write your own custom request handler to filter
>>> out that kind of thing.
>> 
>> Is this a concern that can be punted to what you'd naturally be
>> putting in front of Solr anyway or a proxy tier that can have DoS
>> blocking rules?  I mean, if you're deploying a Struts that hits Solr
>> under the covers, how do you prevent against DoS on that?  A malicious
>> user could keep sending queries indirectly to a Solr through a whole
>> lot of public apps now.  In other words, another tier in front of Solr
>> doesn't add (much) to DoS protection to an underlying Solr, no?
> 
> famous last words and all, but you shouldn't be just passing what a user
> types directly into a application should you?
> 
> I'd be parsing out wildcards, boosts, and fuzzy searches (or at least
> thinking about the effects).
> I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a
> regular query.
> 
> but they don't let me into design meetings any more ;(
>>     Erik
>> 
>> 
> 


Re: Solr security

Posted by Ian Holsman <li...@holsman.net>.
Erik Hatcher wrote:
>
> On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
>> First thing I would look at is disabling write access, or writing a 
>> servlet that sits on top of the write handler to filter your data.
>
> We can turn off all the update handlers, but how does that affect 
> replication?  Can a Solr replicant be entirely read-only in the HTTP 
> request sense?
>
>> Second thing I would be concerned about is people writing DoS queries 
>> that bypass the cache.
>>
>>
>> so you may need to write your own custom request handler to filter 
>> out that kind of thing.
>
> Is this a concern that can be punted to what you'd naturally be 
> putting in front of Solr anyway or a proxy tier that can have DoS 
> blocking rules?  I mean, if you're deploying a Struts that hits Solr 
> under the covers, how do you prevent against DoS on that?  A malicious 
> user could keep sending queries indirectly to a Solr through a whole 
> lot of public apps now.  In other words, another tier in front of Solr 
> doesn't add (much) to DoS protection to an underlying Solr, no?

famous last words and all, but you shouldn't be just passing what a user 
types directly into a application should you?

I'd be parsing out wildcards, boosts, and fuzzy searches (or at least 
thinking about the effects).
I mean "jakarta apache"~1000 or roam~0.1 aren't as efficient as a 
regular query.

but they don't let me into design meetings any more ;(
>     Erik
>
>


Re: Solr security

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 16, 2008, at 5:41 PM, Ian Holsman wrote:
> First thing I would look at is disabling write access, or writing a  
> servlet that sits on top of the write handler to filter your data.

We can turn off all the update handlers, but how does that affect  
replication?  Can a Solr replicant be entirely read-only in the HTTP  
request sense?

> Second thing I would be concerned about is people writing DoS  
> queries that bypass the cache.
>
>
> so you may need to write your own custom request handler to filter  
> out that kind of thing.

Is this a concern that can be punted to what you'd naturally be  
putting in front of Solr anyway or a proxy tier that can have DoS  
blocking rules?  I mean, if you're deploying a Struts that hits Solr  
under the covers, how do you prevent against DoS on that?  A malicious  
user could keep sending queries indirectly to a Solr through a whole  
lot of public apps now.  In other words, another tier in front of Solr  
doesn't add (much) to DoS protection to an underlying Solr, no?

	Erik


Re: Solr security

Posted by Ian Holsman <li...@holsman.net>.
Erik Hatcher wrote:
> I'm pondering the viability of running Solr as effectively a UI 
> server... what I mean by that is having a public facing browser-based 
> application hitting a Solr backend directly for JSON, XML, etc data.
>
> I know folks are doing this (I won't name names, in case this thread 
> comes up with any vulnerabilities that would effect such existing 
> environments).
>
> Let's just assume a typical deployment environment... replicated 
> Solr's behind a load balancer, maybe even a caching proxy.
> What known vulnerabilities are there in Solr 1.3, for example?
>
> What I think we can get out this is a Solr deployment configuration 
> suitable for direct browser access, but we're not safely there yet are 
> we?  Is this an absurd goal?  Must we always have a moving piece 
> between browser and data/search servers?
>
> Thanks,
>     Erik
>
>

First thing I would look at is disabling write access, or writing a 
servlet that sits on top of the write handler to filter your data.

Second thing I would be concerned about is people writing DoS queries 
that bypass the cache.

so you may need to write your own custom request handler to filter out 
that kind of thing.