You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saumitra Srivastav <sa...@gmail.com> on 2015/03/08 21:05:33 UTC

Solr TCP layer

Dear Solr Contributors,

I want to start working on adding a TCP layer for client to node and
inter-node communication.

I am not up to date on recent changes happening to Solr. So before I start
looking into code, I would like to know if there is already some work done
in this direction, which I can reuse. Are there any know
challenges/complexities?

I would appreciate any help to kick start this effort. Also, what would be
the best way to discuss and get feedback on design from contributors? Open a
JIRA??

Regards,
Saumitra





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TCP layer

Posted by Walter Underwood <wu...@wunderwood.org>.
I agree. I have some servers that use a TCP socket protocol and they are beastly hard to monitor, load-balance, all that stuff. HTTP rules. I need a really big advantage to recommend a non-HTTP server.

Understand that I helped design at least one socket protocol, HP JetDirect. This was designed before HTTP, so I have an excuse.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 8, 2015, at 8:15 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 3/8/2015 2:05 PM, Saumitra Srivastav wrote:
>> I want to start working on adding a TCP layer for client to node and
>> inter-node communication.
>> 
>> I am not up to date on recent changes happening to Solr. So before I start
>> looking into code, I would like to know if there is already some work done
>> in this direction, which I can reuse. Are there any know
>> challenges/complexities?
>> 
>> I would appreciate any help to kick start this effort. Also, what would be
>> the best way to discuss and get feedback on design from contributors? Open a
>> JIRA??
> 
> 
> What follows is my mostly my opinion, interspersed with a few things
> that might be loosely called facts:
> 
> I personally do not mind at all that Solr uses HTTP.
> 
> Some people view it as an inefficient protocol, but the overhead is low
> on all but the slowest connections.  On a LAN, it is probably not even
> worth discussing.  Most of the time, there will be a LAN connection
> between the Solr client and the Solr server.
> 
> I think there are two primary advantages to HTTP:
> 
> *) Testing can be carried out in a browser with hand-typed URLs.
> *) We don't have to worry about writing the code to handle the
> underlying protocol.
> 
> The first point makes testing throughout the Solr deployment process
> VERY easy.
> 
> Expanding on the second point:  There are well-tested and mostly
> bug-free HTTP client and server libraries available for us to use.  We
> don't have to figure out how to write them, troubleshoot them, or
> optimize them.  There are hundreds or thousands of other people using
> those libraries that can find and report bugs and inefficiencies, often
> including the fix in the bug report.
> 
> We might be able to make Solr a little more efficient if we use our own
> TCP protocol instead of HTTP, but there are drawbacks.  It takes a lot
> of experience to write a network protocol from scratch, and we are bound
> to make mistakes, mistakes that will cause users a LOT of problems.  The
> authors of those libraries that I mentioned have already gone through
> that pain, often many times.
> 
> Solr must be a standalone application to realistically use a custom TCP
> protocol directly.  Although there are already loose plans in place to
> pull the HTTP and network layers into Solr and make it a standalone
> application, we are not there yet.
> 
> If you want to work on a new protocol, feel free.  If there is anything
> I can do to help in between my other duties, I will.  I do not know of
> any existing work that you can re-use, although it's possible there
> might be something.  You might look at the Zookeeper project for an
> existing implementation of a custom network protocol in Java.
> 
> I suspect that it will take a very long time before any such work is as
> stable as HttpClient and the Servlet API (implemented by Jetty).  Unless
> you can demonstrate that stability, it will not become the default protocol.
> 
> Thanks,
> Shawn
> 


Re: Solr TCP layer

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/8/2015 2:05 PM, Saumitra Srivastav wrote:
> I want to start working on adding a TCP layer for client to node and
> inter-node communication.
> 
> I am not up to date on recent changes happening to Solr. So before I start
> looking into code, I would like to know if there is already some work done
> in this direction, which I can reuse. Are there any know
> challenges/complexities?
> 
> I would appreciate any help to kick start this effort. Also, what would be
> the best way to discuss and get feedback on design from contributors? Open a
> JIRA??


What follows is my mostly my opinion, interspersed with a few things
that might be loosely called facts:

I personally do not mind at all that Solr uses HTTP.

Some people view it as an inefficient protocol, but the overhead is low
on all but the slowest connections.  On a LAN, it is probably not even
worth discussing.  Most of the time, there will be a LAN connection
between the Solr client and the Solr server.

I think there are two primary advantages to HTTP:

*) Testing can be carried out in a browser with hand-typed URLs.
*) We don't have to worry about writing the code to handle the
underlying protocol.

The first point makes testing throughout the Solr deployment process
VERY easy.

Expanding on the second point:  There are well-tested and mostly
bug-free HTTP client and server libraries available for us to use.  We
don't have to figure out how to write them, troubleshoot them, or
optimize them.  There are hundreds or thousands of other people using
those libraries that can find and report bugs and inefficiencies, often
including the fix in the bug report.

We might be able to make Solr a little more efficient if we use our own
TCP protocol instead of HTTP, but there are drawbacks.  It takes a lot
of experience to write a network protocol from scratch, and we are bound
to make mistakes, mistakes that will cause users a LOT of problems.  The
authors of those libraries that I mentioned have already gone through
that pain, often many times.

Solr must be a standalone application to realistically use a custom TCP
protocol directly.  Although there are already loose plans in place to
pull the HTTP and network layers into Solr and make it a standalone
application, we are not there yet.

If you want to work on a new protocol, feel free.  If there is anything
I can do to help in between my other duties, I will.  I do not know of
any existing work that you can re-use, although it's possible there
might be something.  You might look at the Zookeeper project for an
existing implementation of a custom network protocol in Java.

I suspect that it will take a very long time before any such work is as
stable as HttpClient and the Servlet API (implemented by Jetty).  Unless
you can demonstrate that stability, it will not become the default protocol.

Thanks,
Shawn


Re: Solr TCP layer

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/10/2015 12:13 PM, Saumitra Srivastav wrote:
> Now we want to do the same with Solr. While I do realize that this is going
> to be a lot of work, but if its something that will reap benefit in long
> run, then so be it. Datastax provides a netty based layer in their
> enterprise version which folks have reported to be faster.

Netty has been discussed as a replacement for the Servlet API, as one
pathway towards Solr becoming a standalone application.  I'm pretty sure
that the general thinking within the project is to keep using HTTP (that
is one of the protocols that Netty implements) but the hope is that it
would be more efficient than a servlet container.  There is a lot of
evidence that Netty implements network communication much more
efficiently than other libraries.

If you have the experience to do work like that, user contributions are
always welcome.

Thanks,
Shawn


Re: Solr TCP layer

Posted by Walter Underwood <wu...@wunderwood.org>.
I would strongly recommend taking a look at HTTP/2. It might not be fast enough for you, but it is fast enough for Google and there are already implementations.

http://http2.github.io/faq/

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 10, 2015, at 11:18 AM, Erick Erickson <er...@gmail.com> wrote:

> Saumitra:
> 
> We certainly don't mean to be overly discouraging, so have at it!
> There has been some talk of using Netty in the future as we pull the
> war-file distribution out of the distro. Now, I have no technical clue
> about the merits .vs. TCP. But that's another possibility you might
> want to put into your analysis.
> 
> Best,
> Erick
> 
> On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
> <sa...@gmail.com> wrote:
>> Thanks everyone for the responses.
>> 
>> My motivation for TCP is coming from a very heavy indexing pipeline where
>> the smallest of optimization matters. I am working on a machine data parser
>> which feeds data into Cassandra and Solr and we have SLAs based on how fast
>> we can make data available in both the sources. We used to have issues with
>> Cassandra as well but we optimized the s**t out of it.
>> 
>> Now we want to do the same with Solr. While I do realize that this is going
>> to be a lot of work, but if its something that will reap benefit in long
>> run, then so be it. Datastax provides a netty based layer in their
>> enterprise version which folks have reported to be faster. Now just because
>> a commercial vendor ships it, doesn't mean we will jump into it without
>> thinking. We will definitely do a effect-vs-effort analysis before
>> committing to this.
>> 
>> For majority of users, such high performance might not be a
>> requirement/priority, so I understand the reluctance to go down this path.
>> 
>> I think it would be best at this time that I start exploring this option and
>> get back with my analysis.
>> 
>> Thanks again.
>> 
>> Saumitra
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr TCP layer

Posted by Erick Erickson <er...@gmail.com>.
Saumitra:

We certainly don't mean to be overly discouraging, so have at it!
There has been some talk of using Netty in the future as we pull the
war-file distribution out of the distro. Now, I have no technical clue
about the merits .vs. TCP. But that's another possibility you might
want to put into your analysis.

Best,
Erick

On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
<sa...@gmail.com> wrote:
> Thanks everyone for the responses.
>
> My motivation for TCP is coming from a very heavy indexing pipeline where
> the smallest of optimization matters. I am working on a machine data parser
> which feeds data into Cassandra and Solr and we have SLAs based on how fast
> we can make data available in both the sources. We used to have issues with
> Cassandra as well but we optimized the s**t out of it.
>
> Now we want to do the same with Solr. While I do realize that this is going
> to be a lot of work, but if its something that will reap benefit in long
> run, then so be it. Datastax provides a netty based layer in their
> enterprise version which folks have reported to be faster. Now just because
> a commercial vendor ships it, doesn't mean we will jump into it without
> thinking. We will definitely do a effect-vs-effort analysis before
> committing to this.
>
> For majority of users, such high performance might not be a
> requirement/priority, so I understand the reluctance to go down this path.
>
> I think it would be best at this time that I start exploring this option and
> get back with my analysis.
>
> Thanks again.
>
> Saumitra
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TCP layer

Posted by Saumitra Srivastav <sa...@gmail.com>.
Thanks everyone for the responses.

My motivation for TCP is coming from a very heavy indexing pipeline where
the smallest of optimization matters. I am working on a machine data parser
which feeds data into Cassandra and Solr and we have SLAs based on how fast
we can make data available in both the sources. We used to have issues with
Cassandra as well but we optimized the s**t out of it.

Now we want to do the same with Solr. While I do realize that this is going
to be a lot of work, but if its something that will reap benefit in long
run, then so be it. Datastax provides a netty based layer in their
enterprise version which folks have reported to be faster. Now just because
a commercial vendor ships it, doesn't mean we will jump into it without
thinking. We will definitely do a effect-vs-effort analysis before
committing to this. 

For majority of users, such high performance might not be a
requirement/priority, so I understand the reluctance to go down this path.

I think it would be best at this time that I start exploring this option and
get back with my analysis.

Thanks again.

Saumitra



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr TCP layer

Posted by Yago Riveiro <ya...@gmail.com>.
IMO each mega of memory saved has more impact that 0.001 less in latency … an OOM is killer, a lag of 2 second … is not catastrophic.

—
/Yago Riveiro

On Tue, Mar 10, 2015 at 4:03 PM, Erick Erickson <er...@gmail.com>
wrote:

> Just to pile on:
> I admire your bravery! I'll add to the other comments only by saying
> that _before_ you start down this path, you really need to articulate
> the benefit/cost analysis. "to gain a little more communications
> efficiency" will be a pretty hard sell due to the reasons Shawn
> outlined. This is hugely risky and would require a lot of work for
> as-yet-unarticulated benefits.
> There are lots and lots of other things to work on of significantly
> greater impact IMO. How would you like to work on something to help
> manage Solr's memory usage for instance ;)?
> Best,
> Erick
> On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles
> <Ch...@tiaa-cref.org> wrote:
>> A couple thoughts:
>> 0. Interesting topic.
>> 1. But perhaps better suited to the dev list.
>> 2. Given the existing architecture, shouldn't we be looking to transport projects, e.g. Jetty, Apache HttpComponents, for support of new socket or even HTTP layer protocols?
>> 3. To the extent such support exists, then integration work is still needed at the solr level.  Shalin, is this your intention?
>>
>> Also, for those of us not tracking protocol standards in detail, can you describe the benefits to Solr users of http/2?
>>
>> Do you expect HTTP/2 to be transparent at the application layer?
>>
>> -----Original Message-----
>> From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
>> Sent: Monday, March 09, 2015 6:23 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr TCP layer
>>
>> Hi Saumitra,
>>
>> I've been thinking of adding http/2 support for inter node communication initially and client server communication next in Solr. There's a patch for SPDY support but now that spdy is deprecated and http/2 is the new standard we need to wait for Jetty 9.3 to release. That will take care of many bottlenecks in solrcloud communication. The current trunk is already using jetty 9.2.x which has support for the draft http/2 spec.
>>
>> A brand new async TCP layer based on netty can be considered but that's a huge amount of work considering our need to still support simple http, SSL etc. Frankly for me that effort is better spent optimizing the routing layer.
>> On 09-Mar-2015 1:37 am, "Saumitra Srivastav" <sa...@gmail.com>
>> wrote:
>>
>>> Dear Solr Contributors,
>>>
>>> I want to start working on adding a TCP layer for client to node and
>>> inter-node communication.
>>>
>>> I am not up to date on recent changes happening to Solr. So before I
>>> start looking into code, I would like to know if there is already some
>>> work done in this direction, which I can reuse. Are there any know
>>> challenges/complexities?
>>>
>>> I would appreciate any help to kick start this effort. Also, what
>>> would be the best way to discuss and get feedback on design from
>>> contributors? Open a JIRA??
>>>
>>> Regards,
>>> Saumitra
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> *************************************************************************
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender immediately and then delete it.
>>
>> TIAA-CREF
>> *************************************************************************

Re: Solr TCP layer

Posted by Erick Erickson <er...@gmail.com>.
Just to pile on:

I admire your bravery! I'll add to the other comments only by saying
that _before_ you start down this path, you really need to articulate
the benefit/cost analysis. "to gain a little more communications
efficiency" will be a pretty hard sell due to the reasons Shawn
outlined. This is hugely risky and would require a lot of work for
as-yet-unarticulated benefits.

There are lots and lots of other things to work on of significantly
greater impact IMO. How would you like to work on something to help
manage Solr's memory usage for instance ;)?

Best,
Erick

On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles
<Ch...@tiaa-cref.org> wrote:
> A couple thoughts:
> 0. Interesting topic.
> 1. But perhaps better suited to the dev list.
> 2. Given the existing architecture, shouldn't we be looking to transport projects, e.g. Jetty, Apache HttpComponents, for support of new socket or even HTTP layer protocols?
> 3. To the extent such support exists, then integration work is still needed at the solr level.  Shalin, is this your intention?
>
> Also, for those of us not tracking protocol standards in detail, can you describe the benefits to Solr users of http/2?
>
> Do you expect HTTP/2 to be transparent at the application layer?
>
> -----Original Message-----
> From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
> Sent: Monday, March 09, 2015 6:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr TCP layer
>
> Hi Saumitra,
>
> I've been thinking of adding http/2 support for inter node communication initially and client server communication next in Solr. There's a patch for SPDY support but now that spdy is deprecated and http/2 is the new standard we need to wait for Jetty 9.3 to release. That will take care of many bottlenecks in solrcloud communication. The current trunk is already using jetty 9.2.x which has support for the draft http/2 spec.
>
> A brand new async TCP layer based on netty can be considered but that's a huge amount of work considering our need to still support simple http, SSL etc. Frankly for me that effort is better spent optimizing the routing layer.
> On 09-Mar-2015 1:37 am, "Saumitra Srivastav" <sa...@gmail.com>
> wrote:
>
>> Dear Solr Contributors,
>>
>> I want to start working on adding a TCP layer for client to node and
>> inter-node communication.
>>
>> I am not up to date on recent changes happening to Solr. So before I
>> start looking into code, I would like to know if there is already some
>> work done in this direction, which I can reuse. Are there any know
>> challenges/complexities?
>>
>> I would appreciate any help to kick start this effort. Also, what
>> would be the best way to discuss and get feedback on design from
>> contributors? Open a JIRA??
>>
>> Regards,
>> Saumitra
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately and then delete it.
>
> TIAA-CREF
> *************************************************************************

RE: Solr TCP layer

Posted by "Reitzel, Charles" <Ch...@tiaa-cref.org>.
A couple thoughts:
0. Interesting topic.
1. But perhaps better suited to the dev list.
2. Given the existing architecture, shouldn't we be looking to transport projects, e.g. Jetty, Apache HttpComponents, for support of new socket or even HTTP layer protocols?
3. To the extent such support exists, then integration work is still needed at the solr level.  Shalin, is this your intention?

Also, for those of us not tracking protocol standards in detail, can you describe the benefits to Solr users of http/2?   

Do you expect HTTP/2 to be transparent at the application layer?

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Monday, March 09, 2015 6:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr TCP layer

Hi Saumitra,

I've been thinking of adding http/2 support for inter node communication initially and client server communication next in Solr. There's a patch for SPDY support but now that spdy is deprecated and http/2 is the new standard we need to wait for Jetty 9.3 to release. That will take care of many bottlenecks in solrcloud communication. The current trunk is already using jetty 9.2.x which has support for the draft http/2 spec.

A brand new async TCP layer based on netty can be considered but that's a huge amount of work considering our need to still support simple http, SSL etc. Frankly for me that effort is better spent optimizing the routing layer.
On 09-Mar-2015 1:37 am, "Saumitra Srivastav" <sa...@gmail.com>
wrote:

> Dear Solr Contributors,
>
> I want to start working on adding a TCP layer for client to node and 
> inter-node communication.
>
> I am not up to date on recent changes happening to Solr. So before I 
> start looking into code, I would like to know if there is already some 
> work done in this direction, which I can reuse. Are there any know 
> challenges/complexities?
>
> I would appreciate any help to kick start this effort. Also, what 
> would be the best way to discuss and get feedback on design from 
> contributors? Open a JIRA??
>
> Regards,
> Saumitra
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete it.

TIAA-CREF
*************************************************************************

Re: Solr TCP layer

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi Saumitra,

I've been thinking of adding http/2 support for inter node communication
initially and client server communication next in Solr. There's a patch for
SPDY support but now that spdy is deprecated and http/2 is the new standard
we need to wait for Jetty 9.3 to release. That will take care of many
bottlenecks in solrcloud communication. The current trunk is already using
jetty 9.2.x which has support for the draft http/2 spec.

A brand new async TCP layer based on netty can be considered but that's a
huge amount of work considering our need to still support simple http, SSL
etc. Frankly for me that effort is better spent optimizing the routing
layer.
On 09-Mar-2015 1:37 am, "Saumitra Srivastav" <sa...@gmail.com>
wrote:

> Dear Solr Contributors,
>
> I want to start working on adding a TCP layer for client to node and
> inter-node communication.
>
> I am not up to date on recent changes happening to Solr. So before I start
> looking into code, I would like to know if there is already some work done
> in this direction, which I can reuse. Are there any know
> challenges/complexities?
>
> I would appreciate any help to kick start this effort. Also, what would be
> the best way to discuss and get feedback on design from contributors? Open
> a
> JIRA??
>
> Regards,
> Saumitra
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>