You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by sschepens <gi...@git.apache.org> on 2017/06/23 13:45:51 UTC

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

GitHub user sschepens opened an issue:

    https://github.com/apache/incubator-pulsar/issues/523

    Max Payload size is validated before compression

    Max Message Payload size is validated by `MessageBuilderImpl` and that happens before producer gets to compress payload.
    This should be validated after compression to allow big payload that are highly compressible (json) to be sent.


----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by merlimat <gi...@git.apache.org>.
Github user merlimat commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    My only concern is that it might be tricky to know upfront whether the message would be accepted or rejected (If you send msg >5MB). 
    
    One other thing we could do is to make the Max to be configurable, since the 5MB was very arbitrary anyway.
    
    If you can configure it on broker and client, you can raise and be 100% sure it gets accepted, even though compression didn't reduce the size for a particular message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by rdhabalia <gi...@git.apache.org>.
Github user rdhabalia commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    yes, right now, client considers uncompressed message size while sending message to broker and I don't see any clear reason why can't we consider compressed-message size instead, because broker stores compressed data only to to BK which will be < 5MB only. So, I think we can add validation after compressing the payload.
    @merlimat do you see any issue for considering compressed size while sending msg to broker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by rdhabalia <gi...@git.apache.org>.
Github user rdhabalia commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    > The configurable max message size could be a separate from this
    
    I will create a separate issue to provide a way to configure this size and will create a separate PR for both of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by joefk <gi...@git.apache.org>.
Github user joefk commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    If there needs to be a hard limit (for 100% assurance), the application can enforce that itself without any changes in the system (Pulsar).
    
    My perspective is from what I think as the origin of this ask:   If there is a system(Pulsar)  limit, and if the system can honor a "violating" request by optimizing the request, it should.  Agreed that sometimes the system will not always succeed in this, but it should try.  That could reduce user impact by reducing the number of violations, and some use-cases might prefer that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by merlimat <gi...@git.apache.org>.
Github user merlimat commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    > If there is a system(Pulsar) limit, and if the system can honor a "violating" request by optimizing the request, it should. 
    
    Ok, I agree with that. The configurable max message size could be a separate from this, though something that on its own we should do at some point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by sschepens <gi...@git.apache.org>.
Github user sschepens commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    Isn't Bookkeeper max entry size 5mb?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by merlimat <gi...@git.apache.org>.
Github user merlimat commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    I am not saying we cant, just trying to scope all the implications :)
    
    If you check the size after compression you're not sure whether a particular message will go through or not.
    
    Example:
    I have a 6MB JSON document. Most probably it will compress to 600KB, but I cannot tell 100%. If the JSON is weird enough it might be 5.5MB. 
    
    So if I send this 6MB message it might get rejected by the client.
    
    Typically the safe-size buffer for compression is _bigger_ that the uncompressed content, to account for incompressible content plus the compressor overhead.
    
    In my view, compressing is an optimization to safe network/IO/disk space, but the max size should be enforced irrespectively of that. So if you have 6MB or 10MB entries, you should be able to configure the system to handle that, with 100% assurance that it will processed correctly, instead of relying on the assumption that compression will solve it, which might be true in 99.99% of the cases but fail at unexpected times.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by Matteo Merli <ma...@gmail.com>.
> I am curious about SLA vs. message size. If Pulsar has a 5ms SLA is that
going to be feasible if there are very large messages?

Of course it's not feasible :)

The numbers published were related to 1KB entries. Of course there can be a
huge number of different use cases with different trades off. Typically
most users don't care about latency of bigger messages, so we never really
tested it, but at least at lower transfer rate, the latency shouldn't be
too bad even for 1MB messages.



--
Matteo Merli
<ma...@gmail.com>

On Mon, Jun 26, 2017 at 1:42 PM, Sahaya Andrews <sa...@gmail.com>
wrote:

> As the message size grows, we won't be able to guarantee 5ms latency
> since the underlying storage itself will take longer to sync the data.
>
> Andrews.
>
> On Mon, Jun 26, 2017 at 1:31 PM, Dave Fisher <da...@comcast.net>
> wrote:
> > Hi -
> >
> > If you are negotiating the buffer size then you should also negotiate if
> this is a compressed or uncompressed size.
> >
> > I am curious about SLA vs. message size. If Pulsar has a 5ms SLA is that
> going to be feasible if there are very large messages?
> >
> > Regards,
> > Dave
> >
> >> On Jun 26, 2017, at 1:26 PM, merlimat <gi...@git.apache.org> wrote:
> >>
> >> Github user merlimat commented on the issue:
> >>
> >>    https://github.com/apache/incubator-pulsar/issues/523
> >>
> >>> If it's configurable then client may start sending msg with
> unreasonable size which will be rejected at broker due to
> frame-size/storage-limitation, So, client will not know up-front.
> >>
> >>    Max-frame size could be negotiated between client and broker in
> Connect/Connected handshake
> >>
> >>
> >> ---
> >> If your project is set up for it, you can reply to this email and have
> your
> >> reply appear on GitHub as well. If your project does not have this
> feature
> >> enabled and wishes so, or if the feature is enabled but not working,
> please
> >> contact infrastructure at infrastructure@apache.org or file a JIRA
> ticket
> >> with INFRA.
> >> ---
> >
>

Re: [GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by Sahaya Andrews <sa...@gmail.com>.
As the message size grows, we won't be able to guarantee 5ms latency
since the underlying storage itself will take longer to sync the data.

Andrews.

On Mon, Jun 26, 2017 at 1:31 PM, Dave Fisher <da...@comcast.net> wrote:
> Hi -
>
> If you are negotiating the buffer size then you should also negotiate if this is a compressed or uncompressed size.
>
> I am curious about SLA vs. message size. If Pulsar has a 5ms SLA is that going to be feasible if there are very large messages?
>
> Regards,
> Dave
>
>> On Jun 26, 2017, at 1:26 PM, merlimat <gi...@git.apache.org> wrote:
>>
>> Github user merlimat commented on the issue:
>>
>>    https://github.com/apache/incubator-pulsar/issues/523
>>
>>> If it's configurable then client may start sending msg with unreasonable size which will be rejected at broker due to frame-size/storage-limitation, So, client will not know up-front.
>>
>>    Max-frame size could be negotiated between client and broker in Connect/Connected handshake
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working, please
>> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>

Re: [GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by Dave Fisher <da...@comcast.net>.
Hi -

If you are negotiating the buffer size then you should also negotiate if this is a compressed or uncompressed size.

I am curious about SLA vs. message size. If Pulsar has a 5ms SLA is that going to be feasible if there are very large messages?

Regards,
Dave

> On Jun 26, 2017, at 1:26 PM, merlimat <gi...@git.apache.org> wrote:
> 
> Github user merlimat commented on the issue:
> 
>    https://github.com/apache/incubator-pulsar/issues/523
> 
>> If it's configurable then client may start sending msg with unreasonable size which will be rejected at broker due to frame-size/storage-limitation, So, client will not know up-front.
> 
>    Max-frame size could be negotiated between client and broker in Connect/Connected handshake
> 
> 
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---


[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by merlimat <gi...@git.apache.org>.
Github user merlimat commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    > If it's configurable then client may start sending msg with unreasonable size which will be rejected at broker due to frame-size/storage-limitation, So, client will not know up-front.
    
    Max-frame size could be negotiated between client and broker in Connect/Connected handshake


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by rdhabalia <gi...@git.apache.org>.
Github user rdhabalia commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    > One other thing we could do is to make the Max to be configurable, since the 5MB was very arbitrary anyway.
    
    If it's configurable then client may start sending msg with unreasonable size which will be rejected at broker due to  [frame-size/storage-limitation](https://github.com/apache/incubator-pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/api/PulsarDecoder.java#L60), So, client will not know up-front.
    
    > If you can configure it on broker and client, you can raise and be 100% sure it gets accepted
    
    If it's configurable at client then value might be different than what broker is allowing and it requires some type of sync when client connects to broker.? Also, it may not address payload which can be highly compressible (eg.xml,json)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pulsar issue #523: Max Payload size is validated before compressio...

Posted by joefk <gi...@git.apache.org>.
Github user joefk commented on the issue:

    https://github.com/apache/incubator-pulsar/issues/523
  
    @merlimat Why can't we check size post-compression? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---