You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by GitBox <gi...@apache.org> on 2020/12/10 22:46:43 UTC

[GitHub] [thrift] fishy opened a new pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

fishy opened a new pull request #2292:
URL: https://github.com/apache/thrift/pull/2292


   Client: go
   
   Currently we have a guard for go's TBinaryProtocol.ReadString, abstract
   it to also apply it to TBinaryProtocol.ReadBinary,
   TCompactProtocol.ReadString, and TCompactProtocol.ReadBinary as well.
   
   Also the current implementation for TBinaryProtocol.ReadString will only
   read up to twice of the read limit, remove that restriction. So if we
   have a malformed message with insanely large string length but
   moderately large size of bytes left in the message, we will only
   allocate up to the max of number of: bytes left + single read limit.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] fishy commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
fishy commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540634261



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).

Review comment:
       No. With `io.CopyN` + `bytes.Buffer` it stops when there's no more data to read. See this line in the benchmark result I attached to the commit message:
   
   ```
   BenchmarkSafeReadBytes/max-size-12                462532              2494 ns/op           14464 B/op          7 allocs/op
   ```
   
   The size in that call is max int32, but it only allocated 14KB of memory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] yurishkuro commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
yurishkuro commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540574786



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -482,35 +480,37 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 const readLimit = 32768
 
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
+	buf, err := safeReadBytes(size, p.trans, p.buffer[:], readLimit)
+	return buf.String(), NewTProtocolException(err)
+}
+
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+func safeReadBytes(size int32, trans io.Reader, buffer []byte, singleReadLimit int) (*bytes.Buffer, error) {

Review comment:
       I would rather return just the slice `[]byte`, which would be heap-allocated anyway, and keep the `bytes.Buffer` hidden and allocated on the stack, as it was previously.
   
   Also the semantics of this function is a bit confusing - what is the role of `buffer` argument?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] fishy commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
fishy commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540585807



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -482,35 +480,37 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 const readLimit = 32768
 
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
+	buf, err := safeReadBytes(size, p.trans, p.buffer[:], readLimit)
+	return buf.String(), NewTProtocolException(err)
+}
+
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+func safeReadBytes(size int32, trans io.Reader, buffer []byte, singleReadLimit int) (*bytes.Buffer, error) {

Review comment:
       Good point. Actually we don't need to specify the limit at all. Combining bytes.Buffer with io.CopyN is enough.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] fishy commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
fishy commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540629906



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).
+func safeReadBytes(size int32, trans io.Reader) ([]byte, error) {
+	if size < 0 {
+		return nil, nil
 	}
 
-	for size > 0 {
-		_, e = io.ReadFull(p.trans, b)
-		buf.Write(b)
-		if e != nil {
-			break
-		}
-		size -= readLimit
-		if size < readLimit && size > 0 {
-			b = b[:size]
-		}
-	}
-	return buf.String(), NewTProtocolException(e)
+	buf := new(bytes.Buffer)

Review comment:
       What you suggested IS the large allocation when size is insanely large.

##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -432,6 +432,15 @@ func (p *TBinaryProtocol) ReadString(ctx context.Context) (value string, err err
 		err = invalidDataLength
 		return
 	}
+	if size == 0 {
+		return "", nil
+	}
+	if size < int32(len(p.buffer)) {
+		// Avoid allocation on small reads
+		buf := p.buffer[:size]
+		read, e := io.ReadFull(p.trans, buf)
+		return string(buf[:read]), NewTProtocolException(e)
+	}

Review comment:
       This avoids the extra allocation (we reused `p.buffer`)
   
   `readStringBody` is a function attached to type `TBinaryProtocol`, so it cannot be used by `TCompactProtocol`. we cannot move it to top level function either as it's also used by other parts of `TBinaryProtocol`. I mean we can do that refactor, but consider it's just 4 lines it's not really worth it.

##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).

Review comment:
       It's in the details of the implementation of `io.CopyN` and `bytes.Buffer`. what they did is similar to the previous version, that they will do the copy chunk by chunk (and grow the buffer when necessary), instead of allocate the whole size up front.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] yurishkuro commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
yurishkuro commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540626101



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -432,6 +432,15 @@ func (p *TBinaryProtocol) ReadString(ctx context.Context) (value string, err err
 		err = invalidDataLength
 		return
 	}
+	if size == 0 {
+		return "", nil
+	}
+	if size < int32(len(p.buffer)) {
+		// Avoid allocation on small reads
+		buf := p.buffer[:size]
+		read, e := io.ReadFull(p.trans, buf)
+		return string(buf[:read]), NewTProtocolException(e)
+	}

Review comment:
       why not move this logic inside `readStringBody` and use that from both protocols?

##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).
+func safeReadBytes(size int32, trans io.Reader) ([]byte, error) {
+	if size < 0 {
+		return nil, nil
 	}
 
-	for size > 0 {
-		_, e = io.ReadFull(p.trans, b)
-		buf.Write(b)
-		if e != nil {
-			break
-		}
-		size -= readLimit
-		if size < readLimit && size > 0 {
-			b = b[:size]
-		}
-	}
-	return buf.String(), NewTProtocolException(e)
+	buf := new(bytes.Buffer)

Review comment:
       This is still two allocations, one for Buffer, another for Buffer.buf slice. Why not do just one?
   ```go
   buf := make([]byte, size)
   _, err := io.ReadFull(trans, buf)
   return buf, err
   ```

##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).

Review comment:
       How does it actually prevent large allocations? Previously there was readLimit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] yurishkuro commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
yurishkuro commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540638795



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).

Review comment:
       Got it. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] yurishkuro commented on a change in pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
yurishkuro commented on a change in pull request #2292:
URL: https://github.com/apache/thrift/pull/2292#discussion_r540633143



##########
File path: lib/go/thrift/binary_protocol.go
##########
@@ -479,38 +486,21 @@ func (p *TBinaryProtocol) readAll(ctx context.Context, buf []byte) (err error) {
 	return NewTProtocolException(err)
 }
 
-const readLimit = 32768
-
 func (p *TBinaryProtocol) readStringBody(size int32) (value string, err error) {
-	if size < 0 {
-		return "", nil
-	}
-
-	var (
-		buf bytes.Buffer
-		e   error
-		b   []byte
-	)
+	buf, err := safeReadBytes(size, p.trans)
+	return string(buf), NewTProtocolException(err)
+}
 
-	switch {
-	case int(size) <= len(p.buffer):
-		b = p.buffer[:size] // avoids allocation for small reads
-	case int(size) < readLimit:
-		b = make([]byte, size)
-	default:
-		b = make([]byte, readLimit)
+// This function is shared between TBinaryProtocol and TCompactProtocol.
+//
+// It tries to read size bytes from trans, in a way that prevents large
+// allocations when size is insanely large (mostly caused by malformed message).

Review comment:
       if `size` is large you end up with a huge allocated array no matter _how you copy data into it_. The main issue in the ticket is that the value of `size` could be corrupted and be greater than the total message size.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [thrift] fishy merged pull request #2292: THRIFT-5322: Guard against large string/binary lengths in Go

Posted by GitBox <gi...@apache.org>.
fishy merged pull request #2292:
URL: https://github.com/apache/thrift/pull/2292


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org