You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "0x4500 (via GitHub)" <gi...@apache.org> on 2024/01/29 15:57:08 UTC

[I] SIGSEGV in 0.12.0 with zstd compression enabled and batching disabled [pulsar-client-go]

0x4500 opened a new issue, #1163:
URL: https://github.com/apache/pulsar-client-go/issues/1163

   #### Expected behavior
   
   In v0.11.0, sending unbatched messages with zstd compression enabled works. In v0.12.0, this appears to cause a segfault.
   
   #### Actual behavior
   
   Segfaults of the following form are observed:
   
   ```
   SIGSEGV: segmentation violation
   PC=0xd41627 m=14 sigcode=1
   signal arrived during cgo execution
   
   goroutine 62 [syscall]:
   runtime.cgocall(0xcf0dc0, 0xc0001ed318)
   	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0001ed2f0 sp=0xc0001ed2b8 pc=0x409d6b
   github.com/DataDog/zstd._Cfunc_ZSTD_compressCCtx(0x7f8b73d5e010, 0xc000fb3140, 0xbc, 0xc00103ad80, 0x7d, 0x9)
   	_cgo_gotypes.go:223 +0x4c fp=0xc0001ed318 sp=0xc0001ed2f0 pc=0x82e48c
   github.com/DataDog/zstd.(*ctx).CompressLevel.func2(0x18892e8?, 0xc0001ed3e0, 0xc0001ed3f8, 0x9)
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/!data!dog/zstd@v1.5.5/zstd_ctx.go:84 +0x127 fp=0xc0001ed388 sp=0xc0001ed318 pc=0x82f447
   github.com/DataDog/zstd.(*ctx).CompressLevel(0x40fc5a?, {0xc000fb3140, 0xbc, 0xbc}, {0xc00103ad80, 0x7d, 0x7d}, 0x0?)
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/!data!dog/zstd@v1.5.5/zstd_ctx.go:84 +0xd9 fp=0xc0001ed3d8 sp=0xc0001ed388 pc=0x82f219
   github.com/apache/pulsar-client-go/pulsar/internal/compression.(*zstdCGoProvider).Compress(0x0?, {0x0?, 0x413005?, 0x8?}, {0xc00103ad80?, 0x101?, 0xc0001ed4e8?})
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/internal/compression/zstd_cgo.go:64 +0x33 fp=0xc0001ed450 sp=0xc0001ed3d8 pc=0x83d1b3
   github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).updateChunkInfo(0xc0001a1680, 0xc000161b80)
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1155 +0x71 fp=0xc0001ed4f8 sp=0xc0001ed450 pc=0xb64ff1
   github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).internalSendAsync(0xc0001a1680, {0x117c200, 0x188dc80}, 0xc0002285b0, 0xc001015930, 0x0)
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1251 +0x517 fp=0xc0001ed718 sp=0xc0001ed4f8 pc=0xb65ad7
   github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).SendAsync(0xc00026cc80?, {0x117c200?, 0x188dc80?}, 0x413005?, 0xd0?)
   	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1024 +0x25 fp=0xc0001ed758 sp=0xc0001ed718 pc=0xb640a5
   github.com/apache/pulsar-client-go/pulsar.(*producer).SendAsync(0xe71300?, {0x117c200, 0x188dc80}, 0x20?, 0x0?)
   [...]
   ```
   
   #### Steps to reproduce
   
   I don't have test case code, but the configuration for the crashing producer is
   
   	pulsar.ProducerOptions{
   		Topic:              <topic>,
   		Name:               <instanceID>,
   		CompressionType:    pulsar.ZSTD,
   		CompressionLevel:   pulsar.Better,
   		DisableBatching:    true,
   		SendTimeout:        600*time.Second,
   		MaxPendingMessages: 5000,
   	}
   
   I have other producers in the system which have batching enabled and zstd compression also enabled. These are not crashing in 0.12.0.
   
   #### System configuration
   
   **Pulsar version**: 3.1.2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SIGSEGV in 0.12.0 with batching disabled and zstd compression enabled [pulsar-client-go]

Posted by "RobertIndie (via GitHub)" <gi...@apache.org>.
RobertIndie commented on issue #1163:
URL: https://github.com/apache/pulsar-client-go/issues/1163#issuecomment-1916481731

   Could you provide the reproducible code? I couldn't reproduce it. And could you provide the OS env where you are running the go client?
   
   > I have other producers in the system which have batching enabled and zstd compression also enabled. These are not crashing in 0.12.0.
   
   Do you mean that only this producer would crash?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SIGSEGV in 0.12.0 with zstd compression enabled [pulsar-client-go]

Posted by "0x4500 (via GitHub)" <gi...@apache.org>.
0x4500 commented on issue #1163:
URL: https://github.com/apache/pulsar-client-go/issues/1163#issuecomment-1916907487

   In the zstd code, I see this comment (in `zstd_context.go`):
   
       //  Note 2 : In multi-threaded environments,
       //         use one different context per thread for parallel execution.
   
   Our calling code uses multiple goroutines and will be multi-threaded at the OS level.
   
   In v0.11 of the pulsar-client-go library, compression always occurred in the function `internalSend()`, which runs in a single goroutine (= single thread) since it is only ever called from `runEventsLoop()`.
   
   In v0.12, compression occurs in the `internalSendAsync()` function, which runs in whatever goroutine the calling code is executing in. I think this means that the library is no longer fulfilling the requirement to use a unique zstd context per thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SIGSEGV in 0.12.0 with batching disabled and zstd compression enabled [pulsar-client-go]

Posted by "0x4500 (via GitHub)" <gi...@apache.org>.
0x4500 commented on issue #1163:
URL: https://github.com/apache/pulsar-client-go/issues/1163#issuecomment-1916859493

   I can reproduce this issue when running our code from the command line, using Ubuntu 23.10, so it does not seem like alpine/musl is the issue.
   
   It still isn't at all clear why this is happening. The only extra information I have is that sometimes, this message will be generated instead of a segfault:
   
       FATA[0001] Failed to compress                            error="Src size is incorrect"
   
   I tried changing the code in `updateChunkInfo()` to:
   
       payloadCopy := make([]byte, len(sr.uncompressedPayload))
       copy(payloadCopy, sr.uncompressedPayload)
       sr.compressedPayload = p.compressionProvider.Compress(nil, payloadCopy)
   
   This does not change the outcome at all. I also tried a similar copy in our code (just in case we were re-using something somewhere), and that did not help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SIGSEGV in 0.12.0 with zstd compression enabled, when producer is shared between multiple goroutines [pulsar-client-go]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat closed issue #1163: SIGSEGV in 0.12.0 with zstd compression enabled, when producer is shared between multiple goroutines
URL: https://github.com/apache/pulsar-client-go/issues/1163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SIGSEGV in 0.12.0 with batching disabled and zstd compression enabled [pulsar-client-go]

Posted by "0x4500 (via GitHub)" <gi...@apache.org>.
0x4500 commented on issue #1163:
URL: https://github.com/apache/pulsar-client-go/issues/1163#issuecomment-1916728040

   Unfortunately I am not able to provide a minimal test case. This is a production system, so we rolled back to 0.11.0. If this means that there isn't enough information in this report to isolate the issue then I totally understand.
   
   Yes, only the producer with this particular configuration would crash. There are several other producers in the overall system which were not crashing when upgraded to 0.12.0. Those other producers all have batching enabled (and zstd compression also enabled, at the `Better` level).
   
   This code is running in a container based on the `alpine:3.19.1` OS image. Alpine uses the musl library instead of glibc, which could be relevant.
   
   Looking through the code myself, I cannot see how this crash could occur (but it does). The line which leads to the segfault is
   
       sr.compressedPayload = p.compressionProvider.Compress(nil, sr.uncompressedPayload)
   
   That code appears just to call the zstd compression function with `sr.uncompressedPayload`. Unless there is some way that `sr.uncompressedPayload` could be modified by another goroutine, or we are doing something strange in our code and reusing the payload before the call is complete, I don't understand how this could cause a crash.
   
   We are producing the payload simply as
   
       	body, err := proto.Marshal(task)
       	if err != nil {
   		return err
   	}
   
   and then later calling
   
          	pulsarProducer.SendAsync(prod.ctx, &pulsar.ProducerMessage{Payload: body}, callback)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org