You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "rvansa (via GitHub)" <gi...@apache.org> on 2023/04/20 06:28:35 UTC

[GitHub] [kafka] rvansa opened a new pull request, #13619: Initial support for OpenJDK CRaC snapshotting

rvansa opened a new pull request, #13619:
URL: https://github.com/apache/kafka/pull/13619

   This change intends to support an application using Vertx GRPC server to perform the Checkpoint and Restore on JVM implementing this, specifically using [OpenJDK CRaC](https://github.com/openjdk/crac/tree/crac) or future versions of OpenJDK. Package org.crac is a facade that either forwards the invocation to actual implementation or provides a no-op implementation.
   
   Right now a test of the actual behaviour is not provided; without running on CRaC-enabled JVM there's nothing that would invoke the `Resource` methods, and making this a part of testsuite would be complicated (probably through a containerized test). If needed, I could try to put together a test not involving the actual checkpoint, that would verify that the code does not deadlock and that connections are eventually re-created.
   
   It is not entirely clear to me what level of API publicity are the `KafkaClient` and `Selectable` interfaces and in what version could this land, or if I should do the changes only on the implementation classes.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

RE: [GitHub] [kafka] github-actions[bot] commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by mi...@votecgroup.com.

Hi Team,

Greetings,

We actually reached out to you for Oracle/ IT / SAP / Infor / Microsoft "VOTEC IT SERVICE PARTNERSHIP"  "IT SERVICE OUTSOURCING" " "PARTNER SERVICE SUBCONTRACTING"

We have very attractive newly introduce reasonably price PARTNER IT SERVICE ODC SUBCONTRACTING MODEL in USA, Philippines, India and Singapore etc with White Label Model.

Our LOW COST IT SERVICE ODC MODEL eliminate the cost of expensive employee payroll, Help partner to get profit more than 50% on each project.. ..We really mean it.

We are already working with platinum partner like NTT DATA, NEC Singapore, Deloitte, Hitachi consulting. ACCENTURE, Abeam Singapore etc.

Are u keen to understand VOTEC IT PARTNERSHIP offerings? Looping KB kailash@votecgroup.com | Partnership In charge |

Let us know your availability this week OR Next week??



-----Original Message-----
From: github-actions[bot] (via GitHub) [mailto:git@apache.org] 
Sent: 21 July 2023 09:03
To: jira@kafka.apache.org
Subject: [GitHub] [kafka] github-actions[bot] commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting


github-actions[bot] commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1644936041

   This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has  merge conflicts, please update it with the latest from trunk (or appropriate release branch) <p> If this PR is no longer valid or desired, please feel free to close it. If no activity occurrs in the next 30 days, it will be automatically closed.


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] rvansa commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "rvansa (via GitHub)" <gi...@apache.org>.

rvansa commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1516168111

   Hi @Hangleton , in this PR I am not addressing the broker but a client. Since I am not that familiar with the whole project, I am following the whack-a-mole strategy; in my case I am trying to demo a C/R of [Quarkus Super-Heroes](https://quarkus.io/quarkus-workshops/super-heroes/) example application which uses Kafka Clients to report some data to another component.
   
   I can imagine that in case of the broker you don't see the need for frequent scaling as it would be counter-productive. In case of applications that don't hold (Kafka) state, startup in seconds is quite a long time, especially if we're talking about serverless architectures and similar. Quarkus in native mode strives for sub-second startup, with CRaC we can get to tens of milliseconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] Hangleton commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "Hangleton (via GitHub)" <gi...@apache.org>.

Hangleton commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1517679958

   Hi, Radim,
   
   Thank you for the follow-up and clarifying, I missed the fact that the targeted components are the Kafka clients. 
   
   Some of the previous statements regarding state may still be valid. Typically, a Kafka client holds cluster and topic metadata and one of its first operation on start-up, once a connection with a bootstrap broker is established, is to fetch these metadata to get an up-to-date view of the cluster (e.g. broker membership).
   
   But, I lack the background to fully understand this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] github-actions[bot] commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1644936041

   This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has  merge conflicts, please update it with the latest from trunk (or appropriate release branch) <p> If this PR is no longer valid or desired, please feel free to close it. If no activity occurrs in the next 30 days, it will be automatically closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] divijvaidya commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "divijvaidya (via GitHub)" <gi...@apache.org>.

divijvaidya commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1516093525

Hi @rvansa
Thank you for your first contribution to Apache Kafka! Thanks to you, today I learnt something new (CRaC).

For an effective discussion on this topic, starting a discussion thread with the community at dev@kafka.apache.org (preferably accompanied with a KIP)? I am asking for this because we need to dig more into the problem that this is solving for us, alternative solutions and weigh the tradeoffs of using a non-GA feature of OpenJDK (and I understand that you have tried to address this using the facade package). You can find the process of creating a KIP at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

My initial thoughts (based on limited understanding, please correct me if I am wrong):
- What is the downside of adding checkpoint & restore to the producer threads? What expense does it add?
- Looks like we are trying to save resource by suspending the threads (on the producer) that are not actively doing anything and restoring them when we they are needed? Is that right?

Let's talk more on the mailing list.

> It is not entirely clear to me what level of API publicity are the KafkaClient and Selectable interface

In general, you can find all public APIs for Kafka at https://kafka.apache.org/34/javadoc/allclasses-index.html

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] Initial support for OpenJDK CRaC snapshotting [kafka]

Posted by "tzvetkovg (via GitHub)" <gi...@apache.org>.

tzvetkovg commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1913574903

   hi, any development on this? we've been using the the spring kafka reactive streams and recently started testing azul crac jdk. I wonder is there any way to gracefully pause the kafka stream before checkpointing and then re-enable it after? thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] Hangleton commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "Hangleton (via GitHub)" <gi...@apache.org>.

Hangleton commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1516151278

Hi, Radim,

Thanks for the PR. This would definitely need a KIP.

I am quite curious about the use cases driving this. Some problems I see with the approach:

- Apache Kafka brokers have an extensive amount of states and cached metadata at any given time. This state is often not applicable upon application restart. For instance, by restoring the previous state of the application, replica states, leader epochs, consumer ids would be stale. This list is far from exhaustive.
- Active connections are used by the controller and quorum to determine the liveness of a broker. Once these connections are closed, the broker is reportedly dead and any new process of the broker comes with a new identity either in the form of a Zookeeper session or broker incarnation id. This conflicts with the re-use of the state hold by a defunct broker process.
- A critical part of Apache Kafka's startup process is to verify the integrity of the data stored on the brokers. Based on on the applicative checkpoints read upon start, Kafka can perform index and state reconstruction by scanning replica logs. Without this, we would remove the current integrity guarantees which the recovery path provides at startup.

Unless the broker needs to take the recovery actions stated above, start-up of a broker is usually fairly fast, a matter of seconds in common use cases. What pushes the need for the feature involved in this PR?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] Initial support for OpenJDK CRaC snapshotting [kafka]

Posted by "rvansa (via GitHub)" <gi...@apache.org>.

rvansa commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1914437381

   Hi @tzvetkovg , there was not so much reaction so far from the community, so basically - we need your feedback. You can use the artifacts at https://mvnrepository.com/artifact/io.github.crac.org.apache.kafka/kafka-clients/3.3.1.CRAC.0 - kind of preview until Kafka decides to integrate things. This should be working for the sender; TBH I haven't integrated any hooks for event receiver. Your feedback is welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] rvansa commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "rvansa (via GitHub)" <gi...@apache.org>.

rvansa commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1517992031

   @Hangleton You're right, this process should be repeated; for example in the JDK itself we flush DNS caches before checkpoint. I was hoping that the code in https://github.com/apache/kafka/pull/13619/files#diff-dcc1af531d191de8da1e23ad6d878a3efc463ba4670dbcf2896295a9dacd1c18R658 would reload the cluster view; is that not the case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [kafka] rvansa commented on pull request #13619: Initial support for OpenJDK CRaC snapshotting

Posted by "rvansa (via GitHub)" <gi...@apache.org>.

rvansa commented on PR #13619:
URL: https://github.com/apache/kafka/pull/13619#issuecomment-1516133197

   Sure, thanks for the pointers! I'll go through the docs and compose a proposal on the mailing list. If you don't mind I'll keep this PR open in the meantime.
   
   > What is the downside of adding checkpoint & restore to the producer threads? What expense does it add?
   
   My take on this is that unless the checkpoint itself is performed, there shouldn't be any performance overhead (or very minimal). In case of this PR the sender performs a volatile read in the loop, which is cheap (unless contended with frequent writes). Also, usually some components that need to handle the checkpoint process need little bit of memory for tracking, but usually applications have only one or few instances of each component. On the other hand the cost of checkpoint itself can be significant as this happens in a controlled manner, sometimes even out of production environment.
   
   > Looks like we are trying to save resource by suspending the threads (on the producer) that are not actively doing anything and restoring them when we they are needed? Is that right?
   
   The sender thread is paused, but not for saving resources but only to achieve correctness. Before performing the checkpoint we need to close all network connections, and don't want to re-create them unexpectedly, until restore. From my understanding of the code the affected components are used exclusively by the Sender thread (processing requests queues), therefore the most natural and performant option was to block it entirely, rather than trying to synchronize using locks (which would bring non-trivial overhead even without checkpoint). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org