且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Kubernetes活动-使用Spring Boot为特定端点保留线程/内存

更新时间:2023-09-01 21:08:40

执行器运行状况终结点使用Spring Boot非常方便-在这种情况下,它实在太方便了,因为它执行了比活动性探针所必需的更深入的运行状况检查.为了准备就绪,您想做更深入的检查,而不是活泼性.这个想法是,如果Pod有点不知所措并且无法就绪,那么它将从负载平衡中退出并喘口气.但是,如果失败,它将重新启动.因此,您只需要进行最少的活动检查(应该进行的健康检查是否会调用其他App Health检查).通过同时使用执行器健康状况,繁忙的Pod无法在他们首先被杀死时获得喘息的机会. kubernetes在执行这两个探测时会定期调用http端点,这进一步加剧了您的线程使用问题(请考虑探测上的periodSeconds).

The actuator health endpoint is very convenient with Spring boot - almost too convenient in this context as it does deeper health checks than you necessarily want in a liveness probe. For readiness you want to do deeper checks but not liveness. The idea is that if the Pod is overwhelmed for a bit and fails readiness then it will be withdrawn from the load balancing and get a breather. But if it fails liveness it will be restarted. So you want only minimal checks in liveness (Should Health Checks call other App Health Checks). By using actuator health for both there is no way for your busy Pods to get a breather as they get killed first. And kubernetes is periodically calling the http endpoint in performing both probes, which contributes further to your thread usage problem (do consider the periodSeconds on the probes).

对于您的情况,您可以定义活动命令而不是http探针-

For your case you could define a liveness command and not an http probe - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-command. The command could just check that the Java process is running (so kinda similar to your go-based probe suggestion).

在许多情况下,使用执行器来保持活动状态会很好(请考虑在线程之前遇到不同约束的应用程序,如果您使用反应堆进行异步/非阻塞操作,情况就是这样).您所在的位置可能会引起问题-执行器对诸如消息代理之类的依赖项进行可用性的探查可能是您重新启动过多(在这种情况下是首次部署)的另一个原因.

For many cases using the actuator for liveness would be fine (think apps that hit a different constraint before threads, which would be your case if you went async/non-blocking with the reactive stack). Yours is one where it can cause problems - the actuator's probing of availability for dependencies like message brokers can be another where you get excessive restarts (in that case on first deploy).