SK-275 // feat: reschedule interrupted bare pods#256
Merged
Conversation
Contributor
Author
f3d2af9 to
43d9de1
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #256 +/- ##
==========================================
- Coverage 78.64% 78.63% -0.02%
==========================================
Files 62 62
Lines 3939 4011 +72
==========================================
+ Hits 3098 3154 +56
- Misses 841 857 +16 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
5ae368c to
5372d03
Compare
8033046 to
866b711
Compare
866b711 to
1bd9b4c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Description and Rationale
How
reschedule_interrupted_bare_podswas added to the Simulation CRD that instructs simkube to reschedule interrupted bare pods; this field is also added toskctl; it defaults to offreschedule_interrupted_bare_podsis set to True, the mutating webhook also intercepts DELETE actions; the admission request for DELETES populates theold_objectfield, which we can use to recreate the pod (note this required creating two different webhooks inside the same webhook configuration).<old-name>-clone-N, whereNis the number of times it's been rescheduled. It is conceivable that this could fail if the new name is too long (probably unlikely, though)static.simkube.io/original-ownerannotation to the pod for easy lookupsanitize_objto take aT: kube::Resourceinstead of aDynamicObject, and moved the GVK-specific stuff out of sanitize_obj and into the watcher. We also strip all of the simkube-specific labels and annotations off the pod, as these will get recreated later (the notable exception is the original owner annotation, which we want to persist -- hence thestatic.simkube.ioprefix).Test Steps
sanitize_objinto the dyn obj watcher meant writing a test there, which was a much larger PITA than expectedhandleentrypoint instead of themutate_podsentry point, this provides more coverage, and enables itests for the reschedule flow--reschedule-interrupted-bare-podsflag).--reschedule-interrupted-bare-podsflag is not present, it behaves as before when a node is removed-clone-1,-clone-2, etc. Confirmed that the lifecycle annotations are applied to the clone pods as expected/desiredOther Notes
I noticed as I was moving around in this code that there is a
simkube.io/skip-local-volume-mountannotation that gets applied to the Simulation CR itself, not to any of the objects within. The code uses this to determine whether to create a "local" volume mount for the running pods. I think the original motivation for this was to have a way to control simulation setup behaviour without having to change the CRD API; I went back and forth for a while as to whether thereschedule_interrupted_bare_podsfield fell into this category, and I ultimately decided it does not. I think you could make an argument that we shouldn't use these annotations at all and that everything should just be stuffed into the simulation CRD, but I want to think about that some more and didn't want to put that into this PR. I created SK-277 to track.I noticed at the end of the simulation, when things are getting cleaned up, this invokes our webhook a whole bunch. I think this is fine but maybe slightly annoying, and may or may not cause a crash. I'm not worried about fixing that right now though.
Maybe we want to have
sanitize_objfilter out simkube-related labels and annotations? This could enable a "simulation of a simulation" style use cases? But I can't honestly think of why we would want to do that right now, we can do that later if it ever becomes relevant.I certify that this PR does not contain any code that has been generated with GitHub Copilot or any other AI-based code generation tool, in accordance with this project's policies.