SK-255 // update KWOK stages to include configurable delays#255
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #255 +/- ##
==========================================
- Coverage 78.69% 78.64% -0.05%
==========================================
Files 62 62
Lines 3919 3939 +20
==========================================
+ Hits 3084 3098 +14
- Misses 835 841 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Ok I'm writing this down for my benefit, because I was getting really confused looking at the stage files.
Also, just for reference, we use another label ( @ogorman89 it might (????) be nice to document this in a README in the kwok stages directory? |
| kind: Pod | ||
| selector: | ||
| matchExpressions: | ||
| - key: .metadata.deletionTimestamp |
There was a problem hiding this comment.
This is a dumb nitpick but can we not add quotes unless we need them?
There was a problem hiding this comment.
(And can we also use double-quotes where needed, unless we have to use single-quotes?)
There was a problem hiding this comment.
I'm adding linting rules for this!
There was a problem hiding this comment.
Ok I added what we talked about to a yamllint config and put it in its own config file .config/.yamllint.yaml
There was a problem hiding this comment.
N.B., we already have a .yamllint config in here: https://github.com/acrlabs/simkube/blob/main/.yamllint
However, stuffing this into .config seems nicer than having it at the top level, can you merge the existing file into the one you created?
| delay: | ||
| durationMilliseconds: 1000 | ||
| durationFrom: | ||
| expressionFrom: .metadata.annotations["pod-create.stage.kwok.x-k8s.io/delay"] |
There was a problem hiding this comment.
In pod-complete.yml we pull the value from simkube.kwok.io/stage-complete-time. I don't totally remember why I used this format (if I had to hazard a guess, it's probably because I thought this annotation format was ugly; which, I still think, lol), but I think we should be consistent.
Maybe this is simkube.kwok.io/stage-create-delay and simkube.kwok.io/stage-create-delay-jitter? And similar for stage-running? Or if you have a better idea that's fine too.
We should also add these values into constants.rs so we can reference them later.
There was a problem hiding this comment.
That made sense to me with one minor change: istead of calling it running I am using ready since that maps directly to the kwok stage: e.g. simkube.kwok.io/stage-ready-delay -> pod-ready kwok stage.
There was a problem hiding this comment.
OK I have one dumb extra request here; in #256, for "making my life easier" reasons I changed all the simkube annotation prefixes to be simkube.io; the labels/annotations got renamed to simkube.io/kwok-stage-complete-time, etc. Can you make the corresponding change here? I can rebase on top of your change once mine is ready and that way we don't have messy merge conflicts.
| pub remote_write_endpoint: Option<String>, | ||
|
|
||
| #[arg(long, long_help = "image pull delay in milliseconds", default_value = IMAGE_PULL_DELAY)] | ||
| pub image_pull_delay: i32, |
There was a problem hiding this comment.
nit: why i32 instead of u32? I don't think these can ever be negative.
| pub const DEFAULT_METRICS_SVC_ACCOUNT: &str = "prometheus-k8s"; | ||
| pub const DRIVER_ADMISSION_WEBHOOK_PORT: &str = "8888"; | ||
| pub const SK_LEASE_NAME: &str = "sk-lease"; | ||
| pub const POD_STARTUP_DELAY: &str = "0"; |
There was a problem hiding this comment.
I don't think we should hard-code a bunch of 0 constants in here, it's fine to just write default_value = "0" above.
| pub speed: Option<f64>, | ||
| pub paused_time: Option<DateTime<Utc>>, | ||
| pub hooks: Option<SimulationHooksConfig>, | ||
| pub image_pull_delay: Option<i32>, |
There was a problem hiding this comment.
Optional: it might make sense to pull these out into their own struct (SimulationDelayParameters or something like that)? Similar to how we have the Driver params and the metrics configs.
There was a problem hiding this comment.
Yea that makes sense, I created a SimulationDelayParameters struct to house them.
| )] | ||
| pub remote_write_endpoint: Option<String>, | ||
|
|
||
| #[arg(long, long_help = "image pull delay in milliseconds", default_value = IMAGE_PULL_DELAY)] |
There was a problem hiding this comment.
Optional: it might make sense to put these in their own section/heading under the help output?
There was a problem hiding this comment.
I put them in a section called Delays, it could use a better name.
There was a problem hiding this comment.
"Lifecycle Configuration"?
| pub speed: Option<f64>, | ||
| pub paused_time: Option<DateTime<Utc>>, | ||
| pub hooks: Option<SimulationHooksConfig>, | ||
| pub simulation_delay_parameters: SimulationDelayParameters, |
There was a problem hiding this comment.
Maybe also "lifecycle parameters" here?
There was a problem hiding this comment.
I called this lifecycle_params to stay consistent, it looks like we typically use the shorter "_config" and "_params" to be concise.




Description and Rationale
How
pod-createthat includes a delay and jitter to simulate container startup timepod-readyto simulate image pull timedelay: durationMilliseconds: 1000 # indicates the default delay in milliseconds durationFrom: # if provided via annotations overrides the durationMilliseconds expressionFrom: .metadata.annotations["pod-create.stage.kwok.x-k8s.io/delay"] jitterDurationMilliseconds: 5000 # indicates the default jitter in milliseconds jitterDurationFrom: # if provided via annotations overrides the jitterDurationMilliseconds expressionFrom: .metadata.annotations["pod-create.stage.kwok.x-k8s.io/jitter-delay"]Test Steps
Old Stages
New Stages
Other Notes
I think the next step for this feature is a PR to update the annotations which override the defaults set in this PR, the annotations receptors are ready on the KWOK stages but pods will need to be assigned matching matching annotations to override the defaults
[ X ] I certify that this PR does not contain any code that has been generated with GitHub Copilot or any other AI-based code generation tool, in accordance with this project's policies.