Skip to content

Added documentation to support userse#33

Merged
pendingintent merged 1 commit into
mainfrom
pi-update-documents
Jun 5, 2026
Merged

Added documentation to support userse#33
pendingintent merged 1 commit into
mainfrom
pi-update-documents

Conversation

@pendingintent

Copy link
Copy Markdown
Collaborator

Added documents to help support users.

Copilot AI review requested due to automatic review settings June 5, 2026 18:35
@pendingintent pendingintent self-assigned this Jun 5, 2026
@pendingintent pendingintent added the documentation Improvements or additions to documentation label Jun 5, 2026
@pendingintent pendingintent merged commit 6182d41 into main Jun 5, 2026
4 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds user-facing documentation for the synthetic NCT01797120 (PrE0102) dataset package under data/protocol/NCT01797120, including a concise top-level README and additional context inside the test_data folder.

Changes:

  • Added a test_data/README.md describing the synthetic dataset and summarizing the source trial.
  • Added a test_data/FEEDBACK.md capturing reviewer feedback that informed dataset/script adjustments.
  • Simplified data/protocol/NCT01797120/README.md to describe the directory contents and how synthetic data is generated.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
data/protocol/NCT01797120/test_data/README.md New README describing the synthetic dataset and trial summary.
data/protocol/NCT01797120/test_data/FEEDBACK.md New feedback log documenting issues/expectations for SDTM/ADaM-like outputs.
data/protocol/NCT01797120/README.md Replaced the prior detailed dataset summary with a brief directory-level overview.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,34 @@
# Recent feedback resulting in the new script generation for these datasets.

The `../scripts/cdisc_generation_functions.py` file hass been changed in accrodance with the feedback below. All CSV files in this direcetory have been generated with the latest `../scripts/cdisc_generation_functions.py`.
|---------------|-----------|
| RANDOMIZED | This is a value in the Protocol Milestone codelist. The `DSCAT` for the record would be "PROTOCOL MILESTONE" |
| TREATMENT | I don't understand what a record this value is supposed to mean. This is not a value in any of the codelists for `DSDECOD`. Dates are between those for "RANDOMIZED" records and records with other `DSDECOD` values, but are anywhere from a few weeks to a few months after the "RANDOMIZED" record. |
| PROGRESSIVE DISEASE | This is a value in the Completion/Reason for Non-Completion code. The `DSCAT` for the record would be "DISPOSITION EVENT". We would also expect a `DSSCAT` value, probably "STUDY TREATMENT", since in this study, subjects are followed (ideally) until death, even if they've stopped treatment. Most subjects seem to two have records for the same date with `DSDECOD = "PROGESSIVE DISEASE"`, one with `EPOCH = "TREATMENT"` and one with `EPOCH = "FOLLOW-UP"`. This doesn't make sense since any particular date falls into only on EPOCH, and there is only one disposition event of ending treatment for progressive disease. Actually, since there are two treatments in the study, if the treatments were stopped at different times, it would be possible to have two disposition events for ending treatment, one with `DSSCAT = "FULVESTRANT"` and one with `DSSCAT = "EVEROLIMUS"`. |

Reviewers would expect to have `DTHFL` included in the `DM` dataset and `DTHDTC` to be populated if `DTHFL = "Y"`. Admittedly, the fact that a patient died is usually collected in some other domain (probably `DS`), and added to `DM`.

`TRT01A` is an ADaM variable, not an SDTM variable. the arm to which a subject was randomized would be represented in some combination of `ARMCD`, `ARM`, `ACTARMCD`, and `ACTARM`. `ARMCD` and `ARM` are code and text for an arm, as are `ACTARMCD` and `ACTARM`. `ARMCD/ARM` are the same as `ACTARMCD/ACTARM` unless a subject receives no treatment (in which case `ACTARMCD/ACTARM` are null) a subject receives a treatment other than that to which they were randomized. I don't think we need to build the treated-wrong situation into the synthetic data, although I think the study included a couple of subjects who were never treated. What's currently in `TRT01A` would probably be in `ACTARM` and `ARM`.


The `EX` dataset is missing `EXFREQ`.
`EXFREQ` for fulvestrant injections would likely be "ONCE" with a record for each injection, and `EXSTDTC = EXENDTC`. Given the dosing schedule (Cycle 1 Day 1, Cycle 1 Day 15, then Day 1 of every subsequent cycle), the minimum number of records would be two, one for the first two doses given at a frequence of every 14 days, and a second for all the remaining injections, with a frequency of every 28 days. Practically, since patient visits drift off-schedule the one record per dose approach is probably more practical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants