Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
43398e8
featÑ mdPatterns fct
ESCRI11 Oct 27, 2025
987fbcc
add mdPatterns to DATASHIELD
ESCRI11 Oct 27, 2025
16e22aa
Merge pull request #438 from ESCRI11/dev-task-14
StuartWheater Nov 2, 2025
eda4bdc
Initial 'mdPatternDS' tests
StuartWheater Nov 3, 2025
2e87305
Merge pull request #439 from StuartWheater/v6.3.5-dev
StuartWheater Nov 3, 2025
e499d8d
Additional mdPattern tests
StuartWheater Nov 3, 2025
3ba7b4c
Merge pull request #440 from StuartWheater/v6.3.5-dev
StuartWheater Nov 3, 2025
2485816
Increased data 'mdPatternDS' tests
StuartWheater Nov 3, 2025
cf61c2b
Added 'set.standard.disclosure.settings()'
StuartWheater Nov 4, 2025
e80c23f
Merge branch 'v6.3.5-dev' of github.com:StuartWheater/dsBase into v6.…
StuartWheater Nov 4, 2025
da36ab8
Updated 'mdPatternDS' tests
StuartWheater Nov 4, 2025
40e6425
Merge pull request #441 from StuartWheater/v6.3.5-dev
StuartWheater Nov 4, 2025
9b0fec1
Fix version
StuartWheater Nov 21, 2025
2a3b3fe
Update to docs
StuartWheater Nov 21, 2025
81f5f64
Remove nightly scheduled run and update call to parse_test_report.R
villegar Nov 26, 2025
e0d234f
Add session_info_*.txt as one of the log outputs and avoid storing du…
villegar Nov 26, 2025
42efdb6
Update 'perf' support
StuartWheater Nov 30, 2025
7c138f8
Merge branch 'v6.3.5-dev' of github.com:StuartWheater/dsBase into v6.…
StuartWheater Nov 30, 2025
27e5a1f
Merge pull request #446 from StuartWheater/v6.3.5-dev
StuartWheater Nov 30, 2025
998482c
Minor docs update
StuartWheater Nov 30, 2025
b335757
Merge branch 'v6.3.5-dev' of github.com:StuartWheater/dsBase into v6.…
StuartWheater Nov 30, 2025
5ae71ec
Merge pull request #447 from StuartWheater/v6.3.5-dev
StuartWheater Dec 1, 2025
bcfb6ae
Reworking of performance profiles
StuartWheater Jan 6, 2026
5af03c3
Merge branch 'v6.3.5-dev' of github.com:StuartWheater/dsBase into v6.…
StuartWheater Jan 6, 2026
2b201d6
Rework setting of variable
StuartWheater Jan 6, 2026
079a067
Merge pull request #449 from StuartWheater/v6.3.5-dev
StuartWheater Jan 6, 2026
9e3892d
Fixed Type
StuartWheater Jan 7, 2026
ee8d845
Merge branch 'datashield:v6.3.5-dev' into v6.3.5-dev
StuartWheater Jan 7, 2026
e21530a
Merge pull request #450 from StuartWheater/v6.3.5-dev
StuartWheater Jan 7, 2026
56ee2f1
Update test schedual
StuartWheater Feb 5, 2026
8665845
Merge branch 'v6.3.5-dev' of github.com:StuartWheater/dsBase into v6.…
StuartWheater Feb 5, 2026
05c8de2
Update test schedual
StuartWheater Feb 10, 2026
8c83782
Update version
StuartWheater Feb 20, 2026
69d4bb4
Update to glmSLMADS.assign
StuartWheater Feb 20, 2026
2a4a349
Update to documents
StuartWheater Feb 20, 2026
d79c7d2
Removed checking of 'opal'
StuartWheater Feb 20, 2026
1aa5c13
Merge pull request #458 from StuartWheater/v6.3.5-dev
StuartWheater Feb 22, 2026
2f6184e
Merge pull request #459 from datashield/v6.3.5-dev
StuartWheater Feb 23, 2026
fbdfa42
Permit perf test duration to be set, seconds, by environment variable…
StuartWheater Apr 10, 2026
1f12d53
Permit perf test duration to be set, seconds, by environment variable…
StuartWheater Apr 10, 2026
e75797b
Refactor perf test duration obtaining
StuartWheater Apr 13, 2026
6a1b21f
Merge pull request #468 from StuartWheater/v6.3.6-dev_feat-perf-support
StuartWheater Apr 14, 2026
c4ff561
feat: add server-side functions for ds.standardiseDf
timcadman Apr 21, 2026
dd6133f
Added libuv1-dev to deployment
StuartWheater Apr 21, 2026
3b7b9d4
export functions
timcadman Apr 22, 2026
9f94f63
Merge branch 'datashield:v6.3.6-dev' into v6.3.6-dev
StuartWheater Apr 22, 2026
90b6f86
Update to perf test suppoer and 'libuv1'
StuartWheater Apr 22, 2026
d40d899
Add 'libuv1'
StuartWheater Apr 22, 2026
b01c923
Updated version to 'v6.3.6-dev'
StuartWheater Apr 22, 2026
baab62b
Merge pull request #476 from StuartWheater/v6.3.6-dev
StuartWheater Apr 23, 2026
1ce10e8
Merge branch 'v6.3.6-dev' into v6.3.6-dev-feat/standardise-df
StuartWheater Apr 23, 2026
3c64965
Switched to 'RoxygenNote: 8.0.0'
StuartWheater May 13, 2026
a71a4c0
feat: standardise-df
timcadman May 14, 2026
15dd8c4
Merge branch 'datashield:v6.3.6-dev' into v6.3.6-dev
StuartWheater May 14, 2026
69a7c12
Merge pull request #481 from StuartWheater/v6.3.6-dev
StuartWheater May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
sudo apt-get install -y r-base-core cmake
- run:
command: |
sudo apt-get install -y libxml2-dev
sudo apt-get install -y libxml2-dev libuv1-dev
- run:
command: |
echo "options(Ncpus=4)" >> ~/.Rprofile
Expand Down
11 changes: 6 additions & 5 deletions .github/workflows/dsBase_test_suite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ on:
push:
schedule:
- cron: '0 0 * * 0' # Weekly
- cron: '0 1 * * *' # Nightly

jobs:
dsBase_test_suite:
Expand Down Expand Up @@ -153,28 +152,30 @@ jobs:
echo "branch:${{ env.BRANCH_NAME }}" > ${{ env.WORKFLOW_ID }}.txt
echo "os:$(lsb_release -ds)" >> ${{ env.WORKFLOW_ID }}.txt
echo "R:$(R --version | head -n1)" >> ${{ env.WORKFLOW_ID }}.txt
Rscript --vanilla -e 'sessionInfo()' >> session_info_${{ env.WORKFLOW_ID }}.txt
working-directory: dsBase/logs

- name: Parse results from testthat and covr
run: |
Rscript --verbose --vanilla ../testStatus/source/parse_test_report.R logs/
Rscript --verbose --vanilla ../testStatus/source/parse_test_report.R logs/ logs/ https://github.com/datashield/${{ env.PROJECT_NAME }}/blob/${{ env.BRANCH_NAME }} '[^-:.]+' '(?<=::)[^:]+(?=::)'
working-directory: dsBase
env:
PROJECT_NAME: ${{ env.PROJECT_NAME }}
BRANCH_NAME: ${{ env.BRANCH_NAME }}

- name: Render report
run: |
cd testStatus

mkdir -p new/logs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/
mkdir -p new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/
mkdir -p new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/latest/

# Copy logs to new logs directory location
cp -rv ../${{ env.PROJECT_NAME }}/logs/* new/logs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/
cp -rv ../${{ env.PROJECT_NAME }}/logs/${{ env.WORKFLOW_ID }}.txt new/logs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/

R -e 'input_dir <- file.path("../new/logs", Sys.getenv("PROJECT_NAME"), Sys.getenv("BRANCH_NAME"), Sys.getenv("WORKFLOW_ID")); quarto::quarto_render("source/test_report.qmd", execute_params = list(input_dir = input_dir))'
mv source/test_report.html new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/index.html
cp -r new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/${{ env.WORKFLOW_ID }}/* new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/latest
mv source/test_report.html new/docs/${{ env.PROJECT_NAME }}/${{ env.BRANCH_NAME }}/latest/index.html

env:
PROJECT_NAME: ${{ env.PROJECT_NAME }}
Expand Down
7 changes: 5 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Description: Base 'DataSHIELD' functions for the server side. 'DataSHIELD' is a
been designed to only share non disclosive summary statistics, with built in automated output
checking based on statistical disclosure control. With data sites setting the threshold values for
the automated output checks. For more details, see 'citation("dsBase")'.
Version: 6.3.4
Version: 6.3.6.9000
Authors@R: c(person(given = "Paul",
family = "Burton",
role = c("aut"),
Expand Down Expand Up @@ -65,6 +65,9 @@ Imports:
stringr,
lme4,
dplyr,
tibble,
purrr,
tidyselect,
reshape2,
polycor (>= 0.8),
splines,
Expand All @@ -75,6 +78,6 @@ Imports:
Suggests:
spelling,
testthat
RoxygenNote: 7.3.3
RoxygenNote: 8.0.0
Encoding: UTF-8
Language: en-GB
16 changes: 16 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,12 @@ export(dmtC2SDS)
export(elsplineDS)
export(extractQuantilesDS1)
export(extractQuantilesDS2)
export(fixClassDS)
export(fixColsDS)
export(fixLevelsDS)
export(gamlssDS)
export(getAllLevelsDS)
export(getClassAllColsDS)
export(getWGSRDS)
export(glmDS1)
export(glmDS2)
Expand Down Expand Up @@ -82,6 +87,7 @@ export(matrixDimnamesDS)
export(matrixInvertDS)
export(matrixMultDS)
export(matrixTransposeDS)
export(mdPatternDS)
export(meanDS)
export(meanSdGpDS)
export(mergeDS)
Expand Down Expand Up @@ -138,5 +144,15 @@ import(dplyr)
import(gamlss)
import(gamlss.dist)
import(mice)
importFrom(dplyr,"%>%")
importFrom(dplyr,across)
importFrom(dplyr,mutate)
importFrom(dplyr,select)
importFrom(gamlss.dist,pST3)
importFrom(gamlss.dist,qST3)
importFrom(purrr,imap)
importFrom(purrr,map)
importFrom(purrr,set_names)
importFrom(tibble,as_tibble)
importFrom(tidyselect,all_of)
importFrom(tidyselect,peek_vars)
41 changes: 13 additions & 28 deletions R/glmSLMADS.assign.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,40 +18,25 @@
#' @export
glmSLMADS.assign <- function(formula, family, offsetName, weightsName, dataName){

#############################################################
#MODULE 1: CAPTURE THE nfilter SETTINGS #
thr <- dsBase::listDisclosureSettingsDS() #
nfilter.tab <- as.numeric(thr$nfilter.tab) #
nfilter.glm <- as.numeric(thr$nfilter.glm) #
#nfilter.subset<-as.numeric(thr$nfilter.subset) #
#nfilter.string<-as.numeric(thr$nfilter.string) #
#############################################################
# Convert transmitable text for special link variance combinations back to full representation
if(family=="quasigamma.link_log")
{family<-"quasi(link=log,variance=mu^2)"}

########################################
############
#Convert transmitable text for special link variance combinations back to full representation
if(family=="quasigamma.link_log")
{family<-"quasi(link=log,variance=mu^2)"}
if(family=="Gamma.link_log")
{family<-"Gamma(link=log)"}

if(family=="Gamma.link_log")
{family<-"Gamma(link=log)"}
#############
# Correctly name offset, weights and data objects in function call
# (to allow glmPredict to work correctly later)
calltext <- paste0("mg<-glm(formula,family=",family,",offset=",
offsetName,",weights=",weightsName,",data=", dataName,",x=TRUE)")

#Activate family object (this may not be necessary as character string may already be OK
#but just checking
final.family.object<-eval(parse(text=family))
eval(parse(text=calltext))

# update the call object to include the actual formula
mg$call$formula <- formula

#Correctly name offset, weights and data objects in function call
#(to allow glmPredict to work correctly later)
calltext<-paste0("mg<-glm(formula,family=",family,",offset=",
offsetName,",weights=",weightsName,",data=", dataName,",x=TRUE)")

eval(parse(text=calltext))

return(mg)
return(mg)

}

# ASSIGN FUNCTION
# glmSLMADS.assign
121 changes: 121 additions & 0 deletions R/mdPatternDS.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#'
#' @title Missing data pattern with disclosure control
#' @description This function is a serverside aggregate function that computes the
#' missing data pattern using mice::md.pattern and applies disclosure control to
#' prevent revealing small cell counts.
#' @details This function calls the mice::md.pattern function to generate a matrix
#' showing the missing data patterns in the input data. To ensure disclosure control,
#' any pattern counts that are below the threshold (nfilter.tab, default=3) are
#' suppressed.
#'
#' \strong{Suppression Method:}
#'
#' When a pattern count is below threshold:
#' - Row name is changed to "suppressed(<N>)" where N is the threshold
#' - All pattern values in that row are set to NA
#' - Summary row is also set to NA (prevents back-calculation)
#'
#' \strong{Output Matrix Structure:}
#'
#' - Rows represent different missing data patterns (plus a summary row at the bottom)
#' - Row names contain pattern counts (or "suppressed(<N>)" for invalid patterns)
#' - Columns show 1 if variable is observed, 0 if missing
#' - Last column shows total number of missing values per pattern
#' - Last row shows total number of missing values per variable
#'
#' \strong{Note for Pooling:}
#'
#' When this function is called from ds.mdPattern with type='combine', suppressed
#' patterns are excluded from pooling to prevent disclosure through subtraction.
#' This means pooled counts may underestimate the true total when patterns are
#' suppressed in some studies.
#'
#' @param x a character string specifying the name of a data frame or matrix
#' containing the data to analyze for missing patterns.
#' @return A list containing:
#' \item{pattern}{The missing data pattern matrix with disclosure control applied}
#' \item{valid}{Logical indicating if all patterns meet disclosure requirements}
#' \item{message}{A message describing the validity status}
#' @author Xavier Escribà montagut for DataSHIELD Development Team
#' @import mice
#' @export
#'
mdPatternDS <- function(x){

#############################################################
# MODULE 1: CAPTURE THE nfilter SETTINGS
thr <- dsBase::listDisclosureSettingsDS()
nfilter.tab <- as.numeric(thr$nfilter.tab)
#############################################################

# Parse the input data name with error handling
x.val <- tryCatch(
{
eval(parse(text=x), envir = parent.frame())
},
error = function(e) {
stop(paste0("Object '", x, "' does not exist on the server"), call. = FALSE)
}
)

# Check object class
typ <- class(x.val)

# Check that input is a data frame or matrix
if(!("data.frame" %in% typ || "matrix" %in% typ)){
stop(paste0("The input object must be of type 'data.frame' or 'matrix'. Current type: ",
paste(typ, collapse = ", ")), call. = FALSE)
}

# Use x.val for further processing
x <- x.val

# Call mice::md.pattern with plot=FALSE
pattern <- mice::md.pattern(x, plot = FALSE)

# Apply disclosure control
# Pattern counts are stored in row names (except last row which is empty/summary)
# The last row contains variable-level missing counts

validity <- "valid"
n_patterns <- nrow(pattern) - 1 # exclude the summary row

if(n_patterns > 0){
# Check pattern counts (stored in row names, excluding last row)
pattern_counts <- as.numeric(rownames(pattern)[1:n_patterns])

# Find patterns with counts below threshold
invalid_idx <- which(pattern_counts > 0 & pattern_counts < nfilter.tab)

if(length(invalid_idx) > 0){
validity <- "invalid"

# For invalid patterns, suppress by:
# - Setting row name to "suppressed"
# - Setting all pattern values to NA
rnames <- rownames(pattern)
for(idx in invalid_idx){
rnames[idx] <- paste0("suppressed(<", nfilter.tab, ")")
pattern[idx, ] <- NA
}
rownames(pattern) <- rnames

# Also need to recalculate the last row (summary) if patterns were suppressed
# Set to NA to avoid disclosures
pattern[nrow(pattern), seq_len(ncol(pattern))] <- NA
}
}

# Return the pattern with validity information
return(list(
pattern = pattern,
valid = (validity == "valid"),
message = ifelse(validity == "valid",
"Valid: all pattern counts meet disclosure requirements",
paste0("Invalid: some pattern counts below threshold (",
nfilter.tab, ") have been suppressed"))
))
}

#AGGREGATE FUNCTION
# mdPatternDS
Loading