Feat: Add Model RF-DETR by DerrickUnleashed · Pull Request #333 · mlverse/torchvision

DerrickUnleashed · 2026-06-16T21:36:07Z

This PR adds :

Real-Time Detection Transformers Model (Object Detection Model)
Test Suite for the same

Closes #327

DerrickUnleashed · 2026-06-16T21:40:13Z

=======================================
Running inference with: model_rfdetr_base
========================================
Model weights for <rfdetr_base> (~123 MB) will be downloaded and processed if
not already available.
Loaded pretrained weights for <rfdetr_base> (487/487 keys, 0 skipped).
Top detections:
  score=0.8337 label=17 box=[156,37,1440,1148]
  score=0.1323 label=18 box=[156,37,1440,1148]
  score=0.0865 label=23 box=[156,37,1440,1148]
  score=0.0846 label=20 box=[156,37,1440,1148]
  score=0.0698 label=18 box=[164,23,1440,1148]
Detections above 0.3: 1

========================================
Running inference with: model_rfdetr_base_2
========================================
Model weights for <rfdetr_base_2> (~123 MB) will be downloaded and processed
if not already available.
Loaded pretrained weights for <rfdetr_base_2> (487/487 keys, 0 skipped).
Top detections:
  score=0.6605 label=17 box=[156,38,1440,1152]
  score=0.2869 label=18 box=[156,38,1440,1152]
  score=0.1364 label=23 box=[156,38,1440,1152]
  score=0.0740 label=20 box=[156,38,1440,1152]
  score=0.0676 label=16 box=[156,38,1440,1152]
Detections above 0.3: 1

========================================
Running inference with: model_rfdetr_base_o365
========================================
Model weights for <rfdetr_base_o365> (~127 MB) will be downloaded and
processed if not already available.
Loaded pretrained weights for <rfdetr_base_o365> (487/487 keys, 0 skipped).
Top detections:
  score=0.5500 label=140 box=[158,34,1440,1144]
  score=0.4891 label=93 box=[158,34,1440,1144]
  score=0.1077 label=343 box=[158,34,1440,1144]
  score=0.0971 label=321 box=[158,34,1440,1144]
  score=0.0838 label=84 box=[158,34,1440,1144]
Detections above 0.3: 2

========================================
Running inference with: model_rfdetr_large
========================================
Model weights for <rfdetr_large> (~518 MB) will be downloaded and processed
if not already available.
Loaded pretrained weights for <rfdetr_large> (533/533 keys, 0 skipped).
Top detections:
  score=0.9274 label=18 box=[156,40,1444,1136]
  score=0.0538 label=18 box=[163,36,1440,1137]
  score=0.0460 label=19 box=[156,40,1444,1136]
  score=0.0403 label=17 box=[163,36,1440,1137]
  score=0.0336 label=11 box=[156,40,1444,1136]
Detections above 0.3: 1

========================================
Running inference with: model_rfdetr_medium
========================================
Model weights for <rfdetr_medium> (~116 MB) will be downloaded and processed
if not already available.
Loaded pretrained weights for <rfdetr_medium> (465/465 keys, 0 skipped).
Top detections:
  score=0.3381 label=17 box=[155,35,1460,1158]
  score=0.2565 label=18 box=[155,35,1460,1158]
  score=0.1775 label=23 box=[155,35,1460,1158]
  score=0.1350 label=23 box=[157,34,1461,1156]
  score=0.0925 label=17 box=[157,34,1461,1156]
Detections above 0.3: 1

========================================
Running inference with: model_rfdetr_nano
========================================
Model weights for <rfdetr_nano> (~116 MB) will be downloaded and processed if
not already available.
Loaded pretrained weights for <rfdetr_nano> (465/465 keys, 0 skipped).
Top detections:
  score=0.3381 label=17 box=[155,35,1460,1158]
  score=0.2565 label=18 box=[155,35,1460,1158]
  score=0.1775 label=23 box=[155,35,1460,1158]
  score=0.1350 label=23 box=[157,34,1461,1156]
  score=0.0925 label=17 box=[157,34,1461,1156]
Detections above 0.3: 1

========================================
Running inference with: model_rfdetr_small
========================================
Model weights for <rfdetr_small> (~116 MB) will be downloaded and processed
if not already available.
Loaded pretrained weights for <rfdetr_small> (465/465 keys, 0 skipped).
Top detections:
  score=0.5835 label=17 box=[154,34,1459,1155]
  score=0.4582 label=18 box=[154,34,1459,1155]
  score=0.3348 label=23 box=[154,34,1459,1155]
  score=0.0870 label=64 box=[154,34,1459,1155]
  score=0.0798 label=20 box=[154,34,1459,1155]
Detections above 0.3: 3

DerrickUnleashed · 2026-06-23T17:12:54Z

Can we implement a vignette for this perhaps? @cregouby

cregouby

praise This is massive, thanks for it
todo see inline.

cregouby · 2026-06-22T06:47:50Z

improvement Could you move the added line into the ## New models section ?

cregouby · 2026-06-22T07:06:01Z

todo missing Please add a representative example to the model documentation.
todo missing Please mention the attribution in a code comment # This code is modified from ...
todo Please fix merge conflicts

cregouby · 2026-07-04T09:44:25Z

improvement Could we be more specific on each tests through getting a little deeper than the output shape ?
suggestion You should use the expect_coco_model_detects_cat(model) for each and every pretrained model. See code in

torchvision/tests/testthat/helper-torchvision.R

Lines 69 to 88 in 536a80a

expect_coco_model_detects_cat <- function(model, min_score = 0.25) {

input <- base_loader("assets/class/cat/cat.2.jpg") %>%

transform_to_tensor() %>%

transform_resize(c(640, 640)) %>%

transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225)) %>%

torch::torch_unsqueeze(1)

model$eval()

torch::with_no_grad({

out <- model(input)

})

expect_named(out, "detections")

expect_named(out$detections[[1]], c("boxes", "labels", "scores"), ignore.order = TRUE)

expect_equal(out$detections[[1]]$boxes$shape[2], 4L)

labels_vec <- as.integer(out$detections[[1]]$labels$cpu())

scores_vec <- as.numeric(out$detections[[1]]$scores$cpu())

expect_true(all(labels_vec >= 0 & labels_vec <= 90))

top <- which.max(scores_vec)

expect_equal(labels_vec[top], 17L)

expect_gt(scores_vec[top], min_score)

}

question Could we also test for expect_bbox_to_be_xyxy() from

torchvision/tests/testthat/helper-torchvision.R

Lines 36 to 65 in 536a80a

expect_bbox_is_xyxy <- function(object, width, height) {

expect_tensor(object)

N <- object$shape[1]

expect_tensor_shape(object, c(N, 4))

x_min <- object[, 1]

y_min <- object[, 2]

x_max <- object[, 3]

y_max <- object[, 4]

## bbox range checks

expect_true((x_min >= 0)$all()$item(),

info = "All x_min values must be >= 0.")

expect_true((y_min >= 0)$all()$item(),

info = "All y_min values must be >= 0.")

expect_true((x_max <= torch_tensor(width))$all()$item(),

info = sprintf("All x_max values must be <= width (%s).", width))

expect_true((y_max <= torch_tensor(height))$all()$item(),

info = sprintf("All y_max values must be <= height (%s).", height))

expect_true((x_max > torch_tensor(1))$all()$item(),

info = "x looks like a relative delta and shall be converted back to image width.")

expect_true((y_max > torch_tensor(1))$all()$item(),

info = "y looks like a relative delta and shall be converted back to image height.")

## bbox ordering checks

expect_true((x_min <= x_max)$all()$item(),

info = "Each x_min must be smaller than its x_max.")

expect_true((y_min <= y_max)$all()$item(),

info = "Each y_min must be smaller than its y_max.")

}

cregouby · 2026-07-04T11:05:01Z

+  batch_size <- value$size(1)
+  n_heads <- value$size(2)
+  head_dim <- value$size(3)
+  len_query <- sampling_locations$size(2)
+  n_levels <- sampling_locations$size(4)
+  n_points <- sampling_locations$size(5)


suggestion you could make use of zeallot %<-% for readability

cregouby · 2026-07-04T11:15:03Z

+  }
+)
+
+ms_deform_attn_core_pytorch <- function(value, spatial_shapes, sampling_locations, attention_weights,


thought very strange name knowing that there is zero pytorch in the game... OK, this comes forme the code in roboflow repo. Then please mention the attribution as code coment.

cregouby · 2026-07-04T15:04:44Z

+ms_deform_attn <- nn_module(
+  "ms_deform_attn",
+  initialize = function(d_model = 256, n_levels = 4, n_heads = 8, n_points = 4) {
+    self$d_model <- d_model
+    self$n_levels <- n_levels
+    self$n_heads <- n_heads
+    self$n_points <- n_points
+    self$sampling_offsets <- nn_linear(d_model, n_heads * n_levels * n_points * 2)
+    self$attention_weights <- nn_linear(d_model, n_heads * n_levels * n_points)
+    self$value_proj <- nn_linear(d_model, d_model)
+    self$output_proj <- nn_linear(d_model, d_model)
+    self$reset_parameters()
+  },
+  reset_parameters = function() {
+    nn_init_constant_(self$sampling_offsets$weight, 0)
+    thetas <- torch_arange(0, self$n_heads - 1, dtype = torch_float32()) * (2 * pi / self$n_heads)
+    grid_init <- torch_stack(list(thetas$cos(), thetas$sin()), dim = -1)
+    grid_init <- grid_init / grid_init$abs()$max(dim = -1, keepdim = TRUE)[[1]]
+    grid_init <- grid_init$view(c(self$n_heads, 1, 1, 2))$'repeat'(c(1, self$n_levels, self$n_points, 1))
+    for (i in seq_len(self$n_points)) {
+      grid_init[, , i, ] <- grid_init[, , i, ] * i
+    }
+    self$sampling_offsets$bias <- nn_parameter(grid_init$view(-1))
+    nn_init_constant_(self$attention_weights$weight, 0)
+    nn_init_constant_(self$attention_weights$bias, 0)
+    nn_init_xavier_uniform_(self$value_proj$weight)
+    nn_init_constant_(self$value_proj$bias, 0)
+    nn_init_xavier_uniform_(self$output_proj$weight)
+    nn_init_constant_(self$output_proj$bias, 0)
+  },
+  forward = function(query, reference_points, input_flatten, input_spatial_shapes,
+                     input_level_start_index, input_padding_mask = NULL,
+                     input_spatial_shapes_hw = NULL) {
+    batch_size <- query$size(1)
+    len_query <- query$size(2)
+    value <- self$value_proj(input_flatten)
+    if (!is.null(input_padding_mask)) {
+      value <- value$masked_fill(input_padding_mask$unsqueeze(3), 0)
+    }
+    sampling_offsets <- self$sampling_offsets(query)$view(c(
+      batch_size, len_query, self$n_heads, self$n_levels, self$n_points, 2
+    ))
+    attention_weights <- self$attention_weights(query)$view(c(
+      batch_size, len_query, self$n_heads, self$n_levels * self$n_points
+    ))
+    if (reference_points$size(-1) == 2) {
+      offset_normalizer <- torch_stack(list(
+        input_spatial_shapes[, 2], input_spatial_shapes[, 1]
+      ), dim = -1)
+      sampling_locations <- reference_points$unsqueeze(3)$unsqueeze(5) +
+        sampling_offsets / offset_normalizer$unsqueeze(1)$unsqueeze(1)$unsqueeze(4)
+    } else {
+      sampling_locations <- reference_points[, , NULL, , NULL, 1:2] +
+        sampling_offsets / self$n_points * reference_points[, , NULL, , NULL, 3:4] * 0.5
+    }
+    attention_weights <- nnf_softmax(attention_weights, dim = -1)
+    value <- value$transpose(2, 3)$contiguous()$view(c(
+      batch_size, self$n_heads, self$d_model %/% self$n_heads, -1
+    ))
+    output <- ms_deform_attn_core_pytorch(
+      value, input_spatial_shapes, sampling_locations, attention_weights,
+      input_spatial_shapes_hw
+    )
+    self$output_proj(output)
+  }
+)


todo please factorize this code with

torchvision/R/models-lw_detr.R

Lines 315 to 404 in 8d19bac

lw_detr_ms_deform_attn <- torch::nn_module(

initialize = function(d_model = 256L, n_levels = 1L, n_heads = 8L, n_points = 4L) {

self$n_levels <- n_levels

self$n_heads <- n_heads

self$n_points <- n_points

self$head_dim <- d_model %/% n_heads

self$sampling_offsets <- torch::nn_linear(d_model, n_heads * n_levels * n_points * 2L)

self$attention_weights <- torch::nn_linear(d_model, n_heads * n_levels * n_points)

self$value_proj <- torch::nn_linear(d_model, d_model)

self$output_proj <- torch::nn_linear(d_model, d_model)

torch::with_no_grad({

torch::nn_init_constant_(self$sampling_offsets$weight, 0)

thetas <- torch::torch_arange(n_heads, dtype = torch::torch_float32()) * (2 * pi / n_heads)

grid_init <- torch::torch_stack(list(thetas$cos(), thetas$sin()), dim = -1L)

grid_init <- (grid_init / grid_init$abs()$amax(-1L, keepdim = TRUE))

grid_init <- grid_init$reshape(c(n_heads, 1L, 1L, 2L))$`repeat`(c(1L, n_levels, n_points, 1L))

for (i in seq_len(n_points)) {

grid_init[,, i, ] <- grid_init[,, i, ] * i

}

self$sampling_offsets$bias <- torch::nn_parameter(grid_init$reshape(c(-1L)))

torch::nn_init_constant_(self$attention_weights$weight, 0)

torch::nn_init_constant_(self$attention_weights$bias, 0)

torch::nn_init_xavier_uniform_(self$value_proj$weight)

torch::nn_init_constant_(self$value_proj$bias, 0)

torch::nn_init_xavier_uniform_(self$output_proj$weight)

torch::nn_init_constant_(self$output_proj$bias, 0)

})

},

forward = function(query, reference_points, input_flatten, spatial_shapes, level_start_index, mask = NULL) {

bs <- query$size(1L)

lenq <- query$size(2L)

nh <- self$n_heads

nl <- self$n_levels

np <- self$n_points

hd <- self$head_dim

value <- self$value_proj(input_flatten)

if (!is.null(mask)) {

value <- value$masked_fill(mask$logical_not()$unsqueeze(-1L), 0)

}

offsets <- self$sampling_offsets(query)$reshape(c(bs, lenq, nh, nl, np, 2L))

attn_w <- torch::nnf_softmax(

self$attention_weights(query)$reshape(c(bs, lenq, nh, nl * np)),

dim = -1L

)

ref_xy <- reference_points[,,, 1:2]

ref_wh <- reference_points[,,, 3:4]

ref_xy_exp <- ref_xy$unsqueeze(3L)$unsqueeze(5L)

ref_wh_exp <- ref_wh$unsqueeze(3L)$unsqueeze(5L)

sampling_locs <- ref_xy_exp + offsets / np * ref_wh_exp * 0.5

val_split <- list()

for (lvl in seq_len(nl)) {

h_l <- as.integer(spatial_shapes[lvl, 1])

w_l <- as.integer(spatial_shapes[lvl, 2])

s <- level_start_index[lvl] + 1L

e <- s + h_l * w_l - 1L

val_l <- value[, s:e, ]$reshape(c(bs, h_l, w_l, nh, hd))

val_l <- val_l$permute(c(1L, 4L, 5L, 2L, 3L))$reshape(c(bs * nh, hd, h_l, w_l))

val_split[[lvl]] <- val_l

}

sampling_grids <- 2 * sampling_locs - 1

out_list <- list()

for (lvl in seq_len(nl)) {

grid_l <- sampling_grids[,,, lvl, , ]

grid_l <- grid_l$permute(c(1L, 3L, 2L, 4L, 5L))

grid_l <- grid_l$reshape(c(bs * nh, lenq, np, 2L))

sampled <- torch::nnf_grid_sample(

val_split[[lvl]],

grid_l,

mode = "bilinear",

padding_mode = "zeros",

align_corners = FALSE

)

out_list[[lvl]] <- sampled

}

out_vals <- torch::torch_cat(out_list, dim = -1L)

attn_w2 <- attn_w$permute(c(1L, 3L, 2L, 4L))$reshape(c(bs * nh, 1L, lenq, nl * np))

output <- (out_vals * attn_w2)$sum(-1L)$reshape(c(bs, nh * hd, lenq))

self$output_proj(output$permute(c(1L, 3L, 2L)))

}

)

as the initialization is the same and forward here only additionnaly manages 2D reference_points.
suggestion rename it detr_ms_deform_attn

cregouby · 2026-07-04T16:31:55Z

+        out$pred_boxes <- ref_enc
+      }
+    }
+    out


todo Please make the output data model identical to all other objects detection models so that it helps visualization: we expect out to have a $detections with each detection item having names c("boxes", "labels", "scores")

cregouby · 2026-07-04T16:35:39Z

+mlp_module <- nn_module(
+  "mlp_module",
+  initialize = function(input_dim, hidden_dim, output_dim, num_layers) {
+    self$num_layers <- num_layers
+    h <- rep(hidden_dim, num_layers - 1)
+    dims <- c(input_dim, h, output_dim)
+    self$layers <- nn_module_list(lapply(seq_len(num_layers), function(i) {
+      nn_linear(dims[i], dims[i + 1])
+    }))
+  },
+  forward = function(x) {
+    for (i in seq_len(self$num_layers)) {
+      x <- self$layers[[i]](x)
+      if (i < self$num_layers) x <- nnf_relu(x)
+    }
+    x
+  }
+)
+


suggestion looks like this should be factorized with / reused from

torchvision/R/models-lw_detr.R

Lines 409 to 429 in 8d19bac

# MLP with $layers nn_module_list

.lw_detr_mlp_layers <- torch::nn_module(

initialize = function(input_dim, hidden_dim, output_dim, num_layers) {

dims_in <- c(input_dim, rep(hidden_dim, num_layers - 1L))

dims_out <- c(rep(hidden_dim, num_layers - 1L), output_dim)

self$layers <- torch::nn_module_list(mapply(

function(di, do) torch::nn_linear(di, do),

dims_in,

dims_out,

SIMPLIFY = FALSE

))

self$n <- num_layers

},

forward = function(x) {

for (i in seq_len(self$n)) {

x <- self$layers[[i]](x)

if (i < self$n) x <- torch::nnf_relu(x)

}

x

}

)

DerrickUnleashed added 6 commits June 17, 2026 02:42

Feat: Add RF-DETR Model

1e752f7

Chore: Update documentation

9536fa0

Update doc integration

7e2616e

Chore Add tests

2a967e8

Chore Update NEWS.md

680ce36

Run document()

d565df7

DerrickUnleashed changed the title ~~Feat/model rfdetr~~ Feat: Add Model RF-DETR Jun 16, 2026

DerrickUnleashed and others added 3 commits June 17, 2026 03:44

Fix: rename bottleneck function to resolve conflicts

9fe3d73

revert removal of links in doc

7e91136

Retrigger workflow

2e60c9e

Update NEWS.md

60e5fbf

cregouby requested changes Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Add Model RF-DETR#333

Feat: Add Model RF-DETR#333
DerrickUnleashed wants to merge 10 commits into
mlverse:mainfrom
DerrickUnleashed:feat/modelRfdetr

DerrickUnleashed commented Jun 16, 2026

Uh oh!

DerrickUnleashed commented Jun 16, 2026

Uh oh!

DerrickUnleashed commented Jun 23, 2026 •

edited

Loading

Uh oh!

cregouby left a comment

Uh oh!

cregouby Jun 22, 2026

Uh oh!

cregouby Jun 22, 2026 •

edited

Loading

Uh oh!

cregouby Jul 4, 2026 •

edited

Loading

Uh oh!

cregouby Jul 4, 2026

Uh oh!

cregouby Jul 4, 2026 •

edited

Loading

Uh oh!

cregouby Jul 4, 2026

Uh oh!

cregouby Jul 4, 2026

Uh oh!

cregouby Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	expect_coco_model_detects_cat <- function(model, min_score = 0.25) {
	input <- base_loader("assets/class/cat/cat.2.jpg") %>%
	transform_to_tensor() %>%
	transform_resize(c(640, 640)) %>%
	transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225)) %>%
	torch::torch_unsqueeze(1)
	model$eval()
	torch::with_no_grad({
	out <- model(input)
	})
	expect_named(out, "detections")
	expect_named(out$detections[[1]], c("boxes", "labels", "scores"), ignore.order = TRUE)
	expect_equal(out$detections[[1]]$boxes$shape[2], 4L)
	labels_vec <- as.integer(out$detections[[1]]$labels$cpu())
	scores_vec <- as.numeric(out$detections[[1]]$scores$cpu())
	expect_true(all(labels_vec >= 0 & labels_vec <= 90))
	top <- which.max(scores_vec)
	expect_equal(labels_vec[top], 17L)
	expect_gt(scores_vec[top], min_score)
	}

	expect_bbox_is_xyxy <- function(object, width, height) {
	expect_tensor(object)
	N <- object$shape[1]
	expect_tensor_shape(object, c(N, 4))

	x_min <- object[, 1]
	y_min <- object[, 2]
	x_max <- object[, 3]
	y_max <- object[, 4]

	## bbox range checks
	expect_true((x_min >= 0)$all()$item(),
	info = "All x_min values must be >= 0.")
	expect_true((y_min >= 0)$all()$item(),
	info = "All y_min values must be >= 0.")
	expect_true((x_max <= torch_tensor(width))$all()$item(),
	info = sprintf("All x_max values must be <= width (%s).", width))
	expect_true((y_max <= torch_tensor(height))$all()$item(),
	info = sprintf("All y_max values must be <= height (%s).", height))
	expect_true((x_max > torch_tensor(1))$all()$item(),
	info = "x looks like a relative delta and shall be converted back to image width.")
	expect_true((y_max > torch_tensor(1))$all()$item(),
	info = "y looks like a relative delta and shall be converted back to image height.")
	## bbox ordering checks
	expect_true((x_min <= x_max)$all()$item(),
	info = "Each x_min must be smaller than its x_max.")
	expect_true((y_min <= y_max)$all()$item(),
	info = "Each y_min must be smaller than its y_max.")

	}

	lw_detr_ms_deform_attn <- torch::nn_module(
	initialize = function(d_model = 256L, n_levels = 1L, n_heads = 8L, n_points = 4L) {
	self$n_levels <- n_levels
	self$n_heads <- n_heads
	self$n_points <- n_points
	self$head_dim <- d_model %/% n_heads

	self$sampling_offsets <- torch::nn_linear(d_model, n_heads * n_levels * n_points * 2L)
	self$attention_weights <- torch::nn_linear(d_model, n_heads * n_levels * n_points)
	self$value_proj <- torch::nn_linear(d_model, d_model)
	self$output_proj <- torch::nn_linear(d_model, d_model)

	torch::with_no_grad({
	torch::nn_init_constant_(self$sampling_offsets$weight, 0)
	thetas <- torch::torch_arange(n_heads, dtype = torch::torch_float32()) * (2 * pi / n_heads)
	grid_init <- torch::torch_stack(list(thetas$cos(), thetas$sin()), dim = -1L)
	grid_init <- (grid_init / grid_init$abs()$amax(-1L, keepdim = TRUE))
	grid_init <- grid_init$reshape(c(n_heads, 1L, 1L, 2L))$`repeat`(c(1L, n_levels, n_points, 1L))
	for (i in seq_len(n_points)) {
	grid_init[,, i, ] <- grid_init[,, i, ] * i
	}
	self$sampling_offsets$bias <- torch::nn_parameter(grid_init$reshape(c(-1L)))

	torch::nn_init_constant_(self$attention_weights$weight, 0)
	torch::nn_init_constant_(self$attention_weights$bias, 0)
	torch::nn_init_xavier_uniform_(self$value_proj$weight)
	torch::nn_init_constant_(self$value_proj$bias, 0)
	torch::nn_init_xavier_uniform_(self$output_proj$weight)
	torch::nn_init_constant_(self$output_proj$bias, 0)
	})
	},
	forward = function(query, reference_points, input_flatten, spatial_shapes, level_start_index, mask = NULL) {
	bs <- query$size(1L)
	lenq <- query$size(2L)
	nh <- self$n_heads
	nl <- self$n_levels
	np <- self$n_points
	hd <- self$head_dim

	value <- self$value_proj(input_flatten)
	if (!is.null(mask)) {
	value <- value$masked_fill(mask$logical_not()$unsqueeze(-1L), 0)
	}
	offsets <- self$sampling_offsets(query)$reshape(c(bs, lenq, nh, nl, np, 2L))
	attn_w <- torch::nnf_softmax(
	self$attention_weights(query)$reshape(c(bs, lenq, nh, nl * np)),
	dim = -1L
	)

	ref_xy <- reference_points[,,, 1:2]
	ref_wh <- reference_points[,,, 3:4]
	ref_xy_exp <- ref_xy$unsqueeze(3L)$unsqueeze(5L)
	ref_wh_exp <- ref_wh$unsqueeze(3L)$unsqueeze(5L)
	sampling_locs <- ref_xy_exp + offsets / np * ref_wh_exp * 0.5

	val_split <- list()
	for (lvl in seq_len(nl)) {
	h_l <- as.integer(spatial_shapes[lvl, 1])
	w_l <- as.integer(spatial_shapes[lvl, 2])
	s <- level_start_index[lvl] + 1L
	e <- s + h_l * w_l - 1L
	val_l <- value[, s:e, ]$reshape(c(bs, h_l, w_l, nh, hd))
	val_l <- val_l$permute(c(1L, 4L, 5L, 2L, 3L))$reshape(c(bs * nh, hd, h_l, w_l))
	val_split[[lvl]] <- val_l
	}

	sampling_grids <- 2 * sampling_locs - 1

	out_list <- list()
	for (lvl in seq_len(nl)) {
	grid_l <- sampling_grids[,,, lvl, , ]
	grid_l <- grid_l$permute(c(1L, 3L, 2L, 4L, 5L))
	grid_l <- grid_l$reshape(c(bs * nh, lenq, np, 2L))

	sampled <- torch::nnf_grid_sample(
	val_split[[lvl]],
	grid_l,
	mode = "bilinear",
	padding_mode = "zeros",
	align_corners = FALSE
	)
	out_list[[lvl]] <- sampled
	}

	out_vals <- torch::torch_cat(out_list, dim = -1L)
	attn_w2 <- attn_w$permute(c(1L, 3L, 2L, 4L))$reshape(c(bs * nh, 1L, lenq, nl * np))
	output <- (out_vals * attn_w2)$sum(-1L)$reshape(c(bs, nh * hd, lenq))
	self$output_proj(output$permute(c(1L, 3L, 2L)))
	}
	)

	# MLP with $layers nn_module_list
	.lw_detr_mlp_layers <- torch::nn_module(
	initialize = function(input_dim, hidden_dim, output_dim, num_layers) {
	dims_in <- c(input_dim, rep(hidden_dim, num_layers - 1L))
	dims_out <- c(rep(hidden_dim, num_layers - 1L), output_dim)
	self$layers <- torch::nn_module_list(mapply(
	function(di, do) torch::nn_linear(di, do),
	dims_in,
	dims_out,
	SIMPLIFY = FALSE
	))
	self$n <- num_layers
	},
	forward = function(x) {
	for (i in seq_len(self$n)) {
	x <- self$layers[[i]](x)
	if (i < self$n) x <- torch::nnf_relu(x)
	}
	x
	}
	)

Uh oh!

Conversation

DerrickUnleashed commented Jun 16, 2026

Uh oh!

DerrickUnleashed commented Jun 16, 2026

Uh oh!

DerrickUnleashed commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cregouby left a comment

Choose a reason for hiding this comment

Uh oh!

cregouby Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cregouby Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

cregouby Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DerrickUnleashed commented Jun 23, 2026 •

edited

Loading

cregouby Jun 22, 2026 •

edited

Loading

cregouby Jul 4, 2026 •

edited

Loading

cregouby Jul 4, 2026 •

edited

Loading