Add studyDbIds, sampleDbIds, and germplasm filters to POST /search/calls; add studyDbIds to POST /search/allelematrix; formally document AND-intersection semantics
Summary
- Add
studyDbIds to both POST /search/allelematrix and POST /search/calls
- Add
sampleDbIds, germplasmDbIds, germplasmNames, and germplasmPUIs to POST /search/calls (these already exist on AlleleMatrixSearchRequest)
- Formally document AND-intersection semantics for all high-level filters on both endpoints
- Add
dimensionColumnAggregation parameter to override default column aggregation granularity
Background
An allele matrix is a 2D structure where genotype calls are attached to CallSets:
- Row dimension (variants):
Study → VariantSet → Variant → Call
- Column dimension (materials):
Study → Germplasm → Sample → CallSet → Call
The column dimension can be entered at any level: callers may filter by germplasmDbIds, sampleDbIds, or callSetDbIds directly. Note that Germplasm objects do not carry a studyDbId but can however be filtered by that field.
Currently, /search/calls only supports column selection via callSetDbIds, forcing clients to resolve the full Germplasm → Sample → CallSet chain themselves before querying. /search/allelematrix supports germplasm and sample filters but lacks studyDbIds. Intersection semantics for simultaneous use of multiple filters are undefined on both endpoints.
Proposed Changes
New fields on CallSearchRequest
studyDbIds:
type: array
items:
type: string
description: >
Filter results to calls associated with the specified studies.
The server resolves studyDbIds across both dimensions: to VariantSets on the row
dimension, and to CallSets on the column dimension.
Acts as an AND constraint alongside all other filters.
sampleDbIds:
type: array
items:
type: string
description: >
Filter results to calls belonging to CallSets derived from the specified samples.
Acts as an AND constraint alongside all other filters.
germplasmDbIds:
type: array
items:
type: string
description: >
Filter results to calls belonging to CallSets derived from samples associated with
the specified germplasm.
Acts as an AND constraint alongside all other filters.
germplasmNames:
type: array
items:
type: string
description: As germplasmDbIds but matched against germplasm names.
germplasmPUIs:
type: array
items:
type: string
description: As germplasmDbIds but matched against germplasm PUIs.
dimensionColumnAggregation:
type: string
enum: [callSet, sample, germplasm]
description: >
Override the default column aggregation granularity (see Genotype aggregation level below).
When provided, the server groups calls at the specified level regardless of which filter
parameters were used to select the material. For example, filtering by sampleDbIds but
setting dimensionColumnAggregation to "germplasm" will return one aggregated column per
Germplasm rather than one per Sample.
New fields on AlleleMatrixSearchRequest
studyDbIds:
type: array
items:
type: string
description: >
Filter the matrix to calls associated with the specified studies.
The server resolves studyDbIds across both dimensions: to VariantSets on the row
dimension, and to CallSets on the column dimension.
Acts as an AND constraint alongside all other filters.
dimensionColumnAggregation:
type: string
enum: [callSet, sample, germplasm]
description: >
Override the default column aggregation granularity (see Genotype aggregation level below).
When provided, the server groups calls at the specified level regardless of which filter
parameters were used to select the material. For example, filtering by sampleDbIds but
setting dimensionColumnAggregation to "germplasm" will return one aggregated column per
Germplasm rather than one per Sample.
Genotype aggregation level
The biological entity type used to filter the column dimension determines the default granularity at which genotype calls are aggregated and returned:
callSetDbIds: one column per CallSet — finest granularity, no merging.
sampleDbIds: calls are grouped per Sample (aggregating all CallSets of that Sample).
germplasmDbIds / germplasmNames / germplasmPUIs: calls are grouped per Germplasm (aggregating all Samples and their CallSets for that Germplasm).
studyDbIds (column dimension): calls are grouped per Germplasm within the study.
When multiple filters from different tiers are provided simultaneously, the finest-grained tier governs the default aggregation. This default can be overridden using the dimensionColumnAggregation parameter.
Note: this proposal does not define the merging strategy for conflicting allele calls within an aggregation group (e.g. two CallSets of the same Sample carrying different genotypes). This is left to a follow-up discussion.
Intersection semantics (AND logic)
All filters stack as AND constraints across and within dimensions:
- Providing only
studyDbIds is sufficient to retrieve all calls for those studies (both dimensions implicitly resolved).
- Any additional filter further narrows the result within the study scope.
- Filters from different tiers of the same dimension are intersected: e.g.
studyDbIds: ["S1"] + sampleDbIds: ["Samp1"] returns only calls from CallSets of Samp1 if and only if Samp1 belongs to S1.
- Non-overlapping filter combinations return HTTP 200 with an empty
data array (standard BrAPI empty result).
- All pre-existing fields (
variantDbIds, variantSetDbIds, callSetDbIds, etc.) retain their current semantics.
Examples
Minimal study query — all calls for one study, grouped by germplasm (default):
{ "studyDbIds": ["study1"] }
Study query with explicit aggregation override — same query, but return one column per Sample instead of per Germplasm:
{ "studyDbIds": ["study1"], "dimensionColumnAggregation": "sample" }
Study + germplasm — calls restricted to the specified germplasm within the study:
{ "studyDbIds": ["study1"], "germplasmDbIds": ["germ1", "germ2"] }
Sample filter with germplasm-level aggregation — select by sample but aggregate up to germplasm:
{ "sampleDbIds": ["samp1", "samp2"], "dimensionColumnAggregation": "germplasm" }
Study + variantSet — row dimension restricted to variants in both the study and the specified VariantSet:
{ "studyDbIds": ["study1"], "variantSetDbIds": ["vs1"] }
Non-overlapping filters — returns 200 + empty data:
{ "studyDbIds": ["study1"], "sampleDbIds": ["samp_from_study2"] }
Affected endpoints
POST /search/calls — add studyDbIds, sampleDbIds, germplasmDbIds, germplasmNames, germplasmPUIs, dimensionColumnAggregation
POST /search/allelematrix — add studyDbIds, dimensionColumnAggregation
Notes
- Server implementations MAY return
202 Accepted with a search result URL for large result sets, consistent with existing BrAPI async search behaviour.
Add
studyDbIds,sampleDbIds, and germplasm filters toPOST /search/calls; addstudyDbIdstoPOST /search/allelematrix; formally document AND-intersection semanticsSummary
studyDbIdsto bothPOST /search/allelematrixandPOST /search/callssampleDbIds,germplasmDbIds,germplasmNames, andgermplasmPUIstoPOST /search/calls(these already exist onAlleleMatrixSearchRequest)dimensionColumnAggregationparameter to override default column aggregation granularityBackground
An allele matrix is a 2D structure where genotype calls are attached to CallSets:
Study→VariantSet→Variant→CallStudy→Germplasm→Sample→CallSet→CallThe column dimension can be entered at any level: callers may filter by
germplasmDbIds,sampleDbIds, orcallSetDbIdsdirectly. Note that Germplasm objects do not carry astudyDbIdbut can however be filtered by that field.Currently,
/search/callsonly supports column selection viacallSetDbIds, forcing clients to resolve the fullGermplasm → Sample → CallSetchain themselves before querying./search/allelematrixsupports germplasm and sample filters but lacksstudyDbIds. Intersection semantics for simultaneous use of multiple filters are undefined on both endpoints.Proposed Changes
New fields on
CallSearchRequestNew fields on
AlleleMatrixSearchRequestGenotype aggregation level
The biological entity type used to filter the column dimension determines the default granularity at which genotype calls are aggregated and returned:
callSetDbIds: one column per CallSet — finest granularity, no merging.sampleDbIds: calls are grouped per Sample (aggregating all CallSets of that Sample).germplasmDbIds/germplasmNames/germplasmPUIs: calls are grouped per Germplasm (aggregating all Samples and their CallSets for that Germplasm).studyDbIds(column dimension): calls are grouped per Germplasm within the study.When multiple filters from different tiers are provided simultaneously, the finest-grained tier governs the default aggregation. This default can be overridden using the
dimensionColumnAggregationparameter.Intersection semantics (AND logic)
All filters stack as AND constraints across and within dimensions:
studyDbIdsis sufficient to retrieve all calls for those studies (both dimensions implicitly resolved).studyDbIds: ["S1"]+sampleDbIds: ["Samp1"]returns only calls from CallSets ofSamp1if and only ifSamp1belongs toS1.dataarray (standard BrAPI empty result).variantDbIds,variantSetDbIds,callSetDbIds, etc.) retain their current semantics.Examples
Minimal study query — all calls for one study, grouped by germplasm (default):
{ "studyDbIds": ["study1"] }Study query with explicit aggregation override — same query, but return one column per Sample instead of per Germplasm:
{ "studyDbIds": ["study1"], "dimensionColumnAggregation": "sample" }Study + germplasm — calls restricted to the specified germplasm within the study:
{ "studyDbIds": ["study1"], "germplasmDbIds": ["germ1", "germ2"] }Sample filter with germplasm-level aggregation — select by sample but aggregate up to germplasm:
{ "sampleDbIds": ["samp1", "samp2"], "dimensionColumnAggregation": "germplasm" }Study + variantSet — row dimension restricted to variants in both the study and the specified VariantSet:
{ "studyDbIds": ["study1"], "variantSetDbIds": ["vs1"] }Non-overlapping filters — returns 200 + empty data:
{ "studyDbIds": ["study1"], "sampleDbIds": ["samp_from_study2"] }Affected endpoints
POST /search/calls— addstudyDbIds,sampleDbIds,germplasmDbIds,germplasmNames,germplasmPUIs,dimensionColumnAggregationPOST /search/allelematrix— addstudyDbIds,dimensionColumnAggregationNotes
202 Acceptedwith a search result URL for large result sets, consistent with existing BrAPI async search behaviour.