Skip to content

[codex] Update Darwin Core export metadata#1200

Draft
karilint wants to merge 3 commits into
mainfrom
metadata-darwincore-export
Draft

[codex] Update Darwin Core export metadata#1200
karilint wants to merge 3 commits into
mainfrom
metadata-darwincore-export

Conversation

@karilint

Copy link
Copy Markdown
Collaborator

Summary

  • Enriches the production Darwin Core export metadata for the combined DwC-DP and DwC-A package.
  • Updates the DwC-DP datapackage.json generator with the production package name/title/version, DOI, CC BY 4.0 license metadata, contributors, keywords, resource descriptions, CSV media declarations, field descriptions, missing value metadata, and assertion/provenance-oriented descriptions.
  • Expands the generated EML for both the DwC-DP relational export and the DwC-A taxon trait archive with scientific scope, rights, citation guidance, coverage, methods, curation/provenance notes, update philosophy, and future interoperability notes.
  • Replaces missing CSV values with the standardized \N marker and documents that convention.
  • Keeps the DwC-A meta.xml generator minimal and unchanged so encoding, delimiter, rowType, core, and extension mappings remain tooling-focused.

Standards and compatibility notes

  • Preserves existing CSV schemas, column names, primary keys, foreign keys, identifiers, assertion tables, and the DwC-A Taxon + MeasurementOrFact structure.
  • Uses the DwC-DP profile URI http://rs.tdwg.org/dwc-dp/1.0/dwc-dp-profile.json, tabular-data-resource, format: csv, and mediatype: text/csv.
  • Clarifies that https://doi.org/10.5281/zenodo.4268068 describes the NOW database generally rather than a single frozen export version.
  • Leaves placeholders/descriptions for future ontology IRIs, semantic predicates, agents, protocols, and richer provenance structures without restructuring the current export.

Validation

  • npm run lint:backend
  • npm run tsc:backend
  • cd backend && npm run test:unit
  • Commit hook also passed broader npm run lint and npm run tsc during commit creation.

Notes

A root-level backend test wrapper was not used for final validation because it attempts Docker-backed API test setup; Docker daemon access was unavailable in the sandbox. No database reset or restore commands were run.

@karilint karilint force-pushed the metadata-darwincore-export branch from 6146b4f to ca0e007 Compare June 11, 2026 13:09

Copy link
Copy Markdown
Collaborator Author

Follow-up update on the same branch:

  • Darwin Core export menu items now send the currently filtered table state instead of requesting unfiltered archives.
  • Species and locality DwC-A exports post the filtered idList from the table context.
  • Occurrence DwC-A, DwC-DP, and full Darwin Core exports post the cross-search columnFilters and sorting, then the backend resolves those to matching (lid, species_id) occurrence keys.
  • Backend builders keep their existing unfiltered behavior for GET/internal callers, but accept optional filtered IDs/keys for the new POST export flow.
  • Full Darwin Core filtered exports derive the taxon DwC-A subset from the filtered occurrence species IDs.

Validation after this update:

  • npm run tsc:backend
  • npm run tsc:frontend
  • npm run lint:backend
  • npm run lint:frontend
  • cd backend && npm run test:unit -- --runTestsByPath src/unit-tests/dwcArchiveExport.test.ts src/unit-tests/dwcArchiveExportLocalities.test.ts src/unit-tests/dwcArchiveExportOccurrences.test.ts src/unit-tests/dwcDataPackageExport.test.ts
  • cd backend && npm run test:unit
  • Amend commit hook also passed full npm run lint and npm run tsc.

@karilint karilint force-pushed the metadata-darwincore-export branch from ca0e007 to b8f3fd7 Compare June 11, 2026 14:02

Copy link
Copy Markdown
Collaborator Author

Follow-up filename cleanup:

  • Removed test from the browser/download ZIP filenames for all Darwin Core exports:
    • species/taxon DwC-A
    • locality DwC-A
    • occurrence DwC-A
    • occurrence DwC-DP
    • full Darwin Core package
  • Updated both backend Content-Disposition names and frontend download filenames so they stay aligned.

Validation:

  • npm run tsc:backend
  • npm run tsc:frontend
  • npm run lint:backend
  • npm run lint:frontend
  • Amend commit hook also passed full npm run lint and npm run tsc.

@karilint karilint force-pushed the metadata-darwincore-export branch from b8f3fd7 to 24d4248 Compare June 11, 2026 14:13

Copy link
Copy Markdown
Collaborator Author

Follow-up taxonRank fix:

  • DwC-DP occurrence.csv now fills taxonRank using the same resolveTaxonRank logic as the DwC-A taxon export.
  • Added a unit assertion that the occurrence mapping emits taxonRank: species for the existing fixture.

Validation:

  • cd backend && npm run test:unit -- --runTestsByPath src/unit-tests/dwcArchiveExport.test.ts src/unit-tests/dwcDataPackageExport.test.ts
  • npm run tsc:backend
  • npm run lint:backend
  • Amend commit hook also passed full npm run lint and npm run tsc.

@karilint

Copy link
Copy Markdown
Collaborator Author

Follow-up CI fix pushed in 7c4199d.\n\nRoot cause: API specs still expected the old Darwin Core attachment filenames containing test_export, while the routes now correctly emit production export filenames without test.\n\nChanges:\n- Updated DwC-A, DwC-DP, and full Darwin Core API filename assertions.\n- Removed remaining test export wording from locality and occurrence DwC-A EML snippets, including package ids.\n\nValidation:\n- npm run tsc:backend passed.\n- npm run lint:backend passed.\n- Commit hook passed full npm run lint and npm run tsc.\n- Targeted API specs were attempted locally against now_test / now_log_test; local login returned 500 before reaching export assertions, while CI showed login passed and only filename assertions failed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the NOW Darwin Core export to production-ready metadata/content (DwC-DP + DwC-A bundle), adds standardized missing-value handling (\N), and introduces filtered export support from the UI by POSTing current table filters/IDs to new backend export endpoints.

Changes:

  • Frontend export menu items now POST selected/filtered IDs or CrossSearch filters when generating Darwin Core exports, and filenames drop the “test” prefix.
  • Backend export generators are enriched with production datapackage/EML metadata, standardized missing values (\N), and optional filtering of exported records.
  • Tests updated to reflect production filenames/metadata and new missing-value serialization.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
frontend/src/components/Species/SpeciesDwcExportMenuItem.tsx POST filtered species IDs for export + updated filename
frontend/src/components/Locality/LocalityDwcExportMenuItem.tsx POST filtered locality IDs for export + updated filename
frontend/src/components/Occurrence/OccurrenceDwcExportMenuItem.tsx POST CrossSearch filters for occurrence export (with progress polling)
frontend/src/components/Occurrence/OccurrenceDwcDpExportMenuItem.tsx POST CrossSearch filters for DwC-DP export
frontend/src/components/Occurrence/OccurrenceFullDarwinCoreExportMenuItem.tsx POST CrossSearch filters for full bundle export
backend/src/services/utils/dwcCsv.ts Serialize missing values as \N
backend/src/services/dwcDataPackageExport.ts Production datapackage/EML metadata, missing values, filtering support
backend/src/services/dwcArchiveExport.ts Production EML metadata, filtering species export
backend/src/services/dwcArchiveExportOccurrences.ts Filtering support for occurrence archive export
backend/src/services/dwcArchiveExportLocalities.ts Filtering support for locality archive export
backend/src/services/crossSearch.ts New helper to derive occurrence keys from CrossSearch filters
backend/src/routes/species.ts Add POST export route with optional ID filtering
backend/src/routes/locality.ts Add POST export route with optional ID filtering
backend/src/routes/occurrence.ts Add POST export routes with CrossSearch-filtered exports
backend/src/unit-tests/dwcCsv.test.ts Tests for \N missing marker
backend/src/unit-tests/dwcDataPackageExport.test.ts Tests for production datapackage metadata + missing values
backend/src/api-tests//dwcArchiveExport.test.ts Filename/header expectation updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 21 to +41
@@ -27,7 +38,7 @@ export const OccurrenceDwcExportMenuItem = ({ handleClose }: { handleClose: () =
url: `${BACKEND_URL}/occurrence/export/dwc-archive?${new URLSearchParams({ exportId })}`,
progressUrl: `${BACKEND_URL}/occurrence/export/dwc-archive/progress/${exportId}`,
filename,
fetchOptions,
fetchOptions: filteredFetchOptions,
Comment on lines +12 to +20
const parseNumericIds = (value: unknown): number[] | undefined => {
if (value === undefined) return undefined
if (!Array.isArray(value)) throw new Error('ids must be an array.')
return value.map(id => {
const parsed = typeof id === 'number' ? id : typeof id === 'string' ? parseInt(id, 10) : NaN
if (!Number.isInteger(parsed)) throw new Error('ids must contain only integers.')
return parsed
})
}
Comment on lines +43 to 49
router.post('/export/dwc-archive', requireOneOf([Role.Admin]), async (req, res) => {
try {
return await sendDwcArchive(parseNumericIds((req.body as { ids?: unknown }).ids), res)
} catch (error) {
return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' })
}
})
Comment on lines +18 to +26
const parseNumericIds = (value: unknown): number[] | undefined => {
if (value === undefined) return undefined
if (!Array.isArray(value)) throw new Error('ids must be an array.')
return value.map(id => {
const parsed = typeof id === 'number' ? id : typeof id === 'string' ? parseInt(id, 10) : NaN
if (!Number.isInteger(parsed)) throw new Error('ids must contain only integers.')
return parsed
})
}
Comment on lines +44 to 50
router.post('/export/dwc-archive', requireOneOf([Role.Admin]), async (req, res) => {
try {
return await sendDwcArchive(parseNumericIds((req.body as { ids?: unknown }).ids), res)
} catch (error) {
return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' })
}
})
Comment on lines +47 to +49
const handleExportFilterError = (error: unknown, res: Response) => {
return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' })
}
Comment on lines +262 to +275
const resultPages = (await getCrossSearchRawSql(
user,
undefined,
undefined,
validatedColumnFilters,
validatedSorting
)) as Array<Array<Partial<CrossSearch>>>

const keysById = new Map<string, { lid: number; speciesId: number }>()
for (const row of resultPages.flat()) {
if (typeof row.lid_now_loc !== 'number' || typeof row.species_id_com_species !== 'number') continue
const key = { lid: row.lid_now_loc, speciesId: row.species_id_com_species }
keysById.set(`${key.lid}:${key.speciesId}`, key)
}
Comment on lines +911 to +930
const fetchOccurrencesForDwcDataPackageExport = async (
occurrenceKeys?: DwcOccurrenceKey[]
): Promise<OccurrenceForDwcDpExport[]> => {
if (occurrenceKeys && occurrenceKeys.length === 0) return []
const { nowDb } = await import('../utils/db')
const occurrences = await nowDb.now_ls.findMany({
where: occurrenceKeys
? {
OR: occurrenceKeys.map(key => ({
lid: key.lid,
species_id: key.speciesId,
})),
}
: undefined,
orderBy: [{ lid: 'asc' }, { species_id: 'asc' }],
select: occurrenceSelect,
})

return occurrences as unknown as OccurrenceForDwcDpExport[]
}
Comment on lines +334 to +338
taxonID: 'Stable NOW taxon identifier; this joins to dwc-a-taxa/taxon.csv in the full export.',
scientificName: 'Scientific name assembled from curated NOW taxonomic fields.',
scientificNameAuthorship: 'Scientific name authorship where curated.',
taxonRank: 'Taxonomic rank when available; currently reserved for future enrichment.',
identificationVerificationStatus: 'Curated identification status or qualifier.',
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants