[codex] Update Darwin Core export metadata#1200
Conversation
6146b4f to
ca0e007
Compare
|
Follow-up update on the same branch:
Validation after this update:
|
ca0e007 to
b8f3fd7
Compare
|
Follow-up filename cleanup:
Validation:
|
b8f3fd7 to
24d4248
Compare
|
Follow-up taxonRank fix:
Validation:
|
|
Follow-up CI fix pushed in 7c4199d.\n\nRoot cause: API specs still expected the old Darwin Core attachment filenames containing |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the NOW Darwin Core export to production-ready metadata/content (DwC-DP + DwC-A bundle), adds standardized missing-value handling (\N), and introduces filtered export support from the UI by POSTing current table filters/IDs to new backend export endpoints.
Changes:
- Frontend export menu items now POST selected/filtered IDs or CrossSearch filters when generating Darwin Core exports, and filenames drop the “test” prefix.
- Backend export generators are enriched with production datapackage/EML metadata, standardized missing values (
\N), and optional filtering of exported records. - Tests updated to reflect production filenames/metadata and new missing-value serialization.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/components/Species/SpeciesDwcExportMenuItem.tsx | POST filtered species IDs for export + updated filename |
| frontend/src/components/Locality/LocalityDwcExportMenuItem.tsx | POST filtered locality IDs for export + updated filename |
| frontend/src/components/Occurrence/OccurrenceDwcExportMenuItem.tsx | POST CrossSearch filters for occurrence export (with progress polling) |
| frontend/src/components/Occurrence/OccurrenceDwcDpExportMenuItem.tsx | POST CrossSearch filters for DwC-DP export |
| frontend/src/components/Occurrence/OccurrenceFullDarwinCoreExportMenuItem.tsx | POST CrossSearch filters for full bundle export |
| backend/src/services/utils/dwcCsv.ts | Serialize missing values as \N |
| backend/src/services/dwcDataPackageExport.ts | Production datapackage/EML metadata, missing values, filtering support |
| backend/src/services/dwcArchiveExport.ts | Production EML metadata, filtering species export |
| backend/src/services/dwcArchiveExportOccurrences.ts | Filtering support for occurrence archive export |
| backend/src/services/dwcArchiveExportLocalities.ts | Filtering support for locality archive export |
| backend/src/services/crossSearch.ts | New helper to derive occurrence keys from CrossSearch filters |
| backend/src/routes/species.ts | Add POST export route with optional ID filtering |
| backend/src/routes/locality.ts | Add POST export route with optional ID filtering |
| backend/src/routes/occurrence.ts | Add POST export routes with CrossSearch-filtered exports |
| backend/src/unit-tests/dwcCsv.test.ts | Tests for \N missing marker |
| backend/src/unit-tests/dwcDataPackageExport.test.ts | Tests for production datapackage metadata + missing values |
| backend/src/api-tests//dwcArchiveExport.test.ts | Filename/header expectation updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -27,7 +38,7 @@ export const OccurrenceDwcExportMenuItem = ({ handleClose }: { handleClose: () = | |||
| url: `${BACKEND_URL}/occurrence/export/dwc-archive?${new URLSearchParams({ exportId })}`, | |||
| progressUrl: `${BACKEND_URL}/occurrence/export/dwc-archive/progress/${exportId}`, | |||
| filename, | |||
| fetchOptions, | |||
| fetchOptions: filteredFetchOptions, | |||
| const parseNumericIds = (value: unknown): number[] | undefined => { | ||
| if (value === undefined) return undefined | ||
| if (!Array.isArray(value)) throw new Error('ids must be an array.') | ||
| return value.map(id => { | ||
| const parsed = typeof id === 'number' ? id : typeof id === 'string' ? parseInt(id, 10) : NaN | ||
| if (!Number.isInteger(parsed)) throw new Error('ids must contain only integers.') | ||
| return parsed | ||
| }) | ||
| } |
| router.post('/export/dwc-archive', requireOneOf([Role.Admin]), async (req, res) => { | ||
| try { | ||
| return await sendDwcArchive(parseNumericIds((req.body as { ids?: unknown }).ids), res) | ||
| } catch (error) { | ||
| return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' }) | ||
| } | ||
| }) |
| const parseNumericIds = (value: unknown): number[] | undefined => { | ||
| if (value === undefined) return undefined | ||
| if (!Array.isArray(value)) throw new Error('ids must be an array.') | ||
| return value.map(id => { | ||
| const parsed = typeof id === 'number' ? id : typeof id === 'string' ? parseInt(id, 10) : NaN | ||
| if (!Number.isInteger(parsed)) throw new Error('ids must contain only integers.') | ||
| return parsed | ||
| }) | ||
| } |
| router.post('/export/dwc-archive', requireOneOf([Role.Admin]), async (req, res) => { | ||
| try { | ||
| return await sendDwcArchive(parseNumericIds((req.body as { ids?: unknown }).ids), res) | ||
| } catch (error) { | ||
| return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' }) | ||
| } | ||
| }) |
| const handleExportFilterError = (error: unknown, res: Response) => { | ||
| return res.status(403).send({ error: error instanceof Error ? error.message : 'Invalid export filters.' }) | ||
| } |
| const resultPages = (await getCrossSearchRawSql( | ||
| user, | ||
| undefined, | ||
| undefined, | ||
| validatedColumnFilters, | ||
| validatedSorting | ||
| )) as Array<Array<Partial<CrossSearch>>> | ||
|
|
||
| const keysById = new Map<string, { lid: number; speciesId: number }>() | ||
| for (const row of resultPages.flat()) { | ||
| if (typeof row.lid_now_loc !== 'number' || typeof row.species_id_com_species !== 'number') continue | ||
| const key = { lid: row.lid_now_loc, speciesId: row.species_id_com_species } | ||
| keysById.set(`${key.lid}:${key.speciesId}`, key) | ||
| } |
| const fetchOccurrencesForDwcDataPackageExport = async ( | ||
| occurrenceKeys?: DwcOccurrenceKey[] | ||
| ): Promise<OccurrenceForDwcDpExport[]> => { | ||
| if (occurrenceKeys && occurrenceKeys.length === 0) return [] | ||
| const { nowDb } = await import('../utils/db') | ||
| const occurrences = await nowDb.now_ls.findMany({ | ||
| where: occurrenceKeys | ||
| ? { | ||
| OR: occurrenceKeys.map(key => ({ | ||
| lid: key.lid, | ||
| species_id: key.speciesId, | ||
| })), | ||
| } | ||
| : undefined, | ||
| orderBy: [{ lid: 'asc' }, { species_id: 'asc' }], | ||
| select: occurrenceSelect, | ||
| }) | ||
|
|
||
| return occurrences as unknown as OccurrenceForDwcDpExport[] | ||
| } |
| taxonID: 'Stable NOW taxon identifier; this joins to dwc-a-taxa/taxon.csv in the full export.', | ||
| scientificName: 'Scientific name assembled from curated NOW taxonomic fields.', | ||
| scientificNameAuthorship: 'Scientific name authorship where curated.', | ||
| taxonRank: 'Taxonomic rank when available; currently reserved for future enrichment.', | ||
| identificationVerificationStatus: 'Curated identification status or qualifier.', |
Summary
datapackage.jsongenerator with the production package name/title/version, DOI, CC BY 4.0 license metadata, contributors, keywords, resource descriptions, CSV media declarations, field descriptions, missing value metadata, and assertion/provenance-oriented descriptions.\Nmarker and documents that convention.meta.xmlgenerator minimal and unchanged so encoding, delimiter, rowType, core, and extension mappings remain tooling-focused.Standards and compatibility notes
http://rs.tdwg.org/dwc-dp/1.0/dwc-dp-profile.json,tabular-data-resource,format: csv, andmediatype: text/csv.https://doi.org/10.5281/zenodo.4268068describes the NOW database generally rather than a single frozen export version.Validation
npm run lint:backendnpm run tsc:backendcd backend && npm run test:unitnpm run lintandnpm run tscduring commit creation.Notes
A root-level backend test wrapper was not used for final validation because it attempts Docker-backed API test setup; Docker daemon access was unavailable in the sandbox. No database reset or restore commands were run.