A FastAPI backend that accepts a GIS polygon or bounding box, queries locally stored Sentinel-2 satellite imagery and SoilGrids soil property data, and returns soil nutrient predictions using trained machine learning models.
Live API: https://soilscan-sentinel2-api-production.up.railway.app
Interactive docs: https://soilscan-sentinel2-api-production.up.railway.app/docs
| Target | Classes | Model |
|---|---|---|
| Nitrogen (N) | Low / Medium / High | Random Forest |
| Phosphorus (P) | Low / Medium / High | Random Forest |
| Potassium (K) | Low / Medium / High | SVM (RBF) |
| pH | 4.0 β 7.6 (11-class CPR scale) | Random Forest |
The input polygon (GeoJSON or bounding box) is projected to UTM and filled with a regular grid of points at 10 m spacing (matching Sentinel-2 native resolution). Only points that fall inside the polygon boundary are kept.
Polygon boundary
βββββββββββββββββββ
β Β· Β· Β· Β· Β· Β· Β· β
β Β· Β· Β· Β· Β· Β· Β· β β each Β· is a (lon, lat) point 10 m apart
β Β· Β· Β· Β· Β· Β· Β· β
βββββββββββββββββββ
A 1 hectare field produces ~100 sample points. The maximum is capped at 500 points per request (configurable via SOILSCAN_MAX_SAMPLE_POINTS).
For every sample point the extractor performs a coordinate-to-pixel lookup against the local Sentinel-2 GeoTIFF:
- Transform
(lon, lat)from WGS84 β raster CRS (UTM Zone 51N) - Convert the UTM coordinate to a pixel
(row, col)index using rasterio - Read a 3Γ3 pixel window (30Γ30 m neighbourhood) centred on that pixel
- Take
nanmeanacross the 9 pixels as the band value for that point
Sentinel-2 raster (10 m pixels)
βββββ¬ββββ¬ββββ¬ββββ¬ββββ
β β β β β β
βββββΌββββΌββββΌββββΌββββ€
β β β β β β β β β
βββββΌββββΌββββΌββββΌββββ€ β 3Γ3 window read around the matched pixel
β β β β β¦ β β β β β¦ = sample point projected to raster CRS
βββββΌββββΌββββΌββββΌββββ€
β β β β β β β β β
βββββΌββββΌββββΌββββΌββββ€
β β β β β β
βββββ΄ββββ΄ββββ΄ββββ΄ββββ
band_value = nanmean(9 pixels)
This produces a (N, 12) array of band means and a (N, 12) array of temporal standard deviations across tiles β 24 spectral features total.
The same coordinate-to-pixel lookup is applied to locally stored SoilGrids v2 GeoTIFFs (250 m resolution). Six soil properties at two depths (0β5 cm, 5β15 cm):
| Property | Unit | What it captures |
|---|---|---|
phh2o |
pH | Soil acidity / alkalinity |
soc |
dg/kg | Soil organic carbon |
nitrogen |
cg/kg | Total nitrogen stock |
clay |
g/kg | Clay particle fraction |
sand |
g/kg | Sand particle fraction |
cec |
mmol/kg | Cation exchange capacity |
This gives 12 SoilGrids features per point (sg_{property}_{depth}).
A local DEM GeoTIFF is sampled at each point to extract 7 terrain attributes via numpy gradients on an 11Γ11 pixel window. If dem.tif is absent, the API automatically downloads the SRTM 30 m tile from AWS public S3 and saves it to the Volume permanently. If that fails, it falls back to the Open-Elevation API for elevation only.
| Feature | Description |
|---|---|
elevation_m |
Elevation above sea level |
slope_deg |
Steepness of terrain |
aspect_deg |
Direction the slope faces (0=North, clockwise) |
twi |
Topographic Wetness Index β proxy for soil moisture accumulation |
curvature |
Surface concavity/convexity |
northness |
cos(aspect) β how north-facing the slope is |
eastness |
sin(aspect) β how east-facing the slope is |
Ten spectral indices are derived from the raw band values at each point:
| Index | Formula | Captures |
|---|---|---|
| NDVI | (B08βB04)/(B08+B04) | Vegetation density |
| EVI | 2.5Γ(B08βB04)/(B08+6ΓB04β7.5ΓB02+1) | Canopy greenness (soil-adjusted) |
| SAVI | 1.5Γ(B08βB04)/(B08+B04+0.5) | Vegetation with soil correction |
| MSAVI | (2ΓB08+1ββ((2ΓB08+1)Β²β8Γ(B08βB04)))/2 | Modified soil adjustment |
| NDRE | (B8AβB05)/(B8A+B05) | Chlorophyll / nitrogen stress |
| CHL-re | (B8A/B05)β1 | Canopy chlorophyll content |
| BSI | ((B11+B04)β(B08+B02))/((B11+B04)+(B08+B02)) | Bare soil exposure |
| BI | β((B04Β²+B08Β²)/2) | Overall surface brightness |
| NDWI | (B03βB08)/(B03+B08) | Surface water / moisture |
| NDMI | (B08βB11)/(B08+B11) | Dry matter / canopy water |
[ B01β¦B12 (12) ] + [ B01_stdβ¦B12_std (12) ] + [ temp, humidity, altitude (3) ]
+ [ elevationβ¦eastness (7) ] + [ sg_phh2oβ¦sg_cec (12) ]
+ [ NDVIβ¦NDMI (10) ] + [ crop_type (1, one-hot encoded inside pipeline) ]
= 57 input features
The sklearn Pipeline embedded in each .joblib model handles StandardScaler normalisation and OneHotEncoding automatically β no manual preprocessing needed at inference time.
Each of the four models runs independently on all N sample points:
point_1 β Low N, Medium P, Low K, pH 6.4
point_2 β Low N, Medium P, Low K, pH 6.0
point_3 β Low N, High P, Low K, pH 6.4
...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
polygon β dominant: Low N Β· Medium P Β· Low K Β· pH 6.4
distribution: N={Low:1.0} P={Low:0.1, Medium:0.67, High:0.33} ...
The response includes:
dominant_classβ majority prediction across all pointsclass_distributionβ fraction of points per class (spatial variability within the field)mean_probabilityβ average model confidence per class
GET /health
β { "status": "ok" }
GET /predict?minlon=120.590&minlat=16.455&maxlon=120.600&maxlat=16.465&crop_type=cabbage
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
minlon |
float | yes | β | West boundary longitude |
minlat |
float | yes | β | South boundary latitude |
maxlon |
float | yes | β | East boundary longitude |
maxlat |
float | yes | β | North boundary latitude |
crop_type |
string | no | "unknown" |
e.g. cabbage, tomato, potato |
temperature_c |
float | no | 18.0 |
Air temperature in Β°C |
humidity_percent |
float | no | 80.0 |
Relative humidity % |
sample_spacing_m |
float | no | 10.0 |
Grid spacing in metres (5β100) |
{
"polygon": {
"type": "Polygon",
"coordinates": [
[[120.596, 16.462], [120.608, 16.462], [120.608, 16.471], [120.596, 16.471], [120.596, 16.462]]
]
},
"crop_type": "cabbage",
"temperature_c": 18.0,
"humidity_percent": 80.0,
"sample_spacing_m": 10.0
}{
"nitrogen": { "dominant_class": "Low (<11 mg/kg)", "class_distribution": {...}, "mean_probability": {...} },
"phosphorus": { "dominant_class": "High (>25 mg/kg)", "class_distribution": {...}, "mean_probability": {...} },
"potassium": { "dominant_class": "Medium (78-156 mg/kg)", "class_distribution": {...}, "mean_probability": {...} },
"ph": { "dominant_class": "6.0", "class_distribution": {...}, "mean_probability": {...} },
"sample_count": 143,
"polygon_area_ha": 1.43,
"warnings": []
}| Code | Meaning |
|---|---|
422 |
Invalid polygon or bbox |
503 |
Sentinel-2 data not found |
New Project β Deploy from GitHub repo β select this repo. Railway builds via Dockerfile.
New β Volume β mount path /mnt/soilscan-data β attach to service.
| Variable | Value |
|---|---|
SOILSCAN_SENTINEL2_DIR |
/mnt/soilscan-data/sentinel2 |
SOILSCAN_SOILGRIDS_DIR |
/mnt/soilscan-data/soilgrids |
SOILSCAN_DEM_PATH |
/mnt/soilscan-data/dem/dem.tif |
SOILSCAN_ADMIN_TOKEN |
<your-secret-token> |
All admin endpoints require the X-Admin-Token header.
Upload preprocessed Sentinel-2 files (Google Drive or direct URL):
POST /admin/download
X-Admin-Token: <token>
{ "url": "<drive-link>", "target": "bands_mean" }
{ "url": "<drive-link>", "target": "bands_std" }Upload SoilGrids as a zip:
POST /admin/unzip
X-Admin-Token: <token>
{ "url": "<drive-link>", "dest_dir": "soilgrids" }Then fix any Windows path issues (if zip was created on Windows):
POST /admin/fix-paths
X-Admin-Token: <token>DEM is auto-downloaded on the first predict request β no manual upload needed.
Check what's on the Volume:
GET /admin/files
GET /admin/lsThe raw .SAFE tiles (~GB each) must be preprocessed into compact GeoTIFFs before upload:
python scripts/preprocess_sentinel2.py \
--safe-dir D:/path/to/SAFE/tiles \
--out-dir data/sentinel2 \
--aoi 120.3 16.2 120.85 16.85
python scripts/clip_sentinel2.py \
--in-dir data/sentinel2 \
--out-dir data/sentinel2_clippedUpload data/sentinel2_clipped/bands_mean.tif and bands_std.tif to Google Drive, then use POST /admin/download.
pip install -r requirements.txt
hypercorn main:app --reload
# API docs: http://localhost:8000/docsPlace data files at data/sentinel2/, data/soilgrids/, data/dem/ or set the SOILSCAN_* env vars.
| Variable | Default | Description |
|---|---|---|
SOILSCAN_SENTINEL2_DIR |
data/sentinel2 |
Path to preprocessed S2 GeoTIFFs |
SOILSCAN_SOILGRIDS_DIR |
data/soilgrids |
Path to SoilGrids GeoTIFFs |
SOILSCAN_DEM_PATH |
data/dem/dem.tif |
Path to DEM GeoTIFF |
SOILSCAN_MODELS_DIR |
models |
Path to .joblib model files |
SOILSCAN_MAX_SAMPLE_POINTS |
500 |
Cap on grid points per request |
SOILSCAN_DEFAULT_TEMPERATURE_C |
18.0 |
Fallback air temperature (Β°C) |
SOILSCAN_DEFAULT_HUMIDITY_PERCENT |
80.0 |
Fallback relative humidity (%) |
SOILSCAN_ADMIN_TOKEN |
(unset) | Token for /admin/* endpoints |