Pillar 02

Technologies

The data infrastructure, AI model suite, and public dashboards designed to take raw monitoring readings and turn them into operational decisions and open releases.

From Measurement to Decision

Data is only useful if the infrastructure can reason over it.

Pillar 01 is designed to produce a great deal of data: continuous sensor telemetry, robot survey imagery, lab analyses, satellite layers. The technology stack is what makes that data actionable. By design, it indexes everything spatially and temporally, runs the AI inference that turns sensor signals into early-warning alerts, and exposes the result through public APIs and dashboards anyone can use.

Three obligations shape every technology choice in this pillar.

Real-time enough to matter

Robot inference at 5 fps. Sensor anomaly detection at 15-minute resolution. The latency budget is set by the ecological response window, which is days at best for bleaching and minutes for storm-pulse contamination.

Open by default

Every dataset is designed to be queryable through a public API. Every dashboard layer is keyed to peer-reviewed thresholds. Every model weight is licensed for release to the research community under a permissive license.

Transferable

Tooling is chosen so a partner site in Year 6 can run the same analysis pipeline on the same data formats without re-implementing the program from scratch. Standards-conformant APIs and well-understood open-source components only.

GIS Database

PostGIS as the spatial backbone.

Every reading, every survey image, and every model prediction is designed to be geolocated. The database is built to handle vector, raster, and time-series data in the same schema, expose it through a standards-conformant public API, and version-control every snapshot the AI models train on.

The database itself is PostGIS on PostgreSQL. The public API is an OGC-API Features endpoint, the open standard for geospatial data published by the Open Geospatial Consortium. Pollution data is structured under the EPA WQX standard, biodiversity records under Darwin Core, and sensor QC under QARTOD. Every model checkpoint is designed to link to a hash-versioned snapshot of its training data, so any published result can be reproduced from its inputs.

AI Model Suite

Seven models, one shared dataset.

The AI stack is designed to train on the same data the program will publish openly. Vision models will run on the robot itself, doing benthic and species classification during the dive. Prediction and forecasting models will run server-side, fusing aggregated telemetry into early-warning alerts. The biorefinery process control model is the one exception, retained as proprietary because it is tied to the biorefinery and bioremediation formulations described under Pillar 03.

Model

Task

Architecture

Benthic segmentation

Coral, algae, and substrate classification from robot imagery

SegFormer-B2

Species classification

Taxonomic identification at the colony scale

EfficientNet-V2-M

Stress index

Pre-bleaching early warning from multi-sensor fusion

XGBoost ensemble

Source attribution

Spatially-resolved pollution source apportionment

PMF 5.0 + spatial regression

Contaminant prediction

DIN and turbidity inferred from sensor proxies

PLS-R

Bloom prediction

Harmful algal bloom forecast at 48 to 72 hours

XGBoost

Biorefinery process control

Composition-driven extraction parameter control

PLS-R with online learning

Benchmark

SegFormer-B2 reaches 78.3% mIoU on Indo-Pacific coral datasets (Jammalamadaka et al., 2023). The same architecture will be fine-tuned on Hawaiian benthic substrate, with results published as the training corpus grows.

Open Releases

Open by default.

The default position for every output of the program is open release under a permissive license. Four categories of work are licensed for publication, each under the license that best fits its use. A narrow retained set, scoped to bioremediation and biorefinery compositions, is described under Pillar 03.

Monitoring datasets

CC0

Continuous water quality telemetry, benthic survey imagery, lab results, and satellite layers, all designed to be queryable through the public API.

Code and APIs

Apache 2.0

The GIS database API, the model training pipelines, and the data ingestion code that will run program operations.

AI model weights

OpenRAIL

Fine-tuned weights for benthic segmentation, species classification, and the open environmental components of source attribution.

Publications and product designs

CC-BY 4.0

Peer-reviewed publications with manuscripts and data archives, plus the STL and design files for every 3D-printed product made from biorefinery filament.

Threshold Layers

Dashboards keyed to WHO standards.

Every threshold on the public dashboard is keyed to the WHO Guidelines for Safe Recreational Water Environments and the WHO Drinking Water Quality Guidelines by design. Federal and state regulatory thresholds are still rendered, but as secondary reference layers. The WHO standard is the design target throughout the program, since it sits at the lower-concentration, more protective end of the available benchmarks for almost every contaminant the program is designed to measure.

The peer-reviewed reasoning behind that choice is laid out under Mission.