Environmental · LLM

Water Advisor AI

Consumer water-quality advisor for US households. Address resolves to the local utility's EPA Consumer Confidence Report, an LLM parses it into structured records, and the user gets a plain-language summary with recommended treatment — every parse audited end to end.

Tech Stack:Next.jsTypeScriptTailwindPostgreSQLOpenAI

The Problem

Every US public water utility publishes a Consumer Confidence Report (CCR) once a year under EPA mandate. These reports cover contaminants, treatment, and source water — but they are long, inconsistently formatted PDFs written for regulators, not consumers.

For an ordinary household, two practical questions are hard to answer. Which utility actually serves my address? And what does this PDF mean for what comes out of my tap?

A ZIP code is not enough on its own. Many ZIPs straddle multiple municipalities, so the same five-digit input can map to two or more utilities with very different water profiles. Address-level resolution is needed before the report even comes into play.

On top of that, an LLM-driven parser is only useful if you can trust the output. The brief required structured, auditable extraction — not free-text summaries — so quality, cost, and model behaviour can be inspected per request.

The Solution

Water Advisor AI takes a US address (with ZIP fallback), resolves the responsible water utility, retrieves the utility's latest Consumer Confidence Report, and parses it into a structured record with an LLM. The user sees a plain-language summary plus recommended treatment for their water.

The parser produces structured fields — provider, report year, source type (groundwater / surface / mixed), hardness, TDS, chlorine with units, top concerns, recommended treatments, summary, and parser notes — rather than free text.

Underneath, the data model is normalised: one row per parsed report, plus a side table holding one row per contaminant linked to the report. That keeps multi-provider and multi-year queries tractable as coverage grows.

Every parse writes audit columns alongside the data: model used, input tokens, output tokens, and USD cost to six decimals, with a parsed-at timestamp. Quality and economics can be inspected per call.

The stack is Next.js (App Router) + TypeScript + Tailwind on the application side, with PostgreSQL for data. The production LLM is OpenAI, and the parser is built model-agnostic — recorded `model_used` per row means swapping providers is a configuration change, not a re-architecture.

Features

Address → utility → CCR lookup

Address-first resolution (with ZIP fallback) maps the user to the correct water utility and pulls the latest Consumer Confidence Report. Avoids the ZIP-spans-multiple-utilities trap.

LLM parsing of CCR PDFs

Long, inconsistent CCR PDFs are parsed into structured records — provider, year, source type, hardness, TDS, chlorine with units, concerns, treatments, and a plain-language summary.

Consumer-readable advice

The user sees a plain-language water-quality summary plus recommended treatment options, grounded in their utility's parsed report rather than generic guidance.

Normalised data model

PostgreSQL schema separates the report-level record from per-contaminant rows. Multi-provider and multi-year coverage stays clean as the dataset grows.

Per-parse audit columns

Every LLM parse records model used, input tokens, output tokens, and USD cost to six-decimal precision. Quality and economics auditable per request.

Model-agnostic parser

OpenAI runs in production; the schema records `model_used` per row, so swapping in Anthropic, Gemini, or any other LLM is a configuration change rather than a rewrite.

Results / Impact

Address → advice live

consumer pipeline shipped to production and in continuous development.

Per-parse cost & quality audit

model, input/output tokens, USD to six decimals on every LLM call.

Model-agnostic parser

OpenAI today; any LLM swappable via configuration, not rewrite.

FAQ

It turns a US address into actionable water-quality advice. The system resolves the address to the responsible public water utility, retrieves the utility's latest EPA Consumer Confidence Report, parses it with an LLM into structured fields, and returns a plain-language summary plus recommended treatment.

ZIP works as a fallback, but address-level resolution is preferred since ZIPs can span multiple municipalities served by different utilities.

Every LLM parse writes audit columns alongside the parsed data: model used, input tokens, output tokens, USD cost to six decimals, and a parsed-at timestamp.

Quality and economics can be inspected per call — useful for catching regressions when models update, comparing providers, and forecasting unit cost as coverage grows.

Reports cover varying sets of contaminants, and the same contaminant can be measured in different units across utilities and years. Forcing everything into one row turns into a sparse, fragile schema.

A normalised pattern — one row per report and one row per contaminant on that report — lets the system query across providers and years cleanly as coverage scales.

Next.js (App Router) + TypeScript + Tailwind on the application side, PostgreSQL for data, OpenAI as the production LLM. The parser architecture is model-agnostic so the model behind any given parse is a configuration choice.

The application is live, with continuous development extending coverage and reliability of the address → utility → CCR pipeline.

Ready to build something impactful?

Let's discuss your project and how we can help you ship faster and smarter.

Book a Free Strategy Call