Query Tool

Overview

The INTEGRATE query tool evaluates geological queries against a posterior ensemble, returning either a probability or a set of percentiles for each survey data point.

Two query types are supported:

Probability queries answer “what fraction of realizations satisfy condition X?”

  • Probability that cumulative clay thickness exceeds 10 m within 0–30 m

  • Probability that resistivity is below 100 Ω·m for at least 25 m

  • Probability that the water table is shallower than 5 m

Percentile queries answer “what is the p5/p50/p95 of metric X?”

  • P5/P50/P95 of cumulative Sand+Grus thickness within 0–30 m

  • Median thickness of Sand above the water table

Both query types can be written by hand as Python dicts / JSON files, or translated automatically from plain English using an LLM via ig.query_from_text().

Core Functions

  • ig.query() — dispatcher: routes to probability or percentile function based on dict structure

  • ig.query_probability() — probability query (fraction of realizations satisfying constraints)

  • ig.query_percentile() — percentile query (p5/p50/p95 of a metric across realizations)

  • ig.query_from_text() — translate a plain-English query to a query dict using an LLM

  • ig.title_from_json() — generate a plain-English title/description from an existing query dict using an LLM

  • ig.query_plot() — plot the probability map or single-point detail view from a probability query

  • ig.query_percentile_plot() — plot one map per percentile from a percentile query

  • ig.save_query() / ig.load_query() — persist a query dict to/from JSON

  • ig.get_prior_model_info() — inspect model names, types, depth ranges, and class labels for one model

  • ig.prior_describe() — print a human-readable summary of all models in a prior HDF5 file

  • ig.query_test_llm() — verify that an LLM model and API key are working

Query Dict Format

Top-level structure

A query dict has a single key "constraints" whose value is a list of constraint objects. All constraints are combined with logical AND: a realization is accepted only when it satisfies every constraint simultaneously.

query = {
    "constraints": [
        { ... },   # constraint 1
        { ... },   # constraint 2 — both must hold
    ]
}

Constraint Fields

Field

Type

Required

Valid values

Description

im

int

always

1, 2, 3, …

Prior model index (see Available Models)

classes

list[int]

DISCRETE only

class IDs from the model

Match any of these class IDs

value_comparison

str

CONTINUOUS / SCALAR

"<" or ">"

Compare model value against threshold

value_threshold

float

CONTINUOUS / SCALAR

any float

Threshold for the value comparison

thickness_mode

str

depth models only

"cumulative" or "first_occurrence"

How to aggregate thickness of matching layers

thickness_comparison

str

depth models only

">", "<", ">=" or "<="

Operator applied to the computed thickness

thickness_threshold

float

depth models only

any float (metres)

Thickness threshold in metres

depth_min

float

optional

any float

Upper boundary of depth interval [m]

depth_max

float

optional

any float

Lower boundary of depth interval [m]

depth_max_im

int

optional

SCALAR model im

Per-realization depth_max from a scalar model

depth_min_im

int

optional

SCALAR model im

Per-realization depth_min from a scalar model

negate

bool

optional

true / false (default false)

If true, invert the constraint result

thickness_mode values:

"cumulative"

Sum the thickness of all matching layers within the depth interval.

"first_occurrence"

Thickness of the first contiguous run of matching layers.

Model Types

DISCRETE models

Store integer class IDs at each depth layer (e.g. lithology). Use the classes field to specify which class IDs to match. Do not use value_comparison / value_threshold.

CONTINUOUS models

Store floating-point values at each depth layer (e.g. resistivity). Use value_comparison + value_threshold together with the thickness fields to express conditions such as “resistivity < 100 Ω·m for >= 25 m”.

SCALAR models (depth range = 0)

Store a single value per realization — no depth profile (e.g. a water table depth). Use value_comparison and value_threshold only. Omit all thickness and depth fields — they have no meaning here.

Cross-model depth bounds

depth_max_im and depth_min_im accept the im index of a SCALAR model. For each realization, the value of that scalar model is used as the upper / lower depth boundary. This enables constraints like “Sand above the water table” where the depth cutoff varies per realization. These may be combined with fixed depth_min / depth_max.

Percentile Query Format

A percentile query has a "metric" key (instead of "constraints") and an optional "percentiles" key. The metric defines what to measure per realization — the same fields as a constraint, minus the comparison fields (thickness_comparison, thickness_threshold, negate).

query = {
    "metric": {
        "im": 2,
        "classes": [1, 2],        # Sand or Grus
        "thickness_mode": "cumulative",
        "depth_max": 30.0         # measure within 0–30 m
        # depth_max_im also supported for cross-model depth bounds
    },
    "percentiles": [5, 50, 95]    # optional; default [5, 50, 95]
}

ig.query() auto-detects the query type: dicts with "metric" are routed to ig.query_percentile(); dicts with "constraints" are routed to ig.query_probability().

Metric fields (same as constraint fields minus comparisons):

im, classes, value_comparison, value_threshold, thickness_mode, depth_min, depth_max, depth_max_im, depth_min_im. For SCALAR models, only im is needed (no thickness fields).

Saving and Loading Queries

Query dicts can be saved to and loaded from JSON files for reuse without repeating an LLM call:

import integrate as ig

# Save
ig.save_query(query, 'clay_10m.json')

# Load and execute (dispatcher routes automatically)
query = ig.load_query('clay_10m.json')
result, meta = ig.query(f_post_h5, query)

Running Queries

Discovering Available Models

Before writing a query it is useful to inspect which models exist in the prior file, what type they are, their depth range, and (for discrete models) their class IDs:

import integrate as ig
import h5py

# Read prior file path from the posterior file
with h5py.File(f_post_h5, 'r') as f:
    f_prior_h5 = str(f.attrs['f5_prior'])

# List all models
with h5py.File(f_prior_h5, 'r') as f:
    model_keys = sorted([k for k in f.keys() if k.startswith('M') and k[1:].isdigit()])

for key in model_keys:
    im   = int(key[1:])
    info = ig.get_prior_model_info(f_prior_h5, im)
    z    = info['z']
    kind = 'DISCRETE' if info['is_discrete'] else 'CONTINUOUS'
    print(f"  im={im}: {info['name']}  ({kind})  depth {z[0]:.1f}{z[-1]:.1f} m")
    if info['is_discrete'] and info['class_id'] is not None:
        for cid, cname in zip(info['class_id'].flatten(), info['class_name'].flatten()):
            print(f"    class {int(cid)} = {cname}")

Example output:

im=1: Resistivity  (CONTINUOUS)  depth 0.0–89.0 m
im=2: Lithology    (DISCRETE)    depth 0.0–89.0 m
    class 1 = Sand
    class 2 = Grus
    class 3 = Moræneler
    class 4 = Miocene sand
    class 5 = Miocene clay
im=3: Waterlevel   (CONTINUOUS)  depth 0.0–0.0 m

Executing a Probability Query

import integrate as ig

P, meta = ig.query_probability(f_post_h5, query)
# or equivalently, using the dispatcher:
P, meta = ig.query(f_post_h5, query)

print(f"N locations : {meta['N_data']}")
print(f"Mean P      : {P.mean():.3f}")

Returns:

P

ndarray of shape (N_data,) — probability [0, 1] for each survey data point.

meta

Dict with keys 'X', 'Y', 'N_data', 'N_post', 'i_use' (all posterior indices), 'i_use_query' (matching indices per location).

Executing a Percentile Query

pct_values, meta = ig.query_percentile(f_post_h5, query)
# or equivalently:
pct_values, meta = ig.query(f_post_h5, query)

print(f"Median Sand+Grus thickness: {pct_values[:, 1].mean():.1f} m")

Returns:

pct_values

ndarray of shape (N_data, n_percentiles) — one column per requested percentile, one row per survey location.

meta

Dict with keys 'X', 'Y', 'N_data', 'N_post', 'i_use', and 'percentiles' (the list of requested percentile values, e.g. [5, 50, 95]).

Visualising Results

ig.query_plot() produces one figure depending on whether ip is set:

  • No ip → XY probability map across all survey locations.

  • With ip → single-point detail view (all posterior realizations + query-matching subset) for that data-point index. The XY map is not shown.

This means hardcopy always saves exactly one figure regardless of which mode is used.

# XY probability map (no ip)
ig.query_plot(P, meta)

# With a custom title (auto-wrapped at 60 characters per line)
ig.query_plot(P, meta, title='My Query Title')

# With LLM-generated title from the query dict
title = ig.title_from_json(query, f_prior_h5=f_prior_h5)
ig.query_plot(P, meta, title=title, hardcopy='clay_query')

# With query text and LLM interpretation in a side panel
ig.query_plot(P, meta, query_text=text, interpretation=interp, text_panel=True)

# Single-point detail view — XY map is skipped
ig.query_plot(P, meta, ip=1000, query_dict=query, f_post_h5=f_post_h5,
              title=title, hardcopy='clay_query_ip1000')

# Percentile maps — one subplot per percentile
ig.query_percentile_plot(pct_values, meta)

# With text panel and hardcopy
ig.query_percentile_plot(pct_values, meta,
                         query_text=text,
                         interpretation=interp,
                         text_panel=True,
                         hardcopy='sand_percentiles')

Examples

Example 1: Discrete Cumulative Constraint

Probability that the cumulative thickness of clay (class 3) exceeds 10 m within 0–30 m depth.

import integrate as ig

query = {
    "constraints": [
        {
            "im": 2,
            "classes": [3],
            "thickness_mode": "cumulative",
            "thickness_comparison": ">",
            "thickness_threshold": 10.0,
            "depth_min": 0.0,
            "depth_max": 30.0,
            "negate": False
        }
    ]
}

P, meta = ig.query(f_post_h5, query)
print(f"Mean P = {P.mean():.3f}")
ig.query_plot(P, meta)

To match any clay type (multiple class IDs), list them all:

"classes": [3, 5]   # Moræneler OR Miocene clay

Example 2: Continuous Cumulative Constraint

Probability that resistivity is below 100 Ω·m for a cumulative thickness of at least 25 m within 0–50 m depth.

query = {
    "constraints": [
        {
            "im": 1,
            "value_comparison": "<",
            "value_threshold": 100.0,
            "thickness_mode": "cumulative",
            "thickness_comparison": ">=",
            "thickness_threshold": 25.0,
            "depth_min": 0.0,
            "depth_max": 50.0,
            "negate": False
        }
    ]
}

P, meta = ig.query(f_post_h5, query)
ig.query_plot(P, meta)

Example 3: Multi-Constraint AND

Probability that Sand and Grus together exceed 20 m within 0–30 m depth AND the first non-sand/gravel layer at the top is less than 3 m thick.

Both constraints must hold simultaneously.

query = {
    "constraints": [
        {
            "im": 2,
            "classes": [1, 2],          # Sand or Grus
            "thickness_mode": "cumulative",
            "thickness_comparison": ">",
            "thickness_threshold": 20.0,
            "depth_min": 0.0,
            "depth_max": 30.0
        },
        {
            "im": 2,
            "classes": [1, 2],          # Sand or Grus — negated = "not sand/grus"
            "thickness_mode": "first_occurrence",
            "thickness_comparison": "<",
            "thickness_threshold": 3.0,
            "depth_min": 0.0,
            "depth_max": 30.0,
            "negate": True
        }
    ]
}

P, meta = ig.query(f_post_h5, query)
ig.query_plot(P, meta)

Example 4: Scalar Model Query

Probability that the water table (im=3) is shallower than 5 m.

The Waterlevel model has depth range 0–0 m, meaning it stores a single value per realization. Thickness fields are not applicable.

query = {
    "constraints": [
        {
            "im": 3,
            "value_comparison": "<",
            "value_threshold": 5.0,
            "negate": False
        }
    ]
}

P, meta = ig.query(f_post_h5, query)
ig.query_plot(P, meta)

Example 5: Cross-Model Depth Bound

Probability that Sand and Grus have a cumulative thickness exceeding 5 m in the zone above the water table.

depth_max_im: 3 instructs the query engine to use the Waterlevel value (im=3) of each realization as the upper depth cutoff for that realization.

query = {
    "constraints": [
        {
            "im": 2,
            "classes": [1, 2],          # Sand or Grus
            "thickness_mode": "cumulative",
            "thickness_comparison": ">",
            "thickness_threshold": 5.0,
            "depth_min": 0.0,
            "depth_max_im": 3,          # use Waterlevel per realization
            "negate": False
        }
    ]
}

P, meta = ig.query(f_post_h5, query)
ig.query_plot(P, meta)

Use depth_min_im symmetrically to set a lower bound from a scalar model (e.g. “below the water table”).

Example 6: Percentile Query — Thickness Distribution

P5, P50, P95 of the cumulative thickness of Sand and Grus within 0–30 m.

query = {
    "metric": {
        "im": 2,
        "classes": [1, 2],          # Sand or Grus
        "thickness_mode": "cumulative",
        "depth_min": 0.0,
        "depth_max": 30.0
    },
    "percentiles": [5, 50, 95]
}

pct_values, meta = ig.query_percentile(f_post_h5, query)
# pct_values shape: (N_data, 3) — columns are P5, P50, P95

ig.query_percentile_plot(pct_values, meta)

# Access individual percentile maps
p50 = pct_values[:, 1]   # median cumulative thickness
print(f"Median Sand+Grus thickness — spatial mean: {p50.mean():.1f} m")

Example 7: Percentile Query — Cross-Model Depth Bound

P5, P50, P95 of the cumulative Sand+Grus thickness above the water table.

query = {
    "metric": {
        "im": 2,
        "classes": [1, 2],
        "thickness_mode": "cumulative",
        "depth_min": 0.0,
        "depth_max_im": 3           # per-realization upper bound = Waterlevel
    },
    "percentiles": [5, 50, 95]
}

pct_values, meta = ig.query_percentile(f_post_h5, query)
ig.query_percentile_plot(pct_values, meta,
                         query_text="Sand+Grus thickness above water table",
                         text_panel=True)

LLM-Powered Query Tools

Generating a Description from an Existing Query

ig.title_from_json() uses an LLM to produce a short plain-English sentence describing what an existing query dict computes. This is useful for automatically labelling figures or log output without writing titles by hand.

import integrate as ig

query = ig.load_query('clay_10m.json')

# From a file path
title = ig.title_from_json('clay_10m.json', f_prior_h5=f_prior_h5)

# From a dict (e.g. returned by ig.load_query())
title = ig.title_from_json(query, f_prior_h5=f_prior_h5)

# Use as a figure title
ig.query_plot(P, meta, title=title, hardcopy='clay_10m')

Parameters:

file_json

Path to a JSON file or a query dict directly (e.g. from ig.load_query()).

f_prior_h5 (optional)

Path to the prior HDF5 file. When supplied, real model names, depth ranges, and class labels are included in the LLM prompt so the description uses geological names (e.g. “clay”) rather than numeric IDs (e.g. “class 3”).

model, api_key

Same as ig.query_from_text().

showInfo (int, default 1)

Controls feedback when the LLM cannot be reached:

  • 0 — silent; empty string returned with no output.

  • 1 — one-line message including a hint to run ig.query_test_llm() (default).

  • 2 — message plus full exception detail.

If the LLM is unavailable for any reason (missing litellm package, no API key, network error) the function always returns an empty string — it never raises — so it is safe to use in a pipeline without extra error handling.

Translating Plain English to a Query Dict

ig.query_from_text() uses LiteLLM to translate a plain-English geological question into a valid query dict. The LLM receives a structured system prompt that describes:

  • both query types (probability and percentile) and when to use each

  • the constraint and metric field schemas

  • the available prior models for the specific prior file (names, types, depth ranges, class IDs)

  • worked examples covering all query types

The LLM auto-detects the query type from the text:

  • “What is the probability that …” → probability query (returns "constraints")

  • “What are the p5/p50/p95 of …” → percentile query (returns "metric" + "percentiles")

The returned query_dict is ready to pass directly to ig.query(), which dispatches to the correct function automatically.

Any LiteLLM-supported model works: Claude, GPT-4, or a locally running Ollama model.

Requirements

pip install litellm

For Claude, set the environment variable before running:

export ANTHROPIC_API_KEY=sk-ant-...

Testing the Connection

Before running queries, verify that the chosen model and key are working:

import integrate as ig

# Claude
ig.query_test_llm(model='anthropic/claude-sonnet-4-6',
                  api_key=os.environ['ANTHROPIC_API_KEY'])

# Local Ollama
ig.query_test_llm(model='ollama_chat/qwen3:latest')

A successful test prints OK. A failed test prints the error message.

Translating a Query

import integrate as ig, h5py

with h5py.File(f_post_h5, 'r') as f:
    f_prior_h5 = str(f.attrs['f5_prior'])

text = (
    "What is the probability that the cumulative thickness of any clay "
    "exceeds 10 m within 0 to 30 m depth?"
)

query_dict, interpretation, system_prompt = ig.query_from_text(
    text,
    f_prior_h5=f_prior_h5,
    model='anthropic/claude-sonnet-4-6',
    api_key=os.environ['ANTHROPIC_API_KEY'],
)

print("Interpretation:", interpretation)

Return values:

query_dict

A valid query dict ready to pass directly to ig.query().

interpretation

A 1–2 sentence plain-English confirmation of what the LLM understood the query to mean, including the specific classes and thresholds used. Always check this before running the query — it catches misunderstandings cheaply.

system_prompt

The full system prompt that was sent to the LLM. Useful for auditing or debugging. Can be saved to a file for inspection.

Full Workflow

import os, json
import integrate as ig, h5py

with h5py.File(f_post_h5, 'r') as f:
    f_prior_h5 = str(f.attrs['f5_prior'])

# 1. Translate
text = "Probability that sand and gravel above the water table exceed 5 m"
query_dict, interpretation, system_prompt = ig.query_from_text(
    text,
    f_prior_h5=f_prior_h5,
    model='anthropic/claude-sonnet-4-6',
)

# 2. Inspect the generated query
print("Interpretation:", interpretation)
print(json.dumps(query_dict, indent=2))

# 3. Execute
P, meta = ig.query(f_post_h5, query_dict)
print(f"Mean P = {P.mean():.3f}")

# 4. Visualise
ig.query_plot(P, meta,
              query_text=text,
              interpretation=interpretation,
              text_panel=True,
              hardcopy='sand_above_wl')

# 5. Save the query for reuse (no LLM call needed next time)
ig.save_query(query_dict, 'sand_above_wl.json')

Pass verbose=True to ig.query_from_text() to print the full system prompt and raw LLM response — useful for debugging unexpected translations.

Supported Models

Provider

Model string

Notes

Anthropic Claude

'anthropic/claude-sonnet-4-6'

Requires ANTHROPIC_API_KEY

OpenAI

'openai/gpt-4o'

Requires OPENAI_API_KEY

Ollama (local)

'ollama_chat/qwen3:latest'

Requires ollama serve running locally; no API key

Unsupported Queries

If the query cannot be expressed with the available constraint schema (for example, “What is the spatial correlation length of resistivity?”), the LLM responds with UNSUPPORTED: <reason> and ig.query_from_text() raises a ValueError:

try:
    query_dict, _, _ = ig.query_from_text(
        "What is the spatial correlation length of resistivity?",
        f_prior_h5=f_prior_h5,
    )
except ValueError as e:
    print(f"Unsupported query: {e}")

API Reference

Quick Reference

from integrate import (
    query,                  # Dispatcher: routes to probability or percentile
    query_probability,      # Probability query (fraction satisfying constraints)
    query_percentile,       # Percentile query (p5/p50/p95 of a metric)
    query_from_text,        # Translate plain English to query dict via LLM
    title_from_json,        # Generate a plain-English description from a query dict via LLM
    query_plot,             # XY probability map or single-point detail view
    query_percentile_plot,  # Plot one map per percentile
    save_query,             # Save a query dict to a JSON file
    load_query,             # Load a query dict from a JSON file
    get_prior_model_info,   # Return metadata for one prior model
    prior_describe,         # Print a summary of all models in a prior file
    query_test_llm,         # Verify LLM model + API key connectivity
)

Key signatures:

# Probability query
P, meta = ig.query_probability(f_post_h5, query_dict)
# meta keys: 'X', 'Y', 'N_data', 'N_post', 'i_use', 'i_use_query'

# Percentile query
pct_values, meta = ig.query_percentile(f_post_h5, query_dict)
# pct_values shape: (N_data, n_percentiles)
# meta adds 'percentiles' key

# Dispatcher (auto-detects from dict structure)
result, meta = ig.query(f_post_h5, query_dict)

# LLM translation (auto-detects probability vs percentile from text)
query_dict, interpretation, system_prompt = ig.query_from_text(
    text, f_prior_h5,
    model='anthropic/claude-sonnet-4-6',
    api_key=None, verbose=False,
)

# Plain-English description of an existing query dict (returns '' on LLM failure)
description = ig.title_from_json(
    file_json,              # str path or dict (e.g. from ig.load_query())
    f_prior_h5=None,        # optional: adds real model/class names to prompt
    model='anthropic/claude-sonnet-4-6',
    api_key=None,
    showInfo=1,             # 0=silent, 1=warn on failure (default), 2=full detail
)

# query_plot: ip=None → XY probability map; ip=<int> → single-point detail only
ig.query_plot(P, meta,
              ip=None, query_dict=None,
              f_prior_h5=None, f_post_h5=None,
              title=None,            # auto-wrapped at 60 chars per line
              query_text=None, interpretation=None,
              text_panel=False, hardcopy=False)

ig.query_percentile_plot(pct_values, meta,
                         query_text=None, interpretation=None,
                         text_panel=False, hardcopy=False)

ig.save_query(query_dict, path)
query_dict = ig.load_query(path)

info = ig.get_prior_model_info(f_prior_h5, im)
# info keys: 'name', 'is_discrete', 'z', 'class_id', 'class_name'

# ig.query() returns (None, {}) with a printed message if f_post_h5 is missing
result, meta = ig.query(f_post_h5, query_dict)

result = ig.query_test_llm(model, api_key=None, verbose=1)
# result keys: 'ok', 'model', 'response', 'error'

See Also