integrate_query¶

Functions related to querying posterior realizations in the INTEGRATE module.

Query posterior realizations based on geophysical constraints.

This module provides tools to compute probabilities that posterior realizations from Bayesian inversion satisfy user-defined constraints (e.g., thickness of lithology classes, resistivity thresholds).

integrate.integrate_query.get_prior_model_info(f_prior_h5, im)¶

Return metadata for prior model im.

Parameters:

f_prior_h5 (str) – Path to the prior HDF5 file.
im (int) – Model index.

Returns:

info – Keys: ‘name’, ‘is_discrete’, ‘z’, ‘class_id’, ‘class_name’.

Return type:

dict

integrate.integrate_query.load_query(path)¶

Load a query dict from a JSON file.

Parameters:: path (str) – Input JSON file path.
Returns:: query – Query definition dictionary.
Return type:: dict

integrate.integrate_query.prior_describe(f_prior_h5)¶

Print a human-readable summary of all models in a prior HDF5 file.

Parameters:: f_prior_h5 (str) – Path to the prior HDF5 file.

Examples

>>> ig.prior_describe('prior.h5')
Prior file: prior.h5
N realizations: 1000000
  im=1  Resistivity   CONTINUOUS   depth 0–89 m  (89 layers)
  im=2  Lithology     DISCRETE     depth 0–89 m  (89 layers)
          class 1 = Sand
          class 2 = Grus
  im=3  Waterlevel    SCALAR

integrate.integrate_query.query(f_post_h5, query_dict)¶

Dispatcher: route to query_probability() or query_percentile() based on query_dict.

If query_dict contains a "metric" key, calls query_percentile(). Otherwise calls query_probability() (backward compatible with all existing "constraints"-based dicts).

Parameters:

f_post_h5 (str) – Path to the posterior HDF5 file.
query_dict (str or dict) – Query definition. See query_probability() and query_percentile() for the respective schemas.

Returns:

See the delegated function for details.

Return type:

result, meta

integrate.integrate_query.query_from_text(text, f_prior_h5, model='anthropic/claude-sonnet-4-6', api_key=None, max_tokens=4096, verbose=False)¶

Translate a natural-language query into a query dict using an LLM.

Uses LiteLLM to interpret the user’s text query in the context of the available prior models and the integrate query schema, returning a query dict and a plain-English interpretation of what the LLM understood.

Parameters:

text (str) – Natural language description of the query, e.g. “What is the probability that cumulative clay thickness exceeds 10 m?”.
f_prior_h5 (str) – Path to the prior HDF5 file. Model metadata (class names, depth ranges, discrete/continuous type) is read automatically and included in the LLM prompt so the model knows what constraints are valid.
model (str, optional) – LiteLLM model string (default: ‘anthropic/claude-sonnet-4-6’). Any LiteLLM-supported model works, e.g. ‘openai/gpt-4o’.
api_key (str, optional) – Provider API key. If None, the relevant environment variable (e.g. ANTHROPIC_API_KEY) is used.
verbose (bool, optional) – If True, print the system prompt and LLM response for inspection.

Returns:

query_dict (dict) – Query dict ready to pass to ig.query(f_post_h5, query_dict).
interpretation (str) – Plain English confirmation of what the LLM understood the query to mean. Check this before running ig.query() to catch misunderstandings cheaply.
system_prompt (str) – The full system prompt sent to the LLM. Useful for inspection and debugging.

Raises:

ImportError – If the litellm package is not installed.
ValueError – If the LLM reports the query is unsupported, or if the response cannot be parsed as valid JSON.

Notes

Requires either the api_key parameter or the relevant provider environment variable to be set. Install the dependency with: pip install litellm

Examples

>>> import integrate as ig
>>> query_dict, interpretation, system_prompt = ig.query_from_text(
...     "Probability that cumulative clay thickness > 10 m within 0-30 m",
...     f_prior_h5='prior.h5',
...     api_key='sk-ant-...',
... )
>>> print(interpretation)
>>> P, meta = ig.query('posterior.h5', query_dict)
>>> ig.query_plot(P, meta)

integrate.integrate_query.query_percentile(f_post_h5, query_dict)¶

Compute per-data-point percentiles of a metric over posterior realizations.

Rather than asking “what fraction of realizations satisfy condition X?”, this asks “what is the p5/p50/p95 of metric X across realizations?”. The metric is defined by the same fields as a probability constraint, minus the comparison fields (thickness_comparison, thickness_threshold, negate).

Parameters:

f_post_h5 (str) – Path to the posterior HDF5 file.
query_dict (str or dict) – Path to a JSON file, or a dict with a "metric" key and an optional "percentiles" key (default [5, 50, 95]).

Returns:

percentile_values (ndarray (N_data, n_percentiles)) – Requested percentile values for each data location.
meta (dict) – Keys: ‘X’, ‘Y’, ‘N_data’, ‘N_post’, ‘i_use’, ‘percentiles’.

Examples

>>> query_def = {
...     "metric": {
...         "im": 2, "classes": [1, 2],
...         "thickness_mode": "cumulative",
...         "depth_max": 30.0
...     },
...     "percentiles": [5, 50, 95]
... }
>>> pct_values, meta = query_percentile('f_post.h5', query_def)
>>> # pct_values shape: (N_data, 3) — p5, p50, p95 per location

integrate.integrate_query.query_percentile_plot(percentile_values, meta, query_text=None, interpretation=None, text_panel=False, hardcopy=False, **kwargs)¶

Plot one probability map per requested percentile as side-by-side subplots.

Parameters:

percentile_values (ndarray (N_data, n_percentiles)) – Output of query_percentile().
meta (dict) – Metadata dict from query_percentile() containing ‘X’, ‘Y’, ‘percentiles’.
query_text (str, optional) – Original query string — shown as figure suptitle.
interpretation (str, optional) – LLM interpretation string — shown below query_text if provided.
text_panel (bool, optional) – If True, add a narrow text column to the right of the maps.
hardcopy (bool or str, optional) – Save figure to disk. True → ‘query_percentile_plot.png’; a string is used as the filename (.png appended if no extension).
**kwargs – All remaining keyword arguments are forwarded to plot_xy(), giving full control over cmap, clim, uselog, colorbar, colorbar_label, plotPoints, plotPoints_color, plotPoints_marker, s, etc. clim defaults to [percentile_values.min(), percentile_values.max()] so that all subplots share the same colour scale. cmap defaults to 'viridis'.

Returns:

fig

Return type:

matplotlib Figure

integrate.integrate_query.query_plot(P, meta, ip=None, query_dict=None, f_prior_h5=None, f_post_h5=None, title=None, query_text=None, interpretation=None, text_panel=False, hardcopy=False, **kwargs)¶

Plot query results and optionally detailed model visualization for a data point.

If ip is None, displays the XY probability map showing P(x, y). If ip is provided (together with query_dict and f_prior_h5/f_post_h5), skips the probability map and shows only the detailed single-point visualization of all posterior realizations and the query-matching subset.

Parameters:

P (ndarray (N_data,)) – Probability array from query().
meta (dict) – Metadata dict from query() containing ‘X’, ‘Y’, ‘i_use’, ‘i_use_query’.
ip (int, optional) – Data point index to visualize in detail. If None, only shows probability map.
query_dict (dict, optional) – Query dict used in query(). Required for detailed visualization.
f_prior_h5 (str, optional) – Path to prior HDF5 file. If not provided, will be extracted from f_post_h5.
f_post_h5 (str, optional) – Path to posterior HDF5 file. Used to automatically extract prior file path if f_prior_h5 is not provided.
title (str, optional) – Custom title for the probability map. If None, a title is built from query_text and interpretation (if provided), or ‘Query Probability Map’.
query_text (str, optional) – The original natural-language query string. Shown in the figure title, or in the text panel if text_panel=True.
interpretation (str, optional) – The LLM interpretation string returned by query_from_text(). Shown as a second line in the figure title, or in the text panel if text_panel=True.
text_panel (bool, optional) – If True and query_text or interpretation is provided, adds a narrow text column to the right of the probability map. The query text appears at the top and the interpretation below. Default False.
hardcopy (bool or str, optional) – Save the probability map figure. If True, saves as ‘query_plot.png’. If a string, uses that as the filename (a ‘.png’ extension is appended if the string has no extension). Default False.
**kwargs – All remaining keyword arguments are forwarded to plot_xy(), giving full control over cmap, clim, uselog, colorbar, colorbar_label, plotPoints, plotPoints_color, plotPoints_marker, s, etc. cmap defaults to 'hot_r' and clim defaults to [0, 1].

Examples

>>> P, meta = query(f_post_h5, query_def)
>>> query_plot(P, meta)  # Just probability map
>>> query_plot(P, meta, title='Custom Query Title')  # Custom title
>>> query_plot(P, meta, ip=1000, query_dict=query_def, f_post_h5='posterior.h5')
>>> query_plot(P, meta, ip=1000, query_dict=query_def, f_prior_h5='prior.h5')
>>> # With LLM query text and interpretation:
>>> query_dict, interp = ig.query_from_text(text, f_prior_h5)
>>> P, meta = ig.query(f_post_h5, query_dict)
>>> ig.query_plot(P, meta, query_text=text, interpretation=interp)

integrate.integrate_query.query_probability(f_post_h5, query_dict)¶

Compute per-data-point probability that posterior realizations satisfy a query.

Parameters:

f_post_h5 (str) – Path to the posterior HDF5 file.
query_dict (str or dict) – Path to a JSON file, or a dict with a "constraints" key.

Returns:

P (ndarray (N_data,)) – Probability [0, 1] for each data location.
meta (dict) – Keys: ‘X’, ‘Y’, ‘N_data’, ‘N_post’, ‘i_use’, ‘i_use_query’.

Examples

>>> query_def = {
...     "constraints": [{
...         "im": 2, "classes": [2],
...         "thickness_mode": "cumulative",
...         "thickness_comparison": ">",
...         "thickness_threshold": 10.0,
...         "depth_min": 0.0, "depth_max": 30.0
...     }]
... }
>>> P, meta = query_probability('f_post.h5', query_def)

integrate.integrate_query.query_test_llm(model='anthropic/claude-sonnet-4-6', api_key=None, verbose=1)¶

Test whether a given LLM model and API key are working correctly.

Sends a minimal JSON-generation prompt and checks that the response is valid JSON. Prints a summary and returns a status dict.

Parameters:

model (str, optional) – LiteLLM model string (default: ‘anthropic/claude-sonnet-4-6’).
api_key (str, optional) – Provider API key. If None, the relevant environment variable is used.
verbose (int, optional) – 0 = silent, 1 = summary only (default), 2 = full response included.

Returns:

result – Keys: ‘ok’ (bool), ‘model’, ‘response’ (str or None), ‘error’ (str or None).

Return type:

dict

integrate.integrate_query.save_query(query, path)¶

Save a query dict to a JSON file.

Parameters:

query (dict) – Query definition dictionary.
path (str) – Output JSON file path.

integrate.integrate_query.title_from_json(file_json, f_prior_h5=None, model='anthropic/claude-sonnet-4-6', api_key=None, showInfo=1)¶

Return a plain-language description of what a query JSON dict will do.

Uses an LLM to produce a short human-readable summary suitable for a figure title or log message. If the LLM is unavailable (missing package, no API key, network error), returns an empty string.

Parameters:

file_json (str or dict) – Path to a query JSON file, or a query dict directly (e.g. from ig.load_query()).
f_prior_h5 (str, optional) – Path to the prior HDF5 file. When provided, real model names, depth ranges, and class labels are included in the prompt so the description uses geological names instead of numeric model/class IDs.
model (str, optional) – LiteLLM model string (default: ‘anthropic/claude-sonnet-4-6’).
api_key (str, optional) – Provider API key. If None, the relevant environment variable is used.
showInfo (int, optional) – 0 = silent; 1 = print a message when the LLM cannot be reached (default); 2 = also print the exception detail.

Returns:

description – One-sentence plain-English summary of the query, or an empty string if the LLM could not be reached.

Return type:

str

Examples

>>> description = ig.title_from_json('my_query.json')
>>> description = ig.title_from_json('my_query.json', f_prior_h5='prior.h5')
>>> query = ig.load_query('query_ex1.json')
>>> title = ig.title_from_json(query, f_prior_h5='prior.h5')
>>> title = ig.title_from_json(query, showInfo=0)  # silent on failure