integrate_borehole¶

Functions related to borehole data integration in the INTEGRATE module.

integrate.integrate_borehole.Pobs_to_datagrid(P_obs, X, Y, f_data_h5, r_data=10, r_dis=100, doPlot=False, nan_freq=0.8, r_data_i_use=None)¶

Convert point-based discrete probability observations to gridded data with distance-based weighting.

This function distributes discrete probability observations (e.g., from a borehole) across a spatial grid using distance-based weighting. Observations at location (X, Y) are applied to nearby grid points with decreasing influence based on distance. Temperature annealing is used to reduce the strength of observations far from the source point.

Parameters:

P_obs (ndarray) – Probability matrix of shape (nclass, nm) where nclass is the number of classes and nm is the number of model parameters (e.g., depth points). Each column represents a probability distribution over discrete classes.
X (float) – X coordinate (e.g., UTM Easting) of the observation point.
Y (float) – Y coordinate (e.g., UTM Northing) of the observation point.
f_data_h5 (str) – Path to HDF5 data file containing survey geometry (X, Y coordinates).
r_data (float, optional) – Inner radius in meters within which observations have full strength. Default is 10 meters.
r_dis (float, optional) – Outer radius in meters for distance-based weighting. Beyond this distance, observations are fully attenuated (temperature → ∞). Default is 100 meters.
doPlot (bool, optional) – If True, creates diagnostic plots showing weight distributions. Default is False.
nan_freq (float, optional) – NaN-frequency threshold for automatic data-gate selection inside get_weight_from_position(). Gates where the fraction of non-NaN values is below this threshold are excluded. Default 0.8. Ignored when r_data_i_use is provided.
r_data_i_use (array-like of int or None, optional) – Explicit gate/channel indices to use for data-distance computation inside get_weight_from_position(). Overrides nan_freq when provided. Default None.

Returns:

d_obs (ndarray) – Gridded observation data of shape (nd, nclass, nm) where nd is the number of spatial locations in the survey. Each location gets temperature-scaled probabilities based on distance from (X, Y).
i_use (ndarray) – Binary mask of shape (nd, 1) indicating which grid points should be used (1) or ignored (0) in the inversion. Points with temperature < 100 are used.
T_all (ndarray) – Temperature values of shape (nd, 1) applied to each grid point based on distance from the observation point.

Notes

The function uses distance-based temperature annealing: 1. Computes distance-based weights using get_weight_from_position() 2. Converts distance weight to temperature: T = 1 / w_dis 3. Caps maximum temperature at 100 (very weak influence) 4. For each grid point:

If T < 100: include point (i_use=1) and apply temperature scaling

If T ≥ 100: exclude point (i_use=0) and set observations to NaN

Temperature scaling reduces probability certainty with distance:

T = 1 (close to observation): Original probabilities preserved
T > 1 (far from observation): Probabilities become more uniform
T ≥ 100 (very far): Observations effectively ignored

Examples

>>> # Borehole observation at specific location
>>> P_obs = compute_P_obs_discrete(depth_top, depth_bottom, lithology, z, class_id)
>>> X_well, Y_well = 543000.0, 6175800.0
>>> d_obs, i_use, T_all = Pobs_to_datagrid(P_obs, X_well, Y_well, 'survey_data.h5',
...                                  r_data=10, r_dis=100)
>>> # Write to data file
>>> ig.save_data_multinomial(d_obs, i_use=i_use, id=2, f_data_h5='survey_data.h5')

See also

rescale_P_obs_temperature: Temperature scaling function
compute_P_obs_discrete: Create P_obs from depth intervals
get_weight_from_position: Distance-based weighting function

integrate.integrate_borehole.compute_P_obs_discrete(depth_top=None, depth_bottom=None, lithology_obs=None, z=None, class_id=None, lithology_prob=0.8, P_prior=None, W=None)¶

Compute discrete observation probability matrix from depth intervals and lithology observations.

This function creates a probability matrix where each depth point is assigned probabilities based on observed lithology classes within specified depth intervals.

Parameters:

depth_top (array-like, optional) – Array of top depths for each observation interval. Required if W is not provided.
depth_bottom (array-like, optional) – Array of bottom depths for each observation interval. Required if W is not provided.
lithology_obs (array-like, optional) – Array of observed lithology class IDs for each interval. Required if W is not provided.
z (array-like, optional) – Array of depth/position values where probabilities are computed. Required if W is not provided.
class_id (array-like, optional) – Array of unique class identifiers (e.g., [0, 1, 2] for 3 lithology types). Required if W is not provided.
lithology_prob (float or array-like, optional) – Probability assigned to the observed class. Can be: - float: Same probability for all intervals (default is 0.8) - array: Array of probabilities, one per interval (must match length of lithology_obs)
P_prior (ndarray, optional) – Prior probability matrix of shape (nclass, nm). If None, uses uniform distribution for depths not covered by observations. Default is None.
W (dict, optional) – Well/borehole dictionary containing observation data. If provided, overrides the individual parameters. Expected keys: - ‘depth_top’: Array of top depths - ‘depth_bottom’: Array of bottom depths - ‘class_obs’: Array of observed class IDs (e.g., lithology, soil type) - ‘class_prob’: Probability or array of probabilities (optional, defaults to 0.8) - ‘X’: X coordinate of well location (optional, not used in this function) - ‘Y’: Y coordinate of well location (optional, not used in this function) Default is None.

Returns:

P_obs – Probability matrix of shape (nclass, nm) where nclass is the number of classes and nm is the number of depth points. For each depth point covered by observations, the observed class gets probability lithology_prob and other classes share (1-lithology_prob). Depths not covered by any observation contain NaN or prior probabilities if provided.

Return type:

ndarray

Examples

>>> # Traditional usage with individual parameters
>>> depth_top = [0, 10, 20]
>>> depth_bottom = [10, 20, 30]
>>> lithology_obs = [1, 2, 1]  # clay, sand, clay
>>> z = np.arange(30)
>>> class_id = [0, 1, 2]  # gravel, clay, sand
>>> P_obs = compute_P_obs_discrete(depth_top, depth_bottom, lithology_obs, z, class_id)
>>> print(P_obs.shape)  # (3, 30)

>>> # With different probabilities per interval
>>> lithology_prob = [0.9, 0.7, 0.85]  # Higher confidence in first interval
>>> P_obs = compute_P_obs_discrete(depth_top, depth_bottom, lithology_obs, z, class_id, lithology_prob=lithology_prob)

>>> # Using well dictionary (cleaner interface)
>>> W = {'depth_top': [0, 10, 20], 'depth_bottom': [10, 20, 30],
...      'class_obs': [1, 2, 1], 'class_prob': [0.9, 0.7, 0.85],
...      'X': 543000.0, 'Y': 6175800.0}
>>> P_obs = compute_P_obs_discrete(z=z, class_id=class_id, W=W)

integrate.integrate_borehole.get_weight_from_position(f_data_h5, x_well=0, y_well=0, i_ref=-1, r_dis=400, r_data=2, useLog=True, doPlot=False, plFile=None, showInfo=0, nan_freq=0.8, r_data_i_use=None)¶

Calculate weights based on distance and data similarity to a reference point.

This function computes three sets of weights: 1. Combined weights based on both spatial distance and data similarity 2. Distance-based weights 3. Data similarity weights

Parameters:

f_data_h5 (str) – Path to HDF5 file containing geometry and observed data.
x_well (float, optional) – X coordinate of reference point (well). Default 0.
y_well (float, optional) – Y coordinate of reference point (well). Default 0.
i_ref (int, optional) – Index of reference point. Default -1 (auto-calculated as closest to x_well, y_well).
r_dis (float, optional) – Geographic XY distance range [m] for spatial weighting. Default 400.
r_data (float, optional) – Data-space similarity range parameter for data weighting. Default 2.
useLog (bool, optional) – Apply log10 transform to data before computing similarity. Default True.
doPlot (bool, optional) – Create diagnostic weight plots. Default False.
plFile (str or None, optional) – Output filename for the diagnostic plot. Auto-generated if None.
showInfo (int, optional) – Verbosity level. Default 0.
nan_freq (float, optional) – NaN-frequency threshold for automatic gate selection. Gates where the fraction of non-NaN values across all soundings is below this threshold are excluded from the data-distance computation. Default 0.8. Ignored when r_data_i_use is provided.
r_data_i_use (array-like of int or None, optional) – Explicit gate/channel indices to use for the data-distance computation. When provided, overrides the nan_freq automatic selection. A NaN check at the reference sounding is still applied. Default None (use nan_freq threshold).

Returns:

w_combined (ndarray (N_data,)) – Combined weights from spatial distance and data similarity.
w_dis (ndarray (N_data,)) – Geographic distance-based weights.
w_data (ndarray (N_data,)) – Data similarity-based weights.
i_ref (int) – Index of the reference sounding used.

Notes

Weights are calculated using Gaussian functions: - Distance weights: exp(-dis² / r_dis²) - Data weights: exp(-sum_dd² / r_data²) where dis is geographic distance and sum_dd is cumulative data difference.

integrate.integrate_borehole.prior_data_borehole(f_prior_h5, im_prior, BH, parallel=False, **kwargs)¶

Compute prior data for a single borehole and save to the prior HDF5 file.

Dispatches to the appropriate implementation based on BH[‘method’]:

'mode_probability' → prior_data_borehole_class_mode() (recommended — fast and robust)
'layer_probability' → prior_data_borehole_class_layer()
'class_exact' / 'layer_probability_independent' → NotImplementedError

Parameters:

f_prior_h5 (str) – Path to the prior HDF5 file.
im_prior (int) – Index of the discrete model parameter (e.g. 2 for /M2 lithology).
BH (dict) – Borehole dictionary. Key 'method' selects the integration approach and defaults to 'mode_probability' if absent.
parallel (bool, optional) – Enable parallel mode computation for 'mode_probability'. Default False.
**kwargs –

showInfoint, optional
Verbosity level. Default is 1.

Returns:

P_obs (ndarray or None) – Probability matrix, shape (nclass, n_intervals).
id_prior (int or None) – Dataset index of the new /D entry added to f_prior_h5.

Examples

>>> P_obs, id_prior = ig.prior_data_borehole(f_prior_h5, im_prior=2, BH=BH, parallel=True)
>>> # Then extrapolate to survey grid:
>>> d_obs, i_use, T_use = ig.Pobs_to_datagrid(P_obs, BH['X'], BH['Y'], f_data_h5)
>>> id_out, _ = ig.save_data_multinomial(d_obs, i_use=i_use, id_prior=id_prior, f_data_h5=f_data_h5)

integrate.integrate_borehole.prior_data_borehole_class_layer(f_prior_h5, im_prior, BH, **kwargs)¶

Compute layer-probability prior data for a borehole using identity mapping.

Uses compute_P_obs_discrete (no prior ensemble needed) and stores an identity prior data reference in f_prior_h5 via prior_data_identity.

Parameters:

f_prior_h5 (str) – Path to the prior HDF5 file.
im_prior (int) – Index of the discrete model parameter (e.g. 2 for /M2 lithology).
BH (dict) – Borehole dictionary with keys depth_top, depth_bottom, class_obs, class_prob, X, Y, name, method.
**kwargs –

showInfoint, optional
Verbosity level. Default is 1.

Returns:

P_obs (ndarray) – Probability matrix, shape (nclass, n_intervals).
id_prior (int) – Dataset index of the identity /D{id_prior} entry in f_prior_h5.

integrate.integrate_borehole.prior_data_borehole_class_mode(f_prior_h5, im_prior, BH, parallel=False, **kwargs)¶

Compute mode-class prior data for a borehole and save to prior HDF5 file.

Reads M{im_prior} from f_prior_h5, extracts the most frequent class in each observed depth interval for every prior realization (via welllog_compute_P_obs_class_mode), then stores the result as a new dataset in f_prior_h5.

Parameters:

f_prior_h5 (str) – Path to the prior HDF5 file.
im_prior (int) – Index of the discrete model parameter (e.g. 2 for /M2 lithology).
BH (dict) – Borehole dictionary with keys depth_top, depth_bottom, class_obs, class_prob, X, Y, name, method.
parallel (bool, optional) – Enable parallel mode computation. Default is False.
**kwargs –

showInfoint, optional
Verbosity level. Default is 1.

Returns:

P_obs (ndarray) – Probability matrix, shape (nclass, n_intervals).
id_prior (int) – Dataset index of the new /D{id_prior} entry added to f_prior_h5.

integrate.integrate_borehole.rescale_P_obs_temperature(P_obs, T=1.0)¶

Rescale discrete observation probabilities by temperature and renormalize.

This function applies temperature annealing to probability distributions by raising each probability to the power (1/T), then renormalizing each column (depth point) so that probabilities sum to 1. Higher temperatures (T > 1) flatten the distribution, while lower temperatures (T < 1) sharpen it.

Parameters:

P_obs (ndarray) – Probability matrix of shape (nclass, nm) where nclass is the number of classes and nm is the number of model parameters (e.g., depth points). Each column should represent a probability distribution over classes.
T (float, optional) – Temperature parameter for annealing. Default is 1.0 (no scaling). - T = 1.0: No change (original probabilities) - T > 1.0: Flattens distribution (less certain) - T < 1.0: Sharpens distribution (more certain) - T → ∞: Approaches uniform distribution - T → 0: Approaches one-hot distribution

Returns:

P_obs_scaled – Temperature-scaled and renormalized probability matrix of shape (nclass, nm). Each column sums to 1.0. NaN values in input are preserved in output.

Return type:

ndarray

Examples

>>> P_obs = np.array([[0.8, 0.6, 0.5],
...                   [0.1, 0.2, 0.3],
...                   [0.1, 0.2, 0.2]])
>>> P_scaled = rescale_P_obs_temperature(P_obs, T=2.0)
>>> print(P_scaled)  # More uniform distribution
>>> P_scaled = rescale_P_obs_temperature(P_obs, T=0.5)
>>> print(P_scaled)  # Sharper distribution

Notes

The temperature scaling follows the Boltzmann distribution:: P_new(c) ∝ P_old(c)^(1/T)
After scaling, each column (depth point) is renormalized:: P_new(c) = P_new(c) / sum_c(P_new(c))

This is commonly used in simulated annealing and rejection sampling to control the strength of discrete observations during Bayesian inference.

integrate.integrate_borehole.save_borehole_data(f_prior_h5, f_data_h5, BH, **kwargs)¶

Compute and save prior and observed data for a single borehole in one call.

Combines the three-step borehole ingestion workflow into a single function:

Compute mode-class prior data and save to f_prior_h5 (via prior_data_borehole())
Extrapolate point observations to the survey grid with distance-based weighting (via Pobs_to_datagrid())
Save the gridded observations to f_data_h5 (via ig.save_data_multinomial)

Parameters:

f_prior_h5 (str) – Path to the prior HDF5 file.
f_data_h5 (str) – Path to the observed-data HDF5 file.
BH (dict) –
Borehole dictionary with keys depth_top, depth_bottom, class_obs, class_prob, X, Y, name, method. Two optional keys control the distance-weighting radii when r_data / r_dis are not passed as explicit kwargs:
- range_data (float, optional) — data-space similarity radius. Survey points whose EM data response is similar to the borehole location receive higher weight; points that are more dissimilar are down-weighted. Default: 1,000,000 (effectively no cutoff).
- range_dis (float, optional) — geographic XY distance [m] beyond which the borehole exerts no influence on nearby survey points. Default: 300 m.
- nan_freq (float, optional) — NaN-frequency threshold for automatic data-gate selection. Default: 0.8.
- r_data_i_use (list of int, optional) — explicit gate indices for data-distance computation; overrides nan_freq when provided.
**kwargs –

im_priorint, optional
Index of the discrete model parameter in f_prior_h5 (e.g. 2 for /M2). Default is 2.

parallelbool, optional
Enable parallel mode computation. Default is False.

r_datafloat, optional
Data-space similarity radius. Overrides BH['range_data'] when provided. Resolution order: explicit kwarg > BH[‘range_data’] > 1,000,000 (no cutoff).

r_disfloat, optional
Geographic XY fade-out distance [m]. Overrides BH['range_dis'] when provided. Resolution order: explicit kwarg > BH[‘range_dis’] > 300 m.

nan_freqfloat, optional
NaN-frequency threshold for automatic data-gate selection. Gates where the fraction of non-NaN soundings is below this threshold are excluded from data-distance computation. Resolution order: explicit kwarg > BH[‘nan_freq’] > 0.8. Ignored when r_data_i_use is set.

r_data_i_uselist of int or None, optional
Explicit gate/channel indices to use for data-distance computation. Overrides nan_freq when provided. Resolution order: explicit kwarg > BH[‘r_data_i_use’] > None.

doPlotbool, optional
Plot distance-weight maps. Default is False.

showInfoint, optional
Verbosity level (0 = silent, 1 = one summary line per borehole). Default is 1.

Returns:

id_prior (int) – Dataset index of the new /D entry added to f_prior_h5.
id_out (int) – Dataset index of the new /D entry added to f_data_h5.

Examples

>>> # Single borehole
>>> id_prior, id_data = ig.save_borehole_data(f_prior_h5, f_data_h5, BH)

>>> # All boreholes — collect data IDs for joint inversion
>>> id_borehole_list = []
>>> for BH in BHOLES:
...     _, id_out = ig.save_borehole_data(f_prior_h5, f_data_h5, BH,
...                                       r_data=2, r_dis=300, parallel=True)
...     id_borehole_list.append(id_out)
>>> f_post_h5 = ig.integrate_rejection(f_prior_h5, f_data_h5,
...                                    id_use=[1] + id_borehole_list)

integrate.integrate_borehole.welllog_compute_P_obs_class_mode(M_lithology=None, depth_top=None, depth_bottom=None, lithology_obs=None, z=None, class_id=None, lithology_prob=0.8, W=None, parallel=False, Ncpu=-1, showInfo=1)¶

Compute observation probability matrix from well log class observations by extracting mode class from prior models.

This function processes discrete class models (e.g., lithology) from a prior ensemble to create well log observations. For each depth interval, it finds the most frequent (mode) class within that interval from each prior model, then creates a probability matrix based on how well these modes match the observed classes.

Simplified Usage: When called with only W=W, the function returns just the probability matrix P_obs without computing class mode, and class_mode is returned as None.

Parameters:

M_lithology (ndarray, optional) – Array of lithology models from prior ensemble, shape (nreal, nz) where nreal is the number of realizations and nz is the number of depth points. If None, only P_obs is computed and lithology_mode is returned as None. Default is None.
depth_top (array-like, optional) – Array of top depths for each observation interval. Required if W is not provided.
depth_bottom (array-like, optional) – Array of bottom depths for each observation interval. Required if W is not provided.
lithology_obs (array-like, optional) – Array of observed lithology class IDs for each interval. Required if W is not provided.
z (array-like, optional) – Array of depth/position values corresponding to M_lithology depth discretization. Required if M_lithology is provided and W does not contain depth information.
class_id (array-like, optional) – Array of unique class identifiers (e.g., [0, 1, 2] for 3 lithology types). Required if W is not provided.
lithology_prob (float or array-like, optional) – Probability assigned to the observed class. Can be: - float: Same probability for all intervals (default is 0.8) - array: Array of probabilities, one per interval (must match length of lithology_obs)
W (dict, optional) – Well/borehole dictionary containing observation data. If provided, overrides the individual parameters. Expected keys: - ‘depth_top’: Array of top depths - ‘depth_bottom’: Array of bottom depths - ‘class_obs’: Array of observed class IDs (e.g., lithology, soil type) - ‘class_prob’: Probability or array of probabilities (optional, defaults to 0.8) - ‘X’: X coordinate of well location (optional, not used in this function) - ‘Y’: Y coordinate of well location (optional, not used in this function) Default is None.
parallel (bool, optional) – Enable parallel processing for large ensembles. Default is False. When True and parallel processing is available, distributes realization processing across multiple CPU cores for significant speedup. Recommended for N > 10,000 realizations. Only used when M_lithology is provided.
Ncpu (int, optional) – Number of CPU cores to use for parallel processing. Default is -1 (auto-detect). Only used when parallel=True and M_lithology is provided.
showInfo (int, optional) – Control information output level. Default is 1. - 0: No information printed - 1: Single line info (number of realizations, intervals) - >1: Progress bar with tqdm, runtime statistics, and time per model Only applies when M_lithology is provided.

Returns:

P_obs (ndarray) – Probability matrix of shape (nclass, n_obs) where nclass is the number of classes and n_obs is the number of observation intervals. Each column represents the probability distribution for one depth interval.
class_mode (ndarray or None) – If M_lithology is provided: Array of mode class values extracted from prior models, shape (nreal, n_obs). For each realization and observation interval, contains the most frequent class ID within that depth range. If M_lithology is None: Returns None.

Examples

>>> # Full usage: Load prior lithology models and compute mode
>>> M_lithology = f_prior['M2'][:]  # Shape: (100000, 50)
>>> z = np.linspace(0, 100, 50)
>>> class_id = [0, 1, 2]  # sand, clay, gravel
>>>
>>> # Define observations
>>> depth_top = [0, 20, 40]
>>> depth_bottom = [20, 40, 60]
>>> lithology_obs = [1, 0, 1]  # clay, sand, clay
>>>
>>> # Compute well log observations with class mode
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(M_lithology, depth_top, depth_bottom,
...                                                       lithology_obs, z, class_id)
>>> print(P_obs.shape)  # (3, 3) - 3 classes, 3 observations
>>> print(class_mode.shape)  # (100000, 3) - mode for each realization and interval

>>> # Simplified usage: Only compute P_obs using well dictionary
>>> W = {'depth_top': [0, 20, 40], 'depth_bottom': [20, 40, 60],
...      'class_obs': [1, 0, 1], 'class_prob': [0.9, 0.8, 0.85],
...      'X': 543000.0, 'Y': 6175800.0}
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(W=W, class_id=class_id)
>>> print(P_obs.shape)  # (3, 3) - 3 classes, 3 observations
>>> print(class_mode)  # None (no prior models provided)
>>>
>>> # Using parallel processing for large ensembles
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(M_lithology, depth_top, depth_bottom,
...                                                       lithology_obs, z, class_id,
...                                                       parallel=True, Ncpu=8)
>>>
>>> # Control output verbosity
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(M_lithology, ..., showInfo=0)  # Silent
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(M_lithology, ..., showInfo=1)  # Single line info
>>> P_obs, class_mode = welllog_compute_P_obs_class_mode(M_lithology, ..., showInfo=2)  # Progress bar + timing

Notes

The function extracts class mode for each depth interval by: 1. Finding depth indices corresponding to interval boundaries 2. Extracting class values within the interval 3. Computing the most frequent (mode) class 4. Assigning probabilities based on match with observed class

This approach is suitable for well log observations where each interval represents the dominant class within that depth range, rather than the full depth profile.

Simplified Mode (M_lithology=None): When M_lithology is not provided, the function only computes the P_obs probability matrix from the observed class data. This is useful when you only need the probability representation of observations without extracting mode class from prior models.

Parallel processing uses shared memory for M_lithology array to minimize memory overhead. Expected speedup: 4-8x on 8-core machines for large ensembles (N > 100,000 realizations). The parallel implementation distributes realizations across worker processes while maintaining identical results to the sequential version.

Both sequential and parallel modes support timing information via showInfo parameter: - showInfo=0: Silent mode - showInfo=1: Single line summary (default) - showInfo>1: Detailed timing (progress bar for sequential, runtime stats for both)