Equinor Volve Log ML

Equinor Volve Log ML

Work with real Operator Data!


Share:          
  1. Background
  2. Data
  3. Problem
  4. Goal
  5. Processing
    1. Imports
    2. Load data
    3. Data Preparation
    4. Exploratory Data Analysis
      1. Pair-plot of the Train Data
      2. Spearman’s Correlation Heatmap
    5. Transformation of the Train Data
      1. Pair-Plot (after transformation)
    6. Removing Outliers
      1. Method 1: Standard Deviation
      2. Method 2: Isolation Forest
      3. Method 3: Minimum Covariance Determinant
      4. Method 4: Local Outlier Factor
      5. Method 5: Support Vector Machine
    7. Train and Validate
      1. Fit to Test and Score on Val
      2. Inverse Transformation of Prediction
    8. Hyperparameter Tuning
    9. Predict Test Wells
      1. Define the Test Data
      2. Transform Test - Predict - Inverse Transform
      3. Plot the Predictions

This repository is my exploration on bringing machine learning to the Equinor “Volve” geophysical/geological dataset, which was opened up to the public in 2018.

For more information about this open data set, its publishing licence, and how to obtain the original version please visit Equinor’s Data Village.

The full dataset contains ~ 40,000 files and the public is invited to learn from it. It is on my to-do-list to explore more data from this treasure trove. In fact, the data is not very much explored yet in short the time frame since Volve became open data and likely still holds many secrets. The good thing is that certain data workflows can be re-used from log to log, well to well, etc. So see this as a skeleton blue print with ample space for adaptation as it fits your goals. Some people may even create their bespoke Object-Oriented workflow for processsing automation and best time savings in repetitious work such as this large data set.

The analysis in this post was inspired in parts by the geophysicist Yohanes Nuwara (see here). He also wrote a TDS article see here on the topic, that’s worth your time if you want to dive in deeper!

More on the topic:

The full repository including Jupyter Notebook, data, and results of what you see below can be found here.

Background

The Volve field was operated by Equinor in the Norwegian North Sea between 2008—2016. Located 200 kilometres west of Stavanger (Norway) at the southern end of the Norwegian sector, was decommissioned in September 2016 after 8.5 years in operation. Equinor operated it more than twice as long as originally planned. The development was based on production from the Mærsk Inspirer jack-up rig, with Navion Saga used as a storage ship to hold crude oil before export. Gas was piped to the Sleipner A platform for final processing and export. The Volve field reached a recovery rate of 54% and in March 2016 when it was decided to shut down its production permanently. Reference.

Data

Wireline-log files (.LAS) of five wells:

  • Log 1: 15_9-F-11A
  • Log 2: 15_9-F-11B
  • Log 3: 15_9-F-1A
  • Log 4: 15_9-F-1B
  • Log 5: 15_9-F-1C

The .LAS files contain the following feature columns:

NameUnitDescriptionRead More
Depth[m]Below Surface 
NPHI[vol/vol]Neutron Porosity (not not calibrated in basic physical units)Reference
RHOB[g/cm3]Bulk DensityReference
GR[API]Gamma Ray radioactive decay (aka shalyness log)Reference
RT[ohm*m]True ResistivityReference
PEF[barns/electron]PhotoElectric absorption FactorReference
CALI[inches]Caliper, Borehole DiameterReference
DT[μs/ft]Delta Time, Sonic Log, P-wave, interval transit timeReference

Problem

Wells 15/9-F-11B (log 2) and 15/9-F-1C (log 5) lack the DT Sonic Log feature.

Goal

Predict Sonic Log (DT) feature in these two wells.

Processing

Imports

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import itertools

import lasio

import glob
import os
import md_toc

Load data

LAS is an ASCII file type for borehole logs. It contains:

  1. the header with detailed information about a borehole and column descriptions and
  2. the main body with the actual data.

The package LASIO helps parsing and writing such files in python. Reference: https://pypi.org/project/lasio/

# Find paths to the log files (MS windows path style)
paths = sorted(glob.glob(os.path.join(os.getcwd(),"well_logs", "*.LAS")))

# Create a list for loop processing
log_list = [0] * len(paths)

# Parse LAS with LASIO to create pandas df
for i in range(len(paths)):
  df = lasio.read(paths[i])
  log_list[i] = df.df()
  # this transforms the depth from index to regular column
  log_list[i].reset_index(inplace=True)

log_list[0].head()
 DEPTHABDCQF01ABDCQF02ABDCQF03ABDCQF04BSCALIDRHODTDTSPEFRACEHMRACELMRDRHOBRMROPRPCEHMRPCELMRT
0188.5NaNNaNNaNNaN36.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1188.6NaNNaNNaNNaN36.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2188.7NaNNaNNaNNaN36.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3188.8NaNNaNNaNNaN36.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4188.9NaNNaNNaNNaN36.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
# Save logs from list of dfs into separate variables
log1, log2, log3, log4, log5 = log_list

# Helper function for repeated plotting

def makeplot(df,suptitle_str="pass a suptitle"):

  # Lists of used columns and colors
  columns = ['NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI', 'DT']
  colors = ['red', 'darkblue', 'black', 'green', 'purple', 'brown', 'turquoise']

  # determine how many columns are available in the log (some miss 'DT')
  col_counter = 0
  for i in df.columns:
    if i in columns:
      col_counter+=1

  # Create the subplots
  fig, ax = plt.subplots(nrows=1, ncols=col_counter, figsize=(col_counter*2,10))
  fig.suptitle(suptitle_str, size=20, y=1.05)

  # Looping each log to display in the subplots
  for i in range(col_counter):
    if i == 3:
      # semilog plot for resistivity ('RT')
      ax[i].semilogx(df[columns[i]], df['DEPTH'], color=colors[i])
    else:
      # all other -> normal plot
      ax[i].plot(df[columns[i]], df['DEPTH'], color=colors[i])
  
    ax[i].set_title(columns[i])
    ax[i].grid(True)
    ax[i].invert_yaxis()

  plt.tight_layout() #avoids label overlap
  plt.show()

makeplot(log1,"Log 1 15_9-F-11A")


log 1 exploration

makeplot(log2, "Log 2 15_9-F-11B")


log 2 exploration

Data Preparation

  1. The train test split is easy, the data is already partitioned by wells:
    • Train on logs 1, 3, and 4.
    • Test (validation) on logs 2 and 5.
  2. There are many NAN values in the logs. The plots above only display the samples that are non-NaN and thus can be used to gauge where they need to be clipped. The NaN are primarily present on top and bottom of each log before readings start. The logs get clipped around the following depths:
    • log1 2,600 - 3,720 m
    • log2 3,200 - 4,740 m
    • log3 2,620 - 3,640 m
    • log4 3,100 - 3,400 m
    • log5 3,100 - 4,050 m
  3. Furthermore, the logs contain many more featurs than we need. The correct will get selected for further use, the rest gets discarded.
# Lists of depths for clipping
lower = [2600, 3200, 2620, 3100, 3100]
upper = [3720, 4740, 3640, 3400, 4050]

# Lists of selected columns
train_cols = ['DEPTH', 'NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI', 'DT']
test_cols = ['DEPTH', 'NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI']

log_list_clipped = [0] * len(paths)

for i in range(len(log_list)):
    
  # Clip depths
  temp_df = log_list[i].loc[
      (log_list[i]['DEPTH'] >= lower[i]) & 
      (log_list[i]['DEPTH'] <= upper[i])
  ]

  # Select train-log columns
  if i in [0,2,3]:
    log_list_clipped[i] = temp_df[train_cols]
  
  # Select test-log columns
  else:
    log_list_clipped[i] = temp_df[test_cols]

# Save logs from list into separate variables
log1, log2, log3, log4, log5 = log_list_clipped
# check for NaN
log1
 DEPTHNPHIRHOBGRRTPEFCALIDT
241152600.00.3712.35682.7481.3237.1268.648104.605
241162600.10.3412.33879.3991.1966.6548.578103.827
241172600.20.3082.31574.2481.1716.1058.578102.740
241182600.30.2832.29168.5421.1425.6138.547100.943
241192600.40.2722.26960.3141.1075.2818.52398.473
353113719.60.2362.61770.1911.6277.4388.70384.800
353123719.70.2382.59575.3931.5137.2588.75085.013
353133719.80.2362.57182.6481.4207.0768.76685.054
353143719.90.2172.54489.1571.3496.9568.78184.928
353153720.00.2262.52090.8981.3016.9208.78184.784

Next Steps:

  • Concatenate the training logs into a training df and the test logs into a test df

  • Assign log/well names to each sample.

  • Move column location to the right.

# Concatenate dataframes
train = pd.concat([log1, log3, log4])
pred = pd.concat([log2, log5])

# Assign names
names = ['15_9-F-11A', '15_9-F-11B', '15_9-F-1A', '15_9-F-1B', '15_9-F-1C']

names_train = []
names_pred = []

for i in range(len(log_list_clipped)):
  if i in [0,2,3]:
    # Train data, assign names 
    names_train.append(np.full(len(log_list_clipped[i]), names[i]))
  else:
    # Test data, assign names
    names_pred.append(np.full(len(log_list_clipped[i]), names[i]))

# Concatenate inside list
names_train = list(itertools.chain.from_iterable(names_train))
names_pred = list(itertools.chain.from_iterable(names_pred))

# Add well name to df
train['WELL'] = names_train
pred['WELL'] = names_pred

# Pop and add depth to end of df
depth_train, depth_pred = train.pop('DEPTH'), pred.pop('DEPTH')
train['DEPTH'], pred['DEPTH'] = depth_train, depth_pred

# Train dataframe with logs 1,3,4 vertically stacked
train
 NPHIRHOBGRRTPEFCALIDTWELLDEPTH
241150.37102.356082.74801.32307.12608.6480104.605015_9-F-11A2600.0
241160.34102.338079.39901.19606.65408.5780103.827015_9-F-11A2600.1
241170.30802.315074.24801.17106.10508.5780102.740015_9-F-11A2600.2
241180.28302.291068.54201.14205.61308.5470100.943015_9-F-11A2600.3
241190.27202.269060.31401.10705.28108.523098.473015_9-F-11A2600.4
325370.18612.457160.43921.23375.98948.722775.394715_9-F-1B3399.6
325380.18402.459661.84521.24526.09608.697675.340415_9-F-1B3399.7
325390.17982.463761.13861.29606.16288.697675.329815_9-F-1B3399.8
325400.17802.471459.37511.40606.15208.697675.354115_9-F-1B3399.9
325410.17602.480958.37421.45296.10618.697875.447615_9-F-1B3400.0
# Pred dataframe with logs 2, 5 verically stacked
pred
 NPHIRHOBGRRTPEFCALIWELLDEPTH
301150.07502.60509.34808.33107.45108.547015_9-F-11B3200.0
301160.07702.60209.36208.28907.46408.547015_9-F-11B3200.1
301170.07802.59909.54508.24707.40508.547015_9-F-11B3200.2
301180.07902.594011.15308.20607.29208.547015_9-F-11B3200.3
301190.07802.589012.59208.16507.16708.547015_9-F-11B3200.4
390370.31072.4184106.76132.69506.23328.556915_9-F-1C4049.6
390380.29972.4186109.03362.61976.25398.556915_9-F-1C4049.7
390390.29302.4232106.09352.59486.28838.557015_9-F-1C4049.8
390400.28922.4285105.49312.63446.34008.605615_9-F-1C4049.9
390410.29562.4309109.89652.64596.39988.556915_9-F-1C4050.0

Exploratory Data Analysis

Pair-plot of the Train Data

train_features = ['NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI', 'DT']

sns.pairplot(train, vars=train_features, diag_kind='kde',
             plot_kws = {'alpha': 0.6, 's': 30, 'edgecolor': 'k'})


pairplot

Spearman’s Correlation Heatmap

train_only_features = train[train_features]

# Generate a mask for the upper triangle
mask = np.zeros_like(train_only_features.corr(method = 'spearman') , dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Custom colormap
cmap = sns.cubehelix_palette(n_colors=12, start=-2.25, rot=-1.3, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
plt.figure(figsize=(12,10))
sns.heatmap(train_only_features.corr(method = 'spearman'), annot=True,  mask=mask, cmap=cmap, vmax=.3, square=True)

plt.show()


heatmap

Transformation of the Train Data

Normalize/transform the dataset:

  • Log transform the RT log
  • Use power transform with Yeo-Johnson method (except ‘WELL’ and ‘DEPTH’)
# Log transform the RT to logarithmic
train['RT'] = np.log10(train['RT'])

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PowerTransformer

# Transformation / Normalizer object Yeo-Johnson method
scaler = PowerTransformer(method='yeo-johnson')

# ColumnTransformer (feature_target defines to which it is applied, leave Well and Depth untouched)
ct = ColumnTransformer([('transform', scaler, feature_target)], remainder='passthrough')

# Fit and transform
train_trans = ct.fit_transform(train)

# Convert to dataframe
train_trans = pd.DataFrame(train_trans, columns=colnames)
train_trans
 NPHIRHOBGRRTPEFCALIDTWELLDEPTH
01.702168-0.9207481.130650-0.6318760.0310830.4500191.58838015_9-F-11A2600.0
11.573404-1.0206211.092435-0.736154-0.373325-1.0708481.56234915_9-F-11A2600.1
21.407108-1.1424931.030314-0.758080-0.819890-1.0708481.52505515_9-F-11A2600.2
31.260691-1.2630780.956135-0.784153-1.197992-1.7536411.46093415_9-F-11A2600.3
41.189869-1.3679690.837247-0.816586-1.441155-2.2862211.36743215_9-F-11A2600.4
243980.462363-0.2793510.839177-0.704005-0.9106192.0417080.04794115_9-F-1B3399.6
243990.439808-0.2616210.860577-0.694407-0.8269951.5104340.04346615_9-F-1B3399.7
244000.393869-0.2323350.849885-0.653120-0.7740931.5104340.04259115_9-F-1B3399.8
244010.373838-0.1766280.822640-0.569367-0.7826721.5104340.04459615_9-F-1B3399.9
244020.351335-0.1066090.806807-0.535769-0.8190211.5146820.05229215_9-F-1B3400.0

Pair-Plot (after transformation)

sns.pairplot(train_trans, vars=feature_target, diag_kind = 'kde',
             plot_kws = {'alpha': 0.6, 's': 30, 'edgecolor': 'k'})


pairplot transformed

Removing Outliers

Outliers can be removed easily with an API call. The issue is that there are multiple methods that may work with varying efficiency. Let’s run a few and select which one performs best.

from sklearn.ensemble import IsolationForest
from sklearn.covariance import EllipticEnvelope
from sklearn.neighbors import LocalOutlierFactor
from sklearn.svm import OneClassSVM

# Make a copy of train
train_fonly = train_trans.copy()

# Remove WELL, DEPTH
train_fonly = train_fonly.drop(['WELL', 'DEPTH'], axis=1)
train_fonly_names = train_fonly.columns

# Helper function for repeated plotting

def makeboxplot(my_title='enter title',my_data=None):
    _, ax1 = plt.subplots()
    ax1.set_title(my_title, size=15)
    ax1.boxplot(my_data)
    ax1.set_xticklabels(train_fonly_names)
    plt.show()
	
makeboxplot('Unprocessed',train_trans[train_fonly_names])
print('n samples unprocessed:', len(train_fonly))


m0 unprocessed

n samples unprocessed: 24,403

Method 1: Standard Deviation

train_stdev = train_fonly[np.abs(train_fonly - train_fonly.mean()) <= (3 * train_fonly.std())]

# Delete NaN
train_stdev = train_stdev.dropna()

makeboxplot('Method 1: Standard Deviation',train_stdev)
print('Remaining samples:', len(train_stdev))


m1 StDev

Remaining samples: 24,101

Method 2: Isolation Forest

iso = IsolationForest(contamination=0.5)
yhat = iso.fit_predict(train_fonly)
mask = yhat != -1
train_iso = train_fonly[mask]

makeboxplot('Method 2: Isolation Forest',train_iso)
print('Remaining Samples:', len(train_iso))


m2 Iso Forest

Remaining Samples: 12,202

Method 3: Minimum Covariance Determinant

ee = EllipticEnvelope(contamination=0.1)
yhat = ee.fit_predict(train_fonly)
mask = yhat != -1
train_ee = train_fonly[mask]

makeboxplot('Method 3: Minimum Covariance Determinant',train_ee)
print('Remaining samples:', len(train_ee))


m3 MinCovDet

Remaining samples: 21,962

Method 4: Local Outlier Factor

lof = LocalOutlierFactor(contamination=0.3)
yhat = lof.fit_predict(train_fonly)
mask = yhat != -1
train_lof = train_fonly[mask]

makeboxplot('Method 4: Local Outlier Factor',train_lof)
print('Remaining samples:', len(train_lof))


m4 Local Outlier Factor

Remaining samples: 17,082

Method 5: Support Vector Machine

svm = OneClassSVM(nu=0.1)
yhat = svm.fit_predict(train_fonly)
mask = yhat != -1
train_svm = train_fonly[mask]

makeboxplot('Method 5: Support Vector Machine',train_svm)
print('Remaining samples:', len(train_svm))


m5 Support Vector Machine

Remaining samples: 21,964

One-class SVM performs best

Make pair-plot of data after outliers removed.

sns.pairplot(train_svm, vars=feature_target,
             diag_kind='kde',
             plot_kws = {'alpha': 0.6, 's': 30, 'edgecolor': 'k'})


pairplot after outlier removal

Train and Validate

Define the train data as the SVM outlier-removed-data.

# Select columns for features (X) and target (y)
X_train = train_svm[feature_names].values
y_train = train_svm[target_name].values

Define the validation data 
train_trans_copy = train_trans.copy()

train_well_names = ['15_9-F-11A', '15_9-F-1A', '15_9-F-1B']

X_val = []
y_val = []

for i in range(len(train_well_names)):
  # Split the df by log name
  val = train_trans_copy.loc[train_trans_copy['WELL'] == train_well_names[i]]

  # Drop name column 
  val = val.drop(['WELL'], axis=1)

  # Define X_val (feature) and y_val (target)
  X_val_, y_val_ = val[feature_names].values, val[target_name].values
  
  X_val.append(X_val_)
  y_val.append(y_val_)

# save into separate dfs
X_val1, X_val3, X_val4 = X_val
y_val1, y_val3, y_val4 = y_val

Fit to Test and Score on Val

from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor

# Gradient Booster object
model = GradientBoostingRegressor()

# Fit the regressor to the training data
model.fit(X_train, y_train)

# Validation: Predict on well 1
y_pred1 = model.predict(X_val1)
print("R2 Log 1: {}".format(round(model.score(X_val1, y_val1),4)))
rmse = np.sqrt(mean_squared_error(y_val1, y_pred1))
print("RMSE Log 1: {}".format(round(rmse,4)))

# Validation: Predict on well 3
y_pred3 = model.predict(X_val3)
print("R2 Log 3: {}".format(round(model.score(X_val3, y_val3),4)))
rmse = np.sqrt(mean_squared_error(y_val3, y_pred3))
print("RMSE Log 3: {}".format(round(rmse,4)))

# Validation: Predict on well 4
y_pred4 = model.predict(X_val4)
print("R2 Log 4: {}".format(round(model.score(X_val4, y_val4),4)))
rmse = np.sqrt(mean_squared_error(y_val4, y_pred4))
print("RMSE Log 4: {}".format(round(rmse,4)))

R2 Log 1: 0.9526

RMSE Log 1: 0.2338

R2 Log 3: 0.9428

RMSE Log 3: 0.2211

R2 Log 4: 0.8958

RMSE Log 4: 0.2459

This R2 is relatively good with relatively little effort!

Inverse Transformation of Prediction

# Make the transformer fit to the target
y = train[target_name].values
scaler.fit(y.reshape(-1,1))

# Inverse transform  y_val, y_pred
y_val1, y_pred1 = scaler.inverse_transform(y_val1.reshape(-1,1)), scaler.inverse_transform(y_pred1.reshape(-1,1))
y_val3, y_pred3 = scaler.inverse_transform(y_val3.reshape(-1,1)), scaler.inverse_transform(y_pred3.reshape(-1,1))
y_val4, y_pred4 = scaler.inverse_transform(y_val4.reshape(-1,1)), scaler.inverse_transform(y_pred4.reshape(-1,1))

Plot a comparison between train and prediction of the DT feature.

x = [y_val1, y_pred1, y_val3, y_pred3, y_val4, y_pred4]
y = [log1['DEPTH'], log1['DEPTH'], log3['DEPTH'], log3['DEPTH'], log4['DEPTH'], log4['DEPTH']]

color = ['red', 'blue', 'red', 'blue', 'red', 'blue']
title = ['DT Log1 1', 'Pred. DT Log 1', 'DT Log 3', 'Pred DT Log 3',
         'DT Log 4', 'Pred DT Log 4']

fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(15,10))

for i in range(len(x)):
  ax[i].plot(x[i], y[i], color=color[i])
  ax[i].set_xlim(50, 150)
  ax[i].set_ylim(np.max(y[i]), np.min(y[i]))
  ax[i].set_title(title[i])

plt.tight_layout()

plt.show()


DT train vs prediction

Hyperparameter Tuning

This example below is GridSearchCV hyperparameter tuning on Scikit-Learn’s GradientBoostingRegressor, resulting in 31 models playing through all variations.

Different ways of searching hyperparameters are available with automated approaches of narrowing it down in a smarter way.

from sklearn.model_selection import train_test_split

# Define the X and y from the SVM normalized dataset
X = train_svm[feature_names].values
y = train_svm[target_name].values

# Train and test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

The grid search will churn for several minutes and the best scoring parameter combination saved printed.

from sklearn.model_selection import GridSearchCV

model = GradientBoostingRegressor()

# Hyperparameter ranges

max_features = ['auto', 'sqrt']
min_samples_leaf = [1, 4]
min_samples_split = [2, 10]
max_depth = [10, 100]
n_estimators = [100, 1000]

param_grid = {'max_features': max_features,
              'min_samples_leaf': min_samples_leaf,
              'min_samples_split': min_samples_split,
              'max_depth': max_depth,
              'n_estimators': n_estimators}

# Train with grid
model_random = GridSearchCV(model, param_grid, cv=3)
model_random.fit(X_train, y_train)

# Print best model
model_random.best_params_

{‘max_depth’: 100,

‘max_features’: ‘sqrt’,

‘min_samples_leaf’: 4,

‘min_samples_split’: 10,

‘n_estimators’: 1000}

Predict Test Wells

Define the Test Data

# Define the test data 
names_test = ['15_9-F-11B', '15_9-F-1C']

X_test = []
y_test = []
depths = []

for i in range(len(names_test)):
  # split the df with respect to its name
  test = pred.loc[pred['WELL'] == names_test[i]]

  # Drop well name column 
  test = test.drop(['WELL'], axis=1)

  # Define X_test (feature)
  X_test_ = test[feature_names].values

  # Define depth
  depth_ = test['DEPTH'].values
  
  X_test.append(X_test_)
  depths.append(depth_)

# For each well 2 and 5
X_test2, X_test5 = X_test
depth2, depth5 = depths

Transform Test - Predict - Inverse Transform

# Transform X_test of log 2 and 5
X_test2 = scaler.fit_transform(X_test2)
X_test5 = scaler.fit_transform(X_test5)

# Predict to log 2 and 5 with tuned model
y_pred2 = model_random.predict(X_test2)
y_pred5 = model_random.predict(X_test5)

y = train[target_name].values
scaler.fit(y.reshape(-1,1))

# Inverse transform y_pred
y_pred2 = scaler.inverse_transform(y_pred2.reshape(-1,1))
y_pred5 = scaler.inverse_transform(y_pred5.reshape(-1,1))

Plot the Predictions

plt.figure(figsize=(5,12))

plt.subplot(1,2,1)
plt.plot(y_pred2, depth2, color='green')
plt.ylim(max(depth2), min(depth2))
plt.title('Pred DT Log 2: 15_9-F-11B', size=12)

plt.subplot(1,2,2)
plt.plot(y_pred5, depth5, color='green')
plt.ylim(max(depth5), min(depth5))
plt.title('Pred DT Log 5: 15_9-F-1C', size=12)

plt.tight_layout()
plt.show()


DT test predictions

def makeplotpred(df,suptitle_str="pass a suptitle"):
  # Column selection from df
  col_names = ['NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI', 'DT']
  # Plotting titles
  title = ['NPHI', 'RHOB', 'GR', 'RT', 'PEF', 'CALI', 'Predicted DT']
  # plotting colors
  colors = ['purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'green']

  # Create the subplots; ncols equals the number of logs
  fig, ax = plt.subplots(nrows=1, ncols=len(col_names), figsize=(15,10))
  fig.suptitle(suptitle_str, size=20, y=1.05)

  # Looping each log to display in the subplots
  for i in range(len(col_names)):
    if i == 3:
      # for resistivity, semilog plot
      ax[i].semilogx(df[col_names[i]], df['DEPTH'], color=colors[i])
    else:
      # for non-resistivity, normal plot
      ax[i].plot(df[col_names[i]], df['DEPTH'], color=colors[i])
  
    ax[i].set_ylim(max(df['DEPTH']), min(df['DEPTH']))
    ax[i].set_title(title[i], pad=15)
    ax[i].grid(True)

  ax[2].set_xlim(0, 200)
  plt.tight_layout(1)
  plt.show()
makeplotpred(log2,"Log 2: 15_9-F-11B")


DT test predictions

makeplotpred(log5,"Log 5: 15_9-F-1C")


DT test predictions



© 2023. All rights reserved. Hosted on GitHub, made with https://hydejack.com/