Bayesian Optimization using OPTIMEO

Single outcome

First, we generate some data

Let’s create an experimental_data(temp, conc) function that simulates the yield of a chemical reaction based on temperature and concentration. The yield is a function of these two variables, and we will use Bayesian Optimization to find the position of the maximum in the minimum number of experiments.

In the following block, we just plot this function to see what it looks like(but in real life, you have no idea what the function looks like). We also create an experimental data set where we have already done some experiments (they are the red crosses in the plot).

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = "notebook"

def experimental_data(temp, conc):
    """
    This function simulates experimental data based on temperature and concentration.
    The function is a Gaussian-like function that peaks at certain temperature and concentration values.
    The function is not based on any real experimental data and is purely for demonstration purposes.
    """
    out = np.exp(-((temp - 50) ** 2)/1000)*np.exp(-((conc - .50) ** 2)/.05)-.9*np.exp(-((temp - 45) ** 2)/100)*np.exp(-((conc - .450) ** 2)/.05)
    return out

def generate_data(N=100):
    temp = np.linspace(0, 100, N)
    conc = np.linspace(0, 1, N)
    data = np.meshgrid(temp, conc)
    temp, conc = data[0].flatten(), data[1].flatten()
    exp_data = experimental_data(temp, conc)
    df = pd.DataFrame({'Temperature': temp, 'Concentration': conc, 'Yield': exp_data})
    return df

df = generate_data(500)

# Prepare data for 3D surface plot
unique_temps = np.unique(df['Temperature'])
unique_concs = np.unique(df['Concentration'])
Z = df.pivot(index='Concentration', columns='Temperature', values='Yield').values
# find the maximum yield and its position
max_yield = np.max(Z)
max_pos = np.unravel_index(np.argmax(Z), Z.shape)
max_temp = unique_temps[max_pos[1]]
max_conc = unique_concs[max_pos[0]]

# Create the contour plot
fig = go.Figure(data=[go.Contour(
    z=Z,
    x=unique_temps,
    y=unique_concs,
    colorscale='Viridis',
    contours=dict(coloring='heatmap', size=0.1),
    hovertemplate='Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{z}<extra></extra>'
)])

# Update layout with legend at the top, below the title
fig.update_layout(
    xaxis_title='Temperature',
    yaxis_title='Concentration',
    legend=dict(
        x=0.5,  # Center the legend horizontally
        y=1.05,  # Place the legend just below the title
        orientation='h',  # Horizontal orientation
        xanchor='center',
        yanchor='bottom'
    )
)

# Add a marker for the maximum yield
fig.add_trace(go.Scatter(
    x=[max_temp],
    y=[max_conc],
    mode='markers',
    marker=dict(size=10, color='red', symbol='x'),
    name='Max Yield that we want to determine through experiments',
    customdata=[[max_yield]],  # Add max_yield as custom data
    hovertemplate='Max Yield:<br>Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{customdata[0]}<extra></extra>'
))

# Create some sample data
df_sample = generate_data(2)

# Add the experimental data on the plot
fig.add_trace(go.Scatter(
    x=df_sample['Temperature'],
    y=df_sample['Concentration'],
    mode='markers',
    marker=dict(size=10, color='white', symbol='circle',
                line=dict(width=2, color='black')),
    name='Experimental measurements we have so far',
))

fig.show()

# Save the experimental data to a CSV file
df_sample.to_csv('experimental_data.csv', index=False)

Bayesian Optimization

Now, we will use the OPTIMEO package to perform Bayesian Optimization. First, we load the data in the right format.

from optimeo.bo import *

# there is only one outcome here, in the last column (-1)
features, outcomes = read_experimental_data('experimental_data.csv', out_pos=[-1])
print(f"Features:\n{features}")
print(f"Outcomes:\n{outcomes}")

Features:
{'Temperature': {'type': 'float', 'data': [0.0, 100.0, 0.0, 100.0], 'range': [np.float64(0.0), np.float64(100.0)]}, 'Concentration': {'type': 'float', 'data': [0.0, 0.0, 1.0, 1.0], 'range': [np.float64(0.0), np.float64(1.0)]}}
Outcomes:
{'Yield': {'type': 'float', 'data': [0.0005530843449776, 0.0005530843701466, 0.0005530843667414, 0.0005530843701476]}}

The ranges and types of the features are automatically determined from the data (see the data printed above). In case you want to change the ranges or types, you can do so editing the corresponding fields in the features dict or by providing a ranges dict. The ranges should be in the format {'feature_name': [minvalue,maxvalue]}. If the ranges are not provided either in features or in ranges, they will be determined from the data.

ranges = {'Temperature': [-10,100]}

Now, let’s create the BOExperiment object. This object contains the data, the features, and the model. The model is a Gaussian Process with a Matern kernel.

bo = BOExperiment(
    features=features, 
    outcomes=outcomes,
    # ranges=ranges,
    N = 1, # number of new points to generate
    maximize=True, # we want to maximize the response
    outcome_constraints=None,
    fixed_features=None, # fixed features are not used here
    # but they can be added as fixed_features = {Temperature: 50, Concentration: 0.5}
    feature_constraints=None, # feature constraints are not used here
    # but they can be added as 
    # feature_constraints = ['Concentration + Temperature <= 200']
    optim = 'sobol', # sobol is used to randomly generate the new points
    # to actually optimize, use optim = 'bo'
)

bo


BOExperiment(
    N=1,
    maximize={'Yield': True},
    outcome_constraints=None,
    feature_constraints=None,
    optim=sobol
)

Input data:

   Temperature  Concentration     Yield
0          0.0            0.0  0.000553
1        100.0            0.0  0.000553
2          0.0            1.0  0.000553
3        100.0            1.0  0.000553

new_points = bo.suggest_next_trials()
print(f"New points to sample:\n{new_points}")
fig = bo.plot_model()
fig.show()

New points to sample:
   Temperature  Concentration  Predicted_Yield
0    35.034025       0.139437         0.000553

Now let’s do the optimization. We will first perform 6 random point generations, then we will use the Bayesian Optimization algorithm to find the maximum of the function. The algorithm will use the Gaussian Process model to predict the function value at each point and will use the expected improvement criterion to select the next point to evaluate.

for i in range(30): #let's do 30 iterations
    if i==6:
        bo.optim = 'bo' # change to BO optimization after 6 iterations
    # simulate the new points
    new = bo.suggest_next_trials(with_predicted=False)
    newT = new['Temperature'].values
    newC = new['Concentration'].values
    # perform an experiment to measure the response at these points
    # here we just simulate the response using the experimental data function
    # in a real experiment, you would measure the response at these points
    # and add the new points to the experimental data
    measured_yield = experimental_data(newT, newC)
    # add the new points to the experimental data
    bo.update_experiment(params   = {'Temperature':newT, 'Concentration':newC}, 
                         outcomes = {'Yield': measured_yield})

Now le’ts plot the model:

bo.plot_model()

print(f"Best parameters from BO:")
print(bo.get_best_parameters())
print(f"Expected best parameters:")
print(pd.DataFrame({'Temperature': max_temp, 'Concentration': max_conc, 'Yield':max_yield}, index=[0]))

Best parameters from BO:
   Temperature  Concentration     Yield
0    61.220096       0.508693  0.819884
Expected best parameters:
   Temperature  Concentration     Yield
0    61.322645       0.503006  0.820258

And the convergence plot: we see here that the maximum was found after the 13th iteration. You can play on the number of random points to see how it affects the convergence.

bo.plot_optimization_trace(optimum=max_yield)

Two outcomes

First, we generate some data

Like before, we will generate some data. We will use the same function for the yield, and andd another function for the price of the experiment: we will want to maximize rthe yield and minimize the price. The price is a function of the temperature and concentration, but it is not a function of the yield.

from plotly.subplots import make_subplots

def price(temp, conc):
    """
    This function simulates the price of the experiment.
    """
    out = (np.exp(-((temp - 45) ** 2)/2000)*np.exp(-((conc - .350) ** 2)/.08)-1.2*np.exp(-((temp - 55) ** 2)/150)*np.exp(-((conc - .250) ** 2)/.05))*100+150
    return out

def generate_data(N=100):
    temp = np.linspace(0, 100, N)
    conc = np.linspace(0, 1, N)
    data = np.meshgrid(temp, conc)
    temp, conc = data[0].flatten(), data[1].flatten()
    exp_data = experimental_data(temp, conc)
    price_data = price(temp, conc)
    df = pd.DataFrame({'Temperature': temp, 'Concentration': conc, 'Yield': exp_data, 'Price': price_data})
    return df

df = generate_data(500)

# Prepare data for 3D surface plot
unique_temps = np.unique(df['Temperature'])
unique_concs = np.unique(df['Concentration'])
Z = df.pivot(index='Concentration', columns='Temperature', values='Yield').values
ZZ = df.pivot(index='Concentration', columns='Temperature', values='Price').values
# find the maximum yield and its position
max_yield = np.max(Z)
max_pos = np.unravel_index(np.argmax(Z), Z.shape)
max_temp = unique_temps[max_pos[1]]
max_conc = unique_concs[max_pos[0]]
# min price
min_price = np.min(ZZ)
min_pos = np.unravel_index(np.argmin(ZZ), ZZ.shape)
minp_temp = unique_temps[min_pos[1]]
minp_conc = unique_concs[min_pos[0]]

# Create the subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Yield', 'Price'))

# Add the contour plot for Yield (Z)
fig.add_trace(
    go.Contour(
        z=Z,
        x=unique_temps,
        y=unique_concs,
        colorscale='Viridis',
        contours=dict(coloring='heatmap', size=0.1),
        hovertemplate='Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{z}<extra></extra>'
    ),
    row=1, col=1
)

# Add the contour plot for Price (ZZ)
fig.add_trace(
    go.Contour(
        z=ZZ,
        x=unique_temps,
        y=unique_concs,
        colorscale='Viridis',
        contours=dict(coloring='heatmap', size=0.1),
        hovertemplate='Temperature: %{x}<br>Concentration: %{y}<br>Price: %{z}<extra></extra>'
    ),
    row=1, col=2
)

# Update layout with legend at the top, below the title
fig.update_layout(
    xaxis_title='Temperature',
    yaxis_title='Concentration',
    xaxis2_title='Temperature',
    yaxis2_title='Concentration',
    legend=dict(
        x=0.5,  # Center the legend horizontally
        y=1.05,  # Place the legend just below the title
        orientation='h',  # Horizontal orientation
        xanchor='center',
        yanchor='bottom'
    )
)

# Add a marker for the maximum yield
fig.add_trace(
    go.Scatter(
        x=[max_temp],
        y=[max_conc],
        mode='markers',
        marker=dict(size=10, color='red', symbol='x'),
        name='Max Yield',
        customdata=[[max_yield]],  # Add max_yield as custom data
        hovertemplate='Max Yield:<br>Temperature: %{x}<br>Concentration: %{y}<br>Yield: %{customdata[0]}<extra></extra>'
    ),
    row=1, col=1
)

# Add a marker for the minimum price
fig.add_trace(
    go.Scatter(
        x=[minp_temp],
        y=[minp_conc],
        mode='markers',
        marker=dict(size=10, color='orange', symbol='x'),
        name='Min Price',
        customdata=[[min_price]],  # Add min_price as custom data
        hovertemplate='Min Price:<br>Temperature: %{x}<br>Concentration: %{y}<br>Price: %{customdata[0]}<extra></extra>'
    ),
    row=1, col=2
)

# Create some sample data
df_sample = generate_data(2)

# Add the experimental data on the plot for Yield
fig.add_trace(
    go.Scatter(
        x=df_sample['Temperature'],
        y=df_sample['Concentration'],
        mode='markers',
        marker=dict(size=10, color='white', symbol='circle',
                    line=dict(width=2, color='black')),
        name='Experimental Measurements (Yield)',
    ),
    row=1, col=1
)

# Add the experimental data on the plot for Price
fig.add_trace(
    go.Scatter(
        x=df_sample['Temperature'],
        y=df_sample['Concentration'],
        mode='markers',
        marker=dict(size=10, color='white', symbol='circle',
                    line=dict(width=2, color='black')),
        name='Experimental Measurements (Price)',
    ),
    row=1, col=2
)

fig.show()

df_sample.to_csv('experimental_data2.csv', index=False)

Let’s now read the experimental data and create our BOExperiment object. The data is in the same format as before, but we have two outcomes: yield and price. The features are the same as before, but we have to specify that we have two outcomes. We also have to specify the type of optimization: we want to maximize the yield and minimize the price.

features, outcomes = read_experimental_data('experimental_data2.csv', out_pos=[-2,-1])
print(f"- Features:\n{features}")
print(f"- Outcomes:\n{outcomes}")

- Features:
{'Temperature': {'type': 'float', 'data': [0.0, 100.0, 0.0, 100.0], 'range': [np.float64(0.0), np.float64(100.0)]}, 'Concentration': {'type': 'float', 'data': [0.0, 0.0, 1.0, 1.0], 'range': [np.float64(0.0), np.float64(1.0)]}}
- Outcomes:
{'Yield': {'type': 'float', 'data': [0.0005530843449776, 0.0005530843701466, 0.0005530843667414, 0.0005530843701476]}, 'Price': {'type': 'float', 'data': [157.85712040284733, 154.76553732340065, 150.18478176220222, 150.11207580199311]}}

bo = BOExperiment(
    features=features, 
    outcomes=outcomes,
    N = 1, # number of new points to generate
    maximize={'Yield':True, 'Price':False}, # we want to maximize the response
    outcome_constraints=None,
    fixed_features=None, # fixed features are not used here
    # but they can be added as fixed_features = {Temperature: 50, Concentration: 0.5}
    feature_constraints=None, # feature constraints are not used here
    # but they can be added as 
    # feature_constraints = ['Concentration + Temperature <= 200']
    optim = 'sobol', # sobol is used to randomly generate the new points
)

Let’s do the optimization:

for i in range(50): #let's do 50 iterations
    if i==6:
        bo.optim = 'bo' # change to BO optimization after 6 iterations
    # simulate the new points
    new = bo.suggest_next_trials(with_predicted=False)
    newT = new['Temperature'].values
    newC = new['Concentration'].values
    # perform an experiment to measure the response at these points
    # here we just simulate the response using the experimental data function
    # in a real experiment, you would measure the response at these points
    # and add the new points to the experimental data
    measured_yield = experimental_data(newT, newC)
    measured_price = price(newT, newC)
    # add the new points to the experimental data
    bo.update_experiment(params   = {'Temperature':newT, 'Concentration':newC}, 
                         outcomes = {'Yield': measured_yield, 'Price': measured_price})

Now let’s plot the model:

figs = [bo.plot_model(metricname=mname) for mname in bo.out_names]
for fig in figs:
    fig.show()

The get_best_parameters() function returns the best parameters found so far. In the case of multiple outcomes, it will be an ensemble of points.

bo.get_best_parameters()

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:166: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:167: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:166: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:167: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

[INFO 04-17 10:14:59] ax.service.utils.best_point: Using inferred objective thresholds: [ObjectiveThreshold(Price <= 197.16387577527303), ObjectiveThreshold(Yield >= 0.010359901918938652)], as objective thresholds were not specified as part of the optimization configuration on the experiment.

	Temperature	Concentration	Price	Yield
0	57.897013	0.331592	142.280413	0.404607
1	58.694735	0.339458	147.550978	0.445939
2	58.301005	0.356022	152.340603	0.488407
3	58.303991	0.314050	137.284781	0.361327
4	58.747299	0.366543	157.370179	0.530613
5	57.562162	0.305640	132.196824	0.321728
6	58.705903	0.381138	162.289523	0.573660
7	58.533775	0.283291	128.347919	0.280069
8	58.930060	0.394457	167.213936	0.617058
9	57.289987	0.285081	124.921503	0.261351
10	57.088387	0.274983	121.542106	0.232607
11	59.193792	0.408026	171.929492	0.659473
12	56.969153	0.263560	118.304074	0.204246
13	57.678023	0.434807	176.539257	0.687853
14	57.285682	0.247100	115.410368	0.175324
15	59.669112	0.437953	180.357960	0.737872
16	56.546327	0.237377	112.127660	0.145017
17	56.255879	0.220583	109.423382	0.113278
18	59.920463	0.466747	185.605804	0.787920
19	55.874548	0.202481	107.654696	0.083816
20	61.063312	0.488886	189.026678	0.818379

You can also plot the Pareto frontier, and then decide what is the best compromise you are prepared to make between the two outcomes. The Pareto frontier is the set of points that are not dominated by any other point. A point is dominated if there is another point that is better in both outcomes.

Here, you basically see that there are two choices for you, either high yield and high price, or low yield and low price.

bo.plot_pareto_frontier()

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:166: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:167: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:166: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

/opt/homebrew/lib/python3.13/site-packages/botorch/optim/utils/numpy_utils.py:167: DeprecationWarning:

__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword