optimeo.bo
This module provides a class for optimizing experiments using Bayesian Optimization (BO) with the Ax platform. It includes methods for initializing the experiment, suggesting trials, predicting outcomes, and plotting results.
You can see an example notebook here.
1# Copyright (c) 2025 Colin BOUSIGE 2# Contact: colin.bousige@cnrs.fr 3# 4# This program is free software: you can redistribute it and/or modify 5# it under the terms of the MIT License as published by 6# the Free Software Foundation, either version 3 of the License, or 7# any later version. 8 9""" 10This module provides a class for optimizing experiments using Bayesian Optimization (BO) with the [Ax platform](https://ax.dev/). 11It includes methods for initializing the experiment, suggesting trials, predicting outcomes, and plotting results. 12 13You can see an example notebook [here](../examples/bo.ipynb). 14 15""" 16 17import warnings 18warnings.simplefilter(action='ignore', category=FutureWarning) 19warnings.simplefilter(action='ignore', category=DeprecationWarning) 20warnings.simplefilter(action='ignore', category=UserWarning) 21warnings.simplefilter(action='ignore', category=RuntimeError) 22 23import numpy as np 24import pandas as pd 25import random 26from janitor import clean_names 27from typing import Any, Dict, List, Optional, Union, Tuple 28 29from ax.core.observation import ObservationFeatures, TrialStatus 30from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy 31from ax.modelbridge.modelbridge_utils import get_pending_observation_features 32from ax.modelbridge.registry import Models 33from ax.plot.contour import interact_contour, plot_contour 34from ax.plot.pareto_frontier import plot_pareto_frontier 35from ax.plot.pareto_utils import compute_posterior_pareto_frontier 36from ax.plot.slice import plot_slice 37from ax.service.ax_client import AxClient, ObjectiveProperties 38from botorch.acquisition.analytic import * 39import plotly.graph_objects as go 40import plotly.express as px 41import re 42import matplotlib.cm as cm 43 44# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 45 46class BOExperiment: 47 """ 48 BOExperiment is a class designed to facilitate Bayesian Optimization experiments using the [Ax platform](https://ax.dev/). 49 It encapsulates the experiment setup, including features, outcomes, constraints, and optimization methods. 50 51 Parameters 52 ---------- 53 features: Dict[str, Dict[str, Any]] 54 A dictionary defining the features of the experiment, including their types and ranges. 55 Each feature is represented as a dictionary with keys 'type', 'data', and 'range'. 56 - 'type': The type of the feature (e.g., 'int', 'float', 'text'). 57 - 'data': The observed data for the feature. 58 - 'range': The range of values for the feature. 59 outcomes: Dict[str, Dict[str, Any]] 60 A dictionary defining the outcomes of the experiment, including their types and observed data. 61 Each outcome is represented as a dictionary with keys 'type' and 'data'. 62 - 'type': The type of the outcome (e.g., 'int', 'float'). 63 - 'data': The observed data for the outcome. 64 ranges: Optional[Dict[str, Dict[str, Any]]] 65 A dictionary defining the ranges of the features. Default is `None`. 66 If not provided, the ranges will be inferred from the features data. 67 The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. 68 N: int 69 The number of trials to suggest in each optimization step. Must be a positive integer. 70 maximize: Union[bool, Dict[str, bool]] 71 A boolean or dict indicating whether to maximize the outcomes in the form `{'outcome1':True, 'outcome2':False}`. 72 If a single boolean is provided, it is applied to all outcomes. Default is `True`. 73 fixed_features: Optional[Dict[str, Any]] 74 A dictionary defining fixed features with their values. Default is `None`. 75 If provided, the fixed features will be treated as fixed parameters in the generation process. 76 The fixed features should be in the format `{'feature_name': value}`. 77 The values should be the fixed values for the respective features. 78 outcome_constraints: Optional[List[str]] 79 Constraints on the outcomes, specified as a list of strings. Default is `None`. 80 The constraints should be in the format `{'outcome_name': [minvalue,maxvalue]}`. 81 feature_constraints: Optional[List[str]] 82 Constraints on the features, specified as a list of strings. Default is `None`. 83 The constraints should be in the format `{'feature_name': [minvalue,maxvalue]}`. 84 optim: str 85 The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. Default is 'bo'. 86 acq_func: Optional[Dict[str, Any]] 87 The acquisition function to use for the optimization process. It must be a dict with 2 keys: 88 - `acqf`: the acquisition function class to use (e.g., `UpperConfidenceBound`), 89 - `acqf_kwargs`: a dict of the kwargs to pass to the acquisition function class. (e.g. `{'beta': 0.1}`). 90 91 If not provided, the default acquisition function is used (`LogExpectedImprovement` or `qLogExpectedImprovement` if N>1). 92 93 Attributes 94 ---------- 95 96 features: Dict[str, Dict[str, Any]] 97 A dictionary defining the features of the experiment, including their types and ranges. 98 outcomes: Dict[str, Dict[str, Any]] 99 A dictionary defining the outcomes of the experiment, including their types and observed data. 100 N: int 101 The number of trials to suggest in each optimization step. Must be a positive integer. 102 maximize: Union[bool, List[bool]] 103 A boolean or list of booleans indicating whether to maximize the outcomes. 104 If a single boolean is provided, it is applied to all outcomes. 105 outcome_constraints: Optional[Dict[str, Dict[str, float]]] 106 Constraints on the outcomes, specified as a dictionary or list of dictionaries. 107 feature_constraints: Optional[List[Dict[str, Any]]] 108 Constraints on the features, specified as a list of dictionaries. 109 optim: str 110 The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. 111 data: pd.DataFrame 112 A DataFrame representing the current data in the experiment, including features and outcomes. 113 acq_func: dict 114 The acquisition function to use for the optimization process. 115 generator_run: 116 The generator run for the experiment, used to generate new candidates. 117 model: 118 The model used for predictions in the experiment. 119 ax_client: 120 The AxClient for the experiment, used to manage trials and data. 121 gs: 122 The generation strategy for the experiment, used to generate new candidates. 123 parameters: 124 The parameters for the experiment, including their types and ranges. 125 names: 126 The names of the features in the experiment. 127 fixed_features: 128 The fixed features for the experiment, used to generate new candidates. 129 candidate: 130 The candidate(s) suggested by the optimization process. 131 132 133 Methods 134 ------- 135 136 - <b>initialize_ax_client()</b>: 137 Initializes the AxClient with the experiment's parameters, objectives, and constraints. 138 - <b>suggest_next_trials()</b>: 139 Suggests the next set of trials based on the current model and optimization strategy. 140 Returns a DataFrame containing the suggested trials and their predicted outcomes. 141 - <b>predict(params: List[Dict[str, Any]]) -> List[Dict[str, float]]</b>: 142 Predicts the outcomes for a given set of parameters using the current model. 143 Returns a list of predicted outcomes for the given parameters. 144 - <b>update_experiment(params: Dict[str, Any], outcomes: Dict[str, Any])</b>: 145 Updates the experiment with new parameters and outcomes, and reinitializes the AxClient. 146 - <b>plot_model(metricname: Optional[str] = None, slice_values: Optional[Dict[str, Any]] None, linear: bool = False)`</b>: 147 Plots the model's predictions for the experiment's parameters and outcomes. 148 If metricname is None, the first outcome metric is used. 149 If slice_values is provided, it slices the plot at those values. 150 If linear is True, it plots a linear slice plot. 151 If the experiment has only one feature, it plots a slice plot. 152 If the experiment has multiple features, it plots a contour plot. 153 Returns a Plotly figure of the model's predictions. 154 - <b>plot_optimization_trace(optimum: Optional[float] = None)</b>: 155 Plots the optimization trace, showing the progress of the optimization over trials. 156 If the experiment has multiple outcomes, it raises a warning and returns None. 157 Returns a Plotly figure of the optimization trace. 158 - <b>plot_pareto_frontier()</b>: 159 Plots the Pareto frontier for multi-objective optimization experiments. 160 If the experiment has only one outcome, it raises a warning and returns None. 161 Returns a Plotly figure of the Pareto frontier. 162 - <b>get_best_parameters() -> pd.DataFrame</b>: 163 Returns the best parameters found by the optimization process. 164 If the experiment has multiple outcomes, it returns a DataFrame of the Pareto optimal parameters. 165 If the experiment has only one outcome, it returns a DataFrame of the best parameters and their outcomes. 166 The DataFrame contains the best parameters and their corresponding outcomes. 167 - <b>clear_trials()</b>: 168 Clears all trials in the experiment. 169 This is useful for resetting the experiment before suggesting new trials. 170 - <b>set_model()</b>: 171 Sets the model to be used for predictions. 172 This method is called after initializing the AxClient. 173 - <b>set_gs()</b>: 174 Sets the generation strategy for the experiment. 175 This method is called after initializing the AxClient. 176 177 178 Example 179 ------- 180 ```python 181 features, outcomes = read_experimental_data('data.csv', out_pos=[-2, -1]) 182 experiment = BOExperiment(features, 183 outcomes, 184 N=5, 185 maximize={'out1':True, 'out2':False} 186 ) 187 experiment.suggest_next_trials() 188 experiment.plot_model(metricname='outcome1') 189 experiment.plot_model(metricname='outcome2', linear=True) 190 experiment.plot_model(metricname='outcome1', slice_values={'feature1': 5}) 191 experiment.plot_optimization_trace() 192 experiment.plot_pareto_frontier() 193 experiment.get_best_parameters() 194 experiment.update_experiment({'feature1': [4]}, {'outcome1': [0.4]}) 195 experiment.plot_model() 196 experiment.plot_optimization_trace() 197 experiment.plot_pareto_frontier() 198 experiment.get_best_parameters() 199 ``` 200 """ 201 202 def __init__(self, 203 features: Dict[str, Dict[str, Any]], 204 outcomes: Dict[str, Dict[str, Any]], 205 ranges: Optional[Dict[str, Dict[str, Any]]] = None, 206 N=1, 207 maximize: Union[bool, Dict[str, bool]] = True, 208 fixed_features: Optional[Dict[str, Any]] = None, 209 outcome_constraints: Optional[List[str]] = None, 210 feature_constraints: Optional[List[str]] = None, 211 optim='bo', 212 acq_func=None, 213 seed=42) -> None: 214 self._first_initialization_done = False 215 self.ranges = ranges 216 self.features = features 217 self.names = list(self._features.keys()) 218 self.fixed_features = fixed_features 219 self.outcomes = outcomes 220 self.N = N 221 self.maximize = maximize 222 self.outcome_constraints = outcome_constraints 223 self.feature_constraints = feature_constraints 224 self.optim = optim 225 self.acq_func = acq_func 226 self.seed = seed 227 self.candidate = None 228 """The candidate(s) suggested by the optimization process.""" 229 self.ax_client = None 230 """Ax's client for the experiment.""" 231 self.model = None 232 """Ax's Gaussian Process model.""" 233 self.parameters = None 234 """Ax's parameters for the experiment.""" 235 self.generator_run = None 236 """Ax's generator run for the experiment.""" 237 self.gs = None 238 """Ax's generation strategy for the experiment.""" 239 self.initialize_ax_client() 240 self.Nmetrics = len(self.ax_client.objective_names) 241 """The number of metrics in the experiment.""" 242 self._first_initialization_done = True 243 """To indicate that the first initialization is done so that we don't call `initialize_ax_client()` again.""" 244 self.pareto_frontier = None 245 """The Pareto frontier for multi-objective optimization experiments.""" 246 247 @property 248 def seed(self) -> int: 249 """Random seed for reproducibility. Default is 42.""" 250 return self._seed 251 252 @seed.setter 253 def seed(self, value: int): 254 """Set the random seed.""" 255 if isinstance(value, int): 256 self._seed = value 257 else: 258 raise Warning("Seed must be an integer. Using default seed 42.") 259 self._seed = 42 260 random.seed(self.seed) 261 np.random.seed(self.seed) 262 263 @property 264 def features(self): 265 """ 266 A dictionary defining the features of the experiment, including their types and ranges. 267 268 Example 269 ------- 270 ```python 271 features = { 272 'feature1': {'type': 'int', 273 'data': [1, 2, 3], 274 'range': [1, 3]}, 275 'feature2': {'type': 'float', 276 'data': [0.1, 0.2, 0.3], 277 'range': [0.1, 0.3]}, 278 'feature3': {'type': 'text', 279 'data': ['A', 'B', 'C'], 280 'range': ['A', 'B', 'C']} 281 } 282 ``` 283 """ 284 return self._features 285 286 @features.setter 287 def features(self, value): 288 """ 289 Set the features of the experiment with validation. 290 """ 291 if not isinstance(value, dict): 292 raise ValueError("features must be a dictionary") 293 self._features = value 294 for name in self._features.keys(): 295 if self.ranges and name in self.ranges.keys(): 296 self._features[name]['range'] = self.ranges[name] 297 else: 298 if self._features[name]['type'] == 'text': 299 self._features[name]['range'] = list(set(self._features[name]['data'])) 300 elif self._features[name]['type'] == 'int': 301 self._features[name]['range'] = [int(np.min(self._features[name]['data'])), 302 int(np.max(self._features[name]['data']))] 303 elif self._features[name]['type'] == 'float': 304 self._features[name]['range'] = [float(np.min(self._features[name]['data'])), 305 float(np.max(self._features[name]['data']))] 306 if self._first_initialization_done: 307 self.initialize_ax_client() 308 309 @property 310 def ranges(self): 311 """ 312 A dictionary defining the ranges of the features. Default is `None`. 313 314 If not provided, the ranges will be inferred from the features data. 315 The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. 316 """ 317 return self._ranges 318 319 @ranges.setter 320 def ranges(self, value): 321 """ 322 Set the ranges of the features with validation. 323 """ 324 if value is not None: 325 if not isinstance(value, dict): 326 raise ValueError("ranges must be a dictionary") 327 self._ranges = value 328 329 @property 330 def names(self): 331 """ 332 The names of the features. 333 """ 334 return self._names 335 336 @names.setter 337 def names(self, value): 338 """ 339 Set the names of the features. 340 """ 341 if not isinstance(value, list): 342 raise ValueError("names must be a list") 343 self._names = value 344 345 @property 346 def outcomes(self): 347 """ 348 A dictionary defining the outcomes of the experiment, including their types and observed data. 349 350 Example 351 ------- 352 ```python 353 outcomes = { 354 'outcome1': {'type': 'float', 355 'data': [0.1, 0.2, 0.3]}, 356 'outcome2': {'type': 'float', 357 'data': [1.0, 2.0, 3.0]} 358 } 359 ``` 360 """ 361 return self._outcomes 362 363 @outcomes.setter 364 def outcomes(self, value): 365 """ 366 Set the outcomes of the experiment with validation. 367 """ 368 if not isinstance(value, dict): 369 raise ValueError("outcomes must be a dictionary") 370 self._outcomes = value 371 self.out_names = list(value.keys()) 372 if self._first_initialization_done: 373 self.initialize_ax_client() 374 375 @property 376 def fixed_features(self): 377 """ 378 A dictionary defining fixed features with their values. Default is `None`. 379 If provided, the fixed features will be treated as fixed parameters in the generation process. 380 The fixed features should be in the format `{'feature_name': value}`. 381 The values should be the fixed values for the respective features. 382 """ 383 return self._fixed_features 384 385 @fixed_features.setter 386 def fixed_features(self, value): 387 """ 388 Set the fixed features of the experiment. 389 """ 390 self._fixed_features = None 391 if value is not None: 392 if not isinstance(value, dict): 393 raise ValueError("fixed_features must be a dictionary") 394 for name in value.keys(): 395 if name not in self.names: 396 raise ValueError(f"Fixed feature '{name}' not found in features") 397 # fixed_features should be an ObservationFeatures object 398 self._fixed_features = ObservationFeatures(parameters=value) 399 if self._first_initialization_done: 400 self.set_gs() 401 402 @property 403 def N(self): 404 """ 405 The number of trials to suggest in each optimization step. Must be a positive integer. Default is `1`. 406 """ 407 return self._N 408 409 @N.setter 410 def N(self, value): 411 """ 412 Set the number of trials to suggest in each optimization step with validation. 413 """ 414 if not isinstance(value, int) or value <= 0: 415 raise ValueError("N must be a positive integer") 416 self._N = value 417 if self._first_initialization_done: 418 self.set_gs() 419 420 @property 421 def maximize(self): 422 """ 423 A boolean or dict indicating whether to maximize the outcomes in the form `{'outcome1':True, 'outcome2':False}`. 424 If a single boolean is provided, it is applied to all outcomes. Default is `True`. 425 """ 426 return self._maximize 427 428 @maximize.setter 429 def maximize(self, value): 430 """ 431 Set the maximization setting for the outcomes with validation. 432 """ 433 if isinstance(value, bool): 434 self._maximize = {out: value for out in self.out_names} 435 elif isinstance(value, dict) and len(value) == len(self._outcomes): 436 self._maximize = {k:v for k,v in value.items() if 437 (k in self.out_names and isinstance(v, bool))} 438 else: 439 raise ValueError("maximize must be a boolean or a list of booleans with the same length as outcomes") 440 if self._first_initialization_done: 441 self.initialize_ax_client() 442 443 @property 444 def outcome_constraints(self): 445 """ 446 Constraints on the outcomes, specified as a list of strings. Default is `None`. 447 """ 448 return self._outcome_constraints 449 450 @outcome_constraints.setter 451 def outcome_constraints(self, value): 452 """ 453 Set the outcome constraints of the experiment with validation. 454 """ 455 if isinstance(value, str): 456 self._outcome_constraints = [value] 457 elif isinstance(value, list): 458 self._outcome_constraints = value 459 else: 460 self._outcome_constraints = None 461 if self._first_initialization_done: 462 self.initialize_ax_client() 463 464 @property 465 def feature_constraints(self): 466 """ 467 Constraints on the features, specified as a list of strings. Default is `None`. 468 469 Example 470 ------- 471 ```python 472 feature_constraints = [ 473 'feature1 <= 10.0', 474 'feature1 + 2*feature2 >= 3.0' 475 ] 476 ``` 477 """ 478 return self._feature_constraints 479 480 @feature_constraints.setter 481 def feature_constraints(self, value): 482 """ 483 Set the feature constraints of the experiment with validation. 484 """ 485 if isinstance(value, dict): 486 self._feature_constraints = [value] 487 elif isinstance(value, list): 488 self._feature_constraints = value 489 elif isinstance(value, str): 490 self._feature_constraints = [value] 491 else: 492 self._feature_constraints = None 493 if self._first_initialization_done: 494 self.initialize_ax_client() 495 496 @property 497 def optim(self): 498 """ 499 The optimization method to use, either `'bo'` for Bayesian Optimization or `'sobol'` for Sobol sequence. Default is `'bo'`. 500 """ 501 return self._optim 502 503 @optim.setter 504 def optim(self, value): 505 """ 506 Set the optimization method with validation. 507 """ 508 value = value.lower() 509 if value not in ['bo', 'sobol']: 510 raise ValueError("Optimization method must be either 'bo' or 'sobol'") 511 self._optim = value 512 if self._first_initialization_done: 513 self.set_gs() 514 515 @property 516 def data(self) -> pd.DataFrame: 517 """ 518 Returns a DataFrame of the current data in the experiment, including features and outcomes. 519 """ 520 feature_data = {name: info['data'] for name, info in self._features.items()} 521 outcome_data = {name: info['data'] for name, info in self._outcomes.items()} 522 data_dict = {**feature_data, **outcome_data} 523 return pd.DataFrame(data_dict) 524 525 @data.setter 526 def data(self, value: pd.DataFrame): 527 """ 528 Sets the features and outcomes data from a given DataFrame. 529 """ 530 if not isinstance(value, pd.DataFrame): 531 raise ValueError("Data must be a pandas DataFrame") 532 533 feature_columns = [col for col in value.columns if col in self._features] 534 outcome_columns = [col for col in value.columns if col in self._outcomes] 535 536 for col in feature_columns: 537 self._features[col]['data'] = value[col].tolist() 538 539 for col in outcome_columns: 540 self._outcomes[col]['data'] = value[col].tolist() 541 542 if self._first_initialization_done: 543 self.initialize_ax_client() 544 545 @property 546 def pareto_frontier(self): 547 """ 548 The Pareto frontier for multi-objective optimization experiments. 549 """ 550 return self._pareto_frontier 551 552 @pareto_frontier.setter 553 def pareto_frontier(self, value): 554 """ 555 Set the Pareto frontier of the experiment. 556 """ 557 self._pareto_frontier = value 558 559 560 @property 561 def acq_func(self): 562 """ 563 The acquisition function to use for the optimization process. It must be a dict with 2 keys: 564 - `acqf`: the acquisition function class to use (e.g., `UpperConfidenceBound`), 565 - `acqf_kwargs`: a dict of the kwargs to pass to the acquisition function class. (e.g. `{'beta': 0.1}`). 566 567 If not provided, the default acquisition function is used (`LogExpectedImprovement` or `qLogExpectedImprovement` if N>1). 568 569 Example 570 ------- 571 ```python 572 acq_func = { 573 'acqf': UpperConfidenceBound, 574 'acqf_kwargs': {'beta': 0.1} # lower value = exploitation, higher value = exploration 575 } 576 ``` 577 """ 578 return self._acq_func 579 580 @acq_func.setter 581 def acq_func(self, value): 582 """ 583 Set the acquisition function with validation. 584 """ 585 self._acq_func = value 586 if self._first_initialization_done: 587 self.set_gs() 588 589 def __repr__(self): 590 return self.__str__() 591 592 def __str__(self): 593 """ 594 Return a string representation of the BOExperiment instance. 595 """ 596 return f""" 597BOExperiment( 598 N={self.N}, 599 maximize={self.maximize}, 600 outcome_constraints={self.outcome_constraints}, 601 feature_constraints={self.feature_constraints}, 602 optim={self.optim} 603) 604 605Input data: 606 607{self.data} 608 """ 609 610 def initialize_ax_client(self): 611 """ 612 Initialize the AxClient with the experiment's parameters, objectives, and constraints. 613 """ 614 print('\n======== INITIALIZING MODEL ========\n') 615 self.ax_client = AxClient(verbose_logging=False, 616 suppress_storage_errors=True) 617 self.parameters = [] 618 for name, info in self._features.items(): 619 if info['type'] == 'text': 620 self.parameters.append({ 621 "name": name, 622 "type": "choice", 623 "values": [str(val) for val in info['range']], 624 "value_type": "str"}) 625 elif info['type'] == 'int': 626 self.parameters.append({ 627 "name": name, 628 "type": "range", 629 "bounds": [int(np.min(info['range'])), 630 int(np.max(info['range']))], 631 "value_type": "int"}) 632 elif info['type'] == 'float': 633 self.parameters.append({ 634 "name": name, 635 "type": "range", 636 "bounds": [float(np.min(info['range'])), 637 float(np.max(info['range']))], 638 "value_type": "float"}) 639 640 self.ax_client.create_experiment( 641 name="bayesian_optimization", 642 parameters=self.parameters, 643 objectives={k: ObjectiveProperties(minimize=not v) 644 for k,v in self._maximize.items() 645 if isinstance(v, bool) and k in self._outcomes.keys()}, 646 parameter_constraints=self._feature_constraints, 647 outcome_constraints=self._outcome_constraints, 648 overwrite_existing_experiment=True 649 ) 650 651 if len(next(iter(self._outcomes.values()))['data']) > 0: 652 for i in range(len(next(iter(self._outcomes.values()))['data'])): 653 params = {name: info['data'][i] for name, info in self._features.items()} 654 outcomes = {name: info['data'][i] for name, info in self._outcomes.items()} 655 self.ax_client.attach_trial(params) 656 self.ax_client.complete_trial(trial_index=i, raw_data=outcomes) 657 658 self.set_model() 659 self.set_gs() 660 661 def set_model(self): 662 """ 663 Set the model to be used for predictions. 664 This method is called after initializing the AxClient. 665 """ 666 self.model = Models.BOTORCH_MODULAR( 667 experiment=self.ax_client.experiment, 668 data=self.ax_client.experiment.fetch_data() 669 ) 670 671 def set_gs(self): 672 """ 673 Set the generation strategy for the experiment. 674 This method is called after initializing the AxClient. 675 """ 676 self.clear_trials() 677 if self._optim == 'bo': 678 if not self.model: 679 self.set_model() 680 if self.acq_func is None: 681 self.gs = GenerationStrategy( 682 steps=[GenerationStep( 683 model=Models.BOTORCH_MODULAR, 684 num_trials=-1, # No limitation on how many trials should be produced from this step 685 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 686 ) 687 ] 688 ) 689 else: 690 self.gs = GenerationStrategy( 691 steps=[GenerationStep( 692 model=Models.BOTORCH_MODULAR, 693 num_trials=-1, # No limitation on how many trials should be produced from this step 694 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 695 model_configs={"botorch_model_class": self.acq_func['acqf']}, 696 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 697 model_gen_options={"acquisition_options": self.acq_func['acqf_kwargs']} 698 ) 699 ] 700 ) 701 elif self._optim == 'sobol': 702 self.gs = GenerationStrategy( 703 steps=[GenerationStep( 704 model=Models.SOBOL, 705 num_trials=-1, # How many trials should be produced from this generation step 706 should_deduplicate=True, # Deduplicate the trials 707 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 708 model_gen_kwargs={}, # Any kwargs you want passed to `modelbridge.gen` 709 ) 710 ] 711 ) 712 self.generator_run = self.gs.gen( 713 experiment=self.ax_client.experiment, # Ax `Experiment`, for which to generate new candidates 714 data=None, # Ax `Data` to use for model training, optional. 715 n=self._N, # Number of candidate arms to produce 716 fixed_features=self._fixed_features, 717 pending_observations=get_pending_observation_features( 718 self.ax_client.experiment 719 ), # Points that should not be re-generated 720 ) 721 722 def clear_trials(self): 723 """ 724 Clear all trials in the experiment. 725 """ 726 # Get all pending trial indices 727 pending_trials = [k for k,i in self.ax_client.experiment.trials.items() 728 if i.status==TrialStatus.CANDIDATE] 729 for i in pending_trials: 730 self.ax_client.experiment.trials[i].mark_abandoned() 731 732 def suggest_next_trials(self, with_predicted=True): 733 """ 734 Suggest the next set of trials based on the current model and optimization strategy. 735 736 Returns 737 ------- 738 739 pd.DataFrame: 740 DataFrame containing the suggested trials and their predicted outcomes. 741 """ 742 self.clear_trials() 743 if self.ax_client is None: 744 self.initialize_ax_client() 745 if self._N == 1: 746 self.candidate = self.ax_client.experiment.new_trial(self.generator_run) 747 else: 748 self.candidate = self.ax_client.experiment.new_batch_trial(self.generator_run) 749 trials = self.ax_client.get_trials_data_frame() 750 trials = trials[trials['trial_status'] == 'CANDIDATE'] 751 trials = trials[[name for name in self.names]] 752 if with_predicted: 753 topred = [trials.iloc[i].to_dict() for i in range(len(trials))] 754 preds = self.predict(topred)[0] 755 preds = pd.DataFrame(preds) 756 # add 'predicted_' to the names of the pred dataframe 757 preds.columns = [f'Predicted_{col}' for col in preds.columns] 758 preds = preds.reset_index(drop=True) 759 trials = trials.reset_index(drop=True) 760 return pd.concat([trials, preds], axis=1) 761 else: 762 return trials 763 764 def predict(self, params): 765 """ 766 Predict the outcomes for a given set of parameters using the current model. 767 768 Parameters 769 ---------- 770 771 params : List[Dict[str, Any]] 772 List of parameter dictionaries for which to predict outcomes. 773 774 Returns 775 ------- 776 777 List[Dict[str, float]]: 778 List of predicted outcomes for the given parameters. 779 """ 780 if self.ax_client is None: 781 self.initialize_ax_client() 782 obs_feats = [ObservationFeatures(parameters=p) for p in params] 783 f, cm = self.model.predict(obs_feats) 784 # return prediction and std errors as a list of dictionaries 785 # Convert to list of dictionaries 786 predictions = [] 787 for i in range(len(obs_feats)): 788 pred_dict = {} 789 for metric_name in f.keys(): 790 pred_dict[metric_name] = { 791 'mean': f[metric_name][i], 792 'std': np.sqrt(cm[metric_name][metric_name][i]) 793 } 794 predictions.append(pred_dict) 795 preds = [{k: v['mean'] for k, v in pred.items()} for pred in predictions] 796 stderrs = [{k: v['std'] for k, v in pred.items()} for pred in predictions] 797 return preds, stderrs 798 799 def update_experiment(self, params, outcomes): 800 """ 801 Update the experiment with new parameters and outcomes, and reinitialize the AxClient. 802 803 Parameters 804 ---------- 805 806 params : Dict[str, Any] 807 Dictionary of new parameters to update the experiment with. 808 809 outcomes : Dict[str, Any] 810 Dictionary of new outcomes to update the experiment with. 811 """ 812 # append new data to the features and outcomes dictionaries 813 for k, v in zip(params.keys(), params.values()): 814 if k not in self._features: 815 raise ValueError(f"Parameter '{k}' not found in features") 816 if isinstance(v, np.ndarray): 817 v = v.tolist() 818 if not isinstance(v, list): 819 v = [v] 820 self._features[k]['data'] += v 821 for k, v in zip(outcomes.keys(), outcomes.values()): 822 if k not in self._outcomes: 823 raise ValueError(f"Outcome '{k}' not found in outcomes") 824 if isinstance(v, np.ndarray): 825 v = v.tolist() 826 if not isinstance(v, list): 827 v = [v] 828 self._outcomes[k]['data'] += v 829 self.initialize_ax_client() 830 831 def plot_model(self, metricname=None, slice_values={}, linear=False): 832 """ 833 Plot the model's predictions for the experiment's parameters and outcomes. 834 Parameters 835 ---------- 836 metricname : Optional[str] 837 The name of the metric to plot. If None, the first outcome metric is used. 838 slice_values : Optional[Dict[str, Any]] 839 Dictionary of slice values for plotting. 840 linear : bool 841 Whether to plot a linear slice plot. Default is False. 842 Returns 843 ------- 844 plotly.graph_objects.Figure: 845 Plotly figure of the model's predictions. 846 """ 847 if self.ax_client is None: 848 self.initialize_ax_client() 849 self.suggest_next_trials() 850 cand_name = 'Candidate' if self._N == 1 else 'Candidates' 851 mname = self.ax_client.objective_names[0] if metricname is None else metricname 852 param_name = [name for name in self.names if name not in slice_values.keys()] 853 par_numeric = [name for name in param_name if self._features[name]['type'] in ['int', 'float']] 854 855 if len(par_numeric) == 1: 856 fig = plot_slice( 857 model=self.model, 858 metric_name=mname, 859 density=100, 860 param_name=par_numeric[0], 861 generator_runs_dict={cand_name: self.generator_run}, 862 slice_values=slice_values 863 ) 864 elif len(par_numeric) == 2: 865 fig = plot_contour( 866 model=self.model, 867 metric_name=mname, 868 param_x=par_numeric[0], 869 param_y=par_numeric[1], 870 generator_runs_dict={cand_name: self.generator_run}, 871 slice_values=slice_values 872 ) 873 else: 874 # remove sliced parameters from par_numeric 875 pars = [p for p in par_numeric if p not in slice_values.keys()] 876 fig = interact_contour( 877 model=self.model, 878 generator_runs_dict={cand_name: self.generator_run}, 879 metric_name=mname, 880 slice_values=slice_values, 881 parameters_to_use=pars 882 ) 883 884 plotly_fig = go.Figure(fig.data) 885 all_trials = self.ax_client.get_trials_data_frame() 886 completed_trials = all_trials[all_trials['trial_status'] != 'CANDIDATE'] 887 # compute distance to slice 888 col_to_consider = completed_trials[[k for k in slice_values.keys()]] 889 completed_trials['signed_dist_to_slice'] = ( 890 (col_to_consider - slice_values).sum(axis=1) # Sum of signed differences 891 ) 892 signed_dists = completed_trials['signed_dist_to_slice'].values 893 positive_dists = signed_dists[signed_dists >= 0] 894 negative_dists = signed_dists[signed_dists < 0] 895 896 # Normalize positive distances to [0, 1] 897 if len(positive_dists) > 0 and np.max(positive_dists) > 0: 898 normalized_positive = positive_dists / np.max(positive_dists) 899 else: 900 normalized_positive = np.zeros_like(positive_dists) 901 902 # Normalize negative distances to [-1, 0] 903 if len(negative_dists) > 0 and np.min(negative_dists) < 0: 904 normalized_negative = negative_dists / np.abs(np.min(negative_dists)) 905 else: 906 normalized_negative = np.zeros_like(negative_dists) 907 908 # Combine the normalized distances 909 normalized_signed_dists = np.zeros_like(signed_dists) 910 normalized_signed_dists[signed_dists >= 0] = normalized_positive 911 normalized_signed_dists[signed_dists < 0] = normalized_negative 912 913 completed_trials['normalized_signed_dist'] = normalized_signed_dists 914 coolwarm = cm.get_cmap('bwr') 915 normalized_values = (completed_trials['normalized_signed_dist'] + 1) / 2 # Map from [-1,1] to [0,1] 916 colors = [ 917 f"rgb({int(r*255)}, {int(g*255)}, {int(b*255)})" 918 for r, g, b, _ in coolwarm(normalized_values) 919 ] 920 completed_trials['colors'] = colors 921 trials = self.ax_client.get_trials_data_frame() 922 trials = trials[trials['trial_status'] == 'CANDIDATE'] 923 trials = trials[[name for name in self.names]] 924 925 in_sample_trace_idx = 0 926 for trace in plotly_fig.data: 927 if trace.type == "contour": 928 trace.colorscale = "viridis" 929 if 'marker' in trace and trace.legendgroup != cand_name: 930 arm_names = [] 931 if trace['text']: 932 for text in trace['text']: 933 print(text) 934 match = re.search(r'Arm (\d+_\d+)', text) 935 if match: 936 arm_names.append(match.group(1)) 937 arm_to_color = dict(zip(completed_trials['arm_name'], completed_trials['colors'])) 938 trace.marker.color = [arm_to_color[arm] for arm in arm_names] 939 trace.marker.symbol = "circle" 940 trace.marker.size = 10 941 trace.marker.line.width = 2 942 trace.marker.line.color = 'black' 943 # if len(opacities) > 0: 944 # trace.marker.opacity = opacities 945 if trace.text is not None: 946 trace.text = [t.replace('Arm', '<b>Sample').replace("_0","</b>") for t in trace.text] 947 if trace.legendgroup == cand_name: 948 trace.marker.line.color = 'red' 949 trace.marker.color = "orange" 950 trace.name = cand_name 951 trace.marker.symbol = "x" 952 trace.marker.size = 12 953 trace.marker.opacity = 1 954 trace.hoverinfo = "text" 955 trace.hoverlabel = dict(bgcolor="#f8e3cd", font_color='black') 956 if trace.text is not None: 957 trace.text = [t.replace("<i>","").replace("</i>","") for t in trace.text] 958 trace.text = [ 959 f"<b>Candidate {i+1}</b><br>{'<br>'.join([f'{col}: {val}' for col, val in trials.iloc[i].items()])}" 960 for t in trace.text 961 for i in range(len(trials)) 962 ] 963 964 plotly_fig.update_layout( 965 plot_bgcolor="white", 966 legend=dict(bgcolor='rgba(0,0,0,0)'), 967 margin=dict(l=10, r=10, t=50, b=50), 968 xaxis=dict( 969 showgrid=True, 970 gridcolor="lightgray", 971 zeroline=False, 972 zerolinecolor="black", 973 showline=True, 974 linewidth=1, 975 linecolor="black", 976 mirror=True 977 ), 978 yaxis=dict( 979 showgrid=True, 980 gridcolor="lightgray", 981 zeroline=False, 982 zerolinecolor="black", 983 showline=True, 984 linewidth=1, 985 linecolor="black", 986 mirror=True 987 ), 988 xaxis2=dict( 989 showgrid=True, 990 gridcolor="lightgray", 991 zeroline=False, 992 zerolinecolor="black", 993 showline=True, 994 linewidth=1, 995 linecolor="black", 996 mirror=True 997 ), 998 yaxis2=dict( 999 showgrid=True, 1000 gridcolor="lightgray", 1001 zeroline=False, 1002 zerolinecolor="black", 1003 showline=True, 1004 linewidth=1, 1005 linecolor="black", 1006 mirror=True 1007 ), 1008 ) 1009 return plotly_fig 1010 1011 1012 def plot_optimization_trace(self, optimum=None): 1013 """ 1014 Plot the optimization trace, showing the progress of the optimization over trials. 1015 1016 Parameters 1017 ---------- 1018 1019 optimum : Optional[float] 1020 The optimal value to plot on the optimization trace. 1021 1022 Returns 1023 ------- 1024 1025 plotly.graph_objects.Figure: 1026 Plotly figure of the optimization trace. 1027 """ 1028 if self.ax_client is None: 1029 self.initialize_ax_client() 1030 if len(self._outcomes) > 1: 1031 print("Optimization trace is not available for multi-objective optimization.") 1032 return None 1033 fig = self.ax_client.get_optimization_trace(objective_optimum=optimum) 1034 fig = go.Figure(fig.data) 1035 for trace in fig.data: 1036 # add hover info 1037 trace.hoverinfo = "x+y" 1038 fig.update_layout( 1039 plot_bgcolor="white", # White background 1040 legend=dict(bgcolor='rgba(0,0,0,0)'), 1041 margin=dict(l=50, r=10, t=50, b=50), 1042 xaxis=dict( 1043 showgrid=True, # Enable grid 1044 gridcolor="lightgray", # Light gray grid lines 1045 zeroline=False, 1046 zerolinecolor="black", # Black zero line 1047 showline=True, 1048 linewidth=1, 1049 linecolor="black", # Black border 1050 mirror=True 1051 ), 1052 yaxis=dict( 1053 showgrid=True, # Enable grid 1054 gridcolor="lightgray", # Light gray grid lines 1055 zeroline=False, 1056 zerolinecolor="black", # Black zero line 1057 showline=True, 1058 linewidth=1, 1059 linecolor="black", # Black border 1060 mirror=True 1061 ), 1062 ) 1063 return fig 1064 1065 def compute_pareto_frontier(self): 1066 """ 1067 Compute the Pareto frontier for multi-objective optimization experiments. 1068 1069 Returns 1070 ------- 1071 The Pareto frontier. 1072 """ 1073 if self.ax_client is None: 1074 self.initialize_ax_client() 1075 if len(self._outcomes) < 2: 1076 print("Pareto frontier is not available for single-objective optimization.") 1077 return None 1078 1079 objectives = self.ax_client.experiment.optimization_config.objective.objectives 1080 self.pareto_frontier = compute_posterior_pareto_frontier( 1081 experiment=self.ax_client.experiment, 1082 data=self.ax_client.experiment.fetch_data(), 1083 primary_objective=objectives[1].metric, 1084 secondary_objective=objectives[0].metric, 1085 absolute_metrics=[o.metric_names[0] for o in objectives], 1086 num_points=20, 1087 ) 1088 return self.pareto_frontier 1089 1090 def plot_pareto_frontier(self, show_error_bars=True): 1091 """ 1092 Plot the Pareto frontier for multi-objective optimization experiments. 1093 1094 Parameters 1095 ---------- 1096 show_error_bars : bool, optional 1097 Whether to show error bars on the plot. Default is True. 1098 1099 Returns 1100 ------- 1101 plotly.graph_objects.Figure: 1102 Plotly figure of the Pareto frontier. 1103 """ 1104 if self.pareto_frontier is None: 1105 return None 1106 1107 fig = plot_pareto_frontier(self.pareto_frontier) 1108 fig = go.Figure(fig.data) 1109 1110 # Modify traces to show/hide error bars 1111 if not show_error_bars: 1112 for trace in fig.data: 1113 # Remove error bars by setting them to None 1114 if hasattr(trace, 'error_x') and trace.error_x is not None: 1115 trace.error_x = None 1116 if hasattr(trace, 'error_y') and trace.error_y is not None: 1117 trace.error_y = None 1118 1119 fig.update_layout( 1120 plot_bgcolor="white", # White background 1121 legend=dict(bgcolor='rgba(0,0,0,0)'), 1122 margin=dict(l=50, r=10, t=50, b=50), 1123 xaxis=dict( 1124 showgrid=True, # Enable grid 1125 gridcolor="lightgray", # Light gray grid lines 1126 zeroline=False, 1127 zerolinecolor="black", # Black zero line 1128 showline=True, 1129 linewidth=1, 1130 linecolor="black", # Black border 1131 mirror=True 1132 ), 1133 yaxis=dict( 1134 showgrid=True, # Enable grid 1135 gridcolor="lightgray", # Light gray grid lines 1136 zeroline=False, 1137 zerolinecolor="black", # Black zero line 1138 showline=True, 1139 linewidth=1, 1140 linecolor="black", # Black border 1141 mirror=True 1142 ), 1143 ) 1144 return fig 1145 1146 def get_best_parameters(self): 1147 """ 1148 Return the best parameters found by the optimization process. 1149 1150 Returns 1151 ------- 1152 1153 pd.DataFrame: 1154 DataFrame containing the best parameters and their outcomes. 1155 """ 1156 if self.ax_client is None: 1157 self.initialize_ax_client() 1158 if self.Nmetrics == 1: 1159 best_parameters = self.ax_client.get_best_parameters()[0] 1160 best_outcomes = self.ax_client.get_best_parameters()[1] 1161 best_parameters.update(best_outcomes[0]) 1162 best = pd.DataFrame(best_parameters, index=[0]) 1163 else: 1164 best_parameters = self.ax_client.get_pareto_optimal_parameters() 1165 best = ordered_dict_to_dataframe(best_parameters) 1166 return best 1167 1168# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 1169 1170def flatten_dict(d, parent_key="", sep="_"): 1171 """ 1172 Flatten a nested dictionary. 1173 """ 1174 items = [] 1175 for k, v in d.items(): 1176 new_key = f"{parent_key}{sep}{k}" if parent_key else k 1177 if isinstance(v, dict): 1178 items.extend(flatten_dict(v, new_key, sep=sep).items()) 1179 else: 1180 items.append((new_key, v)) 1181 return dict(items) 1182 1183# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 1184 1185def ordered_dict_to_dataframe(data): 1186 """ 1187 Convert an OrderedDict with arbitrary nesting to a DataFrame. 1188 """ 1189 dflat = flatten_dict(data) 1190 out = [] 1191 1192 for key, value in dflat.items(): 1193 main_dict = value[0] 1194 sub_dict = value[1][0] 1195 out.append([value for value in main_dict.values()] + 1196 [value for value in sub_dict.values()]) 1197 1198 df = pd.DataFrame(out, columns=[key for key in main_dict.keys()] + 1199 [key for key in sub_dict.keys()]) 1200 return df 1201 1202# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 1203 1204def read_experimental_data(file_path: str, out_pos=[-1]) -> Tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, Any]]]: 1205 """ 1206 Read experimental data from a CSV file and format it into features and outcomes dictionaries. 1207 1208 Parameters 1209 ---------- 1210 file_path (str) 1211 Path to the CSV file containing experimental data. 1212 out_pos (list of int) 1213 Column indices of the outcome variables. Default is the last column. 1214 1215 Returns 1216 ------- 1217 Tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, Any]]] 1218 Formatted features and outcomes dictionaries. 1219 """ 1220 data = pd.read_csv(file_path) 1221 data = clean_names(data, remove_special=True, case_type='preserve') 1222 outcome_column_name = data.columns[out_pos] 1223 features = data.loc[:, ~data.columns.isin(outcome_column_name)].copy() 1224 outcomes = data[outcome_column_name].copy() 1225 1226 feature_definitions = {} 1227 for column in features.columns: 1228 if features[column].dtype == 'object': 1229 unique_values = features[column].unique() 1230 feature_definitions[column] = {'type': 'text', 1231 'range': unique_values.tolist()} 1232 elif features[column].dtype in ['int64', 'float64']: 1233 min_val = features[column].min() 1234 max_val = features[column].max() 1235 feature_type = 'int' if features[column].dtype == 'int64' else 'float' 1236 feature_definitions[column] = {'type': feature_type, 1237 'range': [min_val, max_val]} 1238 1239 formatted_features = {name: {'type': info['type'], 1240 'data': features[name].tolist(), 1241 'range': info['range']} 1242 for name, info in feature_definitions.items()} 1243 # same for outcomes with just type and data 1244 outcome_definitions = {} 1245 for column in outcomes.columns: 1246 if outcomes[column].dtype == 'object': 1247 unique_values = outcomes[column].unique() 1248 outcome_definitions[column] = {'type': 'text', 1249 'data': unique_values.tolist()} 1250 elif outcomes[column].dtype in ['int64', 'float64']: 1251 min_val = outcomes[column].min() 1252 max_val = outcomes[column].max() 1253 outcome_type = 'int' if outcomes[column].dtype == 'int64' else 'float' 1254 outcome_definitions[column] = {'type': outcome_type, 1255 'data': outcomes[column].tolist()} 1256 formatted_outcomes = {name: {'type': info['type'], 1257 'data': outcomes[name].tolist()} 1258 for name, info in outcome_definitions.items()} 1259 return formatted_features, formatted_outcomes
47class BOExperiment: 48 """ 49 BOExperiment is a class designed to facilitate Bayesian Optimization experiments using the [Ax platform](https://ax.dev/). 50 It encapsulates the experiment setup, including features, outcomes, constraints, and optimization methods. 51 52 Parameters 53 ---------- 54 features: Dict[str, Dict[str, Any]] 55 A dictionary defining the features of the experiment, including their types and ranges. 56 Each feature is represented as a dictionary with keys 'type', 'data', and 'range'. 57 - 'type': The type of the feature (e.g., 'int', 'float', 'text'). 58 - 'data': The observed data for the feature. 59 - 'range': The range of values for the feature. 60 outcomes: Dict[str, Dict[str, Any]] 61 A dictionary defining the outcomes of the experiment, including their types and observed data. 62 Each outcome is represented as a dictionary with keys 'type' and 'data'. 63 - 'type': The type of the outcome (e.g., 'int', 'float'). 64 - 'data': The observed data for the outcome. 65 ranges: Optional[Dict[str, Dict[str, Any]]] 66 A dictionary defining the ranges of the features. Default is `None`. 67 If not provided, the ranges will be inferred from the features data. 68 The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. 69 N: int 70 The number of trials to suggest in each optimization step. Must be a positive integer. 71 maximize: Union[bool, Dict[str, bool]] 72 A boolean or dict indicating whether to maximize the outcomes in the form `{'outcome1':True, 'outcome2':False}`. 73 If a single boolean is provided, it is applied to all outcomes. Default is `True`. 74 fixed_features: Optional[Dict[str, Any]] 75 A dictionary defining fixed features with their values. Default is `None`. 76 If provided, the fixed features will be treated as fixed parameters in the generation process. 77 The fixed features should be in the format `{'feature_name': value}`. 78 The values should be the fixed values for the respective features. 79 outcome_constraints: Optional[List[str]] 80 Constraints on the outcomes, specified as a list of strings. Default is `None`. 81 The constraints should be in the format `{'outcome_name': [minvalue,maxvalue]}`. 82 feature_constraints: Optional[List[str]] 83 Constraints on the features, specified as a list of strings. Default is `None`. 84 The constraints should be in the format `{'feature_name': [minvalue,maxvalue]}`. 85 optim: str 86 The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. Default is 'bo'. 87 acq_func: Optional[Dict[str, Any]] 88 The acquisition function to use for the optimization process. It must be a dict with 2 keys: 89 - `acqf`: the acquisition function class to use (e.g., `UpperConfidenceBound`), 90 - `acqf_kwargs`: a dict of the kwargs to pass to the acquisition function class. (e.g. `{'beta': 0.1}`). 91 92 If not provided, the default acquisition function is used (`LogExpectedImprovement` or `qLogExpectedImprovement` if N>1). 93 94 Attributes 95 ---------- 96 97 features: Dict[str, Dict[str, Any]] 98 A dictionary defining the features of the experiment, including their types and ranges. 99 outcomes: Dict[str, Dict[str, Any]] 100 A dictionary defining the outcomes of the experiment, including their types and observed data. 101 N: int 102 The number of trials to suggest in each optimization step. Must be a positive integer. 103 maximize: Union[bool, List[bool]] 104 A boolean or list of booleans indicating whether to maximize the outcomes. 105 If a single boolean is provided, it is applied to all outcomes. 106 outcome_constraints: Optional[Dict[str, Dict[str, float]]] 107 Constraints on the outcomes, specified as a dictionary or list of dictionaries. 108 feature_constraints: Optional[List[Dict[str, Any]]] 109 Constraints on the features, specified as a list of dictionaries. 110 optim: str 111 The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. 112 data: pd.DataFrame 113 A DataFrame representing the current data in the experiment, including features and outcomes. 114 acq_func: dict 115 The acquisition function to use for the optimization process. 116 generator_run: 117 The generator run for the experiment, used to generate new candidates. 118 model: 119 The model used for predictions in the experiment. 120 ax_client: 121 The AxClient for the experiment, used to manage trials and data. 122 gs: 123 The generation strategy for the experiment, used to generate new candidates. 124 parameters: 125 The parameters for the experiment, including their types and ranges. 126 names: 127 The names of the features in the experiment. 128 fixed_features: 129 The fixed features for the experiment, used to generate new candidates. 130 candidate: 131 The candidate(s) suggested by the optimization process. 132 133 134 Methods 135 ------- 136 137 - <b>initialize_ax_client()</b>: 138 Initializes the AxClient with the experiment's parameters, objectives, and constraints. 139 - <b>suggest_next_trials()</b>: 140 Suggests the next set of trials based on the current model and optimization strategy. 141 Returns a DataFrame containing the suggested trials and their predicted outcomes. 142 - <b>predict(params: List[Dict[str, Any]]) -> List[Dict[str, float]]</b>: 143 Predicts the outcomes for a given set of parameters using the current model. 144 Returns a list of predicted outcomes for the given parameters. 145 - <b>update_experiment(params: Dict[str, Any], outcomes: Dict[str, Any])</b>: 146 Updates the experiment with new parameters and outcomes, and reinitializes the AxClient. 147 - <b>plot_model(metricname: Optional[str] = None, slice_values: Optional[Dict[str, Any]] None, linear: bool = False)`</b>: 148 Plots the model's predictions for the experiment's parameters and outcomes. 149 If metricname is None, the first outcome metric is used. 150 If slice_values is provided, it slices the plot at those values. 151 If linear is True, it plots a linear slice plot. 152 If the experiment has only one feature, it plots a slice plot. 153 If the experiment has multiple features, it plots a contour plot. 154 Returns a Plotly figure of the model's predictions. 155 - <b>plot_optimization_trace(optimum: Optional[float] = None)</b>: 156 Plots the optimization trace, showing the progress of the optimization over trials. 157 If the experiment has multiple outcomes, it raises a warning and returns None. 158 Returns a Plotly figure of the optimization trace. 159 - <b>plot_pareto_frontier()</b>: 160 Plots the Pareto frontier for multi-objective optimization experiments. 161 If the experiment has only one outcome, it raises a warning and returns None. 162 Returns a Plotly figure of the Pareto frontier. 163 - <b>get_best_parameters() -> pd.DataFrame</b>: 164 Returns the best parameters found by the optimization process. 165 If the experiment has multiple outcomes, it returns a DataFrame of the Pareto optimal parameters. 166 If the experiment has only one outcome, it returns a DataFrame of the best parameters and their outcomes. 167 The DataFrame contains the best parameters and their corresponding outcomes. 168 - <b>clear_trials()</b>: 169 Clears all trials in the experiment. 170 This is useful for resetting the experiment before suggesting new trials. 171 - <b>set_model()</b>: 172 Sets the model to be used for predictions. 173 This method is called after initializing the AxClient. 174 - <b>set_gs()</b>: 175 Sets the generation strategy for the experiment. 176 This method is called after initializing the AxClient. 177 178 179 Example 180 ------- 181 ```python 182 features, outcomes = read_experimental_data('data.csv', out_pos=[-2, -1]) 183 experiment = BOExperiment(features, 184 outcomes, 185 N=5, 186 maximize={'out1':True, 'out2':False} 187 ) 188 experiment.suggest_next_trials() 189 experiment.plot_model(metricname='outcome1') 190 experiment.plot_model(metricname='outcome2', linear=True) 191 experiment.plot_model(metricname='outcome1', slice_values={'feature1': 5}) 192 experiment.plot_optimization_trace() 193 experiment.plot_pareto_frontier() 194 experiment.get_best_parameters() 195 experiment.update_experiment({'feature1': [4]}, {'outcome1': [0.4]}) 196 experiment.plot_model() 197 experiment.plot_optimization_trace() 198 experiment.plot_pareto_frontier() 199 experiment.get_best_parameters() 200 ``` 201 """ 202 203 def __init__(self, 204 features: Dict[str, Dict[str, Any]], 205 outcomes: Dict[str, Dict[str, Any]], 206 ranges: Optional[Dict[str, Dict[str, Any]]] = None, 207 N=1, 208 maximize: Union[bool, Dict[str, bool]] = True, 209 fixed_features: Optional[Dict[str, Any]] = None, 210 outcome_constraints: Optional[List[str]] = None, 211 feature_constraints: Optional[List[str]] = None, 212 optim='bo', 213 acq_func=None, 214 seed=42) -> None: 215 self._first_initialization_done = False 216 self.ranges = ranges 217 self.features = features 218 self.names = list(self._features.keys()) 219 self.fixed_features = fixed_features 220 self.outcomes = outcomes 221 self.N = N 222 self.maximize = maximize 223 self.outcome_constraints = outcome_constraints 224 self.feature_constraints = feature_constraints 225 self.optim = optim 226 self.acq_func = acq_func 227 self.seed = seed 228 self.candidate = None 229 """The candidate(s) suggested by the optimization process.""" 230 self.ax_client = None 231 """Ax's client for the experiment.""" 232 self.model = None 233 """Ax's Gaussian Process model.""" 234 self.parameters = None 235 """Ax's parameters for the experiment.""" 236 self.generator_run = None 237 """Ax's generator run for the experiment.""" 238 self.gs = None 239 """Ax's generation strategy for the experiment.""" 240 self.initialize_ax_client() 241 self.Nmetrics = len(self.ax_client.objective_names) 242 """The number of metrics in the experiment.""" 243 self._first_initialization_done = True 244 """To indicate that the first initialization is done so that we don't call `initialize_ax_client()` again.""" 245 self.pareto_frontier = None 246 """The Pareto frontier for multi-objective optimization experiments.""" 247 248 @property 249 def seed(self) -> int: 250 """Random seed for reproducibility. Default is 42.""" 251 return self._seed 252 253 @seed.setter 254 def seed(self, value: int): 255 """Set the random seed.""" 256 if isinstance(value, int): 257 self._seed = value 258 else: 259 raise Warning("Seed must be an integer. Using default seed 42.") 260 self._seed = 42 261 random.seed(self.seed) 262 np.random.seed(self.seed) 263 264 @property 265 def features(self): 266 """ 267 A dictionary defining the features of the experiment, including their types and ranges. 268 269 Example 270 ------- 271 ```python 272 features = { 273 'feature1': {'type': 'int', 274 'data': [1, 2, 3], 275 'range': [1, 3]}, 276 'feature2': {'type': 'float', 277 'data': [0.1, 0.2, 0.3], 278 'range': [0.1, 0.3]}, 279 'feature3': {'type': 'text', 280 'data': ['A', 'B', 'C'], 281 'range': ['A', 'B', 'C']} 282 } 283 ``` 284 """ 285 return self._features 286 287 @features.setter 288 def features(self, value): 289 """ 290 Set the features of the experiment with validation. 291 """ 292 if not isinstance(value, dict): 293 raise ValueError("features must be a dictionary") 294 self._features = value 295 for name in self._features.keys(): 296 if self.ranges and name in self.ranges.keys(): 297 self._features[name]['range'] = self.ranges[name] 298 else: 299 if self._features[name]['type'] == 'text': 300 self._features[name]['range'] = list(set(self._features[name]['data'])) 301 elif self._features[name]['type'] == 'int': 302 self._features[name]['range'] = [int(np.min(self._features[name]['data'])), 303 int(np.max(self._features[name]['data']))] 304 elif self._features[name]['type'] == 'float': 305 self._features[name]['range'] = [float(np.min(self._features[name]['data'])), 306 float(np.max(self._features[name]['data']))] 307 if self._first_initialization_done: 308 self.initialize_ax_client() 309 310 @property 311 def ranges(self): 312 """ 313 A dictionary defining the ranges of the features. Default is `None`. 314 315 If not provided, the ranges will be inferred from the features data. 316 The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. 317 """ 318 return self._ranges 319 320 @ranges.setter 321 def ranges(self, value): 322 """ 323 Set the ranges of the features with validation. 324 """ 325 if value is not None: 326 if not isinstance(value, dict): 327 raise ValueError("ranges must be a dictionary") 328 self._ranges = value 329 330 @property 331 def names(self): 332 """ 333 The names of the features. 334 """ 335 return self._names 336 337 @names.setter 338 def names(self, value): 339 """ 340 Set the names of the features. 341 """ 342 if not isinstance(value, list): 343 raise ValueError("names must be a list") 344 self._names = value 345 346 @property 347 def outcomes(self): 348 """ 349 A dictionary defining the outcomes of the experiment, including their types and observed data. 350 351 Example 352 ------- 353 ```python 354 outcomes = { 355 'outcome1': {'type': 'float', 356 'data': [0.1, 0.2, 0.3]}, 357 'outcome2': {'type': 'float', 358 'data': [1.0, 2.0, 3.0]} 359 } 360 ``` 361 """ 362 return self._outcomes 363 364 @outcomes.setter 365 def outcomes(self, value): 366 """ 367 Set the outcomes of the experiment with validation. 368 """ 369 if not isinstance(value, dict): 370 raise ValueError("outcomes must be a dictionary") 371 self._outcomes = value 372 self.out_names = list(value.keys()) 373 if self._first_initialization_done: 374 self.initialize_ax_client() 375 376 @property 377 def fixed_features(self): 378 """ 379 A dictionary defining fixed features with their values. Default is `None`. 380 If provided, the fixed features will be treated as fixed parameters in the generation process. 381 The fixed features should be in the format `{'feature_name': value}`. 382 The values should be the fixed values for the respective features. 383 """ 384 return self._fixed_features 385 386 @fixed_features.setter 387 def fixed_features(self, value): 388 """ 389 Set the fixed features of the experiment. 390 """ 391 self._fixed_features = None 392 if value is not None: 393 if not isinstance(value, dict): 394 raise ValueError("fixed_features must be a dictionary") 395 for name in value.keys(): 396 if name not in self.names: 397 raise ValueError(f"Fixed feature '{name}' not found in features") 398 # fixed_features should be an ObservationFeatures object 399 self._fixed_features = ObservationFeatures(parameters=value) 400 if self._first_initialization_done: 401 self.set_gs() 402 403 @property 404 def N(self): 405 """ 406 The number of trials to suggest in each optimization step. Must be a positive integer. Default is `1`. 407 """ 408 return self._N 409 410 @N.setter 411 def N(self, value): 412 """ 413 Set the number of trials to suggest in each optimization step with validation. 414 """ 415 if not isinstance(value, int) or value <= 0: 416 raise ValueError("N must be a positive integer") 417 self._N = value 418 if self._first_initialization_done: 419 self.set_gs() 420 421 @property 422 def maximize(self): 423 """ 424 A boolean or dict indicating whether to maximize the outcomes in the form `{'outcome1':True, 'outcome2':False}`. 425 If a single boolean is provided, it is applied to all outcomes. Default is `True`. 426 """ 427 return self._maximize 428 429 @maximize.setter 430 def maximize(self, value): 431 """ 432 Set the maximization setting for the outcomes with validation. 433 """ 434 if isinstance(value, bool): 435 self._maximize = {out: value for out in self.out_names} 436 elif isinstance(value, dict) and len(value) == len(self._outcomes): 437 self._maximize = {k:v for k,v in value.items() if 438 (k in self.out_names and isinstance(v, bool))} 439 else: 440 raise ValueError("maximize must be a boolean or a list of booleans with the same length as outcomes") 441 if self._first_initialization_done: 442 self.initialize_ax_client() 443 444 @property 445 def outcome_constraints(self): 446 """ 447 Constraints on the outcomes, specified as a list of strings. Default is `None`. 448 """ 449 return self._outcome_constraints 450 451 @outcome_constraints.setter 452 def outcome_constraints(self, value): 453 """ 454 Set the outcome constraints of the experiment with validation. 455 """ 456 if isinstance(value, str): 457 self._outcome_constraints = [value] 458 elif isinstance(value, list): 459 self._outcome_constraints = value 460 else: 461 self._outcome_constraints = None 462 if self._first_initialization_done: 463 self.initialize_ax_client() 464 465 @property 466 def feature_constraints(self): 467 """ 468 Constraints on the features, specified as a list of strings. Default is `None`. 469 470 Example 471 ------- 472 ```python 473 feature_constraints = [ 474 'feature1 <= 10.0', 475 'feature1 + 2*feature2 >= 3.0' 476 ] 477 ``` 478 """ 479 return self._feature_constraints 480 481 @feature_constraints.setter 482 def feature_constraints(self, value): 483 """ 484 Set the feature constraints of the experiment with validation. 485 """ 486 if isinstance(value, dict): 487 self._feature_constraints = [value] 488 elif isinstance(value, list): 489 self._feature_constraints = value 490 elif isinstance(value, str): 491 self._feature_constraints = [value] 492 else: 493 self._feature_constraints = None 494 if self._first_initialization_done: 495 self.initialize_ax_client() 496 497 @property 498 def optim(self): 499 """ 500 The optimization method to use, either `'bo'` for Bayesian Optimization or `'sobol'` for Sobol sequence. Default is `'bo'`. 501 """ 502 return self._optim 503 504 @optim.setter 505 def optim(self, value): 506 """ 507 Set the optimization method with validation. 508 """ 509 value = value.lower() 510 if value not in ['bo', 'sobol']: 511 raise ValueError("Optimization method must be either 'bo' or 'sobol'") 512 self._optim = value 513 if self._first_initialization_done: 514 self.set_gs() 515 516 @property 517 def data(self) -> pd.DataFrame: 518 """ 519 Returns a DataFrame of the current data in the experiment, including features and outcomes. 520 """ 521 feature_data = {name: info['data'] for name, info in self._features.items()} 522 outcome_data = {name: info['data'] for name, info in self._outcomes.items()} 523 data_dict = {**feature_data, **outcome_data} 524 return pd.DataFrame(data_dict) 525 526 @data.setter 527 def data(self, value: pd.DataFrame): 528 """ 529 Sets the features and outcomes data from a given DataFrame. 530 """ 531 if not isinstance(value, pd.DataFrame): 532 raise ValueError("Data must be a pandas DataFrame") 533 534 feature_columns = [col for col in value.columns if col in self._features] 535 outcome_columns = [col for col in value.columns if col in self._outcomes] 536 537 for col in feature_columns: 538 self._features[col]['data'] = value[col].tolist() 539 540 for col in outcome_columns: 541 self._outcomes[col]['data'] = value[col].tolist() 542 543 if self._first_initialization_done: 544 self.initialize_ax_client() 545 546 @property 547 def pareto_frontier(self): 548 """ 549 The Pareto frontier for multi-objective optimization experiments. 550 """ 551 return self._pareto_frontier 552 553 @pareto_frontier.setter 554 def pareto_frontier(self, value): 555 """ 556 Set the Pareto frontier of the experiment. 557 """ 558 self._pareto_frontier = value 559 560 561 @property 562 def acq_func(self): 563 """ 564 The acquisition function to use for the optimization process. It must be a dict with 2 keys: 565 - `acqf`: the acquisition function class to use (e.g., `UpperConfidenceBound`), 566 - `acqf_kwargs`: a dict of the kwargs to pass to the acquisition function class. (e.g. `{'beta': 0.1}`). 567 568 If not provided, the default acquisition function is used (`LogExpectedImprovement` or `qLogExpectedImprovement` if N>1). 569 570 Example 571 ------- 572 ```python 573 acq_func = { 574 'acqf': UpperConfidenceBound, 575 'acqf_kwargs': {'beta': 0.1} # lower value = exploitation, higher value = exploration 576 } 577 ``` 578 """ 579 return self._acq_func 580 581 @acq_func.setter 582 def acq_func(self, value): 583 """ 584 Set the acquisition function with validation. 585 """ 586 self._acq_func = value 587 if self._first_initialization_done: 588 self.set_gs() 589 590 def __repr__(self): 591 return self.__str__() 592 593 def __str__(self): 594 """ 595 Return a string representation of the BOExperiment instance. 596 """ 597 return f""" 598BOExperiment( 599 N={self.N}, 600 maximize={self.maximize}, 601 outcome_constraints={self.outcome_constraints}, 602 feature_constraints={self.feature_constraints}, 603 optim={self.optim} 604) 605 606Input data: 607 608{self.data} 609 """ 610 611 def initialize_ax_client(self): 612 """ 613 Initialize the AxClient with the experiment's parameters, objectives, and constraints. 614 """ 615 print('\n======== INITIALIZING MODEL ========\n') 616 self.ax_client = AxClient(verbose_logging=False, 617 suppress_storage_errors=True) 618 self.parameters = [] 619 for name, info in self._features.items(): 620 if info['type'] == 'text': 621 self.parameters.append({ 622 "name": name, 623 "type": "choice", 624 "values": [str(val) for val in info['range']], 625 "value_type": "str"}) 626 elif info['type'] == 'int': 627 self.parameters.append({ 628 "name": name, 629 "type": "range", 630 "bounds": [int(np.min(info['range'])), 631 int(np.max(info['range']))], 632 "value_type": "int"}) 633 elif info['type'] == 'float': 634 self.parameters.append({ 635 "name": name, 636 "type": "range", 637 "bounds": [float(np.min(info['range'])), 638 float(np.max(info['range']))], 639 "value_type": "float"}) 640 641 self.ax_client.create_experiment( 642 name="bayesian_optimization", 643 parameters=self.parameters, 644 objectives={k: ObjectiveProperties(minimize=not v) 645 for k,v in self._maximize.items() 646 if isinstance(v, bool) and k in self._outcomes.keys()}, 647 parameter_constraints=self._feature_constraints, 648 outcome_constraints=self._outcome_constraints, 649 overwrite_existing_experiment=True 650 ) 651 652 if len(next(iter(self._outcomes.values()))['data']) > 0: 653 for i in range(len(next(iter(self._outcomes.values()))['data'])): 654 params = {name: info['data'][i] for name, info in self._features.items()} 655 outcomes = {name: info['data'][i] for name, info in self._outcomes.items()} 656 self.ax_client.attach_trial(params) 657 self.ax_client.complete_trial(trial_index=i, raw_data=outcomes) 658 659 self.set_model() 660 self.set_gs() 661 662 def set_model(self): 663 """ 664 Set the model to be used for predictions. 665 This method is called after initializing the AxClient. 666 """ 667 self.model = Models.BOTORCH_MODULAR( 668 experiment=self.ax_client.experiment, 669 data=self.ax_client.experiment.fetch_data() 670 ) 671 672 def set_gs(self): 673 """ 674 Set the generation strategy for the experiment. 675 This method is called after initializing the AxClient. 676 """ 677 self.clear_trials() 678 if self._optim == 'bo': 679 if not self.model: 680 self.set_model() 681 if self.acq_func is None: 682 self.gs = GenerationStrategy( 683 steps=[GenerationStep( 684 model=Models.BOTORCH_MODULAR, 685 num_trials=-1, # No limitation on how many trials should be produced from this step 686 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 687 ) 688 ] 689 ) 690 else: 691 self.gs = GenerationStrategy( 692 steps=[GenerationStep( 693 model=Models.BOTORCH_MODULAR, 694 num_trials=-1, # No limitation on how many trials should be produced from this step 695 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 696 model_configs={"botorch_model_class": self.acq_func['acqf']}, 697 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 698 model_gen_options={"acquisition_options": self.acq_func['acqf_kwargs']} 699 ) 700 ] 701 ) 702 elif self._optim == 'sobol': 703 self.gs = GenerationStrategy( 704 steps=[GenerationStep( 705 model=Models.SOBOL, 706 num_trials=-1, # How many trials should be produced from this generation step 707 should_deduplicate=True, # Deduplicate the trials 708 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 709 model_gen_kwargs={}, # Any kwargs you want passed to `modelbridge.gen` 710 ) 711 ] 712 ) 713 self.generator_run = self.gs.gen( 714 experiment=self.ax_client.experiment, # Ax `Experiment`, for which to generate new candidates 715 data=None, # Ax `Data` to use for model training, optional. 716 n=self._N, # Number of candidate arms to produce 717 fixed_features=self._fixed_features, 718 pending_observations=get_pending_observation_features( 719 self.ax_client.experiment 720 ), # Points that should not be re-generated 721 ) 722 723 def clear_trials(self): 724 """ 725 Clear all trials in the experiment. 726 """ 727 # Get all pending trial indices 728 pending_trials = [k for k,i in self.ax_client.experiment.trials.items() 729 if i.status==TrialStatus.CANDIDATE] 730 for i in pending_trials: 731 self.ax_client.experiment.trials[i].mark_abandoned() 732 733 def suggest_next_trials(self, with_predicted=True): 734 """ 735 Suggest the next set of trials based on the current model and optimization strategy. 736 737 Returns 738 ------- 739 740 pd.DataFrame: 741 DataFrame containing the suggested trials and their predicted outcomes. 742 """ 743 self.clear_trials() 744 if self.ax_client is None: 745 self.initialize_ax_client() 746 if self._N == 1: 747 self.candidate = self.ax_client.experiment.new_trial(self.generator_run) 748 else: 749 self.candidate = self.ax_client.experiment.new_batch_trial(self.generator_run) 750 trials = self.ax_client.get_trials_data_frame() 751 trials = trials[trials['trial_status'] == 'CANDIDATE'] 752 trials = trials[[name for name in self.names]] 753 if with_predicted: 754 topred = [trials.iloc[i].to_dict() for i in range(len(trials))] 755 preds = self.predict(topred)[0] 756 preds = pd.DataFrame(preds) 757 # add 'predicted_' to the names of the pred dataframe 758 preds.columns = [f'Predicted_{col}' for col in preds.columns] 759 preds = preds.reset_index(drop=True) 760 trials = trials.reset_index(drop=True) 761 return pd.concat([trials, preds], axis=1) 762 else: 763 return trials 764 765 def predict(self, params): 766 """ 767 Predict the outcomes for a given set of parameters using the current model. 768 769 Parameters 770 ---------- 771 772 params : List[Dict[str, Any]] 773 List of parameter dictionaries for which to predict outcomes. 774 775 Returns 776 ------- 777 778 List[Dict[str, float]]: 779 List of predicted outcomes for the given parameters. 780 """ 781 if self.ax_client is None: 782 self.initialize_ax_client() 783 obs_feats = [ObservationFeatures(parameters=p) for p in params] 784 f, cm = self.model.predict(obs_feats) 785 # return prediction and std errors as a list of dictionaries 786 # Convert to list of dictionaries 787 predictions = [] 788 for i in range(len(obs_feats)): 789 pred_dict = {} 790 for metric_name in f.keys(): 791 pred_dict[metric_name] = { 792 'mean': f[metric_name][i], 793 'std': np.sqrt(cm[metric_name][metric_name][i]) 794 } 795 predictions.append(pred_dict) 796 preds = [{k: v['mean'] for k, v in pred.items()} for pred in predictions] 797 stderrs = [{k: v['std'] for k, v in pred.items()} for pred in predictions] 798 return preds, stderrs 799 800 def update_experiment(self, params, outcomes): 801 """ 802 Update the experiment with new parameters and outcomes, and reinitialize the AxClient. 803 804 Parameters 805 ---------- 806 807 params : Dict[str, Any] 808 Dictionary of new parameters to update the experiment with. 809 810 outcomes : Dict[str, Any] 811 Dictionary of new outcomes to update the experiment with. 812 """ 813 # append new data to the features and outcomes dictionaries 814 for k, v in zip(params.keys(), params.values()): 815 if k not in self._features: 816 raise ValueError(f"Parameter '{k}' not found in features") 817 if isinstance(v, np.ndarray): 818 v = v.tolist() 819 if not isinstance(v, list): 820 v = [v] 821 self._features[k]['data'] += v 822 for k, v in zip(outcomes.keys(), outcomes.values()): 823 if k not in self._outcomes: 824 raise ValueError(f"Outcome '{k}' not found in outcomes") 825 if isinstance(v, np.ndarray): 826 v = v.tolist() 827 if not isinstance(v, list): 828 v = [v] 829 self._outcomes[k]['data'] += v 830 self.initialize_ax_client() 831 832 def plot_model(self, metricname=None, slice_values={}, linear=False): 833 """ 834 Plot the model's predictions for the experiment's parameters and outcomes. 835 Parameters 836 ---------- 837 metricname : Optional[str] 838 The name of the metric to plot. If None, the first outcome metric is used. 839 slice_values : Optional[Dict[str, Any]] 840 Dictionary of slice values for plotting. 841 linear : bool 842 Whether to plot a linear slice plot. Default is False. 843 Returns 844 ------- 845 plotly.graph_objects.Figure: 846 Plotly figure of the model's predictions. 847 """ 848 if self.ax_client is None: 849 self.initialize_ax_client() 850 self.suggest_next_trials() 851 cand_name = 'Candidate' if self._N == 1 else 'Candidates' 852 mname = self.ax_client.objective_names[0] if metricname is None else metricname 853 param_name = [name for name in self.names if name not in slice_values.keys()] 854 par_numeric = [name for name in param_name if self._features[name]['type'] in ['int', 'float']] 855 856 if len(par_numeric) == 1: 857 fig = plot_slice( 858 model=self.model, 859 metric_name=mname, 860 density=100, 861 param_name=par_numeric[0], 862 generator_runs_dict={cand_name: self.generator_run}, 863 slice_values=slice_values 864 ) 865 elif len(par_numeric) == 2: 866 fig = plot_contour( 867 model=self.model, 868 metric_name=mname, 869 param_x=par_numeric[0], 870 param_y=par_numeric[1], 871 generator_runs_dict={cand_name: self.generator_run}, 872 slice_values=slice_values 873 ) 874 else: 875 # remove sliced parameters from par_numeric 876 pars = [p for p in par_numeric if p not in slice_values.keys()] 877 fig = interact_contour( 878 model=self.model, 879 generator_runs_dict={cand_name: self.generator_run}, 880 metric_name=mname, 881 slice_values=slice_values, 882 parameters_to_use=pars 883 ) 884 885 plotly_fig = go.Figure(fig.data) 886 all_trials = self.ax_client.get_trials_data_frame() 887 completed_trials = all_trials[all_trials['trial_status'] != 'CANDIDATE'] 888 # compute distance to slice 889 col_to_consider = completed_trials[[k for k in slice_values.keys()]] 890 completed_trials['signed_dist_to_slice'] = ( 891 (col_to_consider - slice_values).sum(axis=1) # Sum of signed differences 892 ) 893 signed_dists = completed_trials['signed_dist_to_slice'].values 894 positive_dists = signed_dists[signed_dists >= 0] 895 negative_dists = signed_dists[signed_dists < 0] 896 897 # Normalize positive distances to [0, 1] 898 if len(positive_dists) > 0 and np.max(positive_dists) > 0: 899 normalized_positive = positive_dists / np.max(positive_dists) 900 else: 901 normalized_positive = np.zeros_like(positive_dists) 902 903 # Normalize negative distances to [-1, 0] 904 if len(negative_dists) > 0 and np.min(negative_dists) < 0: 905 normalized_negative = negative_dists / np.abs(np.min(negative_dists)) 906 else: 907 normalized_negative = np.zeros_like(negative_dists) 908 909 # Combine the normalized distances 910 normalized_signed_dists = np.zeros_like(signed_dists) 911 normalized_signed_dists[signed_dists >= 0] = normalized_positive 912 normalized_signed_dists[signed_dists < 0] = normalized_negative 913 914 completed_trials['normalized_signed_dist'] = normalized_signed_dists 915 coolwarm = cm.get_cmap('bwr') 916 normalized_values = (completed_trials['normalized_signed_dist'] + 1) / 2 # Map from [-1,1] to [0,1] 917 colors = [ 918 f"rgb({int(r*255)}, {int(g*255)}, {int(b*255)})" 919 for r, g, b, _ in coolwarm(normalized_values) 920 ] 921 completed_trials['colors'] = colors 922 trials = self.ax_client.get_trials_data_frame() 923 trials = trials[trials['trial_status'] == 'CANDIDATE'] 924 trials = trials[[name for name in self.names]] 925 926 in_sample_trace_idx = 0 927 for trace in plotly_fig.data: 928 if trace.type == "contour": 929 trace.colorscale = "viridis" 930 if 'marker' in trace and trace.legendgroup != cand_name: 931 arm_names = [] 932 if trace['text']: 933 for text in trace['text']: 934 print(text) 935 match = re.search(r'Arm (\d+_\d+)', text) 936 if match: 937 arm_names.append(match.group(1)) 938 arm_to_color = dict(zip(completed_trials['arm_name'], completed_trials['colors'])) 939 trace.marker.color = [arm_to_color[arm] for arm in arm_names] 940 trace.marker.symbol = "circle" 941 trace.marker.size = 10 942 trace.marker.line.width = 2 943 trace.marker.line.color = 'black' 944 # if len(opacities) > 0: 945 # trace.marker.opacity = opacities 946 if trace.text is not None: 947 trace.text = [t.replace('Arm', '<b>Sample').replace("_0","</b>") for t in trace.text] 948 if trace.legendgroup == cand_name: 949 trace.marker.line.color = 'red' 950 trace.marker.color = "orange" 951 trace.name = cand_name 952 trace.marker.symbol = "x" 953 trace.marker.size = 12 954 trace.marker.opacity = 1 955 trace.hoverinfo = "text" 956 trace.hoverlabel = dict(bgcolor="#f8e3cd", font_color='black') 957 if trace.text is not None: 958 trace.text = [t.replace("<i>","").replace("</i>","") for t in trace.text] 959 trace.text = [ 960 f"<b>Candidate {i+1}</b><br>{'<br>'.join([f'{col}: {val}' for col, val in trials.iloc[i].items()])}" 961 for t in trace.text 962 for i in range(len(trials)) 963 ] 964 965 plotly_fig.update_layout( 966 plot_bgcolor="white", 967 legend=dict(bgcolor='rgba(0,0,0,0)'), 968 margin=dict(l=10, r=10, t=50, b=50), 969 xaxis=dict( 970 showgrid=True, 971 gridcolor="lightgray", 972 zeroline=False, 973 zerolinecolor="black", 974 showline=True, 975 linewidth=1, 976 linecolor="black", 977 mirror=True 978 ), 979 yaxis=dict( 980 showgrid=True, 981 gridcolor="lightgray", 982 zeroline=False, 983 zerolinecolor="black", 984 showline=True, 985 linewidth=1, 986 linecolor="black", 987 mirror=True 988 ), 989 xaxis2=dict( 990 showgrid=True, 991 gridcolor="lightgray", 992 zeroline=False, 993 zerolinecolor="black", 994 showline=True, 995 linewidth=1, 996 linecolor="black", 997 mirror=True 998 ), 999 yaxis2=dict( 1000 showgrid=True, 1001 gridcolor="lightgray", 1002 zeroline=False, 1003 zerolinecolor="black", 1004 showline=True, 1005 linewidth=1, 1006 linecolor="black", 1007 mirror=True 1008 ), 1009 ) 1010 return plotly_fig 1011 1012 1013 def plot_optimization_trace(self, optimum=None): 1014 """ 1015 Plot the optimization trace, showing the progress of the optimization over trials. 1016 1017 Parameters 1018 ---------- 1019 1020 optimum : Optional[float] 1021 The optimal value to plot on the optimization trace. 1022 1023 Returns 1024 ------- 1025 1026 plotly.graph_objects.Figure: 1027 Plotly figure of the optimization trace. 1028 """ 1029 if self.ax_client is None: 1030 self.initialize_ax_client() 1031 if len(self._outcomes) > 1: 1032 print("Optimization trace is not available for multi-objective optimization.") 1033 return None 1034 fig = self.ax_client.get_optimization_trace(objective_optimum=optimum) 1035 fig = go.Figure(fig.data) 1036 for trace in fig.data: 1037 # add hover info 1038 trace.hoverinfo = "x+y" 1039 fig.update_layout( 1040 plot_bgcolor="white", # White background 1041 legend=dict(bgcolor='rgba(0,0,0,0)'), 1042 margin=dict(l=50, r=10, t=50, b=50), 1043 xaxis=dict( 1044 showgrid=True, # Enable grid 1045 gridcolor="lightgray", # Light gray grid lines 1046 zeroline=False, 1047 zerolinecolor="black", # Black zero line 1048 showline=True, 1049 linewidth=1, 1050 linecolor="black", # Black border 1051 mirror=True 1052 ), 1053 yaxis=dict( 1054 showgrid=True, # Enable grid 1055 gridcolor="lightgray", # Light gray grid lines 1056 zeroline=False, 1057 zerolinecolor="black", # Black zero line 1058 showline=True, 1059 linewidth=1, 1060 linecolor="black", # Black border 1061 mirror=True 1062 ), 1063 ) 1064 return fig 1065 1066 def compute_pareto_frontier(self): 1067 """ 1068 Compute the Pareto frontier for multi-objective optimization experiments. 1069 1070 Returns 1071 ------- 1072 The Pareto frontier. 1073 """ 1074 if self.ax_client is None: 1075 self.initialize_ax_client() 1076 if len(self._outcomes) < 2: 1077 print("Pareto frontier is not available for single-objective optimization.") 1078 return None 1079 1080 objectives = self.ax_client.experiment.optimization_config.objective.objectives 1081 self.pareto_frontier = compute_posterior_pareto_frontier( 1082 experiment=self.ax_client.experiment, 1083 data=self.ax_client.experiment.fetch_data(), 1084 primary_objective=objectives[1].metric, 1085 secondary_objective=objectives[0].metric, 1086 absolute_metrics=[o.metric_names[0] for o in objectives], 1087 num_points=20, 1088 ) 1089 return self.pareto_frontier 1090 1091 def plot_pareto_frontier(self, show_error_bars=True): 1092 """ 1093 Plot the Pareto frontier for multi-objective optimization experiments. 1094 1095 Parameters 1096 ---------- 1097 show_error_bars : bool, optional 1098 Whether to show error bars on the plot. Default is True. 1099 1100 Returns 1101 ------- 1102 plotly.graph_objects.Figure: 1103 Plotly figure of the Pareto frontier. 1104 """ 1105 if self.pareto_frontier is None: 1106 return None 1107 1108 fig = plot_pareto_frontier(self.pareto_frontier) 1109 fig = go.Figure(fig.data) 1110 1111 # Modify traces to show/hide error bars 1112 if not show_error_bars: 1113 for trace in fig.data: 1114 # Remove error bars by setting them to None 1115 if hasattr(trace, 'error_x') and trace.error_x is not None: 1116 trace.error_x = None 1117 if hasattr(trace, 'error_y') and trace.error_y is not None: 1118 trace.error_y = None 1119 1120 fig.update_layout( 1121 plot_bgcolor="white", # White background 1122 legend=dict(bgcolor='rgba(0,0,0,0)'), 1123 margin=dict(l=50, r=10, t=50, b=50), 1124 xaxis=dict( 1125 showgrid=True, # Enable grid 1126 gridcolor="lightgray", # Light gray grid lines 1127 zeroline=False, 1128 zerolinecolor="black", # Black zero line 1129 showline=True, 1130 linewidth=1, 1131 linecolor="black", # Black border 1132 mirror=True 1133 ), 1134 yaxis=dict( 1135 showgrid=True, # Enable grid 1136 gridcolor="lightgray", # Light gray grid lines 1137 zeroline=False, 1138 zerolinecolor="black", # Black zero line 1139 showline=True, 1140 linewidth=1, 1141 linecolor="black", # Black border 1142 mirror=True 1143 ), 1144 ) 1145 return fig 1146 1147 def get_best_parameters(self): 1148 """ 1149 Return the best parameters found by the optimization process. 1150 1151 Returns 1152 ------- 1153 1154 pd.DataFrame: 1155 DataFrame containing the best parameters and their outcomes. 1156 """ 1157 if self.ax_client is None: 1158 self.initialize_ax_client() 1159 if self.Nmetrics == 1: 1160 best_parameters = self.ax_client.get_best_parameters()[0] 1161 best_outcomes = self.ax_client.get_best_parameters()[1] 1162 best_parameters.update(best_outcomes[0]) 1163 best = pd.DataFrame(best_parameters, index=[0]) 1164 else: 1165 best_parameters = self.ax_client.get_pareto_optimal_parameters() 1166 best = ordered_dict_to_dataframe(best_parameters) 1167 return best
BOExperiment is a class designed to facilitate Bayesian Optimization experiments using the Ax platform. It encapsulates the experiment setup, including features, outcomes, constraints, and optimization methods.
Parameters
- features (Dict[str, Dict[str, Any]]):
A dictionary defining the features of the experiment, including their types and ranges.
Each feature is represented as a dictionary with keys 'type', 'data', and 'range'.
- 'type': The type of the feature (e.g., 'int', 'float', 'text').
- 'data': The observed data for the feature.
- 'range': The range of values for the feature.
- outcomes (Dict[str, Dict[str, Any]]):
A dictionary defining the outcomes of the experiment, including their types and observed data.
Each outcome is represented as a dictionary with keys 'type' and 'data'.
- 'type': The type of the outcome (e.g., 'int', 'float').
- 'data': The observed data for the outcome.
- ranges (Optional[Dict[str, Dict[str, Any]]]):
A dictionary defining the ranges of the features. Default is
None. If not provided, the ranges will be inferred from the features data. The ranges should be in the format{'feature_name': [minvalue,maxvalue]}. - N (int): The number of trials to suggest in each optimization step. Must be a positive integer.
- maximize (Union[bool, Dict[str, bool]]):
A boolean or dict indicating whether to maximize the outcomes in the form
{'outcome1':True, 'outcome2':False}. If a single boolean is provided, it is applied to all outcomes. Default isTrue. - fixed_features (Optional[Dict[str, Any]]):
A dictionary defining fixed features with their values. Default is
None. If provided, the fixed features will be treated as fixed parameters in the generation process. The fixed features should be in the format{'feature_name': value}. The values should be the fixed values for the respective features. - outcome_constraints (Optional[List[str]]):
Constraints on the outcomes, specified as a list of strings. Default is
None. The constraints should be in the format{'outcome_name': [minvalue,maxvalue]}. - feature_constraints (Optional[List[str]]):
Constraints on the features, specified as a list of strings. Default is
None. The constraints should be in the format{'feature_name': [minvalue,maxvalue]}. - optim (str): The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. Default is 'bo'.
acq_func (Optional[Dict[str, Any]]): The acquisition function to use for the optimization process. It must be a dict with 2 keys:
acqf: the acquisition function class to use (e.g.,UpperConfidenceBound),acqf_kwargs: a dict of the kwargs to pass to the acquisition function class. (e.g.{'beta': 0.1}).
If not provided, the default acquisition function is used (
LogExpectedImprovementorqLogExpectedImprovementif N>1).
Attributes
- features (Dict[str, Dict[str, Any]]): A dictionary defining the features of the experiment, including their types and ranges.
- outcomes (Dict[str, Dict[str, Any]]): A dictionary defining the outcomes of the experiment, including their types and observed data.
- N (int): The number of trials to suggest in each optimization step. Must be a positive integer.
- maximize (Union[bool, List[bool]]): A boolean or list of booleans indicating whether to maximize the outcomes. If a single boolean is provided, it is applied to all outcomes.
- outcome_constraints (Optional[Dict[str, Dict[str, float]]]): Constraints on the outcomes, specified as a dictionary or list of dictionaries.
- feature_constraints (Optional[List[Dict[str, Any]]]): Constraints on the features, specified as a list of dictionaries.
- optim (str): The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence.
- data (pd.DataFrame): A DataFrame representing the current data in the experiment, including features and outcomes.
- acq_func (dict): The acquisition function to use for the optimization process.
- generator_run:: The generator run for the experiment, used to generate new candidates.
- model:: The model used for predictions in the experiment.
- ax_client:: The AxClient for the experiment, used to manage trials and data.
- gs:: The generation strategy for the experiment, used to generate new candidates.
- parameters:: The parameters for the experiment, including their types and ranges.
- names:: The names of the features in the experiment.
- fixed_features:: The fixed features for the experiment, used to generate new candidates.
- candidate:: The candidate(s) suggested by the optimization process.
Methods
- initialize_ax_client(): Initializes the AxClient with the experiment's parameters, objectives, and constraints.
- suggest_next_trials(): Suggests the next set of trials based on the current model and optimization strategy. Returns a DataFrame containing the suggested trials and their predicted outcomes.
- predict(params: List[Dict[str, Any]]) -> List[Dict[str, float]]: Predicts the outcomes for a given set of parameters using the current model. Returns a list of predicted outcomes for the given parameters.
- update_experiment(params: Dict[str, Any], outcomes: Dict[str, Any]): Updates the experiment with new parameters and outcomes, and reinitializes the AxClient.
- plot_model(metricname: Optional[str] = None, slice_values: Optional[Dict[str, Any]] None, linear: bool = False)`: Plots the model's predictions for the experiment's parameters and outcomes. If metricname is None, the first outcome metric is used. If slice_values is provided, it slices the plot at those values. If linear is True, it plots a linear slice plot. If the experiment has only one feature, it plots a slice plot. If the experiment has multiple features, it plots a contour plot. Returns a Plotly figure of the model's predictions.
- plot_optimization_trace(optimum: Optional[float] = None): Plots the optimization trace, showing the progress of the optimization over trials. If the experiment has multiple outcomes, it raises a warning and returns None. Returns a Plotly figure of the optimization trace.
- plot_pareto_frontier(): Plots the Pareto frontier for multi-objective optimization experiments. If the experiment has only one outcome, it raises a warning and returns None. Returns a Plotly figure of the Pareto frontier.
- get_best_parameters() -> pd.DataFrame: Returns the best parameters found by the optimization process. If the experiment has multiple outcomes, it returns a DataFrame of the Pareto optimal parameters. If the experiment has only one outcome, it returns a DataFrame of the best parameters and their outcomes. The DataFrame contains the best parameters and their corresponding outcomes.
- clear_trials(): Clears all trials in the experiment. This is useful for resetting the experiment before suggesting new trials.
- set_model(): Sets the model to be used for predictions. This method is called after initializing the AxClient.
- set_gs(): Sets the generation strategy for the experiment. This method is called after initializing the AxClient.
Example
features, outcomes = read_experimental_data('data.csv', out_pos=[-2, -1])
experiment = BOExperiment(features,
outcomes,
N=5,
maximize={'out1':True, 'out2':False}
)
experiment.suggest_next_trials()
experiment.plot_model(metricname='outcome1')
experiment.plot_model(metricname='outcome2', linear=True)
experiment.plot_model(metricname='outcome1', slice_values={'feature1': 5})
experiment.plot_optimization_trace()
experiment.plot_pareto_frontier()
experiment.get_best_parameters()
experiment.update_experiment({'feature1': [4]}, {'outcome1': [0.4]})
experiment.plot_model()
experiment.plot_optimization_trace()
experiment.plot_pareto_frontier()
experiment.get_best_parameters()
203 def __init__(self, 204 features: Dict[str, Dict[str, Any]], 205 outcomes: Dict[str, Dict[str, Any]], 206 ranges: Optional[Dict[str, Dict[str, Any]]] = None, 207 N=1, 208 maximize: Union[bool, Dict[str, bool]] = True, 209 fixed_features: Optional[Dict[str, Any]] = None, 210 outcome_constraints: Optional[List[str]] = None, 211 feature_constraints: Optional[List[str]] = None, 212 optim='bo', 213 acq_func=None, 214 seed=42) -> None: 215 self._first_initialization_done = False 216 self.ranges = ranges 217 self.features = features 218 self.names = list(self._features.keys()) 219 self.fixed_features = fixed_features 220 self.outcomes = outcomes 221 self.N = N 222 self.maximize = maximize 223 self.outcome_constraints = outcome_constraints 224 self.feature_constraints = feature_constraints 225 self.optim = optim 226 self.acq_func = acq_func 227 self.seed = seed 228 self.candidate = None 229 """The candidate(s) suggested by the optimization process.""" 230 self.ax_client = None 231 """Ax's client for the experiment.""" 232 self.model = None 233 """Ax's Gaussian Process model.""" 234 self.parameters = None 235 """Ax's parameters for the experiment.""" 236 self.generator_run = None 237 """Ax's generator run for the experiment.""" 238 self.gs = None 239 """Ax's generation strategy for the experiment.""" 240 self.initialize_ax_client() 241 self.Nmetrics = len(self.ax_client.objective_names) 242 """The number of metrics in the experiment.""" 243 self._first_initialization_done = True 244 """To indicate that the first initialization is done so that we don't call `initialize_ax_client()` again.""" 245 self.pareto_frontier = None 246 """The Pareto frontier for multi-objective optimization experiments."""
310 @property 311 def ranges(self): 312 """ 313 A dictionary defining the ranges of the features. Default is `None`. 314 315 If not provided, the ranges will be inferred from the features data. 316 The ranges should be in the format `{'feature_name': [minvalue,maxvalue]}`. 317 """ 318 return self._ranges
A dictionary defining the ranges of the features. Default is None.
If not provided, the ranges will be inferred from the features data.
The ranges should be in the format {'feature_name': [minvalue,maxvalue]}.
264 @property 265 def features(self): 266 """ 267 A dictionary defining the features of the experiment, including their types and ranges. 268 269 Example 270 ------- 271 ```python 272 features = { 273 'feature1': {'type': 'int', 274 'data': [1, 2, 3], 275 'range': [1, 3]}, 276 'feature2': {'type': 'float', 277 'data': [0.1, 0.2, 0.3], 278 'range': [0.1, 0.3]}, 279 'feature3': {'type': 'text', 280 'data': ['A', 'B', 'C'], 281 'range': ['A', 'B', 'C']} 282 } 283 ``` 284 """ 285 return self._features
A dictionary defining the features of the experiment, including their types and ranges.
Example
features = {
'feature1': {'type': 'int',
'data': [1, 2, 3],
'range': [1, 3]},
'feature2': {'type': 'float',
'data': [0.1, 0.2, 0.3],
'range': [0.1, 0.3]},
'feature3': {'type': 'text',
'data': ['A', 'B', 'C'],
'range': ['A', 'B', 'C']}
}
330 @property 331 def names(self): 332 """ 333 The names of the features. 334 """ 335 return self._names
The names of the features.
376 @property 377 def fixed_features(self): 378 """ 379 A dictionary defining fixed features with their values. Default is `None`. 380 If provided, the fixed features will be treated as fixed parameters in the generation process. 381 The fixed features should be in the format `{'feature_name': value}`. 382 The values should be the fixed values for the respective features. 383 """ 384 return self._fixed_features
A dictionary defining fixed features with their values. Default is None.
If provided, the fixed features will be treated as fixed parameters in the generation process.
The fixed features should be in the format {'feature_name': value}.
The values should be the fixed values for the respective features.
346 @property 347 def outcomes(self): 348 """ 349 A dictionary defining the outcomes of the experiment, including their types and observed data. 350 351 Example 352 ------- 353 ```python 354 outcomes = { 355 'outcome1': {'type': 'float', 356 'data': [0.1, 0.2, 0.3]}, 357 'outcome2': {'type': 'float', 358 'data': [1.0, 2.0, 3.0]} 359 } 360 ``` 361 """ 362 return self._outcomes
A dictionary defining the outcomes of the experiment, including their types and observed data.
Example
outcomes = {
'outcome1': {'type': 'float',
'data': [0.1, 0.2, 0.3]},
'outcome2': {'type': 'float',
'data': [1.0, 2.0, 3.0]}
}
403 @property 404 def N(self): 405 """ 406 The number of trials to suggest in each optimization step. Must be a positive integer. Default is `1`. 407 """ 408 return self._N
The number of trials to suggest in each optimization step. Must be a positive integer. Default is 1.
421 @property 422 def maximize(self): 423 """ 424 A boolean or dict indicating whether to maximize the outcomes in the form `{'outcome1':True, 'outcome2':False}`. 425 If a single boolean is provided, it is applied to all outcomes. Default is `True`. 426 """ 427 return self._maximize
A boolean or dict indicating whether to maximize the outcomes in the form {'outcome1':True, 'outcome2':False}.
If a single boolean is provided, it is applied to all outcomes. Default is True.
444 @property 445 def outcome_constraints(self): 446 """ 447 Constraints on the outcomes, specified as a list of strings. Default is `None`. 448 """ 449 return self._outcome_constraints
Constraints on the outcomes, specified as a list of strings. Default is None.
465 @property 466 def feature_constraints(self): 467 """ 468 Constraints on the features, specified as a list of strings. Default is `None`. 469 470 Example 471 ------- 472 ```python 473 feature_constraints = [ 474 'feature1 <= 10.0', 475 'feature1 + 2*feature2 >= 3.0' 476 ] 477 ``` 478 """ 479 return self._feature_constraints
Constraints on the features, specified as a list of strings. Default is None.
Example
feature_constraints = [
'feature1 <= 10.0',
'feature1 + 2*feature2 >= 3.0'
]
497 @property 498 def optim(self): 499 """ 500 The optimization method to use, either `'bo'` for Bayesian Optimization or `'sobol'` for Sobol sequence. Default is `'bo'`. 501 """ 502 return self._optim
The optimization method to use, either 'bo' for Bayesian Optimization or 'sobol' for Sobol sequence. Default is 'bo'.
561 @property 562 def acq_func(self): 563 """ 564 The acquisition function to use for the optimization process. It must be a dict with 2 keys: 565 - `acqf`: the acquisition function class to use (e.g., `UpperConfidenceBound`), 566 - `acqf_kwargs`: a dict of the kwargs to pass to the acquisition function class. (e.g. `{'beta': 0.1}`). 567 568 If not provided, the default acquisition function is used (`LogExpectedImprovement` or `qLogExpectedImprovement` if N>1). 569 570 Example 571 ------- 572 ```python 573 acq_func = { 574 'acqf': UpperConfidenceBound, 575 'acqf_kwargs': {'beta': 0.1} # lower value = exploitation, higher value = exploration 576 } 577 ``` 578 """ 579 return self._acq_func
The acquisition function to use for the optimization process. It must be a dict with 2 keys:
acqf: the acquisition function class to use (e.g.,UpperConfidenceBound),acqf_kwargs: a dict of the kwargs to pass to the acquisition function class. (e.g.{'beta': 0.1}).
If not provided, the default acquisition function is used (LogExpectedImprovement or qLogExpectedImprovement if N>1).
Example
acq_func = {
'acqf': UpperConfidenceBound,
'acqf_kwargs': {'beta': 0.1} # lower value = exploitation, higher value = exploration
}
248 @property 249 def seed(self) -> int: 250 """Random seed for reproducibility. Default is 42.""" 251 return self._seed
Random seed for reproducibility. Default is 42.
546 @property 547 def pareto_frontier(self): 548 """ 549 The Pareto frontier for multi-objective optimization experiments. 550 """ 551 return self._pareto_frontier
The Pareto frontier for multi-objective optimization experiments.
516 @property 517 def data(self) -> pd.DataFrame: 518 """ 519 Returns a DataFrame of the current data in the experiment, including features and outcomes. 520 """ 521 feature_data = {name: info['data'] for name, info in self._features.items()} 522 outcome_data = {name: info['data'] for name, info in self._outcomes.items()} 523 data_dict = {**feature_data, **outcome_data} 524 return pd.DataFrame(data_dict)
Returns a DataFrame of the current data in the experiment, including features and outcomes.
611 def initialize_ax_client(self): 612 """ 613 Initialize the AxClient with the experiment's parameters, objectives, and constraints. 614 """ 615 print('\n======== INITIALIZING MODEL ========\n') 616 self.ax_client = AxClient(verbose_logging=False, 617 suppress_storage_errors=True) 618 self.parameters = [] 619 for name, info in self._features.items(): 620 if info['type'] == 'text': 621 self.parameters.append({ 622 "name": name, 623 "type": "choice", 624 "values": [str(val) for val in info['range']], 625 "value_type": "str"}) 626 elif info['type'] == 'int': 627 self.parameters.append({ 628 "name": name, 629 "type": "range", 630 "bounds": [int(np.min(info['range'])), 631 int(np.max(info['range']))], 632 "value_type": "int"}) 633 elif info['type'] == 'float': 634 self.parameters.append({ 635 "name": name, 636 "type": "range", 637 "bounds": [float(np.min(info['range'])), 638 float(np.max(info['range']))], 639 "value_type": "float"}) 640 641 self.ax_client.create_experiment( 642 name="bayesian_optimization", 643 parameters=self.parameters, 644 objectives={k: ObjectiveProperties(minimize=not v) 645 for k,v in self._maximize.items() 646 if isinstance(v, bool) and k in self._outcomes.keys()}, 647 parameter_constraints=self._feature_constraints, 648 outcome_constraints=self._outcome_constraints, 649 overwrite_existing_experiment=True 650 ) 651 652 if len(next(iter(self._outcomes.values()))['data']) > 0: 653 for i in range(len(next(iter(self._outcomes.values()))['data'])): 654 params = {name: info['data'][i] for name, info in self._features.items()} 655 outcomes = {name: info['data'][i] for name, info in self._outcomes.items()} 656 self.ax_client.attach_trial(params) 657 self.ax_client.complete_trial(trial_index=i, raw_data=outcomes) 658 659 self.set_model() 660 self.set_gs()
Initialize the AxClient with the experiment's parameters, objectives, and constraints.
662 def set_model(self): 663 """ 664 Set the model to be used for predictions. 665 This method is called after initializing the AxClient. 666 """ 667 self.model = Models.BOTORCH_MODULAR( 668 experiment=self.ax_client.experiment, 669 data=self.ax_client.experiment.fetch_data() 670 )
Set the model to be used for predictions. This method is called after initializing the AxClient.
672 def set_gs(self): 673 """ 674 Set the generation strategy for the experiment. 675 This method is called after initializing the AxClient. 676 """ 677 self.clear_trials() 678 if self._optim == 'bo': 679 if not self.model: 680 self.set_model() 681 if self.acq_func is None: 682 self.gs = GenerationStrategy( 683 steps=[GenerationStep( 684 model=Models.BOTORCH_MODULAR, 685 num_trials=-1, # No limitation on how many trials should be produced from this step 686 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 687 ) 688 ] 689 ) 690 else: 691 self.gs = GenerationStrategy( 692 steps=[GenerationStep( 693 model=Models.BOTORCH_MODULAR, 694 num_trials=-1, # No limitation on how many trials should be produced from this step 695 max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol 696 model_configs={"botorch_model_class": self.acq_func['acqf']}, 697 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 698 model_gen_options={"acquisition_options": self.acq_func['acqf_kwargs']} 699 ) 700 ] 701 ) 702 elif self._optim == 'sobol': 703 self.gs = GenerationStrategy( 704 steps=[GenerationStep( 705 model=Models.SOBOL, 706 num_trials=-1, # How many trials should be produced from this generation step 707 should_deduplicate=True, # Deduplicate the trials 708 model_kwargs={"seed": self.seed}, # Any kwargs you want passed into the model 709 model_gen_kwargs={}, # Any kwargs you want passed to `modelbridge.gen` 710 ) 711 ] 712 ) 713 self.generator_run = self.gs.gen( 714 experiment=self.ax_client.experiment, # Ax `Experiment`, for which to generate new candidates 715 data=None, # Ax `Data` to use for model training, optional. 716 n=self._N, # Number of candidate arms to produce 717 fixed_features=self._fixed_features, 718 pending_observations=get_pending_observation_features( 719 self.ax_client.experiment 720 ), # Points that should not be re-generated 721 )
Set the generation strategy for the experiment. This method is called after initializing the AxClient.
723 def clear_trials(self): 724 """ 725 Clear all trials in the experiment. 726 """ 727 # Get all pending trial indices 728 pending_trials = [k for k,i in self.ax_client.experiment.trials.items() 729 if i.status==TrialStatus.CANDIDATE] 730 for i in pending_trials: 731 self.ax_client.experiment.trials[i].mark_abandoned()
Clear all trials in the experiment.
733 def suggest_next_trials(self, with_predicted=True): 734 """ 735 Suggest the next set of trials based on the current model and optimization strategy. 736 737 Returns 738 ------- 739 740 pd.DataFrame: 741 DataFrame containing the suggested trials and their predicted outcomes. 742 """ 743 self.clear_trials() 744 if self.ax_client is None: 745 self.initialize_ax_client() 746 if self._N == 1: 747 self.candidate = self.ax_client.experiment.new_trial(self.generator_run) 748 else: 749 self.candidate = self.ax_client.experiment.new_batch_trial(self.generator_run) 750 trials = self.ax_client.get_trials_data_frame() 751 trials = trials[trials['trial_status'] == 'CANDIDATE'] 752 trials = trials[[name for name in self.names]] 753 if with_predicted: 754 topred = [trials.iloc[i].to_dict() for i in range(len(trials))] 755 preds = self.predict(topred)[0] 756 preds = pd.DataFrame(preds) 757 # add 'predicted_' to the names of the pred dataframe 758 preds.columns = [f'Predicted_{col}' for col in preds.columns] 759 preds = preds.reset_index(drop=True) 760 trials = trials.reset_index(drop=True) 761 return pd.concat([trials, preds], axis=1) 762 else: 763 return trials
Suggest the next set of trials based on the current model and optimization strategy.
Returns
- pd.DataFrame (): DataFrame containing the suggested trials and their predicted outcomes.
765 def predict(self, params): 766 """ 767 Predict the outcomes for a given set of parameters using the current model. 768 769 Parameters 770 ---------- 771 772 params : List[Dict[str, Any]] 773 List of parameter dictionaries for which to predict outcomes. 774 775 Returns 776 ------- 777 778 List[Dict[str, float]]: 779 List of predicted outcomes for the given parameters. 780 """ 781 if self.ax_client is None: 782 self.initialize_ax_client() 783 obs_feats = [ObservationFeatures(parameters=p) for p in params] 784 f, cm = self.model.predict(obs_feats) 785 # return prediction and std errors as a list of dictionaries 786 # Convert to list of dictionaries 787 predictions = [] 788 for i in range(len(obs_feats)): 789 pred_dict = {} 790 for metric_name in f.keys(): 791 pred_dict[metric_name] = { 792 'mean': f[metric_name][i], 793 'std': np.sqrt(cm[metric_name][metric_name][i]) 794 } 795 predictions.append(pred_dict) 796 preds = [{k: v['mean'] for k, v in pred.items()} for pred in predictions] 797 stderrs = [{k: v['std'] for k, v in pred.items()} for pred in predictions] 798 return preds, stderrs
Predict the outcomes for a given set of parameters using the current model.
Parameters
- params (List[Dict[str, Any]]): List of parameter dictionaries for which to predict outcomes.
Returns
- List[Dict[str, float]] (): List of predicted outcomes for the given parameters.
800 def update_experiment(self, params, outcomes): 801 """ 802 Update the experiment with new parameters and outcomes, and reinitialize the AxClient. 803 804 Parameters 805 ---------- 806 807 params : Dict[str, Any] 808 Dictionary of new parameters to update the experiment with. 809 810 outcomes : Dict[str, Any] 811 Dictionary of new outcomes to update the experiment with. 812 """ 813 # append new data to the features and outcomes dictionaries 814 for k, v in zip(params.keys(), params.values()): 815 if k not in self._features: 816 raise ValueError(f"Parameter '{k}' not found in features") 817 if isinstance(v, np.ndarray): 818 v = v.tolist() 819 if not isinstance(v, list): 820 v = [v] 821 self._features[k]['data'] += v 822 for k, v in zip(outcomes.keys(), outcomes.values()): 823 if k not in self._outcomes: 824 raise ValueError(f"Outcome '{k}' not found in outcomes") 825 if isinstance(v, np.ndarray): 826 v = v.tolist() 827 if not isinstance(v, list): 828 v = [v] 829 self._outcomes[k]['data'] += v 830 self.initialize_ax_client()
Update the experiment with new parameters and outcomes, and reinitialize the AxClient.
Parameters
- params (Dict[str, Any]): Dictionary of new parameters to update the experiment with.
- outcomes (Dict[str, Any]): Dictionary of new outcomes to update the experiment with.
832 def plot_model(self, metricname=None, slice_values={}, linear=False): 833 """ 834 Plot the model's predictions for the experiment's parameters and outcomes. 835 Parameters 836 ---------- 837 metricname : Optional[str] 838 The name of the metric to plot. If None, the first outcome metric is used. 839 slice_values : Optional[Dict[str, Any]] 840 Dictionary of slice values for plotting. 841 linear : bool 842 Whether to plot a linear slice plot. Default is False. 843 Returns 844 ------- 845 plotly.graph_objects.Figure: 846 Plotly figure of the model's predictions. 847 """ 848 if self.ax_client is None: 849 self.initialize_ax_client() 850 self.suggest_next_trials() 851 cand_name = 'Candidate' if self._N == 1 else 'Candidates' 852 mname = self.ax_client.objective_names[0] if metricname is None else metricname 853 param_name = [name for name in self.names if name not in slice_values.keys()] 854 par_numeric = [name for name in param_name if self._features[name]['type'] in ['int', 'float']] 855 856 if len(par_numeric) == 1: 857 fig = plot_slice( 858 model=self.model, 859 metric_name=mname, 860 density=100, 861 param_name=par_numeric[0], 862 generator_runs_dict={cand_name: self.generator_run}, 863 slice_values=slice_values 864 ) 865 elif len(par_numeric) == 2: 866 fig = plot_contour( 867 model=self.model, 868 metric_name=mname, 869 param_x=par_numeric[0], 870 param_y=par_numeric[1], 871 generator_runs_dict={cand_name: self.generator_run}, 872 slice_values=slice_values 873 ) 874 else: 875 # remove sliced parameters from par_numeric 876 pars = [p for p in par_numeric if p not in slice_values.keys()] 877 fig = interact_contour( 878 model=self.model, 879 generator_runs_dict={cand_name: self.generator_run}, 880 metric_name=mname, 881 slice_values=slice_values, 882 parameters_to_use=pars 883 ) 884 885 plotly_fig = go.Figure(fig.data) 886 all_trials = self.ax_client.get_trials_data_frame() 887 completed_trials = all_trials[all_trials['trial_status'] != 'CANDIDATE'] 888 # compute distance to slice 889 col_to_consider = completed_trials[[k for k in slice_values.keys()]] 890 completed_trials['signed_dist_to_slice'] = ( 891 (col_to_consider - slice_values).sum(axis=1) # Sum of signed differences 892 ) 893 signed_dists = completed_trials['signed_dist_to_slice'].values 894 positive_dists = signed_dists[signed_dists >= 0] 895 negative_dists = signed_dists[signed_dists < 0] 896 897 # Normalize positive distances to [0, 1] 898 if len(positive_dists) > 0 and np.max(positive_dists) > 0: 899 normalized_positive = positive_dists / np.max(positive_dists) 900 else: 901 normalized_positive = np.zeros_like(positive_dists) 902 903 # Normalize negative distances to [-1, 0] 904 if len(negative_dists) > 0 and np.min(negative_dists) < 0: 905 normalized_negative = negative_dists / np.abs(np.min(negative_dists)) 906 else: 907 normalized_negative = np.zeros_like(negative_dists) 908 909 # Combine the normalized distances 910 normalized_signed_dists = np.zeros_like(signed_dists) 911 normalized_signed_dists[signed_dists >= 0] = normalized_positive 912 normalized_signed_dists[signed_dists < 0] = normalized_negative 913 914 completed_trials['normalized_signed_dist'] = normalized_signed_dists 915 coolwarm = cm.get_cmap('bwr') 916 normalized_values = (completed_trials['normalized_signed_dist'] + 1) / 2 # Map from [-1,1] to [0,1] 917 colors = [ 918 f"rgb({int(r*255)}, {int(g*255)}, {int(b*255)})" 919 for r, g, b, _ in coolwarm(normalized_values) 920 ] 921 completed_trials['colors'] = colors 922 trials = self.ax_client.get_trials_data_frame() 923 trials = trials[trials['trial_status'] == 'CANDIDATE'] 924 trials = trials[[name for name in self.names]] 925 926 in_sample_trace_idx = 0 927 for trace in plotly_fig.data: 928 if trace.type == "contour": 929 trace.colorscale = "viridis" 930 if 'marker' in trace and trace.legendgroup != cand_name: 931 arm_names = [] 932 if trace['text']: 933 for text in trace['text']: 934 print(text) 935 match = re.search(r'Arm (\d+_\d+)', text) 936 if match: 937 arm_names.append(match.group(1)) 938 arm_to_color = dict(zip(completed_trials['arm_name'], completed_trials['colors'])) 939 trace.marker.color = [arm_to_color[arm] for arm in arm_names] 940 trace.marker.symbol = "circle" 941 trace.marker.size = 10 942 trace.marker.line.width = 2 943 trace.marker.line.color = 'black' 944 # if len(opacities) > 0: 945 # trace.marker.opacity = opacities 946 if trace.text is not None: 947 trace.text = [t.replace('Arm', '<b>Sample').replace("_0","</b>") for t in trace.text] 948 if trace.legendgroup == cand_name: 949 trace.marker.line.color = 'red' 950 trace.marker.color = "orange" 951 trace.name = cand_name 952 trace.marker.symbol = "x" 953 trace.marker.size = 12 954 trace.marker.opacity = 1 955 trace.hoverinfo = "text" 956 trace.hoverlabel = dict(bgcolor="#f8e3cd", font_color='black') 957 if trace.text is not None: 958 trace.text = [t.replace("<i>","").replace("</i>","") for t in trace.text] 959 trace.text = [ 960 f"<b>Candidate {i+1}</b><br>{'<br>'.join([f'{col}: {val}' for col, val in trials.iloc[i].items()])}" 961 for t in trace.text 962 for i in range(len(trials)) 963 ] 964 965 plotly_fig.update_layout( 966 plot_bgcolor="white", 967 legend=dict(bgcolor='rgba(0,0,0,0)'), 968 margin=dict(l=10, r=10, t=50, b=50), 969 xaxis=dict( 970 showgrid=True, 971 gridcolor="lightgray", 972 zeroline=False, 973 zerolinecolor="black", 974 showline=True, 975 linewidth=1, 976 linecolor="black", 977 mirror=True 978 ), 979 yaxis=dict( 980 showgrid=True, 981 gridcolor="lightgray", 982 zeroline=False, 983 zerolinecolor="black", 984 showline=True, 985 linewidth=1, 986 linecolor="black", 987 mirror=True 988 ), 989 xaxis2=dict( 990 showgrid=True, 991 gridcolor="lightgray", 992 zeroline=False, 993 zerolinecolor="black", 994 showline=True, 995 linewidth=1, 996 linecolor="black", 997 mirror=True 998 ), 999 yaxis2=dict( 1000 showgrid=True, 1001 gridcolor="lightgray", 1002 zeroline=False, 1003 zerolinecolor="black", 1004 showline=True, 1005 linewidth=1, 1006 linecolor="black", 1007 mirror=True 1008 ), 1009 ) 1010 return plotly_fig
Plot the model's predictions for the experiment's parameters and outcomes.
Parameters
- metricname (Optional[str]): The name of the metric to plot. If None, the first outcome metric is used.
- slice_values (Optional[Dict[str, Any]]): Dictionary of slice values for plotting.
- linear (bool): Whether to plot a linear slice plot. Default is False.
Returns
- plotly.graph_objects.Figure (): Plotly figure of the model's predictions.
1013 def plot_optimization_trace(self, optimum=None): 1014 """ 1015 Plot the optimization trace, showing the progress of the optimization over trials. 1016 1017 Parameters 1018 ---------- 1019 1020 optimum : Optional[float] 1021 The optimal value to plot on the optimization trace. 1022 1023 Returns 1024 ------- 1025 1026 plotly.graph_objects.Figure: 1027 Plotly figure of the optimization trace. 1028 """ 1029 if self.ax_client is None: 1030 self.initialize_ax_client() 1031 if len(self._outcomes) > 1: 1032 print("Optimization trace is not available for multi-objective optimization.") 1033 return None 1034 fig = self.ax_client.get_optimization_trace(objective_optimum=optimum) 1035 fig = go.Figure(fig.data) 1036 for trace in fig.data: 1037 # add hover info 1038 trace.hoverinfo = "x+y" 1039 fig.update_layout( 1040 plot_bgcolor="white", # White background 1041 legend=dict(bgcolor='rgba(0,0,0,0)'), 1042 margin=dict(l=50, r=10, t=50, b=50), 1043 xaxis=dict( 1044 showgrid=True, # Enable grid 1045 gridcolor="lightgray", # Light gray grid lines 1046 zeroline=False, 1047 zerolinecolor="black", # Black zero line 1048 showline=True, 1049 linewidth=1, 1050 linecolor="black", # Black border 1051 mirror=True 1052 ), 1053 yaxis=dict( 1054 showgrid=True, # Enable grid 1055 gridcolor="lightgray", # Light gray grid lines 1056 zeroline=False, 1057 zerolinecolor="black", # Black zero line 1058 showline=True, 1059 linewidth=1, 1060 linecolor="black", # Black border 1061 mirror=True 1062 ), 1063 ) 1064 return fig
Plot the optimization trace, showing the progress of the optimization over trials.
Parameters
- optimum (Optional[float]): The optimal value to plot on the optimization trace.
Returns
- plotly.graph_objects.Figure (): Plotly figure of the optimization trace.
1066 def compute_pareto_frontier(self): 1067 """ 1068 Compute the Pareto frontier for multi-objective optimization experiments. 1069 1070 Returns 1071 ------- 1072 The Pareto frontier. 1073 """ 1074 if self.ax_client is None: 1075 self.initialize_ax_client() 1076 if len(self._outcomes) < 2: 1077 print("Pareto frontier is not available for single-objective optimization.") 1078 return None 1079 1080 objectives = self.ax_client.experiment.optimization_config.objective.objectives 1081 self.pareto_frontier = compute_posterior_pareto_frontier( 1082 experiment=self.ax_client.experiment, 1083 data=self.ax_client.experiment.fetch_data(), 1084 primary_objective=objectives[1].metric, 1085 secondary_objective=objectives[0].metric, 1086 absolute_metrics=[o.metric_names[0] for o in objectives], 1087 num_points=20, 1088 ) 1089 return self.pareto_frontier
Compute the Pareto frontier for multi-objective optimization experiments.
Returns
- The Pareto frontier.
1091 def plot_pareto_frontier(self, show_error_bars=True): 1092 """ 1093 Plot the Pareto frontier for multi-objective optimization experiments. 1094 1095 Parameters 1096 ---------- 1097 show_error_bars : bool, optional 1098 Whether to show error bars on the plot. Default is True. 1099 1100 Returns 1101 ------- 1102 plotly.graph_objects.Figure: 1103 Plotly figure of the Pareto frontier. 1104 """ 1105 if self.pareto_frontier is None: 1106 return None 1107 1108 fig = plot_pareto_frontier(self.pareto_frontier) 1109 fig = go.Figure(fig.data) 1110 1111 # Modify traces to show/hide error bars 1112 if not show_error_bars: 1113 for trace in fig.data: 1114 # Remove error bars by setting them to None 1115 if hasattr(trace, 'error_x') and trace.error_x is not None: 1116 trace.error_x = None 1117 if hasattr(trace, 'error_y') and trace.error_y is not None: 1118 trace.error_y = None 1119 1120 fig.update_layout( 1121 plot_bgcolor="white", # White background 1122 legend=dict(bgcolor='rgba(0,0,0,0)'), 1123 margin=dict(l=50, r=10, t=50, b=50), 1124 xaxis=dict( 1125 showgrid=True, # Enable grid 1126 gridcolor="lightgray", # Light gray grid lines 1127 zeroline=False, 1128 zerolinecolor="black", # Black zero line 1129 showline=True, 1130 linewidth=1, 1131 linecolor="black", # Black border 1132 mirror=True 1133 ), 1134 yaxis=dict( 1135 showgrid=True, # Enable grid 1136 gridcolor="lightgray", # Light gray grid lines 1137 zeroline=False, 1138 zerolinecolor="black", # Black zero line 1139 showline=True, 1140 linewidth=1, 1141 linecolor="black", # Black border 1142 mirror=True 1143 ), 1144 ) 1145 return fig
Plot the Pareto frontier for multi-objective optimization experiments.
Parameters
- show_error_bars (bool, optional): Whether to show error bars on the plot. Default is True.
Returns
- plotly.graph_objects.Figure (): Plotly figure of the Pareto frontier.
1147 def get_best_parameters(self): 1148 """ 1149 Return the best parameters found by the optimization process. 1150 1151 Returns 1152 ------- 1153 1154 pd.DataFrame: 1155 DataFrame containing the best parameters and their outcomes. 1156 """ 1157 if self.ax_client is None: 1158 self.initialize_ax_client() 1159 if self.Nmetrics == 1: 1160 best_parameters = self.ax_client.get_best_parameters()[0] 1161 best_outcomes = self.ax_client.get_best_parameters()[1] 1162 best_parameters.update(best_outcomes[0]) 1163 best = pd.DataFrame(best_parameters, index=[0]) 1164 else: 1165 best_parameters = self.ax_client.get_pareto_optimal_parameters() 1166 best = ordered_dict_to_dataframe(best_parameters) 1167 return best
Return the best parameters found by the optimization process.
Returns
- pd.DataFrame (): DataFrame containing the best parameters and their outcomes.
1171def flatten_dict(d, parent_key="", sep="_"): 1172 """ 1173 Flatten a nested dictionary. 1174 """ 1175 items = [] 1176 for k, v in d.items(): 1177 new_key = f"{parent_key}{sep}{k}" if parent_key else k 1178 if isinstance(v, dict): 1179 items.extend(flatten_dict(v, new_key, sep=sep).items()) 1180 else: 1181 items.append((new_key, v)) 1182 return dict(items)
Flatten a nested dictionary.
1186def ordered_dict_to_dataframe(data): 1187 """ 1188 Convert an OrderedDict with arbitrary nesting to a DataFrame. 1189 """ 1190 dflat = flatten_dict(data) 1191 out = [] 1192 1193 for key, value in dflat.items(): 1194 main_dict = value[0] 1195 sub_dict = value[1][0] 1196 out.append([value for value in main_dict.values()] + 1197 [value for value in sub_dict.values()]) 1198 1199 df = pd.DataFrame(out, columns=[key for key in main_dict.keys()] + 1200 [key for key in sub_dict.keys()]) 1201 return df
Convert an OrderedDict with arbitrary nesting to a DataFrame.
1205def read_experimental_data(file_path: str, out_pos=[-1]) -> Tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, Any]]]: 1206 """ 1207 Read experimental data from a CSV file and format it into features and outcomes dictionaries. 1208 1209 Parameters 1210 ---------- 1211 file_path (str) 1212 Path to the CSV file containing experimental data. 1213 out_pos (list of int) 1214 Column indices of the outcome variables. Default is the last column. 1215 1216 Returns 1217 ------- 1218 Tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, Any]]] 1219 Formatted features and outcomes dictionaries. 1220 """ 1221 data = pd.read_csv(file_path) 1222 data = clean_names(data, remove_special=True, case_type='preserve') 1223 outcome_column_name = data.columns[out_pos] 1224 features = data.loc[:, ~data.columns.isin(outcome_column_name)].copy() 1225 outcomes = data[outcome_column_name].copy() 1226 1227 feature_definitions = {} 1228 for column in features.columns: 1229 if features[column].dtype == 'object': 1230 unique_values = features[column].unique() 1231 feature_definitions[column] = {'type': 'text', 1232 'range': unique_values.tolist()} 1233 elif features[column].dtype in ['int64', 'float64']: 1234 min_val = features[column].min() 1235 max_val = features[column].max() 1236 feature_type = 'int' if features[column].dtype == 'int64' else 'float' 1237 feature_definitions[column] = {'type': feature_type, 1238 'range': [min_val, max_val]} 1239 1240 formatted_features = {name: {'type': info['type'], 1241 'data': features[name].tolist(), 1242 'range': info['range']} 1243 for name, info in feature_definitions.items()} 1244 # same for outcomes with just type and data 1245 outcome_definitions = {} 1246 for column in outcomes.columns: 1247 if outcomes[column].dtype == 'object': 1248 unique_values = outcomes[column].unique() 1249 outcome_definitions[column] = {'type': 'text', 1250 'data': unique_values.tolist()} 1251 elif outcomes[column].dtype in ['int64', 'float64']: 1252 min_val = outcomes[column].min() 1253 max_val = outcomes[column].max() 1254 outcome_type = 'int' if outcomes[column].dtype == 'int64' else 'float' 1255 outcome_definitions[column] = {'type': outcome_type, 1256 'data': outcomes[column].tolist()} 1257 formatted_outcomes = {name: {'type': info['type'], 1258 'data': outcomes[name].tolist()} 1259 for name, info in outcome_definitions.items()} 1260 return formatted_features, formatted_outcomes
Read experimental data from a CSV file and format it into features and outcomes dictionaries.
Parameters
- file_path (str): Path to the CSV file containing experimental data.
- out_pos (list of int): Column indices of the outcome variables. Default is the last column.
Returns
- Tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, Any]]]: Formatted features and outcomes dictionaries.