轻松并行化

Optuna 支持多种方式来运行并行优化。

  1. 多线程优化:

    • 您可以使用 `n_jobs` 参数在单个进程中并行运行多个 trial(试验)。

  2. 多进程优化:

    • 您可以运行共享相同存储后端的多个进程,例如 RDB 或文件。

  3. 多节点优化:

    • 您可以在多台机器上运行相同的优化 study。

    • 如果您需要跨越数千个处理节点执行优化,可以使用 `GrpcStorageProxy` 在多台机器上运行分布式优化。

下图显示了哪种策略适用于哪种用例。

digraph storage_selector {
    rankdir=LR;
    node [shape=box];

    { rank=same; multithread; single_node; many_nodes; grpc_storage; }

    multithread [label=<
        <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
            <TR><TD>Multi-thread or Multi-process?</TD></TR>
        </TABLE>
    >];

    single_node [label=<
        <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
            <TR><TD>Single node/<BR/>Multi-node?</TD></TR>
        </TABLE>
    >];

    many_nodes  [label=<
        <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
            <TR><TD>Do you need<BR/>a very large number of nodes?</TD></TR>
        </TABLE>
    >];

    multithread_storages [
        shape=box,
        style=rounded,
        href="#multi-thread-optimization",
        label=<
            <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
                <TR><TD><U>InMemoryStorage</U></TD></TR>
                <TR><TD><U>JournalStorage</U></TD></TR>
            </TABLE>
        >
    ];

    singlenode_storages [
        shape=box,
        style=rounded,
        href="#multi-process-optimization",
        label=<
            <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
                <TR><TD><U>JournalStorage</U></TD></TR>
                <TR><TD><U>RDBStorage</U></TD></TR>
            </TABLE>
        >
    ]

    rdb_storage [
        shape=box,
        style=rounded,
        href="#multi-node-optimization",
        label=<
            <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
                <TR><TD><U>RDBStorage</U></TD></TR>
            </TABLE>
        >
    ]

    grpc_storage [
        shape=box,
        style=rounded,
        href="#grpc-storage-proxy",
        label=<
            <TABLE BORDER="0" CELLBORDER="0" CELLALIGN="LEFT">
                <TR><TD><U>GrpcStorageProxy</U></TD></TR>
            </TABLE>
        >
    ]

    multithread -> multithread_storages [label="Multi-thread"];
    multithread -> single_node [label="Multi-process"];
    single_node -> singlenode_storages [label="Single node"];
    single_node -> many_nodes [label="Multi-node"];
    many_nodes -> rdb_storage [label="No"];
    many_nodes -> grpc_storage [label="Yes"];
}

多线程优化

注意

推荐的后端:

您只需设置 `optimize()` 中的 `n_jobs` 参数即可并行运行多个 trial。

由于全局解释器锁 (GIL) 的存在,多线程优化在 Python 中传统上效率不高。但是,从 Python 3.14 开始(待官方发布),预计将移除 GIL。这一变化将使多线程成为一个不错的选择,尤其适用于并行优化。

import optuna
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
from optuna.trial import Trial
import threading


def objective(trial: Trial):
    print(f"Running trial {trial.number=} in {threading.current_thread().name}")
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


study = optuna.create_study(
    storage=JournalStorage(JournalFileBackend(file_path="./journal.log")),
)
study.optimize(objective, n_trials=20, n_jobs=4)
Running trial trial.number=0 in ThreadPoolExecutor-1_0
Running trial trial.number=1 in ThreadPoolExecutor-1_2
Running trial trial.number=2 in ThreadPoolExecutor-1_1
Running trial trial.number=3 in ThreadPoolExecutor-1_3
Running trial trial.number=4 in ThreadPoolExecutor-1_3
Running trial trial.number=5 in ThreadPoolExecutor-1_1
Running trial trial.number=6 in ThreadPoolExecutor-1_0
Running trial trial.number=7 in ThreadPoolExecutor-1_2
Running trial trial.number=8 in ThreadPoolExecutor-1_1
Running trial trial.number=9 in ThreadPoolExecutor-1_0
Running trial trial.number=10 in ThreadPoolExecutor-1_3
Running trial trial.number=11 in ThreadPoolExecutor-1_2
Running trial trial.number=12 in ThreadPoolExecutor-1_1
Running trial trial.number=13 in ThreadPoolExecutor-1_3
Running trial trial.number=14 in ThreadPoolExecutor-1_0
Running trial trial.number=15 in ThreadPoolExecutor-1_2
Running trial trial.number=16 in ThreadPoolExecutor-1_1
Running trial trial.number=17 in ThreadPoolExecutor-1_3
Running trial trial.number=18 in ThreadPoolExecutor-1_0
Running trial trial.number=19 in ThreadPoolExecutor-1_2

使用 JournalStorage 进行多进程优化

注意

推荐的后端:

您可以通过使用共享存储来运行多个进程进行优化。由于 `InMemoryStorage` 的设计并非用于跨进程共享,因此不能用于多进程优化。

以下示例展示了如何使用 `JournalStorage` 和 `multiprocessing` 模块进行多进程优化。

import optuna
from multiprocessing import Pool
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
import os


def objective(trial):
    print(f"Running trial {trial.number=} in process {os.getpid()}")
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


def run_optimization(_):
    study = optuna.create_study(
        study_name="journal_storage_multiprocess",
        storage=JournalStorage(JournalFileBackend(file_path="./journal.log")),
        load_if_exists=True, # Useful for multi-process or multi-node optimization.
    )
    study.optimize(objective, n_trials=3)

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        pool.map(run_optimization, range(12))

输出

$ python3 multiprocess_example.py
Running trial trial.number=1 in process 4605
Running trial trial.number=2 in process 4604
Running trial trial.number=3 in process 4607
Running trial trial.number=4 in process 4606
Running trial trial.number=5 in process 4605
Running trial trial.number=6 in process 4607
Running trial trial.number=7 in process 4604
Running trial trial.number=8 in process 4605
...

使用 RDBStorage 进行多节点优化

由于 `JournalFileBackend` 使用本地文件系统的文件锁,它对同一主机上的多个进程运行是安全的。但是,如果通过 NFS(或类似方式)从多台机器同时访问,文件锁可能无法正常工作,这可能导致竞态条件。

因此,对于多节点优化,建议使用 `RDBStorage`。您可以使用 MySQL、PostgreSQL 或其他 RDB 后端。

例如,使用 MySQL 时,您需要设置一个 MySQL 服务器并为 Optuna 创建一个数据库。

$ mysql -u username -e "CREATE DATABASE IF NOT EXISTS example"

然后,您可以通过将 MySQL URL 设置为 `create_study()` 中 `storage` 参数的值,来使用此 MySQL 数据库作为存储后端。

import optuna


def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


if __name__ == "__main__":
    study = optuna.create_study(
        study_name="distributed_test",
        storage="mysql://username:password@127.0.0.1:3306/example",
        load_if_exists=True,
    )
    study.optimize(objective, n_trials=100)

您可以在多台机器上运行此示例

机器 1

$ python3 distributed_example.py
[I 2025-06-03 14:07:45,306] A new study created in RDB with name: distributed_test
[I 2025-06-03 14:08:45,450] Trial 0 finished with value: 12.694308312865278 and parameters: {'x': -1.5629072837873959}. Best is trial 0 with value: 12.694308312865278.
[I 2025-06-03 14:09:45,482] Trial 2 finished with value: 121.80632032697125 and parameters: {'x': -9.036590067904635}. Best is trial 0 with value: 12.694308312865278.

机器 2

$ python3 distributed_example.py
[I 2025-06-03 14:07:49,318] Using an existing study with name 'distributed_test' instead of creating a new one.
[I 2025-06-03 14:08:49,442] Trial 1 finished with value: 0.21258674253407828 and parameters: {'x': 1.5389287012466746}. Best is trial 31 with value: 9.19159178106083e-05.
[I 2025-06-03 14:09:49,480] Trial 3 finished with value: 0.24343413718999274 and parameters: {'x': 2.493390451052706}. Best is trial 31 with value: 9.19159178106083e-05.

使用 GrpcStorageProxy 进行多节点优化

但是,如果您运行的是数千个进程节点,RDB 服务器可能无法处理负载。在这种情况下,您可以使用 `GrpcStorageProxy` 来分发服务器负载。

`GrpcStorageProxy` 是一个代理存储层,它在内部使用 `RDBStorage` 作为其后端。它可以有效地处理来自多台机器的高吞吐量并发请求。

以下示例展示了如何使用 `GrpcStorageProxy`。由于 `GrpcStorageProxy` 是一个代理存储,您需要先使用 `RDBStorage` 作为后端运行一个 gRPC 服务器。

from optuna.storages import run_grpc_proxy_server
from optuna.storages import get_storage

storage = get_storage("mysql+pymysql://username:password@127.0.0.1:3306/example")
run_grpc_proxy_server(storage, host="localhost", port=13000)

输出

$ python3 grpc_proxy_server.py
[I 2025-06-03 13:57:38,328] Server started at localhost:13000
[I 2025-06-03 13:57:38,328] Listening...

然后,在每台机器上,您可以运行以下代码来连接到 gRPC 代理存储。

import optuna

from optuna.storages import GrpcStorageProxy


def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


if __name__ == "__main__":
    storage = GrpcStorageProxy(host="localhost", port=13000)
    study = optuna.create_study(
        study_name="grpc_proxy_multinode",
        storage=storage,
        load_if_exists=True,
    )
    study.optimize(objective, n_trials=50)

脚本总运行时间: (0 分钟 0.227 秒)

由 Sphinx-Gallery 生成的画廊