dimitri/pgcopydbpgcopydb is a tool that automates running pg_dump | pg_restore between two
running Postgres servers. To make a copy of a database to another server as
quickly as possible, one would like to use the parallel options of pg_dump
and still be able to stream the data to as many pg_restore jobs.
The idea would be to use pg_dump --jobs=N --format=directory postgres://user@source/dbname | pg_restore --jobs=N --format=directory -d postgres://user@target/dbname in a way. This command line can't be made to
work, unfortunately, because pg_dump --format=directory writes to local
files and directories first, and then later pg_restore --format=directory
can be used to read from those files again.
When using pgcopydb it is possible to achieve the result outlined before
with this simple command line:
bash$ export PGCOPYDB_SOURCE_PGURI="postgres://***/dbname" $ export PGCOPYDB_TARGET_PGURI="postgres://***/dbname" $ pgcopydb copy-db --table-jobs 8 --index-jobs 2
Then pgcopydb implements the following steps:
pgcopydb produces pre-data section and the post-data sections of
the dump using Postgres custom format.
The pre-data section of the dump is restored on the target database,
creating all the Postgres objects from the source database into the
target database.
pgcopydb gets the list of ordinary and partitioned tables and for
each of them runs COPY the data from the source to the target in a
dedicated sub-process, and starts and control the sub-processes until
all the data has been copied over.
Postgres catalog table pg_class is used to get the list of tables with
data to copy around, and the reltuples is used to start with the
tables with the greatest number of rows first, as an attempt to
minimize the copy time.
In each copy table sub-process, as soon as the data copying is done,
then pgcopydb gets the list of index definitions attached to the
current target table and creates them in parallel.
The primary indexes are created as UNIQUE indexes at this stage.
Then the PRIMARY KEY constraints are created USING the just built indexes. This two-steps approach allows the primary key index itself to be created in parallel with other indexes on the same table, avoiding an EXCLUSIVE LOCK while creating the index.
Then VACUUM ANALYZE is run on each target table as soon as the data and indexes are all created.
The final stage consists now of running the rest of the post-data
section script for the whole database, and that's where the foreign key
constraints and other elements are created.
The post-data script is filtered out using the pg_restore --use-list option so that indexes and primary key constraints already
created in step 4. are properly skipped now.
This is done by the per-table sub-processes sharing the dump IDs of the
post-data items they have created with the main process, which can
then filter out the pg_restore --list output and comment the already
created objects from there, by dump ID.
Here is a docker image definition that uses and extend this image:
FROM pgcopydb USER root RUN apt-get update \ && apt-get install -y --no-install-recommends \ ca-certificates \ curl \ git \ && rm -rf /var/lib/apt/lists/* WORKDIR /usr/src/ RUN git clone --depth 1 [***] WORKDIR /usr/src/pgcopydb COPY ./copydb.sh copydb.sh USER docker CMD /usr/src/pgcopydb/copydb.sh
Here is a docker-compose file as found in the tests/pagila directory of the project:
yamlversion: "3.9" # optional since v1.27.0 services: source: image: postgres:13-bullseye expose: - 5432 environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: h4ckm3 POSTGRES_HOST_AUTH_METHOD: trust target: image: postgres:13-bullseye expose: - 5432 environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: h4ckm3 POSTGRES_HOST_AUTH_METHOD: trust pgcopydb: build: . environment: PGCOPYDB_SOURCE_PGURI: postgres://postgres:h4ckm3@source/postgres PGCOPYDB_TARGET_PGURI: postgres://postgres:h4ckm3@target/postgres PGCOPYDB_TARGET_TABLE_JOBS: 4 PGCOPYDB_TARGET_INDEX_JOBS: 2
Copyright (c) The PostgreSQL Global Development Group.
This project is licensed under the PostgreSQL License, see LICENSE file for details.
This project includes bundled third-party dependencies, see NOTICE file for details.
探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
在 Linux 系统配置镜像服务
在 Docker Desktop 配置镜像
Docker Compose 项目配置
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
MacOS OrbStack 容器配置
在宝塔面板一键配置镜像
Synology 群晖 NAS 配置
飞牛 fnOS 系统配置镜像
极空间 NAS 系统配置服务
爱快 iKuai 路由系统配置
绿联 NAS 系统配置镜像
QNAP 威联通 NAS 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
无需登录使用专属域名
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
免费版仅支持 Docker Hub 访问,不承诺可用性和速度;专业版支持更多镜像源,保证可用性和稳定速度,提供优先客服响应。
专业版支持 docker.io、gcr.io、ghcr.io、registry.k8s.io、nvcr.io、quay.io、mcr.microsoft.com、docker.elastic.co 等;免费版仅支持 docker.io。
当返回 402 Payment Required 错误时,表示流量已耗尽,需要充值流量包以恢复服务。
通常由 Docker 版本过低导致,需要升级到 20.x 或更高版本以支持 V2 协议。
先检查 Docker 版本,版本过低则升级;版本正常则验证镜像信息是否正确。
使用 docker tag 命令为镜像打上新标签,去掉域名前缀,使镜像名称更简洁。
来自真实用户的反馈,见证轩辕镜像的优质服务