Skip to main content

Create a dump

hello 👋🏼

I assume you have Installed Replibyte and you Read the concepts.

Configuration

To use Replibyte, you need to use to a dump from your production database. Here are three options:

  • Option 1 (easiest): You don't, and you want Replibyte to make a dump.
  • Option 2: You don't, and you want to make a dump manually.
  • Option 3: You already have a dump from your database.

Option 1: make a dump with Replibyte

To let Replibyte creating a dump from your database for you, you need to update your conf.yaml file with the source connection URI from your production database as a property.

Pick the example that fit the database you are using.

PostgreSQL
source:
connection_uri: postgres://[user]:[password]@[host]:[port]/[database]
MySQL
source:
connection_uri: mysql://[user]:[password]@[host]:[port]/[database]
MongoDB
source:
connection_uri: mongodb://[user]:[password]@[host]:[port]/[database]

or you can also use an environment variable

With an environment variable
source:
connection_uri: $DATABASE_URL

Option 2: Make a dump manually

Here are the commands to dump your database yourself

PostgreSQL
pg_dump --column-inserts --no-owner -h [host] -p [port] -U [username] [database]
MySQL
mysqldump -h [host] -P [port] -u [username] -p --add-drop-database --add-drop-table --skip-extended-insert --complete-insert --single-transaction --quick --databases
MongoDB
mongodump -h [host] --port [port] --authenticationDatabase [auth_db|default: admin] --db [database] -u [username] -p [password] --archive

Option 3: You already have a dump

You have nothing to do, but it is possible that some options are missing from your dump, then you'll need to use the option 2

Hide sensitive data with Transformers

note

A MongoDB Collection can be associated to a SQL table, and a Document to a table row.

By using Transformers, you can change on the fly your database data. Let's say we have the following structure for a table employees

CREATE TABLE public.customers (
id bpchar NOT NULL,
first_name character varying(30) NOT NULL,
last_name character varying(30) NOT NULL,
contact_email character varying(2048) NOT NULL,
contact_phone character varying(24)
);

with the following entries:

INSERT INTO public.customers (id, first_name, last_name, contact_email, contact_phone) VALUES ('ALFKI', 'Maria', 'Anders', '[email protected]', '030-0074321');
INSERT INTO public.customers (id, first_name, last_name, contact_email, contact_phone) VALUES ('ANATR', 'Ana', 'Trujillo', '[email protected]', '(5) 555-4729');
INSERT INTO public.customers (id, first_name, last_name, contact_email, contact_phone) VALUES ('ANTON', 'Antonio', 'Moreno', '[email protected]', NULL);

and you want to hide the first_name, last_name, contact_email and the contact_phone fields. You can use the following configuration in your conf.yaml file.

source and transformers in your conf.yaml
source:
connection_uri: postgres://user:[email protected]:port/db
transformers:
- database: public
table: customers
columns:
- name: first_name
transformer_name: first-name
- name: last_name
transformer_name: random
- name: contact_phone
transformer_name: phone-number
- name: contact_email
transformer_name: email

By using Transformers, you keep your sensitive data safe of being leaked.

Run

It's the big day! Let's run and upload our transformed dump. But wait, something is missing. If you read about the concepts, and how Replibyte works, you know that a Datastore is required to upload the transformed dump. Here is the lines you need to add in your conf.yaml

Add your datastore in your conf.yaml
datastore:
aws:
bucket: my-replibyte-dumps
region: us-east-2
credentials:
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
session_token: XXX # optional

Here the datastore is a S3 bucket where the dump will be stored and accessible for future restore (next guide).

The final conf.yaml to create a final transformed dump looks like this:

caution

Do not forget to change your bucket name!

conf.yaml
source:
connection_uri: postgres://user:[email protected]:port/db # optional - use only for option #1
transformers:
- database: public
table: customers
columns:
- name: first_name
transformer_name: first-name
- name: last_name
transformer_name: random
- name: contact_phone
transformer_name: phone-number
- name: contact_email
transformer_name: email
datastore:
aws:
bucket: my-replibyte-dumps
region: us-east-2
credentials:
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
session_token: XXX # optional

Finally, you can run the following command according to you chosen option above:

Option 1: Make a dump with Replibyte
replibyte -c conf.yaml dump create
Option 2 and 3: Create a transformed dump from a dump file
cat your_dump.sql | replibyte -c conf.yaml dump create -i -s postgresql

-i parameter is required to read the data from the input.

-s parameter is required if you don't have a source.connection_uri in the configuration file. (Valid values are postgresql, postgres, mysql)


Now, it's time to look at how to restore your transformed dump ➡️