Skip to main content

Subset a dump

caution

Only PostgreSQL supports Subsetting at the moment. Feel free to contribute to accelerate the support of MySQL and MongoDB

Subsetting is a powerful feature to only import a smaller consistent part from your production database.

How Subsetting works

Check out how subsetting works under the hood here.

Configuration

Using Subsetting feature is as simple as adding new parameters in your conf.yaml

add database_subset object
source:
connection_uri: postgres://user:[email protected]:port/db
transformers:
- database: public
table: customers
columns:
- name: first_name
transformer_name: first-name
- name: last_name
transformer_name: random
- name: contact_phone
transformer_name: phone-number
- name: contact_email
transformer_name: email
database_subset:
database: public
table: customers
strategy_name: random
strategy_options:
percent: 10
passthrough_tables:
- product_catalog

By applying this configuration, Replibyte will:

  • Keep around 10% of the full database
  • Go down the whole tables linked to public.customers
  • Keep the whole rows from product_catalog

Subset Strategy

TODO

Considerations

This feature is still under active improvement. Feel free to open an issue if you face any trouble.