Create dm…

from database: dm_from_con()

diagram showing the transformation from left a database icon to right a dm object represented as three tables in a blue frame
con <- DBI::dbConnect(...)
dm_from_con(con)

from data frames: dm(df1, df2, ...)

diagram showing the transformation from left three data frames to right a dm object represented as three tables in a blue frame
dm(df1, df2, df3)

from dm: dm(dm1, df1...)

diagram showing the transformation from left a dm with a data frame below to right a single dm object represented as three tables in a blue frame
dm(dm1, df1)
diagram showing the transformation from left two dm objects to right a single dm object represented as three tables in a blue frame
dm(dm1, dm2)

Add keys: dm_add_pk(), dm_add_fk()

Automatic for MariaDB, SQL Server, Postgres, and others.

Primary keys

Identify potential primary keys:

  • dm_enum_pk_candidates(): columns, candidate, why.
left a single dm object represented as three tables in a blue frame, right the same but with markers for primary keys

Add primary keys:

dm1 |>
  dm_add_pk(table, columns)

Identify potential foreign keys:

  • dm_enum_fk_candidates(): columns, candidate, why.
left a single dm object represented as three tables in a blue frame, right the same but with an arrow to indicate a foreign key relationship

Add foreign keys:

dm1 |>
  dm_add_fk(table, column)

dm objects: relational data models

The dm package provides a grammar of relational data models. It helps maintain referential integrity.

A dm behaves like a list of tables (data frames or lazy tables) capturing relationships between the tables.


Resize dm

Select tables: dm_select_tbl(dm1, ...)

diagram showing two dm objects, the one on the right has one table less
dm1 |>
  dm_select_tbl(-df3)

Rename tables: dm_rename_tbl(dm1, ...)

Select columns: dm_select(dm1, table, ...)

Automatic update of dm meta-information and table relations.

diagram showing two dm objects, the third table of the one on the right has two colmns less
dm1 |>
  dm_select(df3, -c3, -c4)

Rename columns: dm_rename(dm1, table, ...)

Filter rows: dm_filter(dm1, table = (pred))

Filter rows in the table where the condition is defined, but also all directly/indirectly connected tables.

diagram showing two dm objects, the one on the right has tables with less rows
dm1 |>
  dm_filter(df3 = (x == "val"))

Visualize dm: dm_draw()

Control diagram level of detail: display…

  • Only keys (default): dm_draw(view_type = "keys_only").
  • All variables: dm_draw(view_type = "all").
  • Only table names: dm_draw(view_type = "title_only").
Diagram with linked rectangles (tables). Only table names.
dm |>
  dm_draw(
    view_type = "title_only",
    rankdir = "TB"
  )

Control diagram scope

To visualize fewer tables first use dm_select_tbl().

Control diagram colors: dm_set_colors()

Diagram with linked rectangles (tables). The flight table is pink, the airlines and airpots tables whose name start with 'air', are orange.
dm |>
  dm_set_colors(
    pink = flights,
    orange = starts_with("air")
  ) |>
  dm_draw()

Data checks

dm_examine_constraints()

tibble with information about which key constraints are met or violated.

dm_examine_cardinalities()

tibble with information about the cardinality of the foreign keys constraints.

check_key(df, col1)

returns an error if not an unique key.

check_subset(df1, df2)

returns an error if df1 is not a subset of df2.

check_set_equality(df1, df2)

returns an error if df1 and df2 are not the same sets.


Fix column names: dm_disambiguate_cols()

dm_disambiguate_cols(dm1) ensures that all columns in a dm have unique names.

Transform dm into tibble

Wide tibble: Cascade joins with dm_flatten_to_tbl()

Only direct neighbours: dm_flatten_to_tbl()

On the left a dm object where table A is linked to table B and table C. Table D is linked to table B. On the right a table corresponding to a join of table A, B and C. Not D as it is not a direct neighboor
dm1 |>
  dm_flatten_to_tbl(
.start = df1
)

All neighbours: dm_flatten_to_tbl(.recursive = TRUE)

On the left a dm object where table A is linked to table B and table C. Table D is linked to table B. On the right a table corresponding to a join of table A, B, C and D as they can all be reached from table A albeit indirectly.
dm1 |>
  dm_flatten_to_tbl(
.start = df1,
.recursive = TRUE
)

Single tibble dm: dm_wrap_tbl()

Parent tables are packed — dm_pack_tbl().

Child tables are nested — dm_nest_tbl().

On the left a dm object (tables with arrows in a frame) where a green table is linked to a parent blue table and a child purple table. On the right another dm object with a single table in a frame. The blue parent table became a packed column in the green table (a column that contains a data frame) and the purple child table became a nested table (a table in each cell).
dm1 |>
  dm_wrap_tbl(
root = green_df
)

Retrieve one table of the dm: pull_tbl()

On the left a dm object with three tables, on the right only a table. The primary key information has been kept which is represented by the first column of that table having kept its dark background.
dm1 |>
  pull_tbl(
dm1,
green_df,
keyed = TRUE
)
On the left a dm object with three tables. There is a magnifying glass on the green table. On the right only the green table.

If the dm is zoomed, retrieve zoomed table automatically.

dm1 |>
  dm_zoom_to(green_df) |>
pull_tbl()

Mutate, create, analyze tables

Method 1: deconstruct and reconstruct

  1. dm_get_tables(keyed = TRUE):
    to keep information on primary and foreign keys).
diagram showing the transformation from left a dm object to right a named list of the same table, instead of a solid border frame on the right the frame has a dotted border
dm_tbl <- dm1 |>
  dm_get_tables(keyed = TRUE)
  1. tidyverse pipeline on the table of interest.
on the left a table, on the right the same table with one more column
new_table1 <- dm_tbl$table1 |>
  mutate(...)
  1. Optional: update the dm object:
on the left a dm of three tables, on the right the same dm but one of the tables has one column more
dm1 |>
  dm_select_tbl(-table1) |>
  dm(table1 = new_table1)

Method 2: zoom

  1. dm_zoom_to(): Zoom on a table.
On the left a dm object with three tables. On the right the same object but there is now a magnifying glass on the green table.
zoomed_dm1 <- dm1 |>
  dm_zoom_to(green_df)
  1. tidyverse pipeline (mutate(), etc.).
On the left a dm object with three tables including a green table with a magnifying glass on it. On the right the same object but there is now a second green table superimposed, with one more column compared to the green table in the dm.
zoomed_dm2 <- zoomed_dm1 |>
  mutate(var = thing)
  1. dm_update_zoomed() (replace) / dm_insert_zoomed()
On the left a dm object featuring a second green table superimposed, with one more column compared to the green table in the dm. On the right the magnifying glass and second green table disappeared. Instead the green table in the dm has the new column.
dm3 <- zoomed_dm2 |>
  dm_update_zoomed()

Modify database source of a dm

Export dm object to database: copy_dm_to()

Need a database connection — DBI::dbConnect().

On the left a dm object, with an arrow pointing to on the right a database icon
con <- DBI::dbConnect(...)
  # Persistent tables:
persistent_dm <- copy_dm_to(
  con,
  dm1,
  temporary = FALSE
)

DBI::dbDisconnect(con)

Insert, update or remove rows in a dm

Methods:

  • dm_rows_insert(dm1, dm2): adds new rows
  • dm_rows_update(dm1, dm2): changes values in rows
  • dm_rows_patch(dm1, dm2): fills in missing values
  • dm_rows_upsert(dm1, dm2): adds new or changes rows
  • dm_rows_delete(dm1, dm2): deletes rows
dm1 |>
  dm_rows_insert(dm2, in_place = FALSE)
Equal sign then a dm object where the blue table has now more rows.On the left a dm object with 3 tables. On the right, a similar dm where one of the tables, the blue one, has more rows. Between the two is a plus sign
dm1 |>
  dm_rows_insert(dm2, in_place = TRUE)
Database icon.On the left a dm object with 3 tables, and a database icon superimposed. On the right, a similar dm where one of the tables, the blue one, has more rows. Between the two is a plus sign

A dm is immutable, except with

  • these functions AND
  • a mutable backend (database) AND
  • in_place = TRUE.