Skip to contents

For printing

If you have more than one table, use dm.

Create dm…

from database: dm_from_con()

con <- DBI::dbConnect(...)
dm_from_con(con)
diagram showing the transformation from left a database icon to right a dm object represented as three tables in a blue frame

from data frames: dm(df1, df2, ...)

dm(df1, df2, df3)
diagram showing the transformation from left three data frames to right a dm object represented as three tables in a blue frame

from dm: dm(dm1, df1...)

dm(dm1, df1)
diagram showing the transformation from left a dm with a data frame below to right a single dm object represented as three tables in a blue frame
dm(dm1, dm2)
diagram showing the transformation from left two dm objects to right a single dm object represented as three tables in a blue frame

Add keys: dm_add_pk(), dm_add_fk()

Automatic for MariaDB, SQL Server, Postgres, and others.

Primary keys

Identify potential primary keys:

Add primary keys:

dm1 |>
  dm_add_pk(table, columns)
left a single dm object represented as three tables in a blue frame, right the same but with markers for primary keys

Identify potential foreign keys:

Add foreign keys:

dm1 |>
  dm_add_fk(table, column)
left a single dm object represented as three tables in a blue frame, right the same but with an arrow to indicate a foreign key relationship

dm objects: relational data models

The dm package provides a grammar of relational data models. It helps maintain referential integrity.

A dm behaves like a list of tables (data frames or lazy tables) capturing relationships between the tables.


Resize dm

Select tables: dm_select_tbl(dm1, ...)

dm1 |>
  dm_select_tbl(-df3)
diagram showing two dm objects, the one on the right has one table less

Rename tables: dm_rename_tbl(dm1, ...)

Select columns: dm_select(dm1, table, ...)

Automatic update of dm meta-information and table relations.

dm1 |>
  dm_select(df3, -c3, -c4)
diagram showing two dm objects, the third table of the one on the right has two colmns less

Rename columns: dm_rename(dm1, table, ...)

Filter rows: dm_filter(dm1, table = (pred))

Filter rows in the table where the condition is defined, but also all directly/indirectly connected tables.

dm1 |>
  dm_filter(df3 = (x == "val"))
diagram showing two dm objects, the one on the right has tables with less rows

Visualize dm: dm_draw()

Control diagram level of detail: display…

  • Only keys (default): dm_draw(view_type = "keys_only").
  • All variables: dm_draw(view_type = "all").
  • Only table names: dm_draw(view_type = "title_only").
dm |>
  dm_draw(
    view_type = "title_only",
    rankdir = "TB"
  )
Diagram with linked rectangles (tables). Only table names. %0 airlines airlines airports airports flights flights flights:carrier->airlines:carrier flights:origin->airports:faa planes planes flights:tailnum->planes:tailnum weather weather flights:origin, time_hour->weather:origin, time_hour

Control diagram scope

To visualize fewer tables first use dm_select_tbl().

Control diagram colors: dm_set_colors()

dm |>
  dm_set_colors(
    pink = flights,
    orange = starts_with("air")
  ) |>
  dm_draw()
Diagram with linked rectangles (tables). The flight table is pink, the airlines and airpots tables whose name start with ‘air’, are orange. %0 airlines airlinescarrier airports airportsfaa flights flightscarriertailnumoriginorigin, time_hour flights:carrier->airlines:carrier flights:origin->airports:faa planes planestailnum flights:tailnum->planes:tailnum weather weatherorigin, time_hour flights:origin, time_hour->weather:origin, time_hour

Data checks

dm_examine_constraints()

tibble with information about which key constraints are met or violated.

dm_examine_cardinalities()

tibble with information about the cardinality of the foreign keys constraints.

check_key(df, col1)

returns an error if not an unique key.

check_subset(df1, df2)

returns an error if df1 is not a subset of df2.

check_set_equality(df1, df2)

returns an error if df1 and df2 are not the same sets.


Fix column names: dm_disambiguate_cols()

dm_disambiguate_cols(dm1) ensures that all columns in a dm have unique names.

Transform dm into tibble

Wide tibble: Cascade joins with dm_flatten_to_tbl()

Only direct neighbours: dm_flatten_to_tbl()

dm1 |>
  dm_flatten_to_tbl(
.start = df1
)
On the left a dm object where table A is linked to table B and table C. Table D is linked to table B. On the right a table corresponding to a join of table A, B and C. Not D as it is not a direct neighboor

All neighbours: dm_flatten_to_tbl(.recursive = TRUE)

dm1 |>
  dm_flatten_to_tbl(
.start = df1,
.recursive = TRUE
)
On the left a dm object where table A is linked to table B and table C. Table D is linked to table B. On the right a table corresponding to a join of table A, B, C and D as they can all be reached from table A albeit indirectly.

Single tibble dm: dm_wrap_tbl()

Parent tables are packed — dm_pack_tbl().

Child tables are nested — dm_nest_tbl().

dm1 |>
  dm_wrap_tbl(
root = green_df
)
On the left a dm object (tables with arrows in a frame) where a green table is linked to a parent blue table and a child purple table. On the right another dm object with a single table in a frame. The blue parent table became a packed column in the green table (a column that contains a data frame) and the purple child table became a nested table (a table in each cell).

Retrieve one table of the dm: pull_tbl()

dm1 |>
  pull_tbl(
dm1,
green_df,
keyed = TRUE
)
On the left a dm object with three tables, on the right only a table. The primary key information has been kept which is represented by the first column of that table having kept its dark background.

If the dm is zoomed, retrieve zoomed table automatically.

dm1 |>
  dm_zoom_to(green_df) |>
pull_tbl()
On the left a dm object with three tables. There is a magnifying glass on the green table. On the right only the green table.

Mutate, create, analyze tables

Method 1: deconstruct and reconstruct

  1. dm_get_tables(keyed = TRUE):
    to keep information on primary and foreign keys).
dm_tbl <- dm1 |>
  dm_get_tables(keyed = TRUE)
diagram showing the transformation from left a dm object to right a named list of the same table, instead of a solid border frame on the right the frame has a dotted border
  1. tidyverse pipeline on the table of interest.
new_table1 <- dm_tbl$table1 |>
  mutate(...)
on the left a table, on the right the same table with one more column
  1. Optional: update the dm object:
dm1 |>
  dm_select_tbl(-table1) |>
  dm(table1 = new_table1)
on the left a dm of three tables, on the right the same dm but one of the tables has one column more

Method 2: zoom

  1. dm_zoom_to(): Zoom on a table.
zoomed_dm1 <- dm1 |>
  dm_zoom_to(green_df)
On the left a dm object with three tables. On the right the same object but there is now a magnifying glass on the green table.
  1. tidyverse pipeline (mutate(), etc.).
zoomed_dm2 <- zoomed_dm1 |>
  mutate(var = thing)
On the left a dm object with three tables including a green table with a magnifying glass on it. On the right the same object but there is now a second green table superimposed, with one more column compared to the green table in the dm.
  1. dm_update_zoomed() (replace) / dm_insert_zoomed()
dm3 <- zoomed_dm2 |>
  dm_update_zoomed()
On the left a dm object featuring a second green table superimposed, with one more column compared to the green table in the dm. On the right the magnifying glass and second green table disappeared. Instead the green table in the dm has the new column.

Modify database source of a dm

Export dm object to database: copy_dm_to()

Need a database connection — DBI::dbConnect().

con <- DBI::dbConnect(...)
  # Persistent tables:
persistent_dm <- copy_dm_to(
  con,
  dm1,
  temporary = FALSE
)

DBI::dbDisconnect(con)
On the left a dm object, with an arrow pointing to on the right a database icon

Insert, update or remove rows in a dm

Methods:

  • dm_rows_insert(dm1, dm2): adds new rows
  • dm_rows_update(dm1, dm2): changes values in rows
  • dm_rows_patch(dm1, dm2): fills in missing values
  • dm_rows_upsert(dm1, dm2): adds new or changes rows
  • dm_rows_delete(dm1, dm2): deletes rows
dm1 |>
  dm_rows_insert(dm2, in_place = FALSE)
On the left a dm object with 3 tables. On the right, a similar dm where one of the tables, the blue one, has more rows. Between the two is a plus signEqual sign then a dm object where the blue table has now more rows.
dm1 |>
  dm_rows_insert(dm2, in_place = TRUE)
On the left a dm object with 3 tables, and a database icon superimposed. On the right, a similar dm where one of the tables, the blue one, has more rows. Between the two is a plus signDatabase icon.

A dm is immutable, except with

  • these functions AND
  • a mutable backend (database) AND
  • in_place = TRUE.