Insert, update, or remove rows in a database
James Wondrasek
2024-11-24
Source:vignettes/howto-dm-rows.Rmd
howto-dm-rows.Rmd
This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods:
-
dm_rows_insert()
- adds new unique rows -
dm_rows_append()
- adds new rows unconditionally -
dm_rows_update()
- changes values in rows -
dm_rows_patch()
- fills in missing values -
dm_rows_upsert()
- adds new rows or changes values if pre-existing -
dm_rows_delete()
- deletes rows
The dm_rows_* process
All six methods take the same arguments and using them follows the same process:
- Create a temporary changeset dm object that defines the intended changes on the RDBMS
- If desired, simulate changes with
in_place = FALSE
to double-check - Apply changes with
in_place = TRUE
.
To start, a dm
object is created containing the tables
and rows that you want to change. This changeset dm
is then
copied into the same source as the dm you want to modify. With the dm in
the same RDBMS as the destination dm, you call the appropriate method,
such as dm_rows_insert()
, to make your planned changes,
along with an argument of in_place = FALSE
so you can
confirm you achieve the changes that you want.
This verification can be done visually, looking at row counts and the
like, or using {dm}’s constraint checking method,
dm_examine_constraints()
. The biggest danger is damaging
key relations between data spread across multiple tables by deleting or
duplicating rows and their keys. dm_examine_constraints()
will catch errors where primary keys are duplicated or foreign keys do
not have a matching primary key (unless the foreign key value is
NA
).
With the changes confirmed, you execute the method again, this time
with the argument in_place = TRUE
to make the changes
permanent. Note that in_place = FALSE
is the default: you
must opt in to actually change data on the database.
Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values that uniquely identify rows.
Method | Requirements |
---|---|
dm_rows_insert() |
Records with existing primary keys are silently ignored (via
dplyr::rows_insert(conflict = "ignore") ). |
dm_rows_append() |
All records are inserted, the underlying database might check for uniqueness of primary keys (and fail the operation) if a constraint is set. |
dm_rows_update() |
Primary keys must match for all records to be updated. |
dm_rows_patch() |
Updates missing values in existing records. Primary keys must match for all records to be patched. |
dm_rows_upsert() |
Updates existing records and adds new records, based on the primary key. |
dm_rows_delete() |
Removes matching records based on the primary key. Primary keys must match for all records to be deleted. |
To ensure the integrity of all relations during the process, all
methods automatically determine the correct processing order for the
tables involved. For operations that create records, parent tables
(which hold primary keys) are processed before child tables (which hold
foreign keys). For dm_rows_delete()
, child tables are
processed before their parent tables. Note that the user is still
responsible for setting transactions to ensure integrity of operations
across multiple tables. For more details on this see
vignette("howto-dm-theory")
and
vignette("howto-dm-db")
.
Usage
To demonstrate the use of these table modifying methods, we will
create a simple dm
object with two tables linked by a
foreign key. Note that the child
table has a foreign key
missing (NA
).
#> # A tibble: 3 × 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> # A tibble: 3 × 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
demo_dm <-
dm(parent = parent, child = child) %>%
dm_add_pk(parent, pk) %>%
dm_add_pk(child, pk) %>%
dm_add_fk(child, fk, parent)
demo_dm %>%
dm_draw(view_type = "all")
{dm} doesn’t check your key values when you create a dm, we add this check:1
dm_examine_constraints(demo_dm)
Then we copy demo_dm
into an SQLite database. Note: the
default for the method used, copy_dm_to()
, is to create
temporary tables that will be automatically deleted when your session
ends. As demo_sql
will be the destination dm for the
examples, the argument temporary = FALSE
is used to make
this distinction apparent.
library(DBI)
sqlite_db <- DBI::dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
demo_sql
#> ── Table source ───────────────────────────────────────────────────────────
#> src: sqlite 3.47.0 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `parent`, `child`
#> Columns: 5
#> Primary keys: 2
#> Foreign keys: 1
{dm}’s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} – manually building a sequence of operations using temporary results until it is complete and correct, and then committing the result.
dm_rows_insert()
To demonstrate dm_rows_insert()
, we create a dm with
tables containing the rows to insert and copy it to
sqlite_db
, the same source as demo_sql
. For
all of the dm_rows_...()
methods, the source and
destination dm
objects must be in the same RDBMS. You will
get an error message if this is not the case.
The code below adds parent
and child
table
entries for the letter “D”. First, the changeset dm is created and
temporarily copied to the database:
new_parent <- tibble(value = "D", pk = 4)
new_parent
#> # A tibble: 1 × 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
new_child <- tibble(value = "d", pk = 4, fk = 4)
new_child
#> # A tibble: 1 × 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 d 4 4
dm_insert_in <-
dm(parent = new_parent, child = new_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
The changeset dm is then used as an argument to
dm_rows_insert()
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in)
#> Result is returned as a dm object with lazy tables. Use `in_place = FALSE`
#> to mute this message, or `in_place = TRUE` to write to the underlying
#> tables.
This gives us a warning that changes will not persist (i.e., they are
temporary). Inspecting the child
table of the resulting
dm_insert_out
and demo_sql
, we can see that’s
exactly what happened. {dm} returned to us a dm object with our inserted
rows in place, but the underlying database has not changed.
dm_insert_out$child
#> # Source: SQL [4 x 3]
#> # Database: sqlite 3.47.0 []
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
demo_sql$child
#> # Source: table<`child`> [3 x 3]
#> # Database: sqlite 3.47.0 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
We repeat the operation, this time with the argument
in_place = TRUE
and the changes now persist in
demo_sql
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [4 x 3]
#> # Database: sqlite 3.47.0 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_update()
dm_rows_update()
works the same as
dm_rows_insert()
. We create the dm object and copy it to
the same source as the destination. Here we will change the foreign key
for the row in child
containing “b” to point to the correct
row in parent
. And we will persist the changes.
updated_child <- tibble(value = "b", pk = 2, fk = 2)
updated_child
#> # A tibble: 1 × 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 2
dm_update_in <-
dm(child = updated_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
dm_update_out <-
demo_sql %>%
dm_rows_update(dm_update_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [4 x 3]
#> # Database: sqlite 3.47.0 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_delete()
dm_rows_delete()
is not currently implemented to work
with an RDBMS, so we will shift our demonstrations back to the local R
environment. We’ve made changes to demo_sql
, so we use
collect()
to copy the current tables out of SQLite. Note
that persistence is not a concern for local dm
objects. Every operation returns a new dm object containing the changes
made.
local_dm <- collect(demo_sql)
local_dm$parent
#> # A tibble: 4 × 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
local_dm$child
#> # A tibble: 4 × 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_deleted <-
dm(parent = new_parent, child = new_child) %>%
dm_rows_delete(local_dm, .)
#> Result is returned as a dm object with lazy tables. Use `in_place = FALSE` to mute this message, or `in_place = TRUE` to write to the underlying tables.
#> Ignoring extra `y` columns: value, fk
#> Ignoring extra `y` columns: value
dm_deleted$child
#> # A tibble: 3 × 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
dm_rows_patch()
dm_rows_patch()
updates missing values in existing
records. We use it here to fix the missing foreign key in the
child
table.
patched_child <- tibble(value = "c", pk = 3, fk = 3)
patched_child
#> # A tibble: 1 × 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 c 3 3
dm_patched <-
dm(child = patched_child) %>%
dm_rows_patch(dm_deleted, .)
#> Result is returned as a dm object with lazy tables. Use `in_place = FALSE`
#> to mute this message, or `in_place = TRUE` to write to the underlying
#> tables.
dm_patched$child
#> # A tibble: 3 × 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 3
dm_rows_upsert()
dm_rows_upsert()
updates rows with supplied values if
they exist or inserts the supplied values as new rows if they don’t. In
this example we add the letter “D” back to our dm, and update the
foreign key for “b”.
upserted_parent <- tibble(value = "D", pk = 4)
upserted_parent
#> # A tibble: 1 × 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
#> # A tibble: 2 × 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 3
#> 2 d 4 4
dm_upserted <-
dm(parent = upserted_parent, child = upserted_child) %>%
dm_rows_upsert(dm_patched, .)
#> Result is returned as a dm object with lazy tables. Use `in_place = FALSE`
#> to mute this message, or `in_place = TRUE` to write to the underlying
#> tables.
dm_upserted$parent
#> # A tibble: 4 × 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
dm_upserted$child
#> # A tibble: 4 × 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 3
#> 3 c 3 3
#> 4 d 4 4
When done, do not forget to disconnect:
DBI::dbDisconnect(sqlite_db)
Conclusion
The dm_rows_...()
methods give you row-level granularity
over the modifications you need to make to your relational model. Using
the common in_place
argument, they all can construct and
verify your modifications before committing them. There are a few
limitations, as mentioned in the tutorial, but these will be addressed
in future updates to {dm}.
Further Reading
If this tutorial answered some questions, but opened others, these resources might be of assistance.
Is your data in an RDBMS? vignette("howto-dm-db")
offers
a detailed look at working with an existing relational data model.
If your data is in data frames, then you may want to read
vignette("howto-dm-df")
next.
If you would like to know more about relational data models in order
to get the most out of dm, check out
vignette("howto-dm-theory")
.
If you’re familiar with relational data models but want to know how
to work with them in dm, then any of
vignette("tech-dm-join")
,
vignette("tech-dm-filter")
, or
vignette("tech-dm-zoom")
is a good next step.