In this article, we’ll show how to enable and use the SQL Server Change Data Capture feature.
Enabling Change Data Capture on a Database
Before changes to any individual tables within a database can be tracked, change data capture must be explicitly enabled for the database. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked. You can run following query and check whether it is enabled for any database.
USE master
GO
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases
GO
You can run the following system stored procedure in the context of each database to enable CDC at database level. (The following script will enable CDC in Allens database.)
GO
USE [Allens]
GO
EXEC sys.sp_cdc_enable_db
GO
Additionally, in the database Allens, you will see that a schema with the name ‘cdc’ has now been created.
Some System Tables will have been created within the Allens database as part of the cdc schema.
The tables which have been created are listed here.
- cdc.captured_columns – This table returns result for list of captured column.
- cdc.change_tables – This table returns list of all the tables which are enabled for capture.
- cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
- cdc.index_columns – This table contains indexes associated with change table.
- cdc.lsn_time_mapping – This table maps LSN number and time.
- dbo.cdc_jobs – This table stores configuration parameters for capture and cleanup jobs and is the only system table created in the msdb database.
Enabling Change Data Capture on Database Table
The CDC feature can be applied at the table-level to any database for which CDC is enabled. It has to be enabled for any table which needs to be tracked. First run the following query to show which tables of database have already been enabled for CDC.
USE [Allens]
GO
SELECT [name], is_tracked_by_cdc
FROM sys.tables
GO
The above query will return a result that includes a column with the table name, along with a column which displays if CDC is enabled or not.
You can run the following stored procedure to enable each table. Before enabling CDC at the table level, make sure that you have enabled SQL Server Agent. When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database, and execute using SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.
Following script will enable CDC on dbo.TAT_TIME table.
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo' --mandatory
, @source_name = 'TAT_TIME' --mandatory
, @role_name = 'cdc_test' --mandatory
, @supports_net_changes = 1
, @index_name = N'TAT_TIME6'
, @captured_column_list = N'TIME_UNO,SOURCE,ENTRY_STATUS,WIP_STATUS,TAXABLE,LAST_MODIFIED'
, @filegroup_name = N'PRIMARY';
GO
When the feature on the table is successfully enabled, the following messages are shown.
Job ‘cdc.Allens_capture’ started successfully.
Job ‘cdc. Allens_cleanup’ started successfully.
Let’s discuss about the parameters of the sys.sp_cdc_enable_table stored procedure.
By default, all columns in the table are tracked. If you want to track only the specific ones, use the @captured_column_list parameter.
The @filegroup_name parameter can be used to change the default location of the change tables. By default, the change table is located in the default file group of the database. Database owners who want to control the placement of individual change tables can use the @filegroup_name parameter to specify a particular file group for the change table associated with the capture instance. The named file group must already exist.
By default, all members of the sysadmin and db_owner roles have full access the captured records. To limit access to the captured change data, create a new role that provides necessary permissions on the captured information, and use the @role_name parameter to grant the permissions only to the role members
When the @role_name parameter is set to NULL, only members of sysadmin and db_owner roles have full access to captured information. When set to a specific role, only the members of the role (called a gating role) can access the changed data table.
The @supports_net_changes parameter enables to show multiple changes aggregated as a single one. This parameter can be used only on tables that have a primary key or unique index. For instance, if stock market prices are tracked per stock and the price column is updated several times a day, the net change would result in only one change row per stock with the final price in the data.
When the CDC feature is enabled on the table, a capture table and up to two query functions are automatically created. For the dbo.TAT_TIME table, these are the cdc.dbo_TAT_TIME_CT table, and cdc.fn_cdc_get_all_changes_dbo_TAT_TIME and cdc.fn_cdc_get_net_changes_dbo_TAT_TIME table valued functions. The latter one is created only when the @supports_net_changes parameter is set to 1. These functions are used to query change tables.
Change Table
By default, the name of the change data capture change table is of the source table. Its associated change table is named by appending _CT to the capture instance name.
The first five columns of a change data capture change table are metadata columns. These provide additional information that is relevant to the recorded change. The remaining columns mirror the identified captured columns from the source table in name and, typically, in type. These columns hold the captured column data that is gathered from the source table.
Each insert or delete operation that is applied to a source table appears as a single row within the change table. The data columns of the row that results from an insert operation contain the column values after the insert. The data columns of the row that results from a delete operation contain the column values before the delete. An update operation requires one row entry to identify the column values before the update, and a second row entry to identify the column values after the update.
Each row in a change table also contains additional metadata to allow interpretation of the change activity. The column _$start_lsn identifies the commit log sequence number (LSN) that was assigned to the change. The commit LSN both identifies changes that were committed within the same transaction, and orders those transactions. The column _$seqval can be used to order more changes that occur in the same transaction. The column __$operation records the DML operation that is associated with the change: 1 = delete, 2 = insert, 3 = update (before image), and 4 = update (after image). The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns. The column __$command_id tracks the order of operations within a transaction.
References
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-db-transact-sql
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-table-transact-sql
https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-capture-instance-ct-transact-sql