How to Analyze and Read Change Data Capture (CDC) Records

In this article, we will analyze the records stored in change tables and describe the methods to read them.

The System Table Valued Functions

The change information in SQL Server Change Data Capture is available through table valued functions. To query our change table [cdc].[dbo_TAT_TIME_CT], use the [cdc].[fn_cdc_get_all_changes] and [cdc.fn_cdc_get_net_changes] table valued functions.

1

cdc.fn_cdc_get_all_changes_<capture_instance> – returns a row for each change in the source table that belongs to the Log Sequence Number in the range specified by the input parameters.

cdc.fn_cdc_get_all_changes_capture_instance(from_lsn, to_lsn, ‘<row_filter_option>’)

The <row_filter_option> parameter affects only the UPDATEs. It can have the following values:

All

Returns all changes within the specified LSN range. For changes due to an update operation, this option only returns the row containing the new values after the update is applied.

All Update Old

Returns all changes within the specified LSN range. For changes due to an update operation, this option returns both the row containing the column values before the update and the row containing the column values after the update.

cdc.fn_cdc_get_net_changes_<capture_instance>  returns one row that represents multiple changes on a single row aggregated as a single one.

For example, if a column was first updated from ’5 Corinthian Dr’ to ‘8 Donovan St,’ and then to ‘2 Woodward Rd’, the cdc.fn_cdc_get_all_changes function returns all 3 transactions. While the cdc.fn_cdc_get_net_changes function returns only the latest one.

cdc.fn_cdc_get_net_changes_capture_instance( from_lsn , to_lsn , ‘<row_filter_option>’ )

The <row_filter_option> parameter can have the following values:

All

Returns the LSN of the final change, the __$update_mask column is always NULL.

All With Mask

Returns the LSN of the final change, the __$update_mask column shows the IDs of the modified columns.

All With Merge

Returns the LSN of the final change. The __$operation value is 1 for a delete, 5 when the net operation is an insert or an update. The __$update_mask column is always NULL.

Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions require two parameters – the maximal and minimal Log Sequence Number (LSN) for the queried set of records.

To determine the Log Sequence Number (LSN) that can be used in functions the following functions are available.

sys.fn_cdc_get_min_lsn “Returns the start_lsn column value for the specified capture instance from the cdc.change_tables system table. This value represents the low endpoint of the validity interval for the capture instance.

SELECT sys.fn_cdc_get_min_lsn('dbo_TAT_TIME_CT') AS min_lsn

sys.fn_cdc_get_max_lsn – Returns the maximum Log Sequence Number, the high endpoint of the validity interval for all source tables. No parameters are required.

SELECT sys.fn_cdc_get_max_lsn() AS max_lsn

sys.fn_cdc_get_column_ordinal – Returns the ordinal of the column in a source table. These ordinals are used in change tables to reference a specific column.

SELECT sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME','SOURCE')

Returns 2, as SOURCE is the second column in the [cdc].[dbo_TAT_TIME_CT] table.

Reading the Change Table Records

To read the change tables, MSDN doesn’t recommend direct querying of the table, but using the system functions instead. To read all captured information for the [dbo].[TAT_TIME] table, execute:

DECLARE @from_lsn binary (10), @to_lsn binary (10)
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_TAT_TIME_CT')
SET @to_lsn = sys.fn_cdc_get_max_lsn()

SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all')
ORDER BY __$seqval

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all')

2

Retrieve Captured Data of Specific Time Frame

Quite often, one is asked for data to be tracked over a time interval. If you look at the       tracking data there is apparently no time captured at all. However, there are few fields which can definitely help us out i.e. _$start_lsn. LSN stands for Last Sequence  Number. Every record in transaction log is uniquely identified by a LSN. They are always incrementing numbers.

LSN numbers are always associated with time and their mapping can be found after querying system table cdc.lsn_time_mapping. This table is one of the tables which was created when the database was enabled for CDC.

We can find the time that corresponds to the LSN by using the system function sys.fn_cdc_map_time_to_lsn. If we want all the changes done yesterday, we can run this function as described below and it will return all the rows from yesterday.

DECLARE @begin_time DATETIME, @end_time DATETIME, @begin_lsn BINARY(10), @end_lsn BINARY(10);

SELECT @begin_time = GETDATE()-1, @end_time = GETDATE();
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@begin_lsn,@end_lsn,'all')

SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_TAT_TIME(@begin_lsn, @end_lsn, 'all');

3

We have used relational operations in the function sys.fn_cdc_map_time_to_lsn. There can be total of four different relational operations available to use in that function:

  • largest less than·
  • largest less than or equal·
  • smallest greater than·
  • smallest greater than or equal

This way captured data can be queried very easily and query based on time interval.

Understanding Update Mask

As shown, Change Data Capture shows the exact values of all tracked columns in the modified rows, even if the column itself was not updated. Let’s see how to check whether a column has been changed or not.

The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns. Let’s execute the following UPDATE statement.

USE [Allens]
GO
UPDATE [dbo].[TAT_TIME]
SET SOURCE = 'BS', LAST_MODIFIED = GETDATE()
WHERE TIME_UNO = 26515750

Following code is an example of how to use the fn_cdc_get_column_ordinal and __$update_mask to check whether a column has been changed or not.

DECLARE @begin_time datetime
DECLARE @end_time datetime
DECLARE @from_lsn binary(10)
DECLARE @to_lsn binary(10);
SET @begin_time = GETDATE()-1;
SET @end_time = GETDATE();
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

DECLARE @SOURCEPosition INT
DECLARE @ENTRY_STATUSPosition INT
DECLARE @WIP_STATUSPosition INT
DECLARE @TAXABLEPosition INT
DECLARE @LAST_MODIFIEDPosition INT

SET @SOURCEPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME',
'SOURCE')
SET @ENTRY_STATUSPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME',
'ENTRY_STATUS')
SET @WIP_STATUSPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME',
'WIP_STATUS')
SET @TAXABLEPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME',
'TAXABLE')
SET @LAST_MODIFIEDPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME',
'LAST_MODIFIED')

SELECT *, sys.fn_cdc_is_bit_set(@SOURCEPosition, __$update_mask) AS
'UpdatedSOURCE'
,sys.fn_cdc_is_bit_set(@ENTRY_STATUSPosition, __$update_mask) AS
'UpdatedENTRY_STATUS'
,sys.fn_cdc_is_bit_set(@WIP_STATUSPosition, __$update_mask) AS
'UpdatedWIP_STATUS'
,sys.fn_cdc_is_bit_set(@TAXABLEPosition, __$update_mask) AS
'UpdatedTAXABLE'
,sys.fn_cdc_is_bit_set(@LAST_MODIFIEDPosition, __$update_mask) AS
'UpdatedLAST_MODIFIED'
FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all');

4

In our example we had an UPDATE on only two columns SOURCE and LAST_MODIFIED – the second and sixth column. This is represented with 0x22 in hexadecimal value (100010 in binary). Here, this value stands for second and sixth value if you look at it from the right, as a bitmap. This is a useful way of finding out which columns are being updated or changed.

Let’s see a shortcut using sys.fn_cdc_has_column_changed scalar valued function.

SELECT  *, sys.fn_cdc_has_column_changed('dbo_TAT_TIME','SOURCE',__$update_mask) AS
'UpdatedSOURCE'
,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','ENTRY_STATUS',__$update_mask) AS
'UpdatedENTRY_STATUS'
,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','WIP_STATUS',__$update_mask) AS
'UpdatedWIP_STATUS'
,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','TAXABLE',__$update_mask) AS
'UpdatedTAXABLE'
,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','LAST_MODIFIED',__$update_mask) AS
'UpdatedLAST_MODIFIED'
FROM [cdc].[fn_cdc_get_all_changes_dbo_TAT_TIME]
((sys.fn_cdc_get_min_lsn('dbo_TAT_TIME')), (sys.fn_cdc_get_max_lsn()), N'all');

But Microsoft strictly recommends that you should use the functions sys.fn_cdc_get_column_ordinal and sys.fn_cdc_is_bit_set instead of sys.fn_cdc_has_column_changed function to find out which columns are being updated or changed.

References

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/cdc-fn-cdc-get-all-changes-capture-instance-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/cdc-fn-cdc-get-net-changes-capture-instance-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-get-min-lsn-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-get-max-lsn-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-get-column-ordinal-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-lsn-time-mapping-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-map-time-to-lsn-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-map-lsn-to-time-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-has-column-changed-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-is-bit-set-transact-sql

Change Data Capture (CDC) SQL Agent Jobs

When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database, and execute using SQL Server Agent. Before enabling CDC at the table level, make sure that you have enabled SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.

As we are using Allens database, it creates the jobs with following names. The jobs are created when the first table of the database is enabled for change data capture.

1

Capture Job

cdc.Allens_capture is used to populate the database change tables. The capture job is in charge of capturing data changes and processing them into change tables. It runs continuously, processing a maximum of 1000 transactions per scan cycle with a wait of 5 seconds between cycles. 

When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job. The procedure sys.sp_cdc_scan is called internally by sys.sp_MScdc_capture_job.

Cleanup Job

cdc.Allens_cleanup is responsible for change table clean-up. It runs daily at 2 A.M. It retains change table entries for 4320 minutes or 3 days, removing a maximum of 5000 entries with a single delete statement.

When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables.

Internally, change data capture agent jobs are created and dropped by using the stored procedures sys.sp_cdc_add_job and sys.sp_cdc_drop_job, respectively. These stored procedures are also exposed so that administrators can control the creation and removal of these jobs.

An administrator has no explicit control over the default configuration of the change data capture agent jobs. The stored procedure sys.sp_cdc_change_job is provided to allow the default configuration parameters to be modified. In addition, the stored procedure sys.sp_cdc_help_jobs allows current configuration parameters to be viewed. Both the capture job and the cleanup job extract configuration parameters from the table msdb.dbo.cdc_jobs on startup.

msdb.dbo.cdc_jobs

msdb.dbo.cdc_jobs – Stores the change data capture configuration parameters for capture and cleanup jobs. This table is stored in msdb.

2

Column Name Data Type Description
database_id int The ID of the database in which the jobs are running.
job_type nvarchar(20) The type of job, either ‘capture’ or ‘cleanup’.
job_id uniqueidentifier A unique ID associated with the job.
maxtrans int The maximum number of transactions to process in each scan cycle.
maxtrans is valid only for capture jobs.
maxscans int The maximum number of scan cycles to execute in order to extract all rows from the log.
maxscans is valid only for capture jobs.
continuous bit A flag indicating whether the capture job is to run continuously (1), or run in one-time mode (0).

continuous is valid only for capture jobs.

pollinginterval bigint The number of seconds between log scan cycles.
pollinginterval is valid only for capture jobs.
retention bigint The number of minutes that change rows are to be retained in change tables.
retention is valid only for cleanup jobs.
threshold bigint The maximum number of delete entries that can be deleted using a single statement on cleanup.

Any changes made to these values by using sys.sp_cdc_change_job will not take effect until the job is stopped and restarted.

Two additional stored procedures are provided to allow the change data capture agent jobs to be started and stopped: sys.sp_cdc_start_job and sys.sp_cdc_stop_job.

Starting and stopping the capture job does not result in a loss of change data. It only prevents the capture process from actively scanning the log for change entries to deposit in the change tables. A reasonable strategy to prevent log scanning from adding load during periods of peak demand is to stop the capture job and restart it when demand is reduced.

The change data capture agent jobs are removed when change data capture is disabled for a database.

Both SQL Server Agent jobs were designed to be flexible enough and sufficiently configurable to meet the basic needs of change data capture environments. In both cases, however, the underlying stored procedures that provide the core functionality have been exposed so that further customization is possible.

References

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-scan-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-cdc-jobs-transact-sql

 

Change Data Capture (CDC) System Tables

Once we enable CDC on the database, System Tables will be created within the database as part of the cdc schema as shown below.

7

cdc.captured_columns 

Returns one row for each column tracked in a capture instance. By default, all columns of the source table are captured. However, columns can be included or excluded when the source table is enabled for change data capture by specifying a column list.

SELECT * FROM [cdc].[captured_columns]

8

Column Name Data Type Description
object_id int ID of the source table to which the captured column belongs.
column_name sysname Name of the captured column.
column_id int ID of the captured column within the source table.
column_type sysname Type of the captured column.
column_ordinal int Column ordinal (1-based) in the change table. The metadata columns in the change table are excluded. Ordinal 1 is assigned to the first captured column.
is_computed bit Indicates that the captured column is a computed column in the source table.

We recommend that you do not query the system tables directly. Instead, execute the sys.sp_cdc_get_source_columns stored procedure.

cdc.change_tables 

Returns one row for each change table in the database. A change table is created when change data capture is enabled on a source table.

SELECT * FROM [cdc].[change_tables]

9

Column Name Data Type Description
object_id int ID of the change table. Is unique within a database.
version int Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
For SQL Server 2012, this column always returns 0.
source_object_id int ID of the source table enabled for change data capture.
capture_instance sysname Name of the capture instance used to name instance-specific tracking objects. By default, the name is derived from the source schema name plus the source table name in the format schemaname_sourcename.
start_lsn binary(10) Log sequence number (LSN) representing the low endpoint when querying for change data in the change table.
NULL = the low endpoint has not been established.
end_lsn binary(10) Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
For SQL Server 2008, this column always returns NULL.
supports_net_changes bit Support for querying for net changes is enabled for the change table.
has_drop_pending bit Capture process has received notification that the source table has been dropped.
role_name sysname Name of the database role used to gate access to change data.
NULL = a role is not used.
index_name sysname Name of the index used to uniquely identify rows in the source table. index_name is either the name of the primary key index of the source table, or the name of a unique index specified when change data capture was enabled on the source table.

NULL = source table did not have a primary key when change data capture was enabled and a unique index was not specified when change data capture was enabled.

Note: If change data capture is enabled on a table where a primary key exists, the change data capture feature uses the index regardless of whether net changes is enabled or not. After change data capture is enabled, no modification is allowed on the primary key. If there is no primary key on the table, you can still enable change data capture but only with net changes set to false. After change data capture is enabled, you can then create a primary key. You can also modify the primary key because change data capture does not use the primary key.

filegroup_name sysname Name of the filegroup in which the change table resides.
NULL = change table is in the default filegroup of the database.
create_date datetime Date that the source table was enabled.
partition_switch bit Indicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for change data capture. 0 indicates that partition switching is blocked. Non-partitioned tables always return 1.

We recommend that you do not query the system tables directly. Instead, execute the sys.sp_cdc_help_change_data_capture stored procedure.

cdc.ddl_history 

Returns one row for each data definition language (DDL) change made to tables that are enabled for change data capture. You can use this table to determine when a DDL change occurred on a source table and what the change was. Source tables that have not had DDL changes will not have entries in this table.

To accommodate a fixed column structure change table, the capture process which is responsible for populating the change table will ignore any new columns that are not identified for capture when the source table was enabled for change data capture. If a tracked column is dropped, null values will be supplied for the column in the subsequent change entries. However, if an existing column undergoes a change in its data type, the change is propagated to the change table to ensure that the capture mechanism does not introduce data loss to tracked columns. 

Let’s alter the SOURCE column in the [dbo].[TAT_TIME] Source Table.

ALTER TABLE [dbo].[TAT_TIME]
ALTER COLUMN [SOURCE] [char](4) NULL

The change is propagated to the change table [cdc].[dbo_TAT_TIME_CT] as shown below.

SELECT * FROM [cdc].[ddl_history]

14

Column Name Data Type Description
source_object_id int ID of the source table to which the DDL change was applied.
object_id int ID of the change table associated with a capture instance for the source table.
required_column_update bit Indicates that the data type of a captured column was modified in the source table. This modification altered the column in the change table.
ddl_command nvarchar(max) DDL statement applied to the source table.
ddl_lsn binary(10) Log sequence number (LSN) associated with the commitment of the DDL modification.
ddl_time datetime Date and time that the DDL change was made to the source table.

We recommend that you do not query the system tables directly. Instead, execute the sys.sp_cdc_get_ddl_history stored procedure.

cdc.index_columns 

Returns one row for each index column associated with a change table. The index columns are used by change data capture to uniquely identify rows in the source table. By default, the columns of the primary key of the source table are included. However, if a unique index on the source table is specified when change data capture is enabled on the source table, columns in that index are used instead. A primary key or unique index is required on the source table if net change tracking is enabled.

SELECT * FROM [cdc].[index_columns]

15

Column Name Data Type Description
object_id int ID of the change table.
column_name sysname Name of the index column.
index_ordinal tinyint Ordinal (1-based) of the column within the index.
column_id int ID of the column in the source table.

We recommend that you do not query the system tables directly. Instead, execute the sys.sp_cdc_help_change_data_capture stored procedure.

cdc.lsn_time_mapping 

Returns one row for each transaction having rows in a change table. This table is used to map between log sequence number (LSN) commit values and the time the transaction committed. Entries may also be logged for which there are no change tables entries. This allows the table to record the completion of LSN processing in periods of low or no change activity.

At the high end, as the capture process commits each new batch of change data, new entries are added to cdc.lsn_time_mapping for each transaction that has change table entries. Within the mapping table, both LSN and a transaction commit time (columns start_lsn and tran_end_time, respectively) are retained.

SELECT * FROM [cdc].[lsn_time_mapping]

16

Column Name Data Type Description
start_lsn binary(10) LSN of the committed transaction.
tran_begin_time datetime Time that the transaction associated with the LSN began.
tran_end_time datetime Time that the transaction ended.
tran_id varbinary(10) ID of the transaction.

We recommend that you do not query the system tables directly. Instead, execute the sys.fn_cdc_map_lsn_to_time (Transact-SQL) and sys.fn_cdc_map_time_to_lsn (Transact-SQL) system functions.

References

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-captured-columns-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-get-captured-columns-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-change-tables-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-help-change-data-capture-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-ddl-history-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-get-ddl-history-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-index-columns-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-lsn-time-mapping-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-map-lsn-to-time-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-map-time-to-lsn-transact-sql