In this article, we will analyze the records stored in change tables and describe the methods to read them.
The System Table Valued Functions
The change information in SQL Server Change Data Capture is available through table valued functions. To query our change table [cdc].[dbo_TAT_TIME_CT], use the [cdc].[fn_cdc_get_all_changes] and [cdc.fn_cdc_get_net_changes] table valued functions.
cdc.fn_cdc_get_all_changes_<capture_instance> – returns a row for each change in the source table that belongs to the Log Sequence Number in the range specified by the input parameters.
cdc.fn_cdc_get_all_changes_capture_instance(from_lsn, to_lsn, ‘<row_filter_option>’)
The <row_filter_option> parameter affects only the UPDATEs. It can have the following values:
All
Returns all changes within the specified LSN range. For changes due to an update operation, this option only returns the row containing the new values after the update is applied.
All Update Old
Returns all changes within the specified LSN range. For changes due to an update operation, this option returns both the row containing the column values before the update and the row containing the column values after the update.
cdc.fn_cdc_get_net_changes_<capture_instance> – returns one row that represents multiple changes on a single row aggregated as a single one.
For example, if a column was first updated from ’5 Corinthian Dr’ to ‘8 Donovan St,’ and then to ‘2 Woodward Rd’, the cdc.fn_cdc_get_all_changes function returns all 3 transactions. While the cdc.fn_cdc_get_net_changes function returns only the latest one.
cdc.fn_cdc_get_net_changes_capture_instance( from_lsn , to_lsn , ‘<row_filter_option>’ )
The <row_filter_option> parameter can have the following values:
All
Returns the LSN of the final change, the __$update_mask column is always NULL.
All With Mask
Returns the LSN of the final change, the __$update_mask column shows the IDs of the modified columns.
All With Merge
Returns the LSN of the final change. The __$operation value is 1 for a delete, 5 when the net operation is an insert or an update. The __$update_mask column is always NULL.
Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions require two parameters – the maximal and minimal Log Sequence Number (LSN) for the queried set of records.
To determine the Log Sequence Number (LSN) that can be used in functions the following functions are available.
sys.fn_cdc_get_min_lsn – “Returns the start_lsn column value for the specified capture instance from the cdc.change_tables system table. This value represents the low endpoint of the validity interval for the capture instance.
SELECT sys.fn_cdc_get_min_lsn('dbo_TAT_TIME_CT') AS min_lsn
sys.fn_cdc_get_max_lsn – Returns the maximum Log Sequence Number, the high endpoint of the validity interval for all source tables. No parameters are required.
SELECT sys.fn_cdc_get_max_lsn() AS max_lsn
sys.fn_cdc_get_column_ordinal – Returns the ordinal of the column in a source table. These ordinals are used in change tables to reference a specific column.
SELECT sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME','SOURCE')
Returns 2, as SOURCE is the second column in the [cdc].[dbo_TAT_TIME_CT] table.
Reading the Change Table Records
To read the change tables, MSDN doesn’t recommend direct querying of the table, but using the system functions instead. To read all captured information for the [dbo].[TAT_TIME] table, execute:
DECLARE @from_lsn binary (10), @to_lsn binary (10) SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_TAT_TIME_CT') SET @to_lsn = sys.fn_cdc_get_max_lsn() SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all') ORDER BY __$seqval SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all')
Retrieve Captured Data of Specific Time Frame
Quite often, one is asked for data to be tracked over a time interval. If you look at the tracking data there is apparently no time captured at all. However, there are few fields which can definitely help us out i.e. _$start_lsn. LSN stands for Last Sequence Number. Every record in transaction log is uniquely identified by a LSN. They are always incrementing numbers.
LSN numbers are always associated with time and their mapping can be found after querying system table cdc.lsn_time_mapping. This table is one of the tables which was created when the database was enabled for CDC.
We can find the time that corresponds to the LSN by using the system function sys.fn_cdc_map_time_to_lsn. If we want all the changes done yesterday, we can run this function as described below and it will return all the rows from yesterday.
DECLARE @begin_time DATETIME, @end_time DATETIME, @begin_lsn BINARY(10), @end_lsn BINARY(10); SELECT @begin_time = GETDATE()-1, @end_time = GETDATE(); SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time); SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time); SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@begin_lsn,@end_lsn,'all') SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_TAT_TIME(@begin_lsn, @end_lsn, 'all');
We have used relational operations in the function sys.fn_cdc_map_time_to_lsn. There can be total of four different relational operations available to use in that function:
- largest less than·
- largest less than or equal·
- smallest greater than·
- smallest greater than or equal
This way captured data can be queried very easily and query based on time interval.
Understanding Update Mask
As shown, Change Data Capture shows the exact values of all tracked columns in the modified rows, even if the column itself was not updated. Let’s see how to check whether a column has been changed or not.
The column __$update_mask is a variable bit mask with one defined bit for each captured column. For insert and delete entries, the update mask will always have all bits set. Update rows, however, will only have those bits set that correspond to changed columns. Let’s execute the following UPDATE statement.
USE [Allens] GO UPDATE [dbo].[TAT_TIME] SET SOURCE = 'BS', LAST_MODIFIED = GETDATE() WHERE TIME_UNO = 26515750
Following code is an example of how to use the fn_cdc_get_column_ordinal and __$update_mask to check whether a column has been changed or not.
DECLARE @begin_time datetime DECLARE @end_time datetime DECLARE @from_lsn binary(10) DECLARE @to_lsn binary(10); SET @begin_time = GETDATE()-1; SET @end_time = GETDATE(); SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time); SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time); DECLARE @SOURCEPosition INT DECLARE @ENTRY_STATUSPosition INT DECLARE @WIP_STATUSPosition INT DECLARE @TAXABLEPosition INT DECLARE @LAST_MODIFIEDPosition INT SET @SOURCEPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME', 'SOURCE') SET @ENTRY_STATUSPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME', 'ENTRY_STATUS') SET @WIP_STATUSPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME', 'WIP_STATUS') SET @TAXABLEPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME', 'TAXABLE') SET @LAST_MODIFIEDPosition = sys.fn_cdc_get_column_ordinal('dbo_TAT_TIME', 'LAST_MODIFIED') SELECT *, sys.fn_cdc_is_bit_set(@SOURCEPosition, __$update_mask) AS 'UpdatedSOURCE' ,sys.fn_cdc_is_bit_set(@ENTRY_STATUSPosition, __$update_mask) AS 'UpdatedENTRY_STATUS' ,sys.fn_cdc_is_bit_set(@WIP_STATUSPosition, __$update_mask) AS 'UpdatedWIP_STATUS' ,sys.fn_cdc_is_bit_set(@TAXABLEPosition, __$update_mask) AS 'UpdatedTAXABLE' ,sys.fn_cdc_is_bit_set(@LAST_MODIFIEDPosition, __$update_mask) AS 'UpdatedLAST_MODIFIED' FROM cdc.fn_cdc_get_all_changes_dbo_TAT_TIME(@from_lsn, @to_lsn, 'all');
In our example we had an UPDATE on only two columns SOURCE and LAST_MODIFIED – the second and sixth column. This is represented with 0x22 in hexadecimal value (100010 in binary). Here, this value stands for second and sixth value if you look at it from the right, as a bitmap. This is a useful way of finding out which columns are being updated or changed.
Let’s see a shortcut using sys.fn_cdc_has_column_changed scalar valued function.
SELECT *, sys.fn_cdc_has_column_changed('dbo_TAT_TIME','SOURCE',__$update_mask) AS 'UpdatedSOURCE' ,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','ENTRY_STATUS',__$update_mask) AS 'UpdatedENTRY_STATUS' ,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','WIP_STATUS',__$update_mask) AS 'UpdatedWIP_STATUS' ,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','TAXABLE',__$update_mask) AS 'UpdatedTAXABLE' ,sys.fn_cdc_has_column_changed('dbo_TAT_TIME','LAST_MODIFIED',__$update_mask) AS 'UpdatedLAST_MODIFIED' FROM [cdc].[fn_cdc_get_all_changes_dbo_TAT_TIME] ((sys.fn_cdc_get_min_lsn('dbo_TAT_TIME')), (sys.fn_cdc_get_max_lsn()), N'all');
But Microsoft strictly recommends that you should use the functions sys.fn_cdc_get_column_ordinal and sys.fn_cdc_is_bit_set instead of sys.fn_cdc_has_column_changed function to find out which columns are being updated or changed.