Disable Change Data Capture (CDC)

This topic describes how to disable change data capture (CDC) for a database and a table.

Disable Change Data Capture for a Database

A member of the sysadmin fixed server role can run the stored procedure sys.sp_cdc_disable_db (Transact-SQL) in the database context to disable change data capture for a database. It is not necessary to disable individual tables before you disable the database. Disabling the database removes all associated change data capture metadata, functions, tables related to CDC including the cdc user and schema and the change data capture jobs. However, any gating roles created by change data capture will not be removed automatically and must be explicitly deleted. To determine if a database is enabled, query the is_cdc_enabled column in the sys.databases catalog view.

If a change data capture enabled database is dropped, change data capture jobs are automatically removed.

USE master
GO
SELECT [name], database_id, is_cdc_enabled 
FROM sys.databases   
GO

GO
USE [Allens]
GO
EXEC sys.sp_cdc_disable_db                   
GO

Disable Change Data Capture for a Table

Members of the db_owner fixed database role can remove a capture instance for individual source tables by using the stored procedure sys.sp_cdc_disable_table. To determine whether a source table is currently enabled for change data capture, examine the is_tracked_by_cdc column in the sys.tables catalog view. If there are no tables enabled for the database after the disabling takes place, the change data capture jobs are also removed.

Once Change Data Capture is disabled on any table, it drops the change data capture table as well all the functions which were associated with them. It also deletes all the rows and data associated with this feature from all the system tables and changes relevant data in catalog views.

If a change data capture-enabled table is dropped, change data capture metadata that is associated with the table is automatically removed.

 

USE [Allens]
GO
SELECT [name], is_tracked_by_cdc 
FROM sys.tables
GO

USE [Allens]
GO
EXEC sys.sp_cdc_disable_table         
@source_schema = 'dbo' --mandatory
, @source_name = 'TAT_TIME' --mandatory
, @capture_instance = N'dbo_TAT_TIME'
GO 

References

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-disable-db-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-disable-table-transact-sql

Recommendations, Tips and Tricks for Change Data Capture (CDC)

Change data capture (CDC) provides an easy and high-performing way to capture changes in a set of tables. In this article we offer useful examples while pointing out some of the pitfalls that we have come across while working with the CDC. There are many ways to tune the performance of change data capture. Understanding workload characteristics, system I/O usage, and allowable latency is key to tuning change data capture performance without negatively impacting the base workload. Tuning the scan job parameters, sys.sp_cdc_enable_table parameters can significantly improve change data capture performance under load.

Change Data Capture Parameters

Recommendation – 1

If change data capture with default parameters cannot keep up with the workload and latency becomes too high, you can increase maxscans and/or maxtrans by a factor of 10, or you can reduce pollinginterval to 1. If your latency decreases but is still too high, you can further increase maxtrans, but monitor the performance of your workload, latency and performance of queries to the change tables closely. On a system where change data capture can fully keep up with the workload, latency should not be significantly higher than the polling interval, and it should not increase over time.

Recommendation – 2

Always limit the list of columns captured by change data capture to only the columns you really need to track by specifying the @captured_column_list parameter in sys.sp_cdc_enable_table. When enabling CDC on a table, consider what data elements you are actually interested in capturing the changes for over time. By specifying just the columns you need you can improve performance.

We have found that the decreased number of columns captured made a huge difference in overall change data capture performance.

Recommendation – 3

The @supports_net_changes parameter in sys.sp_cdc_enable_table can have significant influence on change data capture performance. When @supports_net_changes is set to 1, an additional non-clustered index will be created on the change table and the net changes query function will be created. Because this index needs to be maintained, we found that enabling net changes can have negative impact on change data capture performance. Especially if change data capture is just able to keep up with a workload, the additional load that is incurred by maintaining the additional index to support net changes queries can be enough to prevent change data capture from keeping up with the workload.

If you do not require support for net changes, set @supports_net_changes to 0. If you do require querying for net changes but change data capture latency grows too big, it can be worthwhile to turn support for net changes off and do the net change detection later in a staging database.

Workload Behavior

Recommendation – 1

For planning a change data capture solution, workloads characteristics are very important. The main factors to consider are INSERT/DELETE vs. UPDATE, and whether the DML operations impact one row per transaction vs. many.

For change data capture, one INSERT only creates one row in the change table. One UPDATE, however, inserts two rows in the change table.

A common scenario in applications is that a row is inserted to a table and then immediately updated to fill missing fields. This can happen in the application or by utilizing an INSERT trigger that looks for missing fields and then updates them if necessary. This will result in three rows instead of one row that needed to be written to the change table for each insert to the original table.

The bottom line is that try to avoid scenarios where a row needs to be updated immediately after insert. And also try to avoid using change data capture to capture changes to tables that have frequent large update transactions.

Cleanup

Recommendation – 1

The cleanup job is initiated by running the parameter less stored procedure sp_MScdc_cleanup_job. This stored procedure starts by extracting the configured retention and threshold values for the cleanup job from msdb.dbo.cdc_jobs. The configurable threshold value derives the maximum number of delete entries that can be deleted using a single statement on clean-up.

Notice that especially with big thresholds, a lock escalation on change tables can happen. This could degrade the application response time, and it could increase the latency of the change data capture scan job. If possible, run cleanup when there is no other workload active or during the off-peak time. Test increasing the threshold parameter until you find a sweet spot for your workload.

Storage

Recommendation – 1

One of the most important things to watch out for with change data capture is the transaction log I/O subsystem. As stated earlier, log file I/O significantly grows when change data capture is enabled in a database. In addition to that, log records stay active until change data capture has processed them. This means that especially in environments where a large latency builds up, the log file can grow significantly because the log space cannot be reused as long as the change data capture scan job has not processed the log records, even in simple recovery model, or even after a log backup in full recovery model.

It should be noted that change data capture works with all recovery models. But when change data capture is enabled, operations that would normally be minimally logged in simple or bulk-logged recovery models are fully logged to enable change data capture to capture all changes.

Be aware also that when a log disk becomes full, you cannot shrink the log file by backing it up and manually shrinking it until change data capture has processed all transactions. But change data capture cannot process the transactions when the log disk is full, because change data capture writes to change tables are logged operations. In this case, the easiest way to recover from this situation is to temporarily add another log file on a different disk.

When planning change data capture architecture, take a significant increase in log size and log volume I/O operations into account. Depending on the amount of data captured by change data capture and the time change data capture needs to catch up with changes, the log file size can grow to 200-300% of the original size, in some cases even more. Size the log file accordingly. Make sure that the growth in log file size does not result in a completely full log disk.

Recommendation – 2

In general there will be no difference in having the change table on the same file group as the application table and having them on different file groups. This might be different when the data files are located on an I/O subsystem that is already under heavy load. In this case, putting change tables in a file group that is located on a different set of physical disks can improve change data capture performance.

To keep the PRIMARY file group small and to have a clear distinction between the application data and the change data, you should consider specifying @filegroup_name in sys.sp_cdc_enable_table as a different file group than PRIMARY.

Recommendation – 3

In addition to the change tables, the table cdc.lsn_time_mapping can also grow to a significant size and become a target of many I/O operations. This table is created on the default file group when sys.sp_cdc_enble_db is executed on a database.

Consider changing the default file group for the database before you execute sys.sp_cdc_enble_db, so that change data capture metadata and especially cdc.lsn_time_mapping are located on a different file group than PRIMARY. You can change the default file group back after the change data capture metadata tables are created.

Change data capture provides an easy and high-performing way to capture changes in a set of tables. There are many ways to tune the performance of change data capture. Understanding workload characteristics, system I/O usage, and allowable latency is key to tuning change data capture performance without negatively impacting the base workload. Tuning the scan job parameters, sys.sp_cdc_enable_table parameters can significantly improve change data capture performance under load.

References

https://technet.microsoft.com/en-us/library/bb522475(v=sql.105).aspx

https://technet.microsoft.com/en-us/library/cc645885(v=sql.105).aspx

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-cdc-jobs-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-lsn-time-mapping-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-db-transact-sql

Monitor Change Data Capture (CDC)

This article describes how to monitor the change data capture process.

Monitoring the change data capture process lets you determine if changes are being written correctly and with a reasonable latency to the change tables. Monitoring can also help you to identify any errors that might occur. SQL Server includes two dynamic management views to help you monitor change data capture: sys.dm_cdc_log_scan_sessions and sys.dm_cdc_errors.

Identifying Sessions with Empty Result Sets

Every row in sys.dm_cdc_log_scan_sessions represents a log scan session (except the row with an ID of 0). A log scan session is equivalent to one execution of sp_cdc_scan. During a session, the scan can either return changes or return an empty result. If the result set is empty, the empty_scan_count column in sys.dm_cdc_log_scan_sessions is set to 1. If there are consecutive empty result sets, such as if the capture job is running continuously, the empty_scan_count in the last existing row is incremented. For example, if sys.dm_cdc_log_scan_sessions already contains 10 rows for scans that returned changes and there are five empty results in a row, the view contains 11 rows. The last row has a value of 5 in the empty_scan_count column. To determine sessions that had an empty scan, run the following query:

SELECT * FROM sys.dm_cdc_log_scan_sessions WHERE empty_scan_count <> 0

Determining Latency

The performance of change data capture itself is determined by the difference between the time when the original transaction happened in the database and the time the change record appears in the change table. This time is called latency. On a system where change data capture can fully keep up with the workload, latency should not be significantly higher than the polling interval, and it should not increase over time.

The latency can be determined using the dynamic management view sys.dm_cdc_log_scan_sessions. This dynamic management view can be used to determine how long the scan sessions took, how many commands and transactions were processed, and how big the latency is. The sys.dm_cdc_log_scan_sessions dynamic management view keeps track of only the last few scan sessions.

The sys.dm_cdc_log_scan_sessions management view includes a column that records the latency for each capture session. The latency column is populated only for active sessions. For sessions with a value greater than 0 in the empty_scan_count column, the latency column is set to 0. The following query returns the average latency for the most recent sessions:

SELECT latency FROM sys.dm_cdc_log_scan_sessions WHERE session_id = 0
m1

You can use latency data to determine how fast or slow the capture process is processing transactions. This data is most useful when the capture process is running continuously. If the capture process is running on a schedule, latency can be high because of the lag between transactions being committed on the source table and the capture process running at its scheduled time.

Determining Throughput

Another important measure of capture process efficiency is throughput. This is the average number of commands per second that are processed during each session. To determine the throughput of a session, divide the value in the command_count column by the value in the duration column in the dynamic management view sys.dm_cdc_log_scan_sessions. The following query returns the average throughput for the most recent sessions:

SELECT command_count/duration AS [Throughput] FROM sys.dm_cdc_log_scan_sessions WHERE session_id = 0

The sys.dm_cdc_log_scan_sessions dynamic management view keeps track of only the last few scan sessions. The values in this dynamic management view are reset whenever the instance of SQL Server is started. By using the SQL Server Extended Events we can capture the sqlserver.cdc_session event. The extended event sqlserver.cdc_session contains the same information as the sys.dm_cdc_log_scan_sessions dynamic management view, but it enabled us to record every scan session.

References

https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/change-data-capture-sys-dm-cdc-log-scan-sessions

https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/change-data-capture-sys-dm-cdc-errors

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-scan-transact-sql

Administer Change Data Capture

This article describes how to administer change data capture.

Capture Job

The capture job is initiated by running the parameter less stored procedure sp_MScdc_capture_job. This stored procedure starts by extracting the configured values for maxtrans, maxscans, continuous, and pollinginterval for the capture job from msdb.dbo.cdc_jobs. These configured values are then passed as parameters to the stored procedure sp_cdc_scan. This is used to invoke sp_replcmds to perform the log scan.

Capture Job Customization

There are four parameters in the stored procedure sys.sp_cdc_scan that determine the behaviour of the capture job.

The first parameter is continuous (default value 1). It determines whether the capture job runs continuously (value 1) or exits after one scan phase (one shot mode, value 0). One shot mode is recommended for testing only, not for production use. The main reason for this is that the log records stay active until change data capture has processed them. So, the log will keep growing when the scan job is not running.

The other three parameters determine how often and how many transactions are read from the log and inserted to the change tables.

The parameter maxtrans (default value 500) determines how many transactions are read from the log and written to the change tables. This write is done in one transaction. The parameter maxscans (the default value is 10) determines how many of these scan cycles are attempted before the job is ended (continuous = 0) or before pausing for an interval (continuous=1). The length of this pause interval is set by the parameter pollinginterval (in seconds, with a default value of 5 seconds). WAITFOR is executed when a scan cycle drains the log completely or when maxscans scan cycles are completed.

a1

The capture job parameters can be changed by using the stored procedure sys.sp_cdc_change_job as shown below.

EXECUTE sys.sp_cdc_change_job  
    @job_type = N'capture', 
    @maxtrans = 1000, 
    @maxscans = 15; 
GO

Any changes made to the values by using sys.sp_cdc_change_job will not take effect until the job is stopped and restarted. Because these parameters are only read during the initialization of the capture job.

Two additional stored procedures are provided to allow the change data capture agent jobs to be started and stopped: sys.sp_cdc_start_job and sys.sp_cdc_stop_job.

To increase the number of transactions per second change data capture can handle and to reduce latency you can do one of the following three things:

  • Increase the number of transactions per scan by increasing maxtrans.
  • Increase the number of scans before a pause by increasing maxscans.
  • Decrease the pause between scanning cycles by decreasing pollinginterval.

For the capture job, we can apply additional logic to determine whether a new scan begins immediately or whether a sleep is imposed before it starts a new scan instead of relying on a fixed polling interval. The choice could be based merely on time of the day, perhaps enforcing very long sleeps during peak activity times, and even moving to a polling interval of 0 at close of day when it is important to complete the days processing and prepare for nightly runs. Capture process progress could also be monitored to determine when all transactions committed by mid-night had been scanned and deposited in change tables. This lets the capture job end, to be restarted by a scheduled daily restart. By replacing the delivered job step calling sp_cdc_scan with a call to a user written wrapper for sp_cdc_scan, highly customized behaviour can be obtained with little additional effort.

Cleanup Job

Change data capture uses a retention based cleanup strategy to manage change table size. A single cleanup job handles cleanup for all database change tables and applies the same retention value to all defined capture instances.

The cleanup job is initiated by running the parameter less stored procedure sp_MScdc_cleanup_job. This stored procedure starts by extracting the configured retention and threshold values for the cleanup job from msdb.dbo.cdc_jobs. The retention value is used to compute a new low watermark for the change tables. The specified number of minutes is subtracted from the maximum tran_end_time value from the cdc.lsn_time_mapping table to obtain the new low water mark expressed as a datetime value. The cdc.lsn_time_mapping table is then used to convert this datetime value to a corresponding LSN value. If the same commit time is shared by multiple entries in the table, the LSN that corresponds to the entry that has the smallest LSN is chosen as the new low watermark. This LSN value is passed to sp_cdc_cleanup_change_table to remove change table entries from the database change tables. The configurable threshold value derives the maximum number of delete entries that can be deleted using a single statement on clean-up. Increase the threshold parameter until you find a sweet spot for your workload.

The cleanup job parameters can be changed by using the stored procedure sys.sp_cdc_change_job as shown below.

EXECUTE sys.sp_cdc_change_job  
    @job_type = N'cleanup', 
    @retention = 2880,
       @threshold = 50000;
GO

References

https://technet.microsoft.com/en-us/library/cc645591(v=sql.105).aspx

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-cdc-jobs-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-scan-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-replcmds-transact-sql

https://technet.microsoft.com/en-us/library/cc645885(v=sql.105).aspx

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-cleanup-change-table-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-change-job-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-start-job-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-stop-job-transact-sql

https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-lsn-time-mapping-transact-sql