Understanding Real-Time Intelligence CDC connector for PostgreSQL database
Coauthor: Aazathraj, Chief Data Architect, Apollo Hospitals
Real-Time Intelligence in Microsoft Fabric provides multiple database change data capture connectors including SQL database, MySQL, PostgreSQL, and Cosmos DB, which allows anyone to easily react and take actions on database changes in real-time. Each of the databases works differently when it comes to enabling CDC, giving permission to the users and how they replicate database changes to Eventstream. This blog outlines some important points to keep in mind when using PostgreSQL database CDC in Eventstream.
PostgreSQL CDC connector is based on Debezium connector for PostgreSQL.
Setup of this CDC source in Eventstream is covered in the ‘Add PostgreSQL Database CDC source to an Eventstream‘ documentation.
There are important considerations on PostgreSQL database, and Debezium properties that users have to keep in mind when using this connector source
Enable logical decoding
- PostgreSQL database needs to have logical decoding enabled and WAL_LEVEL set to logical. Eventstream connector uses “pgoutput” output plugin for Write Ahead Logs. You can check current WAL level by running the following code in PostgreSQL database.
SELECT WAL_LEVEL;
User permissions
- The user that needs to be configured for CDC in Eventstream needs to have a minimum of replication permissions in database. Run the following command in PostgreSQL database:
ALTER ROLE <user> WITH REPLICATION;
- For working with publication, the user needs to have admin/super user permission. If the user cannot be provided with admin/super user permissions, then we need to consider setting the property publication.autocreate.mode of the Debezium connector to disabled. The default option of this property is filtered in Eventstream connector, which requires admin permission.
Publication
- PostgreSQL works with the concept of publications to which CDC records are published. By default, the connector always tries to create a publication in the database with the name ‘dbz_publication’. A super user/admin on the DB will by default have permissions to create/alter a publication. This also means that if multiple connectors are being used, the publication will be overwritten by later created connectors and earlier created connectors will not receive CDC events. Because they use the same publication name, connectors created later will update their filtered table list on the publication.
- In order to avoid giving higher permission to users or connector trying to create its own publication, we can set the property publication.autocreate.mode to disabled.
- Setting this property to disabled means that the connector will not try to create “dbz_publication” and will look for an existing publication. So, make sure publications are precreated in the database.
- Precreating publications in the database will provide full control to DBAs to add tables that will be attached to each publication from which CDC is pulled.
- A database admin can create a publication and attach tables to it by running the following command:
CREATE PUBLICATION pub1 FOR TABLE table1, table2;
Replication slot
- Eventstream UI provides an optional placeholder for providing the slot name. It is advisable to mandatorily provide a replication slot name to control which slots are being consumed by Eventstream connector and other replication channels in the database.
- You can create a replication slot in PostgreSQL database by running the following command.
SELECT pg_create_logical_replication_slot('test_slot_v1', 'pgoutput');
Historical snapshot
- After the initial setup, the connector initially brings historical snapshot of all the tables included in Eventstream UI.

- Important consideration for large Transactional DBs – instead of All tables, users can logically split the table by using multiple connectors in Eventstream. They can enter a comma-separated list of tables instead of All tables or group them at a table or schema level by using regex expression. Table list field has to be used cautiously because it has character limitation of 128 with regex or 100 tables comma-separated list.
- It is important to select only the tables where entire historical load is needed. Otherwise, we will end up getting huge amount of data into Eventstream. This means that it will take time to load into destination, CDC does not start until snapshot is completed and backlog might keep increasing which will result in data traffic on Eventstream.
- In case of issues during snapshot on large tables, because of timeout in source database, run the following command in PostgreSQL database:
ALTER ROLE <user> SET idle_in_transaction_session_timeout='0';
- Another important consideration is that the throughput of CDC connector and Eventstream are different. The configuration in Eventstream UI only changes the throughput of the Eventstream. Throughput numbers are outlined in the Connector throughput documentation.
- Remember that snapshot and CDC work differently. Snapshot is a simple select statement on each table and does not directly depend on WAL. This means that there is no log sequence number/watermark maintained during a snapshot. Any issues during the snapshot will restart the connector and restart the snapshot, resulting in duplicates in database.
- Issues in snapshot can happen because of table permissions issue, large tables, timeouts, datatype conversions or Eventstream throttling. It is important to plan these before starting snapshot.
CDC
- After snapshot, immediately the connector will go into CDC mode where publication and above-mentioned points about user permissions needs to be considered.
- To avoid conflict of publications and slots, each of the source Debezium connectors can also be started by explicitly mentioning both – publication.name and slot.name properties.
- Providing above properties along with publication.autocreate.mode = disabled means that CDC records flow smoothly without user permission denied issues if admin or super user is not being used.
- CDC records have a particular schema of the payload – they contain before and after key value pairs of database table columns and rows. There is an option we can enable to make sure before and after is always populated via CDC which makes it easy for updates, inserts and deletes in downstream applications. This will also help tables without primary key.
- Run the following command in PostgreSQL database:
ALTER TABLE <user> REPLICA IDENTITY FULL;
Data type handling
- By default, Debezium connector uses decimal, numeric data type to binary conversion. This is not usable for most customers and might need extra steps to convert these numbers back into usable format.
- This can be handled by setting the decimal.handling.mode property to double.
- Without setting the property, a column which is decimal, let’s say ‘UnitPrice’ will show up in Eventstream as binary Ex: UnitPrice: “RnLs=”.
- If the property is set to double in the connector, then the value will show up properly Ex: UnitPrice: 63.90.
Conclusion
Eventstream CDC connector removes the overhead of maintaining and running huge infrastructure to move database data into Microsoft Fabric. Since this is a managed connector, there is no complex code to be written or multiple tools to setup. The connector will switch from historical load to streaming CDC on its own which makes it good option for customers planning to derive actionable insights on database changes.
If users can make sure above configurations of the connector are used with right values, both snapshot and CDC will work seamlessly even when working large database tables. While currently, some of these configurations are not exposed in the UI, please be sure to check with the Microsoft team if the changes can be made on the connector backend. Most of the above options will eventually be exposed in the UI which gives complete control to the user to change the configuration.
Next Steps:
- Join the conversation on the Fabric Community.
- Learn more in the Fabric Documentation.
- Give us your feedback on Fabric Ideas.
- For more information, email rticat@microsoft.com.