The wal2json plugin
Convert Postgres Write-Ahead Log (WAL) changes to JSON format
The wal2json
plugin is a logical replication decoding output plugin for Postgres. It lets you convert the Write-Ahead Log (WAL) changes into JSON format, making it easier to consume and process database changes in various applications, such as data replication, auditing, event-driven services, and real-time analytics.
Try it on Neon!
Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.
This guide describes the wal2json
plugin — how to enable it in Neon, configure its output, and use it to capture and process database changes in JSON format. WAL decoding is crucial for building robust data pipelines, implementing Change Data Capture (CDC) systems, and maintaining data consistency across distributed systems.
note
The wal2json
plugin is included in your Neon project and doesn't require a separate installation.
Version availability:
The wal2json
plugin is available in all Postgres versions supported by Neon. For the most up-to-date information on supported versions, please refer to the list of all extensions available in Neon.
Enable logical replication
Before using the wal2json
plugin, you need to enable logical replication for your Neon project. Navigate to the Settings page in your Neon Project Dashaboard, and select Beta from the list of options. Click Enable to enable logical replication.
note
Once enabled for a project, logical replication cannot be reverted. This action triggers a restart of all active compute endpoints in your Neon project. Any active connections will be dropped and have to reconnect.
To verify that logical replication is enabled, navigate to the SQL Editor
and verify the output of the following query:
SHOW wal_level;
wal_level
-----------
logical
(1 row)
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Create a replication slot
To start using wal2json
, you first need to create a replication slot that explcitly specifies wal2json
as the decoder plugin. You can do this by running the following query:
SELECT 'start' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');
This creates a replication slot named test_slot
using the wal2json
plugin. Now, we can query this slot to listen for changes to any tables in the database.
wal2json
to capture changes to a table
Example - use Suppose we have a table named inventory
that stores information about products for a retail store. We want to capture changes to this table in real-time and process them using wal2json
.
Run the following query to create the inventory
table, and insert some sample data:
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
quantity INTEGER,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO inventory (product_name, quantity) VALUES
('Widget A', 100),
('Gadget B', 50),
('Gizmo C', 75);
With logical decoding enabled, Postgres streams changes to the inventory
table to the test_slot
replication slot. Run the following query to observe the messages that have been published to it:
SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', 'on');
This query returns the changes in JSON format. Each change will be represented as a separate JSON object.
lsn | xid | data
-----------+------+-------------------------------------------------------------------------------------------------------------------------
0/24E7950 | 2055 | { +
| | "change": [ +
| | ] +
| | }
0/24E7D60 | 2056 | { +
| | "change": [ +
| | { +
| | "kind": "insert", +
| | "schema": "public", +
| | "table": "inventory", +
| | "columnnames": ["id", "product_name", "quantity", "last_updated"], +
| | "columntypes": ["integer", "character varying(100)", "integer", "timestamp without time zone"],+
| | "columnvalues": [1, "Widget A", 100, "2024-07-30 09:53:26.078749"] +
| | } +
| | ,{ +
| | "kind": "insert", +
| | "schema": "public", +
| | "table": "inventory", +
| | "columnnames": ["id", "product_name", "quantity", "last_updated"], +
| | "columntypes": ["integer", "character varying(100)", "integer", "timestamp without time zone"],+
| | "columnvalues": [2, "Gadget B", 50, "2024-07-30 09:53:26.078749"] +
| | } +
| | ,{ +
| | "kind": "insert", +
| | "schema": "public", +
| | "table": "inventory", +
| | "columnnames": ["id", "product_name", "quantity", "last_updated"], +
| | "columntypes": ["integer", "character varying(100)", "integer", "timestamp without time zone"],+
| | "columnvalues": [3, "Gizmo C", 75, "2024-07-30 09:53:26.078749"] +
| | } +
| | ] +
| | }
(2 rows)
There are two rows in the query output above. The first row corresponds to the CREATE TABLE
statement that we ran earlier. Logical decoding only captures information about DML (data manipulation) events — INSERT
, UPDATE
, and DELETE
statements, hence this row is empty. The second row corresponds to the INSERT
statement that added rows to the inventory
table.
Next, we update an existing row in the inventory
table:
UPDATE inventory SET quantity = quantity + 100 WHERE product_name = 'Widget A';
We can now query the test_slot
replication slot again to see the new information published as a result of the update:
SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', 'on');
This query returns a single row in JSON format, corresponding to the row updated.
lsn | xid | data
-----------+------+-------------------------------------------------------------------------------------------------------------------------
0/24EC940 | 2057 | { +
| | "change": [ +
| | { +
| | "kind": "update", +
| | "schema": "public", +
| | "table": "inventory", +
| | "columnnames": ["id", "product_name", "quantity", "last_updated"], +
| | "columntypes": ["integer", "character varying(100)", "integer", "timestamp without time zone"],+
| | "columnvalues": [1, "Widget A", 200, "2024-07-30 09:53:26.078749"], +
| | "oldkeys": { +
| | "keynames": ["id"], +
| | "keytypes": ["integer"], +
| | "keyvalues": [1] +
| | } +
| | } +
| | ] +
| | }
(1 row)
Format versions: 1 vs 2
The wal2json
plugin supports two different output format versions.
The default format version is 1, which produces a JSON object per transaction. All new and old tuples are available within this single JSON object. This format is useful when you need to process entire transactions as atomic units.
Format version 2 produces a JSON object per tuple (row), with optional JSON objects for the beginning and end of each transaction. This format is more granular and can be useful when you need to process changes on a row-by-row basis. Both formats support various options to include additional properties such as transaction timestamps, schema-qualified names, data types, and transaction IDs.
To use format version 2, you need to specify it explicitly:
SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'format-version', '2');
To illustrate, we add a couple more product entries to the inventory
table:
INSERT INTO inventory (product_name, quantity) VALUES
('Widget D', 200),
('Gizmo E', 75);
Now, we can query the test_slot
replication slot again to see the new information published as a result of the update:
SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', 'on', 'format-version', '2');
The output of this query appears as follows. You can see that there is a separate JSON object for each row inserted.
lsn | xid | data
-----------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0/24F18D8 | 3078 | {"action":"B"}
0/24F1940 | 3078 | {"action":"I","schema":"public","table":"inventory","columns":[{"name":"id","type":"integer","value":8},{"name":"product_name","type":"character varying(100)","value":"Widget D"},{"name":"quantity","type":"integer","value":200},{"name":"last_updated","type":"timestamp without time zone","value":"2024-07-30 10:27:45.428407"}]}
0/24F1A48 | 3078 | {"action":"I","schema":"public","table":"inventory","columns":[{"name":"id","type":"integer","value":9},{"name":"product_name","type":"character varying(100)","value":"Gizmo E"},{"name":"quantity","type":"integer","value":75},{"name":"last_updated","type":"timestamp without time zone","value":"2024-07-30 10:27:45.428407"}]}
0/24F1B10 | 3078 | {"action":"C"}
(4 rows)
wal2json
with tables without a primary key
Use REPLICA IDENTITY
is a table property that determines what information is written to the WAL when a row is updated or deleted. This property is crucial for wal2json
when working with tables that don't have a primary key.
REPLICA IDENTITY
has four possible settings:
DEFAULT
: Only primary key columns are logged forUPDATE
andDELETE
operations.USING INDEX
: A specified index's columns are logged forUPDATE
andDELETE
operations.FULL
: All columns are logged forUPDATE
andDELETE
operations.NOTHING
: No information is logged forUPDATE
andDELETE
operations.
Tables use the DEFAULT
setting by default. For tables without a primary key, this means no information is logged for updates and deletes. Let's create a table without a primary key and see how wal2json
behaves:
CREATE TABLE products_no_pk (
product_name VARCHAR(100),
quantity INTEGER,
price DECIMAL(10, 2)
);
INSERT INTO products_no_pk (product_name, quantity, price) VALUES ('Widget', 100, 19.99);
UPDATE products_no_pk SET quantity = 90 WHERE product_name = 'Widget';
The wal2json
output for this update operation will not contain any information about the updated row due to the lack of a primary key and the DEFAULT REPLICA IDENTITY
setting.
WARNING: table "products_no_pk" without primary key or replica identity is nothing
lsn | xid | data
-----------+------+---------------------
0/256D6C8 | 6151 | { +
| | "change": [+
| | ] +
| | }
(1 row)
To capture changes for tables without a primary key, we can change the REPLICA IDENTITY
to FULL
:
ALTER TABLE products_no_pk REPLICA IDENTITY FULL;
UPDATE products_no_pk SET price = 21.99 WHERE product_name = 'Widget';
Now, the wal2json
output will include both the old and new values for all columns, which can be used to identify the changed row. To verify, we can query the test_slot
replication slot again:
SELECT * FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', 'on');
The output of this query appears as follows:
lsn | xid | data
-----------+------+-----------------------------------------------------------------------------------------------------
0/256E228 | 6152 | { +
| | "change": [ +
| | ] +
| | }
0/256E310 | 6153 | { +
| | "change": [ +
| | { +
| | "kind": "update", +
| | "schema": "public", +
| | "table": "products_no_pk", +
| | "columnnames": ["product_name", "quantity", "price"], +
| | "columntypes": ["character varying(100)", "integer", "numeric(10,2)"], +
| | "columnvalues": ["Widget", 90, 21.99], +
| | "oldkeys": { +
| | "keynames": ["product_name", "quantity", "price"], +
| | "keytypes": ["character varying(100)", "integer", "numeric(10,2)"],+
| | "keyvalues": ["Widget", 90, 19.99] +
| | } +
| | } +
| | ] +
| | }
(2 rows)
Performance considerations
When working with wal2json
, keep the following performance considerations in mind:
- Replication slot management: Unused replication slots can prevent WAL segments from being removed, potentially causing disk space issues. Regularly monitor and clean up unused slots.
- Batch processing: Instead of processing each change individually, consider batching changes for more efficient processing.
- Resource usage: Be mindful of network bandwidth usage, especially when dealing with high-volume changes or when replicating over a wide area network. Additionally, decoding WAL to JSON can be CPU-intensive. Monitor your system's CPU usage and consider scaling your resources if needed.
Conclusion
The wal2json
plugin is a powerful tool for capturing and processing database changes in JSON format. We've seen how to enable it, configure its output, and use it in various scenarios. Whether you're implementing a data replication system, building an audit trail, or creating an event-driven architecture, wal2json
provides a flexible and efficient way to work with the Postgres Write-Ahead Log (WAL).
Resources
- wal2json GitHub Repository
- PostgreSQL Logical Decoding
- Manage logical replication in Neon - Decoder plugins
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.