Joining SQL tables on a JSON array

Joining SQL tables on a JSON array

How to join PostgreSQL tables on a JSON array of ids.

Joining tables is a common SQL operation that allows you to combine data from two or more tables into a single result set. Normally, join operations are performed on columns that have matching values in both tables.

Many-to-many relationships are a common pattern in relational databases, where two tables are related to each other through a junction table.

Let's take this example

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name TEXT
);

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT,
  price NUMERIC
);

CREATE TABLE order_products (
    id SERIAL PRIMARY KEY,
  order_id int references orders(id),
  product_id int references products(id)
);

INSERT INTO orders (customer_name)
VALUES ('Alice'), ('Bob');

INSERT INTO products (name, price)
VALUES ('Product 1', 10.0), ('Product 2', 20.0), ('Product 3', 30.0), ('Product 4', 40.0);

INSERT INTO order_products (order_id, product_id)
VALUES (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (2, 4);

Here we have an orders table, a products table, and an order_products table which relates each order to its products.

If we want to get each order and its products, we use the following query.

select
  orders.id,
  customer_name,
  products.name,
  products.price
from
  orders
  join order_products on orders.id = order_products.order_id
  join products on product_id = products.id;

We get this result

idcustomer_namenameprice
1AliceProduct 110.0
1AliceProduct 220.0
1AliceProduct 330.0
2BobProduct 220.0
2BobProduct 330.0
2BobProduct 440.0

But what if I told you we don't need this intermediate table, order_products?

We can store the product ids as a JSON array in the orders table.

Let's drop the tables

DROP order_products, orders, products;

And create the new tables

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_name TEXT,
  product_ids JSONB
);

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name TEXT,
  price NUMERIC
);

INSERT INTO orders (customer_name, product_ids)
VALUES ('Alice', '[1, 2, 3]'), ('Bob', '[2, 3, 4]');

INSERT INTO products (name, price)
VALUES ('Product 1', 10.0), ('Product 2', 20.0), ('Product 3', 30.0), ('Product 4', 40.0);

Here we defined the products_ids in the orders table as a JSON column and inserted an array of ids into it.

So how can we join these two tables to get each order and its products as we did earlier?
We run the following query

SELECT
    orders.id,
    orders.customer_name,
    products.name,
    products.price
FROM
    orders
    CROSS JOIN jsonb_array_elements(orders.product_ids) AS product_id
    JOIN products ON products.id = product_id::text::int
ORDER BY
    orders.id;

We get the following result

idcustomer_namenameprice
1AliceProduct 110.0
1AliceProduct 220.0
1AliceProduct 330.0
2BobProduct 220.0
2BobProduct 330.0
2BobProduct 440.0

The results are the same, but a lot is going on here, so let's break it down piece by piece.

jsonb_array_elements is a PostgreSQL function that allows you to extract the elements of a JSON array and return them as a set of rows.

If we run the following query

SELECT jsonb_array_elements(orders.product_ids) FROM orders;

We get this result

jsonb_array_elements
1
2
3
2
3
4

As you can see, this returned all of the ids in the product_ids field as rows.

CROSS JOIN is a type of join in SQL that returns the Cartesian products of two tables.

In other words, it returns all possible combinations of rows from the two tables, without any condition for joining them.

But it can also work on functions. You can use a function to generate a set of values, and then combine those values with another table using CROSS JOIN.

So this query

SELECT
  id,
  customer_name,
  product_id
FROM
  orders
  CROSS JOIN jsonb_array_elements (orders.product_ids) AS product_id;

returns this result

idcustomer_nameproduct_id
1Alice1
1Alice2
1Alice3
2Bob2
2Bob3
2Bob4

We can join this result on the products table as we did to get all orders and their products.

product_id::text::int casts the value to an integer.

What is a JSONB type?

You might wonder what is JSONB, JSONB is a binary format for storing JSON data.

Unlike the JSON data type, which stores JSON data as plaintext, JSONB stores the JSON data in a binary format.

The benefits are:

  • Efficient storage: `JSONB` stores data in a binary format that is more compact than plain text, leading to smaller table sizes and faster data access.

  • - Fast queries: `JSONB` supports indexing and querying of JSON data using various operators and functions, which can improve query performance.

  • - Data validation: `JSONB` provides some level of data validation, as it only allows valid JSON data to be stored in a column with this data type.

Considerations

There are some things to consider in this approach.

Firstly, updates and deletes are a bit more complex.
In the first approach where we had an order_products table, adding or deleting products from an order was easy, we just insert a new row or delete an existing one.
But if we want to update the product_ids array we use this

UPDATE
  orders
SET
  product_ids = jsonb_set(product_ids, '{2}', '4')
WHERE
  id = 1;

instead of

UPDATE
  order_products
SET
  product_id = 4
WHERE
  id = 1;

Secondly, is this a good design?
JSON fields in SQL are useful for arbitrary data that is difficult to model, but if the data follows a pattern, then we should probably model the data in a structured way, in this case by using a junction table in the first example of this article.

So, you probably shouldn't use JSON in this case, but I hope this article is useful in case you do need to join tables on a JSON field that doesn't follow a pattern.