Context

The requirement is to implement a new database design for managing a variety of digital assets. Each asset has an associated identifier (hash) and metadata. The hash is calculated using a tool like sha512sum or phash. Metadata is extracted in JSON format using tools like ffprobe or exiftool. Ideally, only specific required metadata fields will be stored in the database, but that makes future changes more complex and requires making certain assumptions about the type of digital asset. To make the database as general purpose as possible, the database management system (DBMS) needs to support JSON documents.

PostgreSQL

PostgreSQL is an open source relational database that supports JSON types. To better understand the configuration of this database, it is beneficial to briefly cover some terminology and hierarchies.

PostgreSQL uses a client-server model. The server accepts client connections at a specified port (5432 by default) and stores data at a specified location (typically /usr/local/pgsql/data or /var/lib/postgresql/data). The server can manage any number of databases stored in this data directory; this forms a single database cluster. The database itself consists of schemas; schemas act as a namespace for organizing database objects like tables and funtions (SQL queries).

PostgreSQL uses roles and priveleges for access control. Roles are created and assigned permissions to perform specific actions:

create role role_name password 'role_password'; 
grant usage on schema schema_name to role_name;  # allows accessing objects under specified schema
grant INSERT on schema.table to role_name;       # allows action on specified object

The specified action can be ALL, or INSERT, SELECT, UPDATE, DELETE and so on.

Implementation

Docker is used to deploy the services. Three services are used:

  • db: the PostgreSQL server and database cluster
  • adminer: webpage that allows running SQL commands
  • api: PostgREST web server that exposes a REST API

psql is a terminal based client interface for the PostgreSQL server. pgAdmin and myPhpAdmin are graphical interfaces similar to adminer; GUIs make running SQL commands easier by elimating the need for terminal access. Adminer can be used to create schemas, perform CRUD operations on tables, create roles, assign permissions and execute other SQL commands.

The api is used to enable clients to perform actions without specialized tools for connecting to the database; only the ability to send HTTP requests is required.

docker-compose.yml

services:

  db:
    image: postgres:18.2
    restart: always
    environment:
      POSTGRES_USER: example_user
      POSTGRES_PASSWORD: example_password
      POSTGRES_DB: example_db
    volumes:
      - pgdata:/var/lib/postgresql

  adminer:
    image: adminer:5.4.2
    restart: always
    ports:
      - 8080:8080
    depends_on:
      - db

  api:
    image: postgrest/postgrest:v14.5
    ports:
      - 3000:3000
    environment:
      PGRST_SERVER_HOST: 0.0.0.0
      PGRST_DB_URI: postgres://authenticator:example_auth@db:5432/example_db
      PGRST_OPENAPI_SERVER_PROXY_URI: http://127.0.0.1:3000
      PGRST_JWT_SECRET: 'aThirtyTwoCharactersSecretForJWT'
    depends_on:
        - db

volumes:
  pgdata:

For PGRST_DB_URI, the authenticator role and example_auth password will be created and set later.

Database roles

Two roles are used to seperate authentication from authorization. Clients connect to the authenticator role by providing the correct password, but this role does not have any permissions. The client can then switch to another role (SET ROLE role_name) which has the required permissions.

The following can be executed using a client interface (like psql/Adminer):

create role role_name nologin;
create role authenticator noinherit login password 'admin_password';

grant role_name to authenticator;

grant usage on schema schema_name to role_name;
grant all on schema.table to role_name;
  • nologin disables connecting directly as the specified role
  • noinherit prevents default permissions from being automatically assigned

API token generation

The following steps are from the PostgREST documentation.

Generate a JWT secret and set the relevant variable in docker-compose.yml:

echo $(< /dev/urandom tr -dc A-Za-z0-9 | head -c32)

Create a signed token using bash and openssl, replacing role_name with the nologin role:

#!/bin/bash
set -e

JWT_SECRET='test_secret_that_is_at_least_32_characters_long'

_base64 () { openssl base64 -e -A | tr '+/' '-_' | tr -d '='; }

header=$(echo -n '{"alg":"HS256","typ":"JWT"}' | _base64)
payload=$(echo -n "{\"role\":\"role_name\"}" | _base64)
signature=$(echo -n "$header.$payload" | openssl dgst -sha256 -hmac "$JWT_SECRET" -binary | _base64)

echo -n "$header.$payload.$signature"

API Usage

Export your token:

export TOKEN=''

To retrieve records:

curl https://domain_name.org/table_name -X GET \
    -H "Authorization: Bearer $TOKEN"

To insert new data:

curl https://domain_name.org/table_name -X POST \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d "{"column1": "value", "column2": {key1: value1, key2: value2}, "column3": 3 }"

Troubleshooting

There are two different ways of approaching troubleshooting - by starting with something that doesn’t work and removing/modifying components until does, or by starting with something that does work and adding components until it doesn’t. The second approach is easier when there are many things that could go wrong.

A common error that may be faced is that the binding port is already used. The syntax for port binding is - host_port:container_port. The host post may need to be changed if there is a conflict.

When modifying the docker-compose.yml, if you mount the pgdata volume to /var/lib/postgresql/data and try to access the API, it will state that the JWT secret is not set. If you have a look at logs of the PostgreSQL service, you will notice that it is actually not running - the error references a pull request which requires that the volume be mounted at /var/lib/postgresql instead for PostgreSQL v18+. Determining this true cause by troubleshooting the API would have been tedious and time consuming.

The PostgREST REST API requires cache invalidation; if faced with errors of the format Could not find xxx in the schema cache and after verification that xxx indeed exists in the specified schema, simply stop and start the service to update cache.

Note, one PostgREST instance can only connect to one database instance. If API access is required for multiple databases in the same database cluster, each database requires a PostgREST instance. Depending on your requirements, using multiple schemas may be a better option.