← All Posts

The Booking That Took Itself: Solving Double-Booking with a Sentinel Lock

SELECT FOR UPDATE prevents two people from booking the same slot — except when the slot is the first one of the day. Here's the edge case, and the one-line fix.

A booking platform’s core invariant is simple to state and surprisingly easy to violate under load: two clients should never be able to book the same time slot. The standard fix is row-level locking. The standard fix has an edge case that only shows up on an empty calendar.

The Naive Version

The obvious implementation checks for conflicts, then creates the booking:

def create_booking(service, start, end, ...):
    overlapping = Appointment.objects.filter(
        status__in=['confirmed', 'pending_payment'],
        start_time_utc__lt=end,
        end_time_utc__gt=start,
    )
    if overlapping.exists():
        raise SlotTakenError()

    return Appointment.objects.create(...)

This works in every manual test. It fails under concurrent load, because there’s a gap between the check and the write. Two requests for the same slot can both run the .exists() check, both see zero conflicts, and both proceed to .create(). Now there are two appointments for one slot.

SELECT FOR UPDATE

PostgreSQL’s SELECT FOR UPDATE locks the selected rows until the transaction commits. Wrapping the check in a transaction with row locks closes the gap:

with transaction.atomic():
    overlapping = Appointment.objects.select_for_update().filter(
        status__in=['confirmed', 'pending_payment'],
        start_time_utc__lt=end,
        end_time_utc__gt=start,
    )
    if overlapping.exists():
        raise SlotTakenError()

    return Appointment.objects.create(...)

Now, if two requests arrive for the same slot: the first request’s SELECT FOR UPDATE acquires a lock on the matching row(s) and proceeds. The second request’s SELECT FOR UPDATE for the same row range blocks — it waits for the first transaction to commit. Once it does, the second request re-evaluates the filter, now sees the first booking, and raises SlotTakenError correctly.

This is correct — as long as there’s a row to lock.

The Empty-Day Edge Case

Here’s the case that breaks it: it’s the first booking of the day. The Appointment table has zero rows that overlap the requested slot — because there are zero appointments at all for that day.

SELECT FOR UPDATE on a query that matches zero rows locks nothing. There is nothing to lock. Both concurrent requests run the filter, both get an empty queryset, both pass the .exists() check, and both proceed to create an appointment for the exact same slot.

The double-booking protection works perfectly — except on the one slot per day where it’s needed for the first booking. Which, statistically, is not rare. It’s the most contested slot of the day, because every client’s calendar shows it as the first available option.

The Fix: Lock Something That Always Exists

The fix is to lock a row that’s guaranteed to exist regardless of how many appointments there are — a sentinel. The natural candidate is the Service row itself: every booking is for some service, and that service row exists before any appointment does.

with transaction.atomic():
    # Lock the Service row as a sentinel. This row always exists,
    # so this SELECT FOR UPDATE always acquires a lock — serializing
    # concurrent requests even when zero Appointment rows match.
    Service.objects.select_for_update().get(pk=service.pk)

    overlapping = Appointment.objects.select_for_update().filter(
        status__in=['confirmed', 'pending_payment'],
        start_time_utc__lt=end_time_utc,
        end_time_utc__gt=start_time_utc,
    )
    if overlapping.exists():
        raise SlotTakenError('This slot was just taken. Please choose another time.')

    appointment = Appointment.objects.create(...)

Now, two concurrent requests for the first slot of an empty day both attempt to lock the same Service row. One acquires the lock immediately; the other blocks until the first transaction commits. By the time the second request’s lock is granted, the first booking exists — its overlapping query now returns a row, and it correctly raises SlotTakenError.

One extra SELECT ... FOR UPDATE on a row that was going to be read anyway, and the race condition closes completely — for the populated case and the empty case.

Why This Matters Beyond Booking Systems

The general pattern is: row-level locking only protects rows that exist. Any “check for conflicts, then act” pattern where the conflict set can legitimately be empty has this gap. Inventory systems checking “is there an existing reservation for this SKU,” queue systems checking “is there already a job for this key,” uniqueness checks on a table that might currently have zero matching rows — all of the same shape.

The fix generalizes too: lock a parent or sibling row that’s guaranteed to exist — a category row, a tenant row, a config singleton — to serialize access to a conflict set that might itself be empty. The lock doesn’t need to be semantically meaningful on its own; it just needs to exist and to be the same row across concurrent requests for the same logical resource.

Verifying It

This is the kind of bug that’s invisible in sequential testing and in low-traffic staging environments — it requires genuinely concurrent requests against an empty table. The test that catches it spins up two threads (or two async tasks) targeting the same empty-calendar slot simultaneously and asserts that exactly one succeeds:

def test_concurrent_booking_first_slot_of_day(self):
    """Two simultaneous bookings for the first slot of an empty
    day — exactly one should succeed."""
    results = []

    def attempt():
        try:
            create_booking(service_id=self.service.id, start_time_utc=self.slot, ...)
            results.append('ok')
        except SlotTakenError:
            results.append('taken')

    t1 = threading.Thread(target=attempt)
    t2 = threading.Thread(target=attempt)
    t1.start(); t2.start()
    t1.join(); t2.join()

    self.assertEqual(sorted(results), ['ok', 'taken'])

Without the sentinel lock, this test is flaky — it passes most of the time and fails under timing pressure, which is exactly the kind of bug that survives code review and surfaces in production three weeks after launch on the first genuinely busy day.