Practical Cursor Pagination

This is my first published article in the system design category. (I've had other articles in draft for a year or more about designing distributed systems for criminal organizations in year 4XXX. (I don't know when I'll get to publishing that work.))

Let's talk about cursor based pagination, which is often used for feeds or ultra large sets of data (facebook/twitter/instagram feed, google search, etc.). I'm assuming you know why you want cursor pagination as opposed to other methods1.

If your database returns serializable cursors, you don't have to implement this on your own for data. For the rest of us, there's this article2.

At a glance, how does it work?

  • The browser requests new data and provides details of its current page of data, which we call a cursor, C. C might be a hash, or perhaps be human-readable. C may be null if the browser is requesting the first page.
  • The server must know 2 things from C:
    • where the search left off, and
    • which direction to go (next page or previous page?)
  • A server gets the browser request. The server decodes C and pulls out the id for Record50 and also also pulls out the paging direction for next page. The server now knows to start searching for the next records after Record50 from a database. We assume Record50 can be efficiently found, otherwise there would be no performance benefit from choosing cursor pagination.
  • Importantly, when I say after Record50, the after part means we maintain whatever sort order was specified. If Record50 was found from sorting by lastName DESC, then breaking ties on telephoneNumber ASC, the next record isn't necessarily Record51, just the record that would have come up the response to the browser included one more row.
  • So the server must know the sort order to use. The browser usually sends that information, or it is known ahead of time by the server.
  • The server sends the data AND the cursor (or enough information for the browser to build the cursor), so the browser can send it back later.
  • We'll cover backwards paging after we cover forwards paging.

Therefore, in addition to two things from C, the server must know the sort order the record came from. What's left is to build that query, return results, and wait for the browser to trigger the whole process over again until the user leaves your majestic site.

Build that Query

This is the crux of it all. If you know the sort order, and have efficient indices to for the rows you sort by, only building the query remains. This reduces the problem of cursor pagination to string/query building.

Here's a function from leetcode-archive's source code that reads a list of comma separated sort specs like "challengeDate DESC, rowid DESC"3, and churns out a WHERE clause in SQL to start getting records for the user's next page of results.

(defun build-where-recursive (sort-specs id)
  (let* ((pair (string-split (car sort-specs)))
         (key (car pair))
         (dir (cadr pair)))
    (if (null (cdr sort-specs))
        ;; Base case, we assume the last sort spec sorts on a unique key,
        ;; thus no other row can have an equal value to key in the table.
        (format "%1$s %2$s (select %1$s from dailies_test where rowid = %3$s)"
                key
                (if (equal "ASC" dir) ">" "<")
                id)
      ;; General case that allows ties. We recurse to the next sort spec.
      (format "%1$s %2$s (select %1$s from dailies_test where rowid = %3$s) OR
  (
    %1$s = (select %1$s from dailies_test where rowid = %3$s) AND
    (
      %4$s
    )
  )"
              key
              (if (equal "ASC" dir) ">" "<")
              id
              (build-where-recursive (cdr sort-specs) id)))))

In the forward page case, this returns a clause like

challengeDate < (select challengeDate from dailies_test where rowid = 50) OR
  (
    challengeDate = (select challengeDate from dailies_test where rowid = 50) AND
    (
      rowid < (select rowid from dailies_test where rowid = 50)
    )
  )

which will be later part of this query (you can ignore the select and the date range):

select rowid, challengeDate, questionFrontendId, title, link, difficulty from dailies_test
where lastName >= "2024-12-09" and lastName <= "2025-01-28"

-- BEGIN paging clause
AND (
  challengeDate < (select challengeDate from dailies_test where rowid = 50) OR
  (
    challengeDate = (select challengeDate from dailies_test where rowid = 50) AND
    (
      rowid < (select rowid from dailies_test where rowid = 50)
    )
  )
)
-- END of paging clause

ORDER BY challengeDate DESC,rowid DESC

limit 10

The clause on the left of each OR is if the cursor's challengeDate has no ties. On the right is when there is a tie–we use the next sort spec, in this case rowid, to break ties.

More sort specs

Next let's try challengeDate desc,questionFrontendId ASC,rowid desc. Try to make a prediction of how the where clause would look.

Did your prediction match this?

challengeDate < (select challengeDate from dailies_test where rowid = 50) OR
  (
    challengeDate = (select challengeDate from dailies_test where rowid = 50) AND
    (
      questionFrontendId > (select questionFrontendId from dailies_test where rowid = 50) OR
  (
    questionFrontendId = (select questionFrontendId from dailies_test where rowid = 50) AND
    (
      rowid < (select rowid from dailies_test where rowid = 50)
    )
  )
    )
  )

ORDER BY challengeDate desc,questionFrontendId ASC,rowid desc

Paging Backwards

Remember, that C gives us both an id and a paging direction. If the paging direction is backwards, we have to find records BEFORE the id. Say now our id references Record40, which was the first record of the page from the database. (The UI can separately sort the page on its own, so I distinguish the page, as the sequence of records given by the data store).

But what does that mean to our sort orders?

We reverse them.

We invert every DESC into ASC, and every ASC into DESC for data lookup.

Thus, for the natural sort order for the records as challengeDate DESC,rowid DESC, we invert them before calling our function. Thus, we input challengeDate ASC,rowid ASC into build-where-recursive.

And it generates:

challengeDate > (select challengeDate from dailies_test where rowid = 40) OR
(
  challengeDate = (select challengeDate from dailies_test where rowid = 40) AND
  (
    rowid > (select rowid from dailies_test where rowid = 40)
  )
)

And the full query:

select * from (
      select rowid, challengeDate, questionFrontendId, title, link, difficulty from dailies_test
  where challengeDate >= "2024-12-09" and challengeDate <= "2025-01-28"

AND (challengeDate > (select challengeDate from dailies_test where rowid = 1369) OR
  (
    challengeDate = (select challengeDate from dailies_test where rowid = 1369) AND
    (
      rowid > (select rowid from dailies_test where rowid = 1369)
    )
  ))

  ORDER BY challengeDate ASC,rowid ASC

  limit 10

    ) order by challengeDate DESC,rowid DESC;

Going into 4,5,n sort specs is the same. build-where-recursive handles it the same.

What's the outer select doing?

You may have noticed the nested query. It's because when we get records paged backward, the records from the inner query are in reverse order. The outer query merely reverses the order again to return records in the natural sort order used in the application.

What If I'm sorting by nullable values?

The above code doesn't handle nullable fields. Whatever sort order you've decided on regarding nulls, you have to flip them when paging for previous page.

Example: If the sort order in play contains a sort spec like

ORDER BY name DESC, email IS NULL, id ASC

you do this when paging backwards:

ORDER BY name ASC, email IS NOT NULL, id DESC

Since the order by syntax can contain computed values as well, it's harder to write a generalized inverse for those cases. Consult this stackoverflow post: sql-how-to-make-null-values-come-last-when-sorting-ascending

Last Note On the Client

When implementing cursor pagination, your server sends the cursor to the client. The client then has the responsibility of sending the first entry of the page's data for paging backwards. Or for paging forward, the last entry of the data. The client may not necessarily be able to decode it however, it might be a hash. We call this an opaque cursor.

And of course, if your database gives you serializable cursors, your app server should send that to the browser. When the browser sends it back to you, your server deserializes it, and uses it to get the next records.

Attack

Practice what you've learned, use my free site (warning, this site cold starts) https://leetcode-archive.onrender.com/ and try out the cursor pagination. The source code is public domain.

Lastly, if you're a law enforcement official from year 4XXX, consider hiring me or sending me a warm pair of gloves because it is cold this winter. Get algorithm lectures at https://www.youtube.com/channel/jwow-0 or like and subscribe for emacs videos.

Thanks!


2

I'm using emacs sqlite library. If SQLite provides serializable cursors, great, but the emacs sqlite connector certainly hasn't exposed them to me in emacs 30.0.50.

3

Note that you always want a final tie breaker (here I use id, which must be unique), even if no sort order is provided by the user. Otherwise the DB engine can decide to return records in any order–which potentially breaks pagination.