This is my first published article in the system design category. (I've had other articles in draft for a year or more about designing distributed systems for criminal organizations in year 4XXX. (I don't know when I'll get to publishing that work.))
Let's talk about cursor based pagination, which is often used for feeds or ultra large sets of data (facebook/twitter/instagram feed, google search, etc.). I'm assuming you know why you want cursor pagination as opposed to other methods1.
If your database returns serializable cursors, you don't have to implement this on your own for data. For the rest of us, there's this article2.
At a glance, how does it work?
- The browser requests new data and provides details of its current page of data, which we call a cursor,
C
.C
might be a hash, or perhaps be human-readable.C
may be null if the browser is requesting the first page. - The server must know 2 things from
C
:
- where the search left off, and
- which direction to go (next page or previous page?)
- where the search left off, and
- A server gets the browser request. The server decodes
C
and pulls out the id forRecord50
and also also pulls out the paging direction for next page. The server now knows to start searching for the next records afterRecord50
from a database. We assumeRecord50
can be efficiently found, otherwise there would be no performance benefit from choosing cursor pagination. - Importantly, when I say after
Record50
, the after part means we maintain whatever sort order was specified. IfRecord50
was found from sorting bylastName DESC
, then breaking ties ontelephoneNumber ASC
, the next record isn't necessarilyRecord51
, just the record that would have come up the response to the browser included one more row. - So the server must know the sort order to use. The browser usually sends that information, or it is known ahead of time by the server.
- The server sends the data AND the cursor (or enough information for the browser to build the cursor), so the browser can send it back later.
- We'll cover backwards paging after we cover forwards paging.
Therefore, in addition to two things from C
, the server must know the sort order the record came from. What's left is to build that query, return results, and wait for the browser to trigger the whole process over again until the user leaves your majestic site.
Build that Query
This is the crux of it all. If you know the sort order, and have efficient indices to for the rows you sort by, only building the query remains. This reduces the problem of cursor pagination to string/query building.
Here's a function from leetcode-archive's source code that reads a list of comma separated sort specs like "challengeDate DESC, rowid DESC"
3, and churns out a WHERE clause in SQL to start getting records for the user's next page of results.
(defun build-where-recursive (sort-specs id) (let* ((pair (string-split (car sort-specs))) (key (car pair)) (dir (cadr pair))) (if (null (cdr sort-specs)) ;; Base case, we assume the last sort spec sorts on a unique key, ;; thus no other row can have an equal value to key in the table. (format "%1$s %2$s (select %1$s from dailies_test where rowid = %3$s)" key (if (equal "ASC" dir) ">" "<") id) ;; General case that allows ties. We recurse to the next sort spec. (format "%1$s %2$s (select %1$s from dailies_test where rowid = %3$s) OR ( %1$s = (select %1$s from dailies_test where rowid = %3$s) AND ( %4$s ) )" key (if (equal "ASC" dir) ">" "<") id (build-where-recursive (cdr sort-specs) id)))))
In the forward page case, this returns a clause like
challengeDate < (select challengeDate from dailies_test where rowid = 50) OR ( challengeDate = (select challengeDate from dailies_test where rowid = 50) AND ( rowid < (select rowid from dailies_test where rowid = 50) ) )
which will be later part of this query (you can ignore the select and the date range):
select rowid, challengeDate, questionFrontendId, title, link, difficulty from dailies_test where lastName >= "2024-12-09" and lastName <= "2025-01-28" -- BEGIN paging clause AND ( challengeDate < (select challengeDate from dailies_test where rowid = 50) OR ( challengeDate = (select challengeDate from dailies_test where rowid = 50) AND ( rowid < (select rowid from dailies_test where rowid = 50) ) ) ) -- END of paging clause ORDER BY challengeDate DESC,rowid DESC limit 10
The clause on the left of each OR
is if the cursor's challengeDate
has no ties. On the right is when there is a tie–we use the next sort spec, in this case rowid, to break ties.
More sort specs
Next let's try challengeDate desc,questionFrontendId ASC,rowid desc
. Try to make a prediction of how the where clause would look.
Did your prediction match this?
challengeDate < (select challengeDate from dailies_test where rowid = 50) OR ( challengeDate = (select challengeDate from dailies_test where rowid = 50) AND ( questionFrontendId > (select questionFrontendId from dailies_test where rowid = 50) OR ( questionFrontendId = (select questionFrontendId from dailies_test where rowid = 50) AND ( rowid < (select rowid from dailies_test where rowid = 50) ) ) ) ) ORDER BY challengeDate desc,questionFrontendId ASC,rowid desc
Paging Backwards
Remember, that C
gives us both an id and a paging direction. If the paging direction is backwards, we have to find records BEFORE the id. Say now our id references Record40
, which was the first record of the page from the database. (The UI can separately sort the page on its own, so I distinguish the page, as the sequence of records given by the data store).
But what does that mean to our sort orders?
We reverse them.
We invert every DESC
into ASC
, and every ASC
into DESC
for data lookup.
Thus, for the natural sort order for the records as challengeDate DESC,rowid DESC
, we invert them before calling our function. Thus, we input challengeDate ASC,rowid ASC
into build-where-recursive
.
And it generates:
challengeDate > (select challengeDate from dailies_test where rowid = 40) OR ( challengeDate = (select challengeDate from dailies_test where rowid = 40) AND ( rowid > (select rowid from dailies_test where rowid = 40) ) )
And the full query:
select * from ( select rowid, challengeDate, questionFrontendId, title, link, difficulty from dailies_test where challengeDate >= "2024-12-09" and challengeDate <= "2025-01-28" AND (challengeDate > (select challengeDate from dailies_test where rowid = 1369) OR ( challengeDate = (select challengeDate from dailies_test where rowid = 1369) AND ( rowid > (select rowid from dailies_test where rowid = 1369) ) )) ORDER BY challengeDate ASC,rowid ASC limit 10 ) order by challengeDate DESC,rowid DESC;
Going into 4,5,n sort specs is the same. build-where-recursive
handles it the same.
What's the outer select doing?
You may have noticed the nested query. It's because when we get records paged backward, the records from the inner query are in reverse order. The outer query merely reverses the order again to return records in the natural sort order used in the application.
What If I'm sorting by nullable values?
The above code doesn't handle nullable fields. Whatever sort order you've decided on regarding nulls, you have to flip them when paging for previous page.
Example: If the sort order in play contains a sort spec like
ORDER BY name DESC, email IS NULL, id ASC
you do this when paging backwards:
ORDER BY name ASC, email IS NOT NULL, id DESC
Since the order by syntax can contain computed values as well, it's harder to write a generalized inverse for those cases. Consult this stackoverflow post: sql-how-to-make-null-values-come-last-when-sorting-ascending
Last Note On the Client
When implementing cursor pagination, your server sends the cursor to the client. The client then has the responsibility of sending the first entry of the page's data for paging backwards. Or for paging forward, the last entry of the data. The client may not necessarily be able to decode it however, it might be a hash. We call this an opaque cursor.
And of course, if your database gives you serializable cursors, your app server should send that to the browser. When the browser sends it back to you, your server deserializes it, and uses it to get the next records.
Attack
Practice what you've learned, use my free site (warning, this site cold starts) https://leetcode-archive.onrender.com/ and try out the cursor pagination. The source code is public domain.
Lastly, if you're a law enforcement official from year 4XXX, consider hiring me or sending me a warm pair of gloves because it is cold this winter. Get algorithm lectures at https://www.youtube.com/channel/jwow-0 or like and subscribe for emacs videos.
Thanks!
See https://brunoscheufler.com/blog/2022-01-01-paginating-large-ordered-datasets-with-cursor-based-pagination, which helped me learn this topic.
I'm using emacs sqlite library. If SQLite provides serializable cursors, great, but the emacs sqlite connector certainly hasn't exposed them to me in emacs 30.0.50.
Note that you always want a final tie breaker (here I use id
, which must be unique), even if no sort order is provided by the user. Otherwise the DB engine can decide to return records in any order–which potentially breaks pagination.