Org Attachment Investigation: Forced Absolute Paths on Export

In this post I investigate how org attachments links behave on export. Mainly I care about how it renders html links, forcing absolute paths. I found it surprising, and abnormal.

I'm running emacs 30.0.50, an edge build, and I ran into a situation where

exports this <img> html tag:

<img src="file:///home/furaro/src/my-site/src/content/data/7d/167a0f-5ae4-4f45-bd29-62ec6e464173/clipboard-20241230T022004.png" alt="clipboard-20241230T022004.png">

But in html, your browser can't render that image. The link is broken and you would instead see the alt text "clipboard-20241230T022004.png"

What we'd really like is just to change the src attribute to something like

"data/7d/167a0f-5ae4-4f45-bd29-62ec6e464173/clipboard-20241230T022004.png"

So that upon exporting, the html file can look through the same path to find the image. I have not customized org-attach. Others online say they get relative paths–and they struggled to get absolute paths (link).

For this article, I consulted https://orgmode.org/manual/Advanced-Export-Configuration.html a lot.

Jump to tl;dr for the reasons.

Org Vocabulary

  • backend: this is the format org is being told to export to. Examples are html, odt, latex, and many others. Check all your loaded backends in org-export-registered-backends
  • AST: abstract syntax tree. This is a representation of an org document that's easy for lisp to manipulate. It's a tree datastructure.
  • parsing: this is the process of building the AST. Loosely, org copies your org buffer into a temporary buffer and certain actions like macro expansion, #+includes, and comment removal happen on this temporary buffer.
  • transcoder: a translater. It is a function that takes some element inside org mode (like a source block), and transforms it into a string that can be parsed by the backend.
  • derived backend: is a backend that has a parent. You specify transcoders when creating a custom backend for export, falling back to the parent's transcoder, for say, a link. When deriving a backend, one important keyword property is :translate-alist, which is a list of the transcoders you specify.
  • org-link-parameters: This is a list of link types and their behavior when certain actions are taken. Most commonly, you :follow a link, and the follow function is then used. In our case, we mostly will care about the :export. Read the help doc for the many more parameters.
    Here's an example of a link type for [[nov:a-path-i-made-up]]
    (("nov" :follow nov-org-link-follow :store nov-org-link-store)...)
    
  • filters: are functions. They run after the transcoder is run and get the transcoded string. The positional arguments for a filter are (text backend info), where info is a giant context object that gets populated during parsing.

tl;dr

org-attach.el bypasses transcoders and the org-link-parameter :export functionality. Why it does this, I don't know.

How it does that is interesting:

;; This is the last line of code in org-attach.el
(add-hook 'org-export-before-parsing-functions 'org-attach-expand-links)

org-attach.el decides to rewrite the temporary buffer before the AST is even parsed.

When AST parsing happens, instead of

we get

Thus even the link type (attachment) is lost, and replaced with file.

In depth

Let's look at why certain solutions don't work. For that, let's see what the help says:

org-export-before-parsing-functions is a variable defined in ox.el.

Value
(org-attach-expand-links)

Documentation
Abnormal hook run before parsing an export buffer.

This is run after include keywords and macros have been expanded
and Babel code blocks executed, on a copy of the original buffer
being exported.

And we run org-attach-expand-links, replacing the link with

(concat "file:" (org-attach-expand file))

And org-attach-expand always expands to a full file path.

(defun org-attach-expand (file)
  "Return the full path to the current entry's attachment file FILE.
Basically, this adds the path to the attachment directory."
  (expand-file-name file (org-attach-dir)))

The code execution cannot be influenced by a customization. Sigh.

To recap: org-attach.el rewrites the temporary buffer that org uses, BEFORE parsing the temp buffer into its abstract syntax tree format. The transcoders run after the AST is built, explaining why methods that involve the transcoding phase, do not work. This means that even creating your own derived backend alone will work. Not unless you do something to influence the behavior of org-export-before-parsing-functions.

Non solutions:

For this puzzle, these are the most likely approaches I think readers will attempt, which, at least as I've coded them, won't work.

  • Don't use attachments
    This is fine, you can just use file: style links but that defeats the purpose of the investigation for people who rely on org attachments.
  • Setting an :export function for the attachment
    (defun my-fun (path desc backend info)
      (message (concat "This is the link path: " path))
      ;; now do stuff to return the html string
      )
    
    ;; Cannot snag on the debugger, message never happens
    (org-link-set-parameters "attachment" :export #'my-fun)
    
    

    If this function ran, path would be "clipboard-20241230T022004.png", but this never triggers. This would be a rather clean solution as this specifically triggers only for attachment: links. The org export machinery's backend should detect links with custom protocols using the built-in org-export-custom-protocol-maybe function. ox-html does, for example.

    One caveat of this method is that you want to return a string specific to whatever the backend target is (latex, html, markdown, etc.) This is a hit for maintainability.

  • Defining a custom transcoder (maybe in your own derived backend)
    for something like (link . jwow/org-attach-link)

    The AST gets passed to ox-html's org-html-link, but the :path will have been populated as an absolute path, of course because the input for the AST has already been changed.

    You can verify in org-html-link, calling this line from edebug:

    (org-element-property :path link)
    

    You'll be able to detect the long UUID that's characteristic of an org attachment, but this is hacky.

  • Operating after the transcoder, hooking into the org-export-filter-link-functions mechanism.
    Again, like the last approach, you have to guess from the full file path that this WAS an attachment, then most likely do a string replace operation.
    (defun my-org-link-filter (text backend info)
      (message text)
      ;; do something to text
      )
    
    (add-to-list 'org-export-filter-link-functions #'my-org-link-filter)
    
    ;; Text in this case is
    ;; <img src="file:///home/furaro/src/temp/data/80/943450-583e-4236-a2d1-1e926fbb15bd/clipboard-20241230T000706.png" alt="clipboard-20241230T000706.png">
    

It is fair to note that deciding on a custom string/directory for org attach to use can work, and you can regexp replace, with that coupling of systems.

One solution

I thought a solution was best found by removing this behavior of expanding links before parsing:

(remove-hook 'org-export-before-parsing-functions 'org-attach-expand-links)

After that, a lot of reasonable approaches are possible.

But let me tell you why I decided against that.

I didn't want to maintain different :export backend targets, and I wasn't interested in tracing through how org decided to generate the html for attachments, in fact, I liked how images are just handled by probably one of the most used org-links. Can we hook into machinery for that?

This was likely what led to the design decision in org-attach.el–they didn't want to deal with a new link type having to bloat up the codebase to support, html, md, latex, odt, and others. Even the docs say attachments should behave like files.

I just wish there were a customization setting to specify whether the link should appear relative or absolute.

So all I did was change org-attach-expand after all:

(defun jwow/org-attach-expand (file)
  "Return an expanded relative file path for a FILE where
  links look like [[attachment:FILE]], relative to the org file
The return string may start with data/0a/39..."
  (let* ((attachment-abs-path (expand-file-name file (org-attach-dir)))
         (org-file (buffer-file-name
                    (org-element-property :buffer (org-element-context))))
         (org-file-dir (file-name-directory org-file)))
    (file-relative-name attachment-abs-path org-file-dir)))

Season to taste.

One great thing is that C-c C-o remains functional. This way, all exporters that already handle file links should continue working with your relative path.

UPDATE 2025-01-07: I've realized that this breaks org-display-inline-images, so be careful about that1. The other options we discussed, like an export filter, would still work.

It works!   ATTACH

If it works here's an image of this section before I added this sentence: clipboard-20241230T053742.png

I haven't tested this extensively, but it seems to work for now.

Explore more

Lastly, if you like what I'm doing, please consider sponsoring me.


1

Which is odd in terms of its design, because org-display-inline-images has a check

(and file (file-exists-p file))

after the variable file is bound. And relatives paths pass the file-exists-p check.