Changing Org HTML Export, plain-list

Have you ever wished your plain-lists in org mode felt more visually similar to the html rendering? I did. In one small spot.

I'll walk through my debug process, or you can skip straight to Results.

A few days ago, I went inside org html export code in ox-html.el and fixed something that had long bothered me: Inconsistent visual line breaks1.

This is better demonstrated with a sample.

This:

- line item head 1
  this line will be conjoined in html with "line item head 1"

- line item head 2

  this line forces "line item head 2" to be wrapped in a <p>. This line will also be wrapped in a <p>.

Renders this:

<ul class="org-ul">
<li>line item head 1
this line will be conjoined in html with "line item head 1"</li>

<li><p>
line item head 2
</p>

<p>
this line forces "line item head 2" to be wrapped in a &lt;p&gt;. This line will also be wrapped in a &lt;p&gt;.
</p></li>
</ul>

Then the browser renders this:

2024-12-19_12-31-41_screenshot.png

The picture is from a basic org file without CSS. Of course on this blog I do have CSS, so the extra <p> causes more unexpectedness in such cases.

I wanted instead:

  1. A <br> for the "\n" line item head 1, so I visually saw a separate line ("\n" isn't rendered in HTML)2.
  2. <p> s to not wrap the line item head, pretty much ever. Maybe I'm a rube for wanting that. I can solve some visual issues with CSS targetting, but semantically this makes less sense to me.

I spent one long night debugging this.

tl;dr, the "fix"

I copied org-html-paragraph from ox-html.el, and used my own export backend3 that I use with this blog.

  1. Copy org-html-paragraph to jwow/org-html-paragraph and modify code4. I changed one branch of the cond.
    (and (eq parent-type 'item)
         ;; Not sure if this previous element condition is needed; might be extraneous
         (not (org-export-get-previous-element paragraph info))
         (if (string-match-p "\n" contents) 
             (string-replace "\n" "<br>" contents)
           contents))
    
  2. Update the function used for paragraph transformation to jwow/org-html-paragraph in my derived backend.
    ;; a keyword argument to org-export-define-derived-backend
    ;; when a paragraph in the org ast is found, we call our function
    :translate-alist '((paragraph . jwow/org-html-paragraph))
    

This fix is more of an opinion on what should happen. The original html export code was very intententional about when <p> insertion behavior happened, but no comments explained why.

Update 4024-12-23:

I found that I didn't like every line getting a new line. Here's what I mean:

1. plain-list para
   new line 1
   new line 2

became

<li>plain-list para<br>new line 1
<br>new line 2</li>

This visually makes new line 1 appear separate from new line 2. I realized I was used to org's handling of paragraphs where you could write text on a line immediately following the last, and it would be joined into one paragraph.

So changed the regex replacement to this (this matches the first \n found, and only it gets replaced; other \n are untouched)5:

(replace-regexp-in-string "\\(\n\\)\\(.\\|\n\\)*\\'" "<br>" contents nil nil 1)

Now the same org snippet gives

<li>plain-list para<br>new line 1
new line 2</li>

Now the subsequent lines are just how a paragraph would render when one has one line after another.

Debugging Process

Find out where to snag the debugger

I knew I was exporting to html, and ox-html.el was responsible.

I found the line

(plain-list . org-html-plain-list)

So that should be it I thought, try and snag that.

I ran edebug-defun on the transcoder. Then I highlighted a region and ran org-export-region-to-html over line item head 2.

When the breakpoint snagged, I pressed e and ran

(substring-no-properties contents)

which showed

"<li><p>\nline item head 2\n</p>\n\n<p>\nthis line forces \"line item head 2\" to be wrapped in a &lt;p&gt;. This line will also be wrapped in a &lt;p&gt;.</p></li>\n"

At this point I was confused–it was already in HTML by the time I got here.

The stacktrace showed calls like this:

(plist-put attributes :class (org-
(org-html--make-attribute-string (
(format "<%s %s>\n%s</%s>" type (o
(let* ((type (let* ((val (org-elem
org-html-plain-list((plain-list (:
org-export-data((plain-list (:stan
#f(compiled-function (element) #<b
mapconcat(#f(compiled-function (el
org-export-data((section (:standar
#f(compiled-function (element) #<b
mapconcat(#f(compiled-function (el
org-export-data((org-data (:standa
org-export-as(html nil nil t nil) 
org-export-string-as(#("- line ite
org-export-replace-region-by(html)
org-export-region-to-html()       

It was the wrong place

Confused, I wondered what else was responsible for rendering a plain-list. A search in the same file showed hits in org-html-paragraph.

So I instrumented org-html-paragraph

With my breakpoint on org-html-paragraph, it ran before org-html-plain-list.

I also printed out the (substring-no-properties contents):

"line item head 2\n"

It's not in html yet. This must be it!

But what? Where was the paragraph in "- line item head 2"? It looked like a bullet heading to me. What was the relationship between the plain list and a paragraph?

The AST

To answer that question, we need to see what org parsed as the abstract syntax tree.

Each of the transcoders gets as input, an AST node as the first positional argument. That's what I wanted to see.

I didn't find any built in tools to dump it into a format to see the relationships between elements6.

It was only until I found the screenshot on the readme of org_mode_ast_investigation_tool that I realized a plainlist contained an item, and an item contained a paragraph, which was the level in the tree I actually cared about.

Making changes

The rest is in tl;dr, the "fix", where I did some string manipulation..

Results

This:

- line item head 1
  this line will be conjoined in html with "line item head 1"

- line item head 2

  this line forces "line item head 2" to be wrapped in a <p>. This line will also be wrapped in a <p>.

Renders this:

<ul class="org-ul">
<li>line item head 1<br/>this line will be conjoined in html with "line item head 1"<br/></li>

<li>line item head 2<br/>

<p>
this line forces "line item head 2" to be wrapped in a &lt;p&gt;. This line will also be wrapped in a &lt;p&gt;.
</p></li>
</ul>

Then the browser renders this:

2024-12-19_12-54-46_screenshot.png

🚀7

Final Thoughts

It would be great if the AST in org mode were easier to visualize within emacs, using its own built in tools. I couldn't make sense of it with the giant print out easily.

I'm not the best with introspecting objects, so I got stuck on what was considered a "paragraph."

Is there also no stacktrace ability to jump to source of caller? This can be a little difficult with macros, but for standard functions this should be possible. I couldn't find this ability anywhere in edebug's docs. Discord also was crickets.

I also really like the manual, and I think certain sections deserve an update. Export mostly just works.


1

I didn't want the behavior of org-export-preserve-breaks.

2

To visually render a newline, you need a line break element: <br>. It looks like org mode intentionally kept the "\n". I say this because other elements like paragraphs in org actually remove the new lines. I like it because this is good for source control. (Try it!)

3

See org vocabulary for definitions.

4
(defun jwow/org-html-paragraph (paragraph contents info)
  "Transcode a PARAGRAPH element from Org to HTML.
    CONTENTS is the contents of the paragraph, as a string.  INFO is
    the plist used as a communication channel.
    Same as org-html-paragraph, but list items render 
    "
  (let* ((parent (org-element-parent paragraph))
         (parent-type (org-element-type parent))
         (style '((footnote-definition " class=\"footpara\"")
                  (org-data " class=\"footpara\"")))
         (attributes (org-html--make-attribute-string
                      (org-export-read-attribute :attr_html paragraph)))
         (extra (or (cadr (assq parent-type style)) "")))
    (cond
     (
      ;; This condition is when the paragraph is actually the text of the line item in the AST
      ;; Then we simply return contents but replace newlines with <br>
      (and (eq parent-type 'item)
           ;; Not sure if this condition is needed; might be extraneous
           (not (org-export-get-previous-element paragraph info))
           ;; (let ((followers (org-export-get-next-element paragraph info 2)))
           ;;   (and (not (cdr followers))
           ;;        (org-element-type-p (car followers) '(nil plain-list paragraph))))
           )
      ;; First paragraph in an item has no tag if it is alone or
      ;; followed, at most, by a sub-list.
      (if (string-match-p "\n" contents) 
          (string-replace "\n" "<br>" contents)
        contents))
     ((org-html-standalone-image-p paragraph info)
      ;; Standalone image.
      (let ((caption
             (let ((raw (org-export-data
                         (org-export-get-caption paragraph) info))
                   (org-html-standalone-image-predicate
                    #'org-html--has-caption-p))
               (if (not (org-string-nw-p raw)) raw
                 (concat "<span class=\"figure-number\">"
                         (format (org-html--translate "Figure %d:" info)
                                 (org-export-get-ordinal
                                  (org-element-map paragraph 'link
                                    #'identity info t)
                                  info nil #'org-html-standalone-image-p))
                         " </span>"
                         raw))))
            (label (org-html--reference paragraph info)))
        (org-html--wrap-image contents info caption label)))
     ;; Regular paragraph.
     (t (format "<p%s%s>\n%s</p>"
                (if (org-string-nw-p attributes)
                    (concat " " attributes) "")
                extra contents)))))
5

Nothing humbling like regexp. I totally forgot "." wouldn't match on a literal "\n" in the string, which is why it's or​ed with a newline. The \\' in the regexp is special to (replace-regexp-in-string), as it allows you to replace only the first match. Check the help. C-h v replace-regexp-in-string.

6

Edebug and the backtrace can help some.

When in edebug, I pressed d, which pops us into the *Edebug Backtrace* buffer.

The stacktrace was not very readable for this snippet:

org-html-paragraph((paragraph (:standard-properties [3 3 3 20 21 1 nil nil nil nil nil nil nil nil #<buffer  *temp*-992987-822380> nil nil ...])

Pressing + calls backtrace-multi-line at the start of an expression can expand properties. Pressing Enter on ellipses can show

org-html-paragraph((paragraph
                    (:standard-properties
                     [3 3 3 20 21 1 nil nil nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil nil
                        (item
                         (:standard-properties
                          [1 1 3 123 123 0 (:tag) item nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil
                             ((1 0 "- " nil nil nil 123))
                             (plain-list
                              (:standard-properties
                               [1 1 1 123 123 0 nil top-comment nil nil nil nil nil nil #<buffer
                                  *temp*-984820-519693> nil (...) (section ... #11)]
                               :type unordered)
                              #6)]
                          :bullet "- " :checkbox nil :counter nil :pre-blank 0 :tag nil)
                         (paragraph (:standard-properties #5) #("line item head 2\n" 0 17 (:parent #9)))
                         (paragraph
                          (:standard-properties
                           [21 21 21 123 123 0 nil nil nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil nil
                               #6])
                          #("  this line forces \"line item head 2\" to be wrapped in a <p>. This line will also be wrapped in a <p>."
                            0 102 (:parent #10))))])
                    #("line item head 2\n" 0 17 (:parent (paragraph (:standard-properties [3 3 3 20 21 1 nil nil nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil nil (item (:standard-properties [1 1 3 123 123 0 (:tag) item nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil (...) (plain-list ... #12)] :bullet "- " :checkbox nil :counter nil :pre-blank 0 :tag nil) #7 (paragraph (:standard-properties [21 21 21 123 123 0 nil nil nil nil nil nil nil nil #<buffer  *temp*-984820-519693> nil nil #12]) #("  this line forces \"line item head 2\" to be wrapped in a <p>. This line will also be wrapped in a <p>." 0 102 (:parent #16))))]) #4))))

This gave clues, but wasn't easy to comprehend. The :standard-properties are hard to parse out, as they're in a vector and I had no idea what each position meant.

7

If you're paying super close attention, you'll notice that in the org, theres a newline after "this line will be conjoined…" but no visual new line in the browser render. That's ok for me. Most newlines in an org doc don't warrant a visual new line. I often have paragraphs of comments to myself on how to reword a section, or scratch notes, which are spaced with new lines.

If I need to force a newline just for rendering, I'd type "\\ " at the end of my line in org mode which will convert to a <br>.