Skip to main content

parsing restructured text using docutils

At work I have decided to use Sphinx+ReSTructured text for documenting my firmware and it turned out that at multiple points of development-testing-production it would be useful to have the firmware UART API in JSON format for easy validation of API requests.

Docutils contain a perfectly usable parser for ReSTructured text and I've decided to use it.

The docutils document layout format takes some getting used to, but it's not complicated.

For a start I've lifted this function from ProgramCreek:

def parse_rst(text: str) -> docutils.nodes.document:
    parser = docutils.parsers.rst.Parser()
    components = (docutils.parsers.rst.Parser,)
    settings = docutils.frontend.OptionParser(components=components).get_default_values()
    settings.tab_width = 4
    settings.pep_references = None
    settings.rfc_references = None
    document = docutils.utils.new_document('<rst-doc>', settings=settings)
    parser.parse(text, document)
    return document

This document is then a node that you may walk(), traverse() or whatever you're into.

What I did next was create a custom visitor that will be picking out individual nodes from the doctree when I walk() it:

class MyVisitor(docutils.nodes.NodeVisitor):
    def __init__(self, document):
        self.titles = []
        self.tables = []
        super().__init__(document)
    def visit_table(self, node):
        self.tables.append(node)
    def visit_title(self, node):
        self.titles.append(child.astext())
    def unknown_visit(self, node):
        pass

Then I read the file and parse it:

file = open("doc/Commands-Messages.rst", "r")
doc = parse_rst(file.read())

visitor = MyVisitor(doc)
doc.walk(visitor)

Of course, you may do some selection based on node parent or children (always an array of children because tree), node tagname is also useful.