parsing restructured text using docutils
At work I have decided to use Sphinx+ReSTructured text for documenting my firmware and it turned out that at multiple points of development-testing-production it would be useful to have the firmware UART API in JSON format for easy validation of API requests.
Docutils contain a perfectly usable parser for ReSTructured text and I've decided to use it.
The docutils document layout format takes some getting used to, but it's not complicated.
For a start I've lifted this function from ProgramCreek:
def parse_rst(text: str) -> docutils.nodes.document: parser = docutils.parsers.rst.Parser() components = (docutils.parsers.rst.Parser,) settings = docutils.frontend.OptionParser(components=components).get_default_values() settings.tab_width = 4 settings.pep_references = None settings.rfc_references = None document = docutils.utils.new_document('<rst-doc>', settings=settings) parser.parse(text, document) return document
This document is then a node that you may walk(), traverse() or whatever you're into.
What I did next was create a custom visitor that will be picking out individual nodes from the doctree when I walk() it:
class MyVisitor(docutils.nodes.NodeVisitor): def __init__(self, document): self.titles = [] self.tables = [] super().__init__(document) def visit_table(self, node): self.tables.append(node) def visit_title(self, node): self.titles.append(child.astext()) def unknown_visit(self, node): pass
Then I read the file and parse it:
file = open("doc/Commands-Messages.rst", "r") doc = parse_rst(file.read()) visitor = MyVisitor(doc) doc.walk(visitor)
Of course, you may do some selection based on node parent or children (always an array of children because tree), node tagname is also useful.