且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

构建解析器(第一部分)

更新时间:2023-01-11 16:02:47

通常,您希望将令牌生成器(也称为 lexer )的功能与编译器或解释器的其他阶段分开.这样做的原因是基本的模块化:每次通过都消耗一种东西(例如字符),而又产生另一种东西(例如令牌).

Generally, you want to separate the functions of the tokeniser (also called a lexer) from other stages of your compiler or interpreter. The reason for this is basic modularity: each pass consumes one kind of thing (e.g., characters) and produces another one (e.g., tokens).

因此,您已将字符转换为令牌.现在,您想将标记的平面列表转换为有意义的嵌套表达式,这就是通常所说的 parsing .对于类似JavaScript的语言,您应该查看递归下降解析.对于使用具有不同优先级的中缀运算符来解析表达式,普拉特解析非常有用,您可以在特殊情况下,可以使用普通递归下降解析.

So you’ve converted your characters to tokens. Now you want to convert your flat list of tokens to meaningful nested expressions, and this is what is conventionally called parsing. For a JavaScript-like language, you should look into recursive descent parsing. For parsing expressions with infix operators of different precedence levels, Pratt parsing is very useful, and you can fall back on ordinary recursive descent parsing for special cases.

仅根据您的情况为您提供一个更具体的示例,我假设您可以编写两个函数:accept(token)expect(token),这两个函数将测试您创建的流中的下一个标记.您将为您的语言语法中的每种类型的语句或表达式创建函数.例如,下面是statement()函数的Python伪代码:

Just to give you a more concrete example based on your case, I’ll assume you can write two functions: accept(token) and expect(token), which test the next token in the stream you’ve created. You’ll make a function for each type of statement or expression in the grammar of your language. Here’s Pythonish pseudocode for a statement() function, for instance:

def statement():

  if accept("if"):
    x = expression()
    y = statement()
    return IfStatement(x, y)

  elif accept("return"):
    x = expression()
    return ReturnStatement(x)

  elif accept("{")
    xs = []
    while True:
      xs.append(statement())
      if not accept(";"):
        break
    expect("}")
    return Block(xs)

  else:
    error("Invalid statement!")

这为您提供了程序的所谓抽象语法树(AST),然后您可以对其进行操作(优化和分析),输出(编译)或运行(解释).

This gives you what’s called an abstract syntax tree (AST) of your program, which you can then manipulate (optimisation and analysis), output (compilation), or run (interpretation).