且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

何时使用抽象或具体的语法树?

更新时间:2021-12-29 01:38:00

对源语言的所有语义细节进行建模的 AST 就是您所需要的.根据定义,如果它确实对语义进行了正确建模,并且您的语言包含一个三元运算符,那么它也会正确建模应用运算符的特定顺序(例如,优先取模覆盖的结果,例如括号).

An AST that models all the semantic details of the source language is all you need. By definition, if it does model the semantics correctly, and your langauge includes a ternary operator, then it will model the specific order in which the operators are applied (e.g, the results of the predecence modulo overrides such as parentheses) correctly too.

所以你的问题不在于 AST.它使用优先级不同的类似(三元)运算符生成另一种语言.

So your problem isn't in the AST. It is generating to another language using similar (ternary) operators whose precedence is different.

这是代码生成中的一个古老问题:目标的运算符与源的运算符不完全匹配,因此输出不能是一对一的.在您的情况下,您应该能够通过生成带有括号的 PHP 三元运算符来控制顺序以实现原始语义来解决问题,因此这不是一个大问题.

This is an age-old problem in code generation: the operators of the target don't quite match the operators of the source, and so the output can't be one-to-one. In your case, you should be able to solve the problem by generating PHP ternary operators with parentheses around them to control the order to achieve what the original semantics are, so this isn't a big problem.

一般来说,生成实现预期结果的代码序列可能非常复杂,而且有很多方法可以做到.这就是编译器书籍厚而不薄的原因.您似乎已经隐含地选择了获取 AST,走 AST,吐出代码";这几乎是一个即时代码生成器.如果您不在乎生成的代码是否特别好,并且目标语言与源语言非常接近,那么这种方法就足够了.

In general, generating sequences of code that achieve a desired result can be pretty complicated, and there's lots of ways to do it. That's why the compiler books are thick rather than thin. You seem to have implicitly settled on "get AST, walk AST, spit code"; this is almost an on-the-fly code generator. And this works adequately if you don't care if the generated code is particularly good, and the target language is pretty close to the source language.

如果代码生成问题更复杂,通常会使用 AST 来生成计算的数据流模型,该模型由产生结果的运算符组成,并消耗先前运算符的结果,以获取变量值和常量的运算符".然后遍历数据流表示生成代码;这样做的好处是您可以在数据流表示中选择一个运算符,在目标语言中找到匹配的代码序列,然后生成它,然后再考虑如何收集操作数.更好的方案将数据流子图(表示等效的复合目标语言结构)与生成的数据流图相匹配;这可以产生明显更好的代码.通常,可以在原始代码生成后应用特定于目标语言的优化来生成更好的代码.在这两种情况下,您都必须担心管理操作员结果;它们可以直接提供给下一个目标语言运算符,还是必须进入某种临时存储(对于机器代码,这可以是另一个寄存器或内存位置).做到这一切并不容易.同样,这就是编译器书籍不薄的原因.

If the code generation problem is more complex, what normally happens is the AST is used to generate what amounts to a data flow model of the computation, consisting of operators that produce results, and consume results from previous operators, grounding out in "operators" that fetch variable values and constants. Then the data flow representation is traversed to generate the code; this has the advantage that you can choose an operator in the data flow representation, find a matching code sequence in the target language, generate that, and then worry about how the operands are collected. Better schemes match data flow subgraphs (representing equivalent compound target language constructs) to the produced data flow graph; this can produce significantly better code. Often, one can apply target-language specific optimizations after raw code generation to produce even better code. In both cases, you have to worry about managing the operator results; can they be fed directly to the next target language operator, or must they go into some kind of temporary storage (for machine code, this can be another register or a memory location). Doing all this isn't easy; again, that's why the compiler books aren't thin.

这个想法的一个变体是源到源程序转换.这将源代码中的构造直接"映射到目标代码中的构造,尽管这通常是通过操作 AST 在幕后完成的,因为未解析的编程语言文本难以匹配.我们的DMS 软件再造工具包就是这种系统的一个例子.使用这样的工具,您可以在源语言中编写模式(隐式匹配解析树),并在目标语言中编写相应的模式(隐式生成目标语言 AST).您可以编写复杂的源或目标构造,以提供上述数据流图匹配的大部分效果.生成后优化包含更多重写规则,将目标代码转换为目标代码.

A variation on this idea is source-to-source program transformations. This maps constructs in source code "directly" to constructs in target code, although this is usually done behind the scenes by operating on ASTs because unparsed programming language text is hard to match. Our DMS Software Reengineering Toolkit is an example of this kind of system. With such a tool, you write patterns in the source language (which implicitly match against a parse tree), and corresponding patterns in the target langauge (implicitly producing target language ASTs). You can write complex source or target constructs giving much of the effect of the data flow graph matching above. Post-generation optimization consists of more rewrite rules that transform the target code to target code.

底线:除非您的翻译非常简单,否则拥有 AST 是不够的.您可以在这个 SO 答案中阅读更多关于您需要什么的信息:https://***.com/a/3460977/120163

Bottom line: Having an AST isn't really enough unless your translation is truly trivial. You can read more about what you do need at this SO answer: https://***.com/a/3460977/120163

警告:强烈的意见如下.

WARNING: Loud opinion follows.

关于转码器":我更喜欢术语编译"、翻译"或源到源"编译器.近 40 年来,我一直在构建程序分析和操作工具.在遇到这个 SO 问题之前,我从来没有听说过转码器"这个词:将遗留 Cobol/PL1 迁移到 Java 的经验和一个描述恕我直言一个真正糟糕的代码翻译方案的回应,称为 NACA.我听说这个词越来越受欢迎;我不明白为什么当我们有完全足够的术语时我们必须发明另一个术语.通常这是某人发明了大祭司的标志;让我们发明一个闪亮的新术语,这样人们就不会真正理解我们在做什么".我很高兴把这个词留给如此糟糕的翻译.

Re "Transcoder": I much prefer the term "compilation", "translation", or "source-to-source" compiler. I've been building program analysis and manipulation tools for almost 40 years. I'd never heard the term "transcoder" until I encountered this SO question: Experience migrating legacy Cobol/PL1 to Java and a response describing IMHO a truly awful code translation scheme called NACA. I've heard since that this term is gaining traction; I don't see why we had to invent another term when we have perfectly adequate ones. Usually this is a sign of somebody inventing a high priesthood; "let's invent a shiny new term so people don't really understand what we're doing". I'm happy to leave that term to such truly awful translations.