且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

弹性默认规则

更新时间:2023-01-06 12:17:15

flex默认规则匹配单个字符并将其打印在标准输出上.如果您不想执行此操作,请编写一个明确的规则,该规则匹配单个字符并执行其他操作.

The flex default rule matches a single character and prints it on standard output. If you don't want that action, write an explicit rule which matches a single character and does something else.

模式(.|\n)*将整个输入文件作为单个标记进行匹配,因此这是一个非常糟糕的主意.您以为默认值应该是长匹配,但实际上您希望该值尽可能短(但不能为空).

The pattern (.|\n)* matches the entire input file as a single token, so that is a very bad idea. You're thinking that the default should be a long match, but in fact you want that to be as short as possible (but not empty).

默认规则的目的是在输入语言中的任何标记都不匹配时执行某些操作.当使用lex标记语言时,这种情况几乎总是错误的,因为这意味着输入以一个字符开头,而不是该语言的任何有效标记的开头.

The purpose of the default rule is to do something when there is no match for any of the tokens in the input language. When lex is used for tokenizing a language, such a situation is almost always erroneous because it means that the input begins with a character which is not the start of any valid token of the language.

因此,捕获任何字符"规则被编码为错误恢复的一种形式.这个想法是丢弃不良字符(仅一个),并尝试从该字符之后的字符中进行标记化.这只是一个猜测,但这是一个很好的猜测,因为它基于已知的东西:即输入中有一个坏字符.

Thus, a "catch any character" rule is coded as a form of error recovery. The idea is to discard the bad character (just one) and try tokenizing from the character after that one. This is only a guess, but it's a good guess because it's based on what is known: namely that there is one bad character in the input.

恢复规则可能是错误的.例如,假设该语言的令牌都不以@开头,并且程序员希望编写字符串文字"@abc".只是,她忘记了开头"并写了@abc".正确的解决方法是插入缺少的",而不是丢弃@.但这将需要在词法分析器中使用一套更为巧妙的规则.

The recovery rule can be wrong. For instance suppose that no token of the language begins with @, and the programmer wanted to write the string literal "@abc". Only, she forgot the opening " and wrote @abc". The right fix is to insert the missing ", not to discard the @. But that would require a much more clever set of rules in the lexer.

无论如何,通常在丢弃一个坏字符时,您希望针对这种情况发出错误消息,例如在第42行的第3列中跳过无效字符'〜`".

Anyway, usually when discarding a bad character, you want to issue an error message for this case like "skipping invalid character '~` in line 42, column 3".

将lex用于文本过滤时,将不匹配字符复制到标准输出的默认规则/操作非常有用.然后,默认规则带来了正则表达式搜索的语义(与正则表达式匹配相反):想法是在输入中搜索词法分析器的令牌识别状态机的匹配项,同时打印该搜索跳过的所有材料.

The default rule/action of copying the unmatched character to standard output is useful when lex is used for text filtering. The default rule then brings about the semantics of a regex search (as opposed to a regex match): the idea is to search the input for matches of the lexer's token-recognizing state machine, while printing all material that is skipped by that search.

例如,一个仅包含规则的lex规范:

So for instance, a lex specification containing just the rule:

 "foo" { printf("bar"); }

将实现

 sed -e 's/foo/bar/g'