且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在ColdFusion中使用带有cgi.PATH_INFO的URL中的unicode时出现问题

更新时间:2022-01-29 01:03:57

是的,这不是ColdFusion的错。这是一个常见的问题。

Yeah, it's not really ColdFusion's fault. It's a common problem.

这主要是原始CGI规范的错误,它指定 PATH_INFO 必须是%-decoded,因此丢失了原始的%xx 字节序列,这些字节序列可以让你找出真正意义上的字符。

It's mostly the fault of the original CGI specification, which specifies that PATH_INFO has to be %-decoded, thus losing the original %xx byte sequences that would have allowed you to work out which real characters were meant.

这部分是IIS的错,因为它总是试图在路径部分读取提交的%xx 字节为UTF-8编码的Unicode(除非路径是'n'这是一个有效的UTF-8字节序列,在这种情况下,它会为Windows默认代码页填充,但是没有办法发现这已经发生了)。完成后,它将它作为Unicode字符串放入环境变量中(因为envvars是Windows下的Unicode)。

And it's partly IIS's fault, because it always tries to read submitted %xx bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).

但是大多数使用C stdio的基于字节的工具(和我假设这适用于ColdFusion,就像在Perl,Python 2,PHP等下一样。)然后尝试将环境变量读取为字节,并且MS C运行时使用Windows默认代码页再次对Unicode内容进行编码。因此,任何不适合默认代码页的字符都会丢失。这将包括在西方Windows安装上运行时的阿拉伯字符。

However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.

一个聪明的脚本,可以直接访问Win32 GetEnvironmentVariableW API可以调用它来检索本机Unicode环境变量,然后它们可以编码为UTF-8或其他任何他们想要的东西,假设输入也是UTF-8(这是你今天通常想要的) 。但是,我不认为CodeFusion会为您提供此访问权限,并且无论如何它只能从IIS6开始工作; IIS5.x会在它们到达环境变量之前丢弃任何非默认代码页字符。

A clever script that has direct access to the Win32 GetEnvironmentVariableW API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.

否则,***的选择是URL重写。如果CF以上的图层可以将 search.cfm /القاهرة转换为 search.cfm /?q =القاهرة那么你不要遇到同样的问题,因为 QUERY_STRING 变量与 PATH_INFO 不同,未指定为%-decoded ,所以%xx 字节仍保留在CF级别的工具可以看到的位置。

Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that search.cfm/القاهرة to search.cfm/?q=القاهرة then you don't face the same problem, as the QUERY_STRING variable, unlike PATH_INFO, is not specified to be %-decoded, so the %xx bytes remain where a tool at CF's level can see them.