且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

查询字符串参数的Java URL编码

更新时间:2021-07-31 23:06:51

URLEncoder 是要走的路.您只需要记住编码单个查询字符串参数名称和/或值,而不是整个 URL,当然不是查询字符串参数分隔符 & 也不是参数名-值分隔符 =.

URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.

String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);

当您仍未使用 Java 10 或更高版本时,请使用 StandardCharsets.UTF_8.toString() 作为字符集参数,或者当您仍未使用 Java 7 或更高版本时,请使用UTF-8".

When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".

请注意,查询参数中的空格由 + 表示,而不是 %20,后者是合法有效的.%20 通常用于表示 URI 本身中的空格(URI-查询字符串分隔符 ? 之前的部分),而不是在查询字符串中(后面的部分)?).

Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).

另请注意,有三个 encode() 方法.一个没有 Charset 作为第二个参数,另一个使用 String 作为第二个参数,它会抛出一个检查异常.不推荐使用没有 Charset 参数的那个.永远不要使用它并始终指定 Charset 参数.javadoc 甚至明确建议使用 UTF-8 编码,正如 RFC3986W3C.

Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.

所有其他字符都是不安全的,首先使用某种编码方案将其转换为一个或多个字节.然后每个字节由 3 个字符的字符串%xy"表示,其中 xy 是字节的两位十六进制表示.推荐使用的编码方案是 UTF-8.但是,出于兼容性原因,如果未指定编码,则使用平台的默认编码.

All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

另见: