且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 URL 查询字符串中使用方括号的数组语法是否有效?

更新时间:2023-11-07 21:11:04

答案并不简单.

以下内容摘自 RFC 3986 的第 3.2.2 节:

由 Internet 协议文字地址标识的主机,版本 6
[RFC3513] 或更高版本,通过包含 IP 文字来区分
方括号内(["和]").这是唯一的地方
URI 语法中允许使用方括号字符.

似乎回答了这个问题,直截了当地声明在 URI 中的任何其他地方都不允许使用方括号.但是方括号字符和百分比编码的方括号字符是有区别的.

以下内容摘自 RFC 3986 第 3 节的开头:

  1. 语法组件

    通用 URI 语法由
    的分层序列组成组件称为方案、权限、路径、查询和
    片段.

    URI = 方案 ":" hier-part [ "?"查询] [ "#" 片段]

所以查询"是URI"的一个组成部分.

以下内容摘自 RFC 3986 的第 2.2 节:

2.2.保留字符

URI 包括由
分隔的组件和子组件保留"集中的字符.这些字符被称为
保留"是因为它们可能(也可能不)被
定义为分隔符通用语法,通过每个方案特定的语法,或通过
URI 解引用算法的特定于实现的语法.
如果 URI 组件的数据与保留的
冲突字符的用途作为分隔符,那么冲突的数据必须
在 URI 形成之前进行百分比编码.

 reserved = gen-delims/sub-delimsgen-delims = ":"/"/"/"?"/"#"/"["/"]"/"@"子delims =!"/"$"/"&"/"'"/"("/")"/"*"/"+"/","/";"/"="

所以方括号可能会出现在查询字符串中,但前提是它们是百分比编码的.除非它们不是,否则将在第 2.2 节中进一步解释:

URI 生成应用程序应该对
的数据八位字节进行百分比编码对应于保留集中的字符,除非这些字符
URI 方案特别允许在该中表示数据
零件.如果在 URI 组件中发现保留字符并且
该字符没有已知的定界角色,那么它必须是
解释为表示对应于那个的数据八位组
US-ASCII 中的字符编码.

因此,因为方括号只允许在主机"子组件中使用,所以它们应该"在其他组件和子组件中进行百分比编码,在这种情况下在查询"组件中,除非 RFC 3986 明确允许未编码的方括号表示查询组件中的数据,而不是.

然而,如果URI 生成应用程序"未能完成它应该"做的事情,即在查询中保留未编码的方括号,那么 URI 的读者不会完全拒绝 URI.相反,方括号将被视为属于查询组件的数据,因为它们在该组件中不用作分隔符.

这就是为什么,例如,当 PHP 接受未编码和百分比编码的方括号作为查询字符串中的有效字符时,它并不违反 RFC 3986,甚至为它们分配特殊用途.但是,试图通过不使用百分比编码方括号来利用此漏洞的作者似乎违反了 RFC 3986.

Is it actually safe/valid to use multidimensional array synthax in the URL query string?

http://example.com?abc[]=123&abc[]=456

It seems to work in every browser and I always thought it was OK to use, but accodring to a comment in this article it is not: http://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/#comment4

I would like to hear a second opinion.

The answer is not simple.

The following is extracted from section 3.2.2 of RFC 3986 :

A host identified by an Internet Protocol literal address, version 6
[RFC3513] or later, is distinguished by enclosing the IP literal
within square brackets ("[" and "]"). This is the only place where
square bracket characters are allowed in the URI syntax.

This seems to answer the question by flatly stating that square brackets are not allowed anywhere else in the URI. But there is a difference between a square bracket character and a percent encoded square bracket character.

The following is extracted from the beginning of section 3 of RFC 3986 :

  1. Syntax Components

    The generic URI syntax consists of a hierarchical sequence of
    components referred to as the scheme, authority, path, query, and
    fragment.

    URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

So the "query" is a component of the "URI".

The following is extracted from section 2.2 of RFC 3986 :

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must
be percent-encoded before the URI is formed.

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So square brackets may appear in a query string, but only if they are percent encoded. Unless they aren't, to be explained further down in section 2.2 :

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So because square brackets are only allowed in the "host" subcomponent, they "should" be percent encoded in other components and subcomponents, and in this case in the "query" component, unless RFC 3986 explicitly allows unencoded square brackets to represent data in the query component, which is does not.

However, if a "URI producing application" fails to do what it "should" do, by leaving square brackets unencoded in the query, then readers of the URI are not to reject the URI outright. Instead, the square brackets are to be considered as belonging to the data of the query component, since they are not used as delimiters in that component.

This is why, for example, it is not a violation of RFC 3986 when PHP accepts both unencoded and percent encoded square brackets as valid characters in a query string, and even assigns to them a special purpose. However, it would appear that authors who try to take advantage of this loophole by not percent encoding square brackets are in violation of RFC 3986.