且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

最短的查询字符串PHP中的数字索引数组

更新时间:2023-02-19 12:53:18

什么 http_build_query 做的是序列化阵列URL的常用方法。 PHP自动反序列化在 $ _ GET

Default PHP way

What http_build_query does is a common way to serialize arrays to URL. PHP automatically deserializes it in $_GET.

当想要序列化只是一个(非关联)的整数数组,你还有其他的选择。

When wanting to serialize just a (non-associative) array of integers, you have other options.

有关小数组,转化为强调分隔的列表是非常方便快捷的。它是由 $ FS =做破灭('_',$ FS)。然后,您的网址应该是这样的:

For small arrays, conversion to underscore-separated list is quite convenient and efficient. It is done by $fs = implode('_', $fs). Then your URL would look like this:

http://example.com/?c=asdf&fs=5_12_99

缺点是,你必须明确地爆炸('_',$ _GET ['FS'])来获取值回作为数组。

其他的分隔符,可以用太。下划线被视为字母,因此很少具有特殊的意义。在网址,它通常被用作空间置换(例如的页面)。这是很难在下划线的文本使用时区分。连字符空间另一种常见的替代品。它也经常用作减号。逗号是一个典型的列表分隔,但不像在下划线和连字符%的恩codeD由 http_build_query 并拥有几乎无处不在特殊的意义。类似的情况是竖条(管子)。

Other delimiters may be used too. Underscore is considered alphanumeric and as such rarely has special meaning. In URLs, it is usually used as space replacement (e.g. by MediaWiki). It is hard to distinguish when used in underlined text. Hyphen is another common replacement for space. It is also often used as minus sign. Comma is a typical list separator, but unlike underscore and hyphen in is percent-encoded by http_build_query and has special meaning almost everywhere. Similar situation is with vertical bar ("pipe").

在有大的URL阵列时,你首先应该停止编码开始思考。这通常表示不好的设计。不会POST HTTP方法更合适?你没有标识解决资源的任何更具可读性和空间有效的方式?

When having large arrays in URLs, you should first stop coding a start thinking. This almost always indicates bad design. Wouldn’t POST HTTP method be more appropriate? Don’t you have any more readable and space efficient way of identifying the addressed resource?

网址,***应易于理解和(至少部分地)记得。放置一个大的blob里面实在是一个糟糕的主意。

URLs should ideally be easy to understand and (at least partially) remember. Placing a large blob inside is really a bad idea.

现在我警告过你。如果您还需要嵌入URL大阵,勇往直前。 COM preSS的数据,尽你所能,的base64 克斯code他们转换的二进制数据文本和 URL-CN code 文本消毒它在URL中嵌入。

Now I warned you. If you still need to embed a large array in URL, go ahead. Compress the data as much as you can, base64-encode them to convert the binary blob to text and url-encode the text to sanitize it for embedding in URL.

嗯。或者更好的使用的base64 的修改后的版本。我选择的人正在使用

Mmm. Or better use a modified version of base64. The one of my choice is using


  • - 而不是 +

  • _ 而不是 /

  • 忽略填充 =

  • - instead of +,
  • _ instead of / and
  • omits the padding =.
define('URL_BASE64_FROM', '+/');
define('URL_BASE64_TO', '-_');
function url_base64_encode($data) {
    $encoded = base64_encode($data);
    if ($encoded === false) {
        return false;
    }
    return str_replace('=', '', strtr($encoded, URL_BASE64_FROM, URL_BASE64_TO));
}
function url_base64_decode($data) {
    $len = strlen($data);
    if (is_null($len)) {
        return false;
    }
    $padded = str_pad($data, 4 - $len % 4, '=', STR_PAD_RIGHT);
    return base64_decode(strtr($padded, URL_BASE64_TO, URL_BASE64_FROM));
}

这节省了每个字符两个字节,否则是百分之恩codeD。有没有需要调用 urlen code 功能了。

This saves two bytes on each character, that would be percent-encoded otherwise. There is no need to call urlencode function, too.

的gzip之间作出选择( gzcom preSS )和bzip2( bzcom preSS )应作出。不想在自己比较花费时间,gzip的外观在几个比较小的投入(约100字)为块大小的任何设置好。

Choice between gzip (gzcompress) and bzip2 (bzcompress) should be made. Do not want to invest time in their comparison, gzip looks better on several relatively small inputs (around 100 chars) for any setting of block size.

但是,什么数据应该被送入COM pression算法

But what data should be fed into the compression algorithm?

在C,人会投整数数组字符数组(字节),并把它交给了COM pression功能。这是做事的最明显的方式。在PHP中最明显的方式来做事是用分隔符所有的整数转换成十进制再presentation为字符串,然后拼接,并且只有COM pression后。什么浪费空间!

In C, one would cast array of integers to array of chars (bytes) and hand it over to the compression function. That’s the most obvious way to do things. In PHP the most obvious way to do things is converting all the integers to their decimal representation as strings, then concatenation using delimiters, and only after that compression. What a waste of space!

那么,让我们用C的办法!我们将开始使用的

So, let’s use the C approach! We’ll get rid of the delimiters and otherwise wasted space and encode each integer in 2 bytes using pack:

define('PACK_NUMS_FORMAT', 'n*');
function pack_nums($num_arr) {
    array_unshift($num_arr, PACK_NUMS_FORMAT);
    return call_user_func_array('pack', $num_arr);
}
function unpack_nums($packed_arr) {
    return unpack(PACK_NUMS_FORMAT, $packed_arr);
}

警告:解压行为是依赖于机器在这种情况下。字节顺序可以机之间切换。但我认为这不会在实践中的问题,因为应用程序将无法在两个系统上不同endianity在同一时间运行。当集成多个系统,不过,这个问题可能会出现。此外,如果您切换到一个系统不同endianity,链接使用原将打破。

Warning: pack and unpack behavior is machine-dependent in this case. Byte order could change between machines. But I think it will not be a problem in practice, because the application will not run on two systems with different endianity at the same time. When integrating multiple systems, though, the problem might arise. Also if you switch to a system with different endianity, links using the original one will break.

现在的包装,COM pression和修改的base64,尽在其中:

Now packing, compression and modified base64, all in one:

function url_embed_array($arr) {
    return url_base64_encode(gzcompress(pack_nums($arr)));
}
function url_parse_array($data) {
    return unpack_nums(gzuncompress(url_base64_decode($data)));
}

查看上IdeOne 结果。它比OP的答案在他的40个元素的数组对我的解决方案产生的91个字符更好,而他的一98使用范围(1,1000)(生成阵列(1,2,3,......,1000))为基准, OP的解决方案产生2712字符,而只是雷字符2032 的。这是一个更好的约25%。

See the result on IdeOne. It is better than OP’s answer where on his 40-element array my solution produced 91 chars while his one 98. When using range(1, 1000) (generates array(1, 2, 3, …, 1000)) as a benchmark, OP’s solution produces 2712 characters while mine just 2032 characters. This is about 25 % better.

有关完整起见,OP的解决方案是

For the sake of completeness, OP’s solution is

function url_embed_array($arr) {
    return urlencode(base64_encode(gzcompress(implode(',', $arr))));
}