且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序?

更新时间:2022-12-18 09:32:38

你的 'm' 和 'g' 单位应该是大写的.GNU sort 手册阅读:

-h --human-numeric-sort --sort=human-numeric

按数字排序,首先按数字符号(负、零或正);然后按 SI 后缀(空,或‘k’或‘K’,或‘MGTPEZY’之一,按该顺序;参见块大小);最后是数值.

您可以像这样使用 GNU sed 更改 curl 的输出:

curl localhost:9200/_cat/indices \|sed 's/[0-9][mgtpezy]/\U&/g'|排序 -k9,9h \|头-n5

产量:

green open index4 1 0 3 0 3.9kb 3.9kb绿色开放指数1 5 1 1021 0 3.2Mb 1.6Mb绿色开放索引2 5 1 8833 0 4.1Mb 2Mb绿色开放索引3 5 1 4500 0 5Mb 2.5Mb绿色开放索引5 3 1 2516794 0 8.6Gb 4.3Gb

其他字母如b"将被视为无单位":

绿色开放索引A 5 1 0 0 1.5kb 800b绿色开放索引E 5 1 0 0 1.5kb 800b绿色开放索引D 5 1 108 11 387.1kb 193.5kb绿色开放指数C 5 1 35998 7106 364.9Mb 182.4Mb绿色开放索引B 5 1 9823178 2268791 152.9Gb 76.4Gb

如果需要,您可以通过管道将排序输出中的单位改回小写字母 sed 's/[0-9][MGTPEZY]/\L&/g'>

This question now answered - scroll to the end of this post for the solution.

Apologies if the answer is already here, but all the answers I have found so far suggest either the -h flag or the -n flag, and neither of those are working for me...

I have some output from a curl command that is giving me several columns of data. One of those columns is a human-readable file size ("1.6mb", "4.3gb" etc).

I am using the unix sort command to sort by the relevant column, but it appears to be trying to sort alphabetically instead of numercially. I have tried using both the -n and the -h flags, but although they do change the order, in neither case is the order numerically correct.

I am on CentOS Linux box, version 7.2.1511. The version of sort I have is "sort (GNU coreutils) 8.22".

I have tried using the -h flag in these different formats:

curl localhost:9200/_cat/indices | sort -k9,9h | head -n5
curl localhost:9200/_cat/indices | sort -k9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k9h | head -n5

I always get these results:

green open indexA            5 1        0       0   1.5kb    800b
green open indexB            5 1  9823178 2268791 152.9gb  76.4gb
green open indexC            5 1    35998    7106 364.9mb 182.4mb
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexE            5 1        0       0   1.5kb    800b

I have tried using the -n flag in the same formats as above:

curl localhost:9200/_cat/indices | sort -k9,9n | head -n5
curl localhost:9200/_cat/indices | sort -k9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k9n | head -n5

I always get these results:

green open index1      5 1     1021       0   3.2mb   1.6mb
green open index2      5 1     8833       0   4.1mb     2mb
green open index3      5 1     4500       0     5mb   2.5mb
green open index4      1 0        3       0   3.9kb   3.9kb
green open index5      3 1  2516794       0   8.6gb   4.3gb

Edit: It turned out there were two problems:

1) sort expects to see capital single letters - M, K and G instead of mb, kb and gb (for bytes you can just leave blank).

2) sort will include leading spaces unless you explicitly exclude them, which messes with the ordering.

The solution is to replace lower case with upper case and use the -b flag to make sort ignore leading spaces (I've based this answer on @Vinicius' solution below, because it's easier to read if you don't know regex):

curl localhost:9200/_cat/indices | tr '[kmg]b' '[KMG] ' | sort -k9hb

Your 'm' and 'g' units should be uppercase. GNU sort manual reads:

-h --human-numeric-sort --sort=human-numeric

Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, or ‘k’ or ‘K’, or one of ‘MGTPEZY’, in that order; see Block size); and finally by numeric value.

You can change the output of curl with GNU sed like this:

curl localhost:9200/_cat/indices \
| sed 's/[0-9][mgtpezy]/\U&/g'
| sort -k9,9h \
| head -n5

Yields:

green open index4      1 0        3       0   3.9kb   3.9kb
green open index1      5 1     1021       0   3.2Mb   1.6Mb
green open index2      5 1     8833       0   4.1Mb     2Mb
green open index3      5 1     4500       0     5Mb   2.5Mb
green open index5      3 1  2516794       0   8.6Gb   4.3Gb

Other letters like "b" will be treated as "no unit":

green open indexA            5 1        0       0   1.5kb    800b
green open indexE            5 1        0       0   1.5kb    800b
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexC            5 1    35998    7106 364.9Mb 182.4Mb
green open indexB            5 1  9823178 2268791 152.9Gb  76.4Gb

If so desired, you can change the units in the sorted output back to lowercase by piping to sed 's/[0-9][MGTPEZY]/\L&/g'