更新时间:2023-12-05 15:36:16
易于使用perl:
perl -CSDA -plE 's/\s/ /g' file
但正如@ mklement0在评论中明确表示的那样,它也将匹配 \ t
(TAB).如果出现问题,可以使用
but as @mklement0 corectly said in comment, it will match the \t
(TAB) too. If this is problem, you could use
perl -CSDA -plE 's/[^\S\t]/ /g'
演示:
X X
以上包含:
U+00058 X LATIN CAPITAL LETTER X
U+01680 OGHAM SPACE MARK
U+02002 EN SPACE
U+02003 EM SPACE
U+02004 THREE-PER-EM SPACE
U+02005 FOUR-PER-EM SPACE
U+02006 SIX-PER-EM SPACE
U+02007 FIGURE SPACE
U+02008 PUNCTUATION SPACE
U+02009 THIN SPACE
U+0200A HAIR SPACE
U+0202F NARROW NO-BREAK SPACE
U+0205F MEDIUM MATHEMATICAL SPACE
U+03000 IDEOGRAPHIC SPACE
U+00058 X LATIN CAPITAL LETTER X
使用:
perl -CSDA -plE 's/\s/_/g' <<<"X X"
请注意,要使演示替换为下划线,请打印
note, for the demo replacing to underscore, prints
X_____________X
也可以使用纯bash来实现
also, doable using pure bash
LC_ALL=en_US.UTF-8 spaces=$(printf "%b" "\U00A0\U1680\U180E\U2000\U2001\U2002\U2003\U2004\U2005\U2006\U2007\U2008\U2009\U200A\U200B\U202F\U205F\U3000\UFEFF")
while read -r line; do
echo "${line//[$spaces]/ }"
done
仅当您的默认语言环境不是 UTF-8
时,才需要 LC_ALL = en_US.UTF-8
.(如果您使用utf8文本,则应该具有):)演示:
The LC_ALL=en_US.UTF-8
is necessary only if your default locale isn't UTF-8
. (which you should have, if do you working with utf8 texts) :)
demo:
str="X X"
echo "${str//[$spaces]/_}"
再次打印:
X_____________X
与 sed
相同-如上准备变量 $ spaces
并使用:
same using sed
- prepare the variable $spaces
as above and use:
sed "s/[$spaces]/ /g" file
编辑-因为一些奇怪的复制/粘贴(或区域设置)问题:
Edit - because some strange copy/paste (or Locale) problems:
xxd -ps <<<"$spaces"
显示
c2a0e19a80e1a08ee28080e28081e28082e28083e28084e28085e28086e2
8087e28088e28089e2808ae2808be280afe2819fe38080efbbbf0a
md5
摘要(两个不同的程序)
the md5
digest (two different programs)
md5sum <<<"$spaces"
LC_ALL=C md5 <<<"$spaces"
打印相同的 md5
35cf5e1d7a5f512031d18f3d2ec6612f -
35cf5e1d7a5f512031d18f3d2ec6612f