且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将PowerShell的默认输出编码更改为UTF-8

更新时间:2022-12-22 14:59:50

注意:以下内容适用于 Windows PowerShell .
有关跨平台的 PowerShell Core (v6 +)版本,请参见下一部分.

  • PSv5.1或更高版本上,其中>>>实际上是Out-File的别名,您可以设置>的默认编码/>>/Out-File通过$PSDefaultParameterValues首选项变量:

    • $PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
  • PSv5.0或更低版本上,您 不能更改>/>> 的编码,但是,在 PSv3或更高版本上,上述技术 可以对Out-File 进行显式调用.
    ($PSDefaultParameterValues首选项变量是在PSv3.0中引入的.)

  • PSv3.0或更高版本上,如果要为支持
    的所有所有 cmdlet设置默认编码 一个-Encoding参数
    (在PSv5.1 +中包括>>>),请使用:

    • $PSDefaultParameterValues['*:Encoding'] = 'utf8'

如果将此命令放置在$PROFILE 中,则Out-FileSet-Content 的cmdlet将默认使用UTF-8编码,但是请注意这使它成为会话全局设置,它将影响所有未明确指定编码的命令/脚本.

类似地,请确保将这样的命令包含在您希望以相同方式运行的脚本或模块中,以便即使在由另一个用户或另一台计算机运行时,它们的运行方式也确实相同.

注意事项: PowerShell,从v5.1开始,总是创建具有 en.wikipedia.org/wiki/Byte_order_mark#UTF-8"rel =" noreferrer>(伪)BOM ,仅在 Windows 中>世界-基于​​ Unix 的实用程序无法识别此BOM(请参阅底部);请参阅这篇文章以了解创建无BOM的UTF-8文件的解决方法.

有关许多Windows PowerShell标准cmdlet的默认字符编码行为不一致的摘要,请参见底部.


自动$OutputEncoding变量是不相关 ,并且仅适用于PowerShell与外部程序进行通信的方式(PowerShell在发送时使用的编码方式)字符串)-与输出重定向操作符和PowerShell cmdlet用于保存到文件的编码无关.


可选阅读:跨平台角度:PowerShell Core :

PowerShell现在通过其 PowerShell Core 跨平台 > 版本,其编码-明智地-默认为无BOM的UTF-8 ,与类似Unix的平台一致.

  • 这意味着不带BOM的源代码文件被假定为UTF-8,并且使用>/Out-File/Set-Content的默认值是无BOM 的UTF -8;明确使用utf8 -Encoding参数也会创建 less BOM UTF-8,但是您可以选择使用utf8bom伪BOM创建文件 . >值.

  • 如果在类似Unix的平台上使用编辑器创建PowerShell脚本,如今甚至在 Windows 上使用跨平台编辑器(例如Visual Studio Code和Sublime Text)创建PowerShell,则生成的文件通常具有UTF-8伪BOM:

    • 这在PowerShell Core 上正常运行.
    • 如果文件包含非ASCII字符,则可能在 Windows PowerShell 上中断;如果您确实需要在脚本中使用非ASCII字符,请将其另存为带有BOM的UTF-8 .
      如果没有BOM,Windows PowerShell(mis)会将您的脚本解释为在旧版"ANSI"代码页中编码(由Unicode之前的应用程序的系统语言环境确定;例如,在英语系统中为Windows-1252).
  • 相反,具有 的文件具有UTF-8伪BOM在类似Unix的平台上可能会出现问题,因为它们会导致Unix实用程序,例如catsed,和awk-甚至某些编辑器(例如gedit-)通过传递伪BOM,即将其视为数据.

    • 这可能永远不是问题,但绝对可以,例如,当您尝试使用text=$(cat file)text=$(<file)将文件读入bash中的字符串时-结果变量将包含伪BOM作为前3个字节.

Windows PowerShell 中的默认编码行为不一致:

令人遗憾的是,Windows PowerShell中使用的默认字符编码完全不一致.上一节中讨论的跨平台PowerShell Core 版本值得称赞,并以此结束.

注意:

  • 以下内容并不希望涵盖所有标准cmdlet.

  • 现在,
  • 搜索cmdlet名称以查找其帮助主题,默认情况下会向您显示主题的PowerShell Core 版本;使用左侧主题列表上方的版本下拉列表切换到 Windows PowerShell 版本.

  • 在撰写本文时,文档经常错误地声称ASCII是Windows PowerShell中的默认编码-请参见此GitHub文档问题.


编写

的cmdlet

Out-File>/>>创建"Unicode"- UTF-16LE -默认情况下的文件-每个ASCII范围字符(太)都用 2 字节表示-与Set-Content/Add-Content明显不同(请参阅下一点); New-ModuleManifestExport-CliXml还会创建UTF-16LE文件.

Set-Content(如果文件尚不存在/为空,则为Add-Content)使用ANSI编码(由活动系统区域设置的ANSI旧版代码页指定的编码,PowerShell将其称为Default.)>

Export-Csv确实创建了ASCII文件,如记录所示,但请参阅下面有关-Append的说明.

Export-PSSession默认情况下使用BOM创建UTF-8文件.

New-Item -Type File -Value当前创建的是无BOM的(!)UTF-8.

Send-MailMessage帮助主题还声称ASCII编码是默认设置-我尚未亲自验证该声明.

重新附加到现有文件的命令:

>>/Out-File -Append尝试 no 尝试匹配文件现有内容的编码. 也就是说,除非-Encoding另有指示,否则它们会盲目地应用其默认编码,而>>则没有此选项(除非在PSv5.1 +中通过$PSDefaultParameterValues间接进行,如上所示). 简而言之:您必须知道现有文件内容的编码,并使用相同的编码进行追加.

Add-Content是值得称赞的例外:在没有显式-Encoding参数的情况下,它会检测到现有的编码并将其自动应用于新内容. ***.com/users/6654942/js2010">js2010 .请注意,在Windows PowerShell中,这意味着如果现有内容没有BOM,则将应用ANSI编码,而在PowerShell Core中则使用UTF-8.

Out-File -Append/>>Add-Content之间的这种不一致,这也会影响PowerShell Core ./issues/9423#issue-435311840"rel =" noreferrer>此GitHub问题.

Export-Csv -Append 部分与现有编码匹配:如果现有文件的编码为ASCII/UTF-8/ANSI中的任何一种,它将盲目附加 UTF-8 匹配UTF-16LE和UTF-16BE.
换句话说:在没有BOM的情况下,Export-Csv -Append假定UTF-8是,而Add-Content假定ANSI.


读取 (在没有BOM的情况下使用的编码)的cmdlet:

Get-ContentImport-PowerShellDataFile默认为ANSI(Default),与Set-Content一致.
当从文件中读取源代码时,PowerShell引擎本身也会默认使用ANSI.

相反,在没有BOM的情况下,Import-CsvImport-CliXmlSelect-String假定为UTF-8.

By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.

It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8 but this is awkward to have to repeat every time.

The persistent way to set things in PowerShell is to put them in \Users\me\Documents\WindowsPowerShell\profile.ps1; I've verified that this file is indeed executed on startup.

It has been said that the output encoding can be set with $PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'} but I've tried this and it had no effect.

https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.

How do you set PowerShell to use UTF-8?

Note: The following applies to Windows PowerShell.
See the next section for the cross-platform PowerShell Core (v6+) edition.

  • On PSv5.1 or higher, where > and >> are effectively aliases of Out-File, you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable:

    • $PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
  • On PSv5.0 or below, you cannot change the encoding for > / >>, but, on PSv3 or higher, the above technique does work for explicit calls to Out-File.
    (The $PSDefaultParameterValues preference variable was introduced in PSv3.0).

  • On PSv3.0 or higher, if you want to set the default encoding for all cmdlets that support
    an -Encoding parameter
    (which in PSv5.1+ includes > and >>), use:

    • $PSDefaultParameterValues['*:Encoding'] = 'utf8'

If you place this command in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding.

Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine.

Caveat: PowerShell, as of v5.1, invariably creates UTF-8 files with a (pseudo) BOM, which is customary only in the Windows world - Unix-based utilities do not recognize this BOM (see bottom); see this post for workarounds that create BOM-less UTF-8 files.

For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.


The automatic $OutputEncoding variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.


Optional reading: The cross-platform perspective: PowerShell Core:

PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.

  • This means that source-code files without a BOM are assumed to be UTF-8, and using > / Out-File / Set-Content defaults to BOM-less UTF-8; explicit use of the utf8 -Encoding argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with the utf8bom value.

  • If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting *.ps1 file will typically not have a UTF-8 pseudo-BOM:

    • This works fine on PowerShell Core.
    • It may break on Windows PowerShell, if the file contains non-ASCII characters; if you do need to use non-ASCII characters in your scripts, save them as UTF-8 with BOM.
      Without the BOM, Windows PowerShell (mis)interprets your script as being encoded in the legacy "ANSI" codepage (determined by the system locale for pre-Unicode applications; e.g., Windows-1252 on US-English systems).
  • Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as cat, sed, and awk - and even some editors such as gedit - to pass the pseudo-BOM through, i.e., to treat it as data.

    • This may not always be a problem, but definitely can be, such as when you try to read a file into a string in bash with, say, text=$(cat file) or text=$(<file) - the resulting variable will contain the pseudo-BOM as the first 3 bytes.

Inconsistent default encoding behavior in Windows PowerShell:

Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.

Note:

  • The following doesn't aspire to cover all standard cmdlets.

  • Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.

  • As of this writing, the documentation frequently incorrectly claims that ASCII is the default encoding in Windows PowerShell - see this GitHub docs issue.


Cmdlets that write:

Out-File and > / >> create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content / Add-Content (see next point); New-ModuleManifest and Export-CliXml also create UTF-16LE files.

Set-Content (and Add-Content if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default).

Export-Csv indeed creates ASCII files, as documented, but see the notes re -Append below.

Export-PSSession creates UTF-8 files with BOM by default.

New-Item -Type File -Value currently creates BOM-less(!) UTF-8.

The Send-MailMessage help topic also claims that ASCII encoding is the default - I have not personally verified that claim.

Re commands that append to an existing file:

>> / Out-File -Append make no attempt to match the encoding of a file's existing content. That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding, which is not an option with >> (except indirectly in PSv5.1+, via $PSDefaultParameterValues, as shown above). In short: you must know the encoding of an existing file's content and append using that same encoding.

Add-Content is the laudable exception: in the absence of an explicit -Encoding argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.

This inconsistency between Out-File -Append / >> and Add-Content, which also affects PowerShell Core, is discussed in this GitHub issue.

Export-Csv -Append partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append assumes UTF-8 is, whereas Add-Content assumes ANSI.


Cmdlets that read (encoding used in the absence of a BOM):

Get-Content and Import-PowerShellDataFile default to ANSI (Default), which is consistent with Set-Content.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.

By contrast, Import-Csv, Import-CliXml and Select-String assume UTF-8 in the absence of a BOM.