且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

CSV文件UTF-8(含BOM)到ANSI/Windows-1251

更新时间:2023-11-27 12:05:16

下一步 VBScript 可能会有所帮助:过程UTF8toANSIUTF-8编码的文本文件转换为另一种编码.

Next VBScript could help: procedure UTF8toANSI converts a UTF-8 encoded text file to another encoding.

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName, ByVal ANSICharSet)
  Dim strText

  With CreateObject("ADODB.Stream")
    .Type = adTypeText

    .Charset = "utf-8"
    .Open
    .LoadFromFile UTF8FName
    strText = .ReadText(adReadAll)
    .Close

    .Charset = ANSICharSet
    .Open
    .WriteText strText, adWriteChar
    .SaveToFile ANSIFName, adSaveCreateOverWrite
    .Close
  End With
End Sub

'UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1250.csv", "windows-1250"
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1251.csv", "windows-1251"
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1253.csv", "windows-1253"

有关系统已知的字符集名称的列表,请参阅Windows注册表中HKEY_CLASSES_ROOT\MIME\Database\Charset的子项:

For a list of the character set names that are known by a system, see the subkeys of HKEY_CLASSES_ROOT\MIME\Database\Charset in the Windows Registry:

for /F "tokens=5* delims=\" %# in ('reg query HKCR\MIME\Database\Charset') do @echo "%#"

数据(38835837utf8.csv文件):

1st Line    1250    852 čeština (Čechie)
2nd Line    1251    966 русский (Россия)
3rd Line    1253    737 ελληνικά (Ελλάδα)

输出表明,那些不能转换为特定字符集的字符是使用

Output shows that those characters that can't be converted to a particular character set are converted using Character Decomposition Mapping (č=>c, š=>s, Č=>C etc.); if not applicable then those are all converted to ? question mark (common replacement character):

==> chcp 1250
Active code page: 1250

==> type D:\test\SO\38835837ansi1250.csv
1st Line        1250    852     čeština (Čechie)
2nd Line        1251    966     ??????? (??????)
3rd Line        1253    737     ???????? (??????)

==> chcp 1251
Active code page: 1251

==> type D:\test\SO\38835837ansi1251.csv
1st Line        1250    852     cestina (Cechie)
2nd Line        1251    966     русский (Россия)
3rd Line        1253    737     ???????? (??????)

==> chcp 1253
Active code page: 1253

==> type D:\test\SO\38835837ansi1253.csv
1st Line        1250    852     cestina (Cechie)
2nd Line        1251    966     ??????? (??????)
3rd Line        1253    737     ελληνικά (Ελλάδα)