更新时间:2023-11-27 12:05:16
下一步 VBScript 可能会有所帮助:过程UTF8toANSI
将UTF-8
编码的文本文件转换为另一种编码.
Next VBScript could help: procedure UTF8toANSI
converts a UTF-8
encoded text file to another encoding.
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName, ByVal ANSICharSet)
Dim strText
With CreateObject("ADODB.Stream")
.Type = adTypeText
.Charset = "utf-8"
.Open
.LoadFromFile UTF8FName
strText = .ReadText(adReadAll)
.Close
.Charset = ANSICharSet
.Open
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
'UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1250.csv", "windows-1250"
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1251.csv", "windows-1251"
UTF8toANSI "D:\test\SO\38835837utf8.csv", "D:\test\SO\38835837ansi1253.csv", "windows-1253"
有关系统已知的字符集名称的列表,请参阅Windows注册表中HKEY_CLASSES_ROOT\MIME\Database\Charset
的子项:
For a list of the character set names that are known by a system, see the subkeys of HKEY_CLASSES_ROOT\MIME\Database\Charset
in the Windows Registry:
for /F "tokens=5* delims=\" %# in ('reg query HKCR\MIME\Database\Charset') do @echo "%#"
数据(38835837utf8.csv
文件):
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ελληνικά (Ελλάδα)
Output shows that those characters that can't be converted to a particular character set are converted using Character Decomposition Mapping (č
=>c
, š
=>s
, Č
=>C
etc.); if not applicable then those are all converted to ?
question mark (common replacement character):
==> chcp 1250
Active code page: 1250
==> type D:\test\SO\38835837ansi1250.csv
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ???????? (??????)
==> chcp 1251
Active code page: 1251
==> type D:\test\SO\38835837ansi1251.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ???????? (??????)
==> chcp 1253
Active code page: 1253
==> type D:\test\SO\38835837ansi1253.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ελληνικά (Ελλάδα)