更新时间:2022-02-20 08:37:04
基本问题是Unicode和字节字符串之间未转换的混合.解决方案可以转换为单一格式,也可以避免一些麻烦而避免出现问题.我所有的解决方案都包括glob
和shutil
标准库.
The basic problem is the unconverted mix between Unicode and byte strings. The solutions can be converting to a single format or avoiding the problems using some trickery. All of my solutions include the glob
and shutil
standard library.
为方便起见,我有一些以ods
结尾的Unicode文件名,我想将它们移动到名为א
(希伯来语Aleph,一个Unicode字符)的子目录中.
For the sake of example, I have some Unicode filenames ending with ods
, and I want to move them to the subdirectory called א
(Hebrew Aleph, a unicode character).
>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods') # List of Byte string file names
>>> for file in files:
... shutil.copy2(file, 'א') # Byte string directory name
...
>>> import glob
>>> import shutil
>>> files=glob.glob(u'*.ods') # List of Unicode file names
>>> for file in files:
... shutil.copy2(file, u'א') # Unicode directory name
向Ezio Melotti致谢, Python错误列表.
Credit to the Ezio Melotti, Python bug list.
尽管我认为这不是***解决方案,但这里有一个不错的技巧值得一提.
Although this isn't the best solution in my opinion, there is a nice trick here that's worth mentioning.
使用os.getcwd()
将目录更改为目标目录,然后将其引用为.
将文件复制到该目录:
Change your directory to the destination directory using os.getcwd()
, and then copy the files to it by referring to it as .
:
# -*- coding: utf-8 -*-
import os
import shutil
import glob
os.chdir('א') # CD to the destination Unicode directory
print os.getcwd() # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods') # List of Byte string file names
for file in files:
shutil.copy2(file, '.') # Copy each file
# Don't forget to go back to the original directory here, if it matters
直接方法shutil.copy2(src, dest)
失败,因为shutil
将具有ASCII字符串的unicode串联而无需进行转换:
The straightforward approach shutil.copy2(src, dest)
fails because shutil
concatenates a unicode with ASCII string without conversions:
>>> files=glob.glob('*.ods')
>>> for file in files:
... shutil.copy2(file, u'א')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.6/shutil.py", line 98, in copy2
dst = os.path.join(dst, os.path.basename(src))
File "/usr/lib/python2.6/posixpath.py", line 70, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1:
ordinal not in range(128)
如前所述,使用'א'
代替Unicode u'א'
As seen before, this can be avoided when using 'א'
instead of the Unicode u'א'
在我看来,这是一个错误,因为Python不能期望basedir
名称始终为str
,而不是unicode
.我已经在Python Buglist中将此问题报告为一个问题,并等待响应.
In my opinion, this is bug, because Python cannot expect basedir
names to be always str
, not unicode
. I have reported this as an issue in the Python buglist, and waiting for responses.