且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将重音字符替换为不重音字符

更新时间:2023-02-22 21:55:58

谢谢你们的回答。他们的工作非常好。


首先,我相信我不工作,因为我所犯的错误是

忘记了" U" for string:u"é"。因为我的文件已经是utf-8

编码(# - * - 编码:UTF-8 - * - ),我认为u没有必要......

i错了。


再见。


你有两个选择。首先,将字符串转换为Unicode并使用代码

,如下所示:


替换= [(u''\xe9'',''e ''),...]

def remove_accents(u):

代替a,b代替:

u = u.replace( a,b)

返回u
remove_accents(u''\xe9 '')
u'''


其次,如果您使用的是单字节编码(iso8859-1,用于

实例),然后使用字节字符串:

replacement_map = string.maketrans(''\ xe9 ...'',''e ...'')

def remove_accents(s):

返回s.translate(replacement_map)

remove_accents(''\ xe9'')



''e''


如果你想在程序中使用u''é''这样的字符串,你必须

在t的顶部包含一行他的源文件告诉Python

编码,如下所示:

# - * - 编码:utf-8 - * -

(除非您必须命名编辑器使用的编码,如果它不是

utf-8)请参阅 http://python.org/peps/pep-0263.html


一旦你有了完成后,你可以写

替换= [(u''é'',''e''),...]

而不是使用\ xXX为它逃脱。


Jeff


Jeff Epler写道:
你有两个选择。首先,将字符串转换为Unicode并使用如下代码:

替换= [(u''\xe9'',''e''),...]
def remove_accents(u):
for a,b in replacements:
u = u.replace(a,b)
return u

remove_accents(u''\xe9'')
u''''

其次,如果你正在使用单字节编码(iso8859-1,用于
实例),然后使用字节字符串:
replacement_map = string.maketrans(''\ xe9 ...'',''e .. 。'')
def remove_accents(s):
返回s.translate(replacement_map)

remove_accents(''\ xe9'')



''''

如果你想在节目中加入像你这样的字符串,你必须在在源文件的顶部告诉Python
编码,喜欢以下几行:
# - * - 编码:utf-8 - * -
(除非你必须命名你的编辑器使用的编码,如果它不是
utf- 8)参见 http://python.org/peps/pep-0263.html

一旦你完成了,你可以写
替换= [(u''é'',''e''),...]




将替换对转换为字典会导致

显着大量替换的加速。


mapping = dict(replacement_pairs)

def multi_replace(inp,mapping = mapping):

返回u''''。join([mapping.get(i,i)for in in inp])


一次通过文件给出一个O( len(inp))算法,比运行在

O(len(inp)* len(replacement_)中的string.replace方法好得多b $ b(运行时间明智)对))给出的时间。


- Josiah


Hi

I would like to replace accentuel chars (like "??", "?¨" or "?*") with non
accetued ones ("??" -> "e", "?¨" -> "e", "?*" -> "a").

I have tried string.replace method, but it seems dislike non ascii chars...

Can you help me please ?
Thanks.

Thank you both for your answer. They works well both very good.

First, i believe i doesn''t work, because the error i''ve made is to
forgot the "u" for string : u"é". Because my file was already utf-8
encoded (# -*- coding: UTF-8 -*-), i thinks the "u" is not necessary...
i was wrong.

Bye.


You have two options. First, convert the string to Unicode and use code
like the following:

replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u
remove_accents(u''\xe9'') u''e''

Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)
remove_accents(''\xe9'')


''e''

If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html

Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.

Jeff


Jeff Epler wrote:
You have two options. First, convert the string to Unicode and use code
like the following:

replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u

remove_accents(u''\xe9'')
u''e''

Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)

remove_accents(''\xe9'')



''e''

If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html

Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.



Translating the replacements pairs into a dictionary would result in a
significant speedup for large numbers of replacements.

mapping = dict(replacement_pairs)

def multi_replace(inp, mapping=mapping):
return u''''.join([mapping.get(i, i) for i in inp])

One pass through the file gives an O(len(inp)) algorithm, much better
(running-time wise) than the string.replace method that runs in
O(len(inp) * len(replacement_pairs)) time as given.

- Josiah