且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Django - Url模式正则表达式不匹配字符串参数与重音符

更新时间:2023-02-22 22:30:13

使用%c3%a9 而不是URL中的%e9 。正则表达式没有失败... Django甚至没有访问urlconf。检查日志,你可能会收到400个错误。



URI路径只能包含UTF-8编码的字符。任何无法表示为正常可打印的ASCII字符(并且不在保留列表中)的UTF-8字符都应进行百分比编码。



é(U + 00E9)是UTF-8中的多字节字符: 0xc3a9 。百分比编码的形式将是%C3%A9 。单字节0xe9不是有效的UTF-8字符。



请参阅 RFC 3986



[\w | \W] + 成功匹配包含%C3%A9 的网址。 Django似乎将URL字节字符串解码为Unicode字符串,然后将其转换为UTF-8以进行urlconf匹配。


I'm having some trouble passing string arguments with accents to my Django application. I have the following url pattern:

url(r'^galeria/(?P<page>\d+)/(?P<order>\w+)/(?P<query>[\w|\W]+)', 'possible_brastemp.views.gallery_with_page_and_query'),

When I try a url like:

 http://127.0.0.1:8000/galeria/1/ultimos/Julian%20Andr%E9s

the pattern is not matched. I have isolated the problem to the '%E9' character (the '%20' doesn't break the match).

How can I change the regex to match parameters with encoded characters?

Thank you

Use %c3%a9 instead of %e9 in the URL. The regex isn't failing... Django isn't even getting to the urlconf. Check the logs, you're probably getting 400 errors.

URI paths should contain UTF-8-encoded characters only. Any UTF-8 character that cannot be represented as a normal, printable ASCII character (and is not on the reserved list) should be percent-encoded.

é (U+00E9) is a multibyte character in UTF-8: 0xc3a9. The percent-encoded form would be %C3%A9. The single byte 0xe9 is NOT a valid UTF-8 character.

See RFC 3986.

[\w|\W]+ successfully matches URLs containing %C3%A9. Django appears to percent-decode the URL byte string into a Unicode string, then converts it to UTF-8 for urlconf matching.