更新时间:2023-02-23 19:22:32
正则表达式 [^ a-zA-Z0-9]
将过滤非ASCII字符,它将省略Unicode字符或128个以上代码点以上的字符。
Regex [^a-zA-Z0-9]
will filter non-ASCII characters which will omit Unicode characters or characters above 128 codepoints.
假设您要通过替换无效的文件名来过滤有效文件名的用户输入等字符? \ /:| < > *
带下划线( _
):
Assuming that you want to filter user input for valid file-names by replacing invalid file-name characters such as ? \ / : | < > *
with underscore (_
):
import java.io.UnsupportedEncodingException;
public class ReplaceI18N {
public static void main(String[] args) {
String[] names = {
"John Smith",
"高岡和子",
"محمد سعيد بن عبد العزيز الفلسطيني",
"|J:o<h>n?Sm\\it/h*",
"高?岡和\\子*",
"محمد /سعيد بن عبد ?العزيز :الفلسطيني\\"
};
for(String s: names){
String u = s;
try {
u = new String(s.getBytes(), "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
u = u.replaceAll("[\\?\\\\/:|<>\\*]", " "); //filter ? \ / : | < > *
u = u.replaceAll("\\s+", "_");
System.out.println(s + " = " + u);
}
}
}
输出:
John Smith = John_Smith
高岡和子 = 高岡和子
محمد سعيد بن عبد العزيز الفلسطيني = محمد_سعيد_بن_عبد_العزيز_الفلسطيني
|J:o<h>n?Sm\it/h* = _J_o_h_n_Sm_it_h_
高?岡和\子* = 高_岡和_子_
محمد /سعيد بن عبد ?العزيز :الفلسطيني\ = محمد_سعيد_بن_عبد_العزيز_الفلسطيني_
即使使用Unicode字符的有效文件名也可以在任何支持UTF的网页上显示-8使用正确的Unicode字体编码。
The valid filenames even with Unicode characters will be displayable on any webpage that supports UTF-8 encoding with the correct Unicode font.
此外,每个支持Unicode的OS文件系统上的文件都是正确的名称(在Windows XP,Windows 7上测试正常)。
In addition, each will be the correct name for its file on any OS file-system that supports Unicode (tested OK on Windows XP, Windows 7).
但是,如果要将每个有效文件名作为URL字符串传递,请确保使用 URLEncoder
正确编码,然后使用 URLDecoder 。
But, if you want to pass each valid filename as a URL string, make sure to encode it properly using URLEncoder
and later decode each encoded URL using URLDecoder
.