且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将带有特殊字符的字符串拆分为 NSMutableArray

更新时间:2022-02-03 21:30:32

首先,你的代码不正确.characterAtIndex 返回 unichar,因此您应该使用 @"%C"(大写)作为格式说明符.

First of all, your code is incorrect. characterAtIndex returns unichar, so you should use @"%C"(uppercase) as the format specifier.

即使使用正确的格式说明符,您的代码也是不安全的,严格来说,仍然不正确,因为并非所有 unicode 字符都可以由单个 unichar 表示.您应该始终处理每个子字符串的 unicode 字符串:

Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar. You should always handle unicode strings per substring:

通常将字符串视为字符序列,但是当使用 NSString 对象,或者一般的 Unicode 字符串,在大多数情况下***处理子字符串而不是个别字符.这样做的原因是用户感知为文本中的字符在许多情况下可以表示为字符串中的多个字符.

It's common to think of a string as a sequence of characters, but when working with NSString objects, or with Unicode strings in general, in most cases it is better to deal with substrings rather than with individual characters. The reason for this is that what the user perceives as a character in text may in many cases be represented by multiple characters in the string.

你绝对应该阅读 字符串编程指南.

最后,给你正确的代码:

Finally, the correct code for you:

NSString *danishString = @"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]]; 
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
    [characters addObject:substring];
}];

如果使用 NSLog(@"%@", characters); 您看到\Uxxxx"形式的奇怪字符",这是正确的.这是 NSArray 通过 description 方法的默认字符串化行为.如果你想看到普通字符",你可以一个一个打印这些unicode字符:

If with NSLog(@"%@", characters); you see "strange character" of the form "\Uxxxx", that's correct. It's the default stringification behavior of NSArray by description method. You can print these unicode characters one by one if you want to see the "normal characters":

for (NSString *c in characters) {
    NSLog(@"%@", c);
}