且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何访问字符数组并将小写字母更改为大写字母,反之亦然

更新时间:2023-02-21 11:47:13

为了清楚起见,我将只使用纯汇编并假设...

For clarity's sake, I'll just use pure assembly and assume that...

  • char_array[ebp+8] 处的 32 位指针.
  • array_size[ebp+12] 处的二进制补码 32 位数字.
  • 对于您的平台(大多数情况下都是这种方式),char 的编码是 ASCII.
  • char_array is a 32-bit pointer at [ebp+8].
  • array_size is a two's complement 32-bit number at [ebp+12].
  • For your platform (it is this way for most anyway), char's encoding is ASCII.

您应该能够自己将其推导出到内联汇编中.现在,如果您查看每个人都应该记住但几乎没有人记得的表格,您会注意到一些重要的详情...

You should be able to deduce this yourself into inline assembly. Now, if you look at the table everyone is supposed to remember but barely anyone does, you'll notice some important details...

  • 大写字母AZ分别映射为0x410x5A.
  • 小写字母az分别映射为0x610x7A.
  • 其他所有内容都不是字母,因此无需大小写转换.
  • 如果您查看大写和小写字母范围的二进制表示,您会注意到它们完全相同,唯一的例外是大写字母清除了第 6 位,而小写字母设置了它.
  • 莉>
  • Uppercase letters A through Z map into codes 0x41 through 0x5A, respectively.
  • Lowercase letters a through z map into codes 0x61 through 0x7A, respectively.
  • Everything else is not a letter, and thus need no case conversion.
  • If you look at the binary representation of the upper and lowercase letter ranges, you'll notice that they are exactly the same, with the sole exception that uppercase letters have bit 6 cleared, and lowercase ones have it set.

因此,该算法将...

while array_size != 0
    byte = *char_array
    if byte >= 0x41 and byte <= 0x5A
        *char_array |= 0x20 // Turn it lowercase
    else if byte >= 0x61 and byte <= 0x7A
        *char_array &= 0xDF // Turn it uppercase
    array_size -= 1
    char_array += 1

现在,让我们将其翻译成汇编...

Now, let's translate this into assembly...

mov eax, [ebp+8]      # char *eax = char_array
mov ecx, [ebp+12]     # int ecx = array_size

.loop:
    or ecx, ecx       # Compare ecx against itself
    jz .end_loop      # If ecx (array_size) is zero, we're done
    mov dl, [eax]     # Otherwise, store the byte at *eax (*char_array) into `char dl`
    cmp dl, 'A'       # Compare dl (*char_array) against 'A' (lower bound of uppercase letters)
    jb .continue      # If dl` (*char_array) is lesser than `A`, continue the loop
    cmp dl, 'Z'       # Compare dl (*char_array) against 'Z' (upper bound of uppercase letters)
    jbe .is_uppercase # If dl (*char_array) is lesser or equal to 'Z', then jump to .is_uppercase
    cmp dl, 'a'       # Compare dl (*char_array) against 'a' (lower bound of lowercase letters)
    jb .continue      # If dl (*char_array) is lesser than 'a', continue the loop
    cmp dl, 'z'       # Compare dl (*char_array) against 'z' (upper bound of lowercase letters)
    jbe .is_lowercase # If dl (*char_array) is lesser or equal to 'z', then jump to .is_lowercase
    jmp .continue     # All tests failed, so continue the loop

    .is_uppercase:
        or dl, 20h    # Set the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .is_lowercase:
        and dl, DFh   # Clear the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .continue:
        inc eax       # Increment `eax` (`char_array`), much of like a pointer increment
        dec ecx       # Decrement `ecx` (`array_size`), so as to match the previous pointer increment
        jmp .loop     # Continue

.end_loop:

一旦代码到达 .end_loop,你就完成了.

Once code reaches .end_loop, you're done.

我希望这对你有所启发!

I hope this has led a light on you!