且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么字符串GetHash code只能处理每四个字符?

更新时间:2023-02-04 23:01:20

  

为什么他们只处理每个第四个字符?而且,如果你愿意的是,为什么他们从右向左处理它?

他们没有做任何。他们正在处理的字符作为对整数值(的注意,他们使用 * numPtr numPtr [1] 的while循环)。两个的Int32 值采用相同的空间,4个字符,这就是为什么他们每次减去4的长度。

这是由前至后(数组顺序)进行处理,但长度减少,因为它是重presenting字符串的剩余的处理的长度。这意味着它们从左至右在4个字符块的处理在一个时间的同时尽可能

I've been reading this article because it was linked by Jon Skeet on this answer. I'm trying to really get an understanding of how hashing works and why Jon likes the algorithm he provided so much. I'm not claiming to have an answer to that yet, but I do have a specific question about the base System.String implementation of GetHashCode.

Consider the code, focusing on the annotated <<<<<========== line:

public override unsafe int GetHashCode()
{
  if (HashHelpers.s_UseRandomizedStringHashing)
    return string.InternalMarvin32HashString(this, this.Length, 0L);
  fixed (char* chPtr = this)
  {
    int num1 = 352654597;
    int num2 = num1;
    int* numPtr = (int*) chPtr;
    int length = this.Length;
    while (length > 2)
    {
      num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
      num2 = (num2 << 5) + num2 + (num2 >> 27) ^ numPtr[1];
      numPtr += 2;
      length -= 4;   <<<<<==========
    }
    if (length > 0)
      num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
    return num1 + num2 * 1566083941;
  }
}

Why do they only process every fourth character? And, if you're willing enough, why do they process it from right to left?

Why do they only process every fourth character? And, if you're willing enough, why do they process it from right to left?

They're not doing either. They're processing the characters as pairs of integer values (note that they use *numPtr and numPtr[1] in the while loop). Two Int32 values takes the same space as 4 characters, which is why they're subtracting 4 from the length each time.

This is processed from front to back (in array order), but length is decremented since it's representing the length of the string remaining to process. This means they're processing from left to right in "blocks of 4 characters" at a time while possible.