且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

可靠的溢出检测浮点/整数类型转换

更新时间:2023-11-10 13:28:34


有没有一种安全的方法来可靠地确定整数类型T是否可以存储浮点整数值f?

Is there a safe way to reliably determine if an integral type T can store a floating-point integer value f?

是的。关键是测试 f 是否在 T :: MIN - 0.999 ... 到 T :: MAX + 0.999 ... 使用浮点数学 - 没有舍入问题。奖励:舍入模式不适用。

Yes. The key is to test if f is in the range T::MIN - 0.999... to T::MAX + 0.999... using floating point math - with no rounding issues. Bonus: rounding mode does not apply.

有3条失败路径:太大,太小,不是数字。

There are 3 failure paths: too big, too small, not-a-number.


以下假定 int / double 。我将为OP留下C ++模板。

The below assumes int/double. I'll leave the C++ template forming for OP.

形成精确的 T :: MAX + 1 完全使用浮点数学很容易,因为 INT_MAX Mersenne Number 。 (我们不是在这里谈论 Mersenne Prime 。)

Forming exact T::MAX + 1 exactly using floating point math is easy as INT_MAX is a Mersenne Number. (We are not talking about Mersenne Prime here.)

代码利用:

A Mersenne数字除以2,整数数学也是 Mersenne数

整数类型的2次幂常量到浮点类型的转换可以是肯定是完全

Code takes advantage of:
A Mersenne Number divided by 2 with integer math is also a Mersenne Number.
The conversion of a integer type power-of-2 constant to a floating point type can be certain to be exact.

#define DBL_INT_MAXP1 (2.0*(INT_MAX/2+1)) 
// Below needed when -INT_MAX == INT_MIN
#define DBL_INT_MINM1 (2.0*(INT_MIN/2-1)) 

成形确切 T :: MIN - 1 很难,因为它的绝对值通常是2 + 1的幂,并且整数类型和FP类型的相对精度不是某些。相反,代码可以减去2的精确幂并与-1进行比较。

Forming exact T::MIN - 1 is hard as its absolute value is usually a power-of-2 + 1 and the relative precision of the integer type and the FP type are not certain. Instead code can subtract the exact power of 2 and compare to -1.

int double_to_int(double x) {
  if (x < DBL_INT_MAXP1) {
    #if -INT_MAX == INT_MIN
    // rare non-2's complement machine 
    if (x > DBL_INT_MINM1) {
      return (int) x;
    }
    #else
    if (x - INT_MIN > -1.0) {
      return (int) x;
    }
    #endif 
    Handle_Underflow();
  } else if (x > 0) {
    Handle_Overflow();
  } else {
    Handle_NaN();
  }
}






关于非二进制基数的浮点类型( FLT_RADIX!= 2


Regarding floating-point types with non-binary radix (FLT_RADIX != 2)

使用 FLT_RADIX = 4,8,16 ...... ,转换也是准确的。使用 FLT_RADIX == 10 ,代码至少精确到34位 int double 必须完全编码+/- 10 ^ 10。所以问题是说 FLT_RADIX == 10 ,64位 int 机器 - 风险很低。基于内存,生产中的最后一个 FLT_RADIX == 10 是十多年前的。

With FLT_RADIX = 4, 8, 16 ..., the conversion would be exact too. With FLT_RADIX == 10, code is at least exact up to a 34-bit int as a double must encode +/-10^10 exactly. So a problem with say a FLT_RADIX == 10, 64-bit int machine - a low risk. Based on memory, the last FLT_RADIX == 10 in production was over a decade ago.

整数类型是始终编​​码为2的补码(最常见),1s补码或符号幅度。 INT_MAX 始终是power-2-minus-1。 INT_MIN 总是a-power-2或1。实际上,总是以2为基础。

The integer type is always encoded as 2's complement (most common), 1s' complement, or sign magnitude. INT_MAX is always a power-2-minus-1. INT_MIN is always a - power-2 or 1 more. Effectively, always base 2.