且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

确定整数的平方根是否为整数的最快方法

更新时间:2023-11-26 14:58:04

我找到了一种比 6bits+Carmack+sqrt 代码快 35% 的方法,至少在我的 CPU (x86) 和编程语言 (C/C++).您的结果可能会有所不同,尤其是因为我不知道 Java 因素将如何发挥作用.

我的方法有三个:

  1. 首先,过滤掉明显的答案.这包括负数和查看最后 4 位.(我发现查看最后六个没有帮助.)我也回答是 0.(在阅读下面的代码时,请注意我的输入是 int64 x.)
    if( x 

  2. 接下来,检查它是否是模 255 = 3 * 5 * 17 的平方.因为这是三个不同素数的乘积,所以只有大约 1/8 的模 255 的余数是平方.但是,根据我的经验,调用模运算符 (%) 的成本高于获得的收益,因此我使用涉及 255 = 2^8-1 的小技巧来计算残差.(无论好坏,我没有使用从单词中读取单个字节的技巧,只是按位和和移位.)
    int64 y = x;y = (y & 4294967295LL) + (y > > 32);y = (y & 65535) + (y > > 16);y = (y & 255) + ((y > > 8) & 255) + (y > > 16);//此时y在0到511之间,代码越多,可以进一步缩小.

    为了实际检查残差是否为正方形,我在预先计算的表格中查找答案.

    if( bad255[y] )返回假;//但是,我只使用了一个大小为 512 的表

  3. 最后,尝试使用类似于 Hensel 引理 的方法计算平方根.(我认为它不能直接适用,但经过一些修改后可以使用.)在此之前,我使用二分搜索来划分 2 的所有幂:
    if((x & 4294967295LL) == 0)x>>=32;如果((x & 65535) == 0)x>>=16;如果((x & 255) == 0)x>>=8;如果((x & 15) == 0)x>>=4;如果((x & 3) == 0)x >>= 2;

    此时,对于我们的数字是一个正方形,它必须是 1 mod 8.

    if((x & 7) != 1)return false;

    Hensel 引理的基本结构如下.(注意:未经测试的代码;如果不起作用,请尝试 t=2 或 8.)

    int64 t = 4, r = 1;t<<=1;r += ((x - r * r) & t) >>1;t<<=1;r += ((x - r * r) & t) >>1;t<<=1;r += ((x - r * r) & t) >>1;//重复直到 t 为 2^33 左右.如果需要,请使用循环.

    这个想法是在每次迭代时,你在 r 上加一位,即 x 的当前"平方根;每个平方根都是精确的模 2 的越来越大的幂,即 t/2.最后,r 和 t/2-r 将是 x 模 t/2 的平方根.(请注意,如果 r 是 x 的平方根,那么 -r 也是如此.即使是模数也是如此,但要注意,对某些数字进行模数,事物甚至可以有 2 个以上的平方根;值得注意的是,这包括 2 的幂.) 因为我们的实际平方根小于 2^32,此时我们实际上可以检查 r 或 t/2-r 是否为实数平方根.在我的实际代码中,我使用了以下修改后的循环:

    int64 r, t, z;r = 开始[(x > > 3) & 1023];做 {z = x - r * r;如果( z == 0 )返回真;如果( z 

    这里的加速是通过三种方式获得的:预先计算的起始值(相当于循环的约 10 次迭代)、更早退出循环和跳过一些 t 值.对于最后一部分,我查看 z = r - x * x,并通过一些技巧将 t 设置为 2 除 z 的最大幂.这允许我跳过无论如何都不会影响 r 值的 t 值.在我的情况下,预先计算的起始值挑选出最小正"平方根模 8192.

即使此代码对您来说不能更快地工作,但我希望您喜欢其中包含的一些想法.完整的、经过测试的代码如下,包括预先计算的表格.

typedef signed long long int int64;int开始[1024] ={1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,1665、1923、1687、1157、1553、1869、1415、1749、1185、1763、649、1061、561、531、409、907、319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,191、61、1065、1605、721、781、1735、875、1377、1827、1353、539、1777、429、1959、1483、1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,2047、2045、279、2043、111、307、2041、597、1569、1891、2039、1957、1103、1389、231、2037、65,1341,727,837,977,2035,569,1643,1633,547,439,130​​7,2033,1709,345,1845,1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,193、67、1623、1595、943、1395、1721、2027、1761、1955、1335、357、113、1747、1497、1461、1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,1985、701、1879、1221、849、627、1465、789、543、1187、1591、923、1905、979、1241、181};bool bad255[512] ={0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,0};内联布尔方块(int64 x){//快速失败if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )返回假;如果( x == 0 )返回真;//检查 mod 255 = 3 * 5 * 17,为了好玩int64 y = x;y = (y & 4294967295LL) + (y > > 32);y = (y & 65535) + (y > > 16);y = (y & 255) + ((y > > 8) & 255) + (y > > 16);如果(坏255[y])返回假;//使用二分查找除以 4 的幂如果((x & 4294967295LL) == 0)x>>=32;如果((x & 65535) == 0)x>>=16;如果((x & 255) == 0)x>>=8;如果((x & 15) == 0)x>>=4;如果((x & 3) == 0)x>>=2;如果((x & 7) != 1)返回假;//使用类似 Hensel 引理的东西计算 sqrtint64 r, t, z;r = 开始[(x > > 3) & 1023];做 {z = x - r * r;如果( z == 0 )返回真;如果( z 

I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):

  1. I've done it the easy way, by using the built-in Math.sqrt() function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
  2. Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).

Here is the very simple and straightforward way I'm doing it now:

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.


I've tried the different solutions to the problem:

  • After exhaustive testing, I found that adding 0.5 to the result of Math.sqrt() is not necessary, at least not on my machine.
  • The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
  • Newton's method was a good bit slower than Math.sqrt(). This is probably because Math.sqrt() uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles.
  • A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than Math.sqrt().
  • Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
  • According to John's tests, using or statements is faster in C++ than using a switch, but in Java and C# there appears to be no difference between or and switch.
  • I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or or statement, I would just say if(lookup[(int)(n&0x3F)]) { test } else return false;. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.

I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.

My approach is threefold:

  1. First, filter out obvious answers. This includes negative numbers and looking at the last 4 bits. (I found looking at the last six didn't help.) I also answer yes for 0. (In reading the code below, note that my input is int64 x.)
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;

  2. Next, check if it's a square modulo 255 = 3 * 5 * 17. Because that's a product of three distinct primes, only about 1/8 of the residues mod 255 are squares. However, in my experience, calling the modulo operator (%) costs more than the benefit one gets, so I use bit tricks involving 255 = 2^8-1 to compute the residue. (For better or worse, I am not using the trick of reading individual bytes out of a word, only bitwise-and and shifts.)
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32); 
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    // At this point, y is between 0 and 511.  More code can reduce it farther.
    

    To actually check if the residue is a square, I look up the answer in a precomputed table.

    if( bad255[y] )
        return false;
    // However, I just use a table of size 512
    

  3. Finally, try to compute the square root using a method similar to Hensel's lemma. (I don't think it's applicable directly, but it works with some modifications.) Before doing that, I divide out all powers of 2 with a binary search:
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;

    At this point, for our number to be a square, it must be 1 mod 8.

    if((x & 7) != 1)
        return false;

    The basic structure of Hensel's lemma is the following. (Note: untested code; if it doesn't work, try t=2 or 8.)

    int64 t = 4, r = 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    // Repeat until t is 2^33 or so.  Use a loop if you want.

    The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:

    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );

    The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at z = r - x * x, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.

Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.

typedef signed long long int int64;

int start[1024] =
{1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};

bool bad255[512] =
{0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0};

inline bool square( int64 x ) {
    // Quickfail
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;

    // Check mod 255 = 3 * 5 * 17, for fun
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32);
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    if( bad255[y] )
        return false;

    // Divide out powers of 4 using binary search
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;

    if((x & 7) != 1)
        return false;

    // Compute sqrt using something like Hensel's lemma
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t  >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );

    return false;
}