且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

MySQL更快,更准确地计算邮政编码之间的距离吗?

更新时间:2022-02-04 23:12:34

准确度

准确地计算距离的唯一方法是使用3D Trig,就像您正在做的那样.您可以在此处阅读有关该主题的更多信息: https://en.wikipedia.org/wiki/Geographical_distance

The only way to calculate distance accurately is with 3D trig, as you're doing. You can read more on that topic here: https://en.wikipedia.org/wiki/Geographical_distance

尽管给出了邮政编码的经/纬度中心点之间的准确距离,但任意选择了这些中心点,并且该距离是随着乌鸦飞翔"而计算的,无法准确表示每个点之间两个点之间的实际行进距离.

Although giving a pretty accurate distance between the lat/lng center-points of zipcodes, those center-points are arbitrarily picked, and the distance is calculated "as the crow flies", so you won't get an accurate representation of actual travel distance between two points within each.

例如,您可能在相邻的邮政编码中有两个彼此相邻的房屋,或者在每个邮政编码的相对两端有两个房屋,在进行此计算后,它们的距离将相等.

For example, you may have two homes next-door to each other in adjacent zipcodes, or two homes on opposite ends of each zipcode, which will calculate as equidistant given this calculation.

解决此问题的唯一方法是计算地址距离,这需要USPS数据将地址映射到更特定的点,或者使用类似Google Maps的API,该API还会在给定可用道路的情况下计算实际的行进距离

The only way to correct that issue is to calculate address distance, which requires USPS data to map an address to a more specific point, or the use of an API like Google Maps, which will also calculate actual travel distance given available roads.

性能

有两种方法可以加快查询速度.

There are a couple ways to speed up your query.

1.减少实时数学

进行实时计算的最快方法是预先计算并在表中的列中存储昂贵的触发值,例如:

The fastest way to do your calculations in real-time is to precalculate and store the expensive trig values in columns in your table, e.g.:

ALTER TABLE Location
    ADD COLUMN cos_rad_lat DOUBLE,
    ADD COLUMN cos_rad_lng DOUBLE,
    ADD COLUMN sin_rad_lat DOUBLE;

然后

UPDATE Location
SET cos_rad_lat = cos(radians(latitude)),
    cos_rad_lng = cos(radians(longitude)),
    sin_rad_lat = sin(radians(latitude));

在查询之外进行cos(radians(78.3232))类型的计算,以免对每一行数据进行数学运算.

Do your cos(radians(78.3232)) type calculations outside the query, so that math isn't done for each row of data.

因此,将所有计算都减少为常量值(在使用SQL之前)和已计算的列将使您的查询看起来像这样:

Thus reducing all calculations to constant values (before getting to SQL) and calculated columns will make your query look like this:

SELECT
    zipcode,
    3959 * acos(
        0.20239077538110228
        * cos_rad_lat
        * cos_rad_lng - 1.140108408597264
    )
    + 0.979304842243025 * sin_rad_lat AS distance
FROM Location
HAVING distance < 25
ORDER BY distance

2.减少边界框

注意:您可以将其与方法1结合使用.

Note: You can combine this with method 1.

在进行Trig操作之前,您可以通过在子查询中添加zip的边界框缩减来稍微提高性能,但这可能比您想要的更为复杂.

You could probably increase performance slightly by adding a bounding-box reduction of zips in a subquery before doing the trig, but that may be more complicated than you would like.

例如,代替:

FROM Location

你可以做

FROM (
    SELECT * 
    FROM Location 
    WHERE latitude BETWEEN A and B
        AND longitude BETWEEN C and D
) AS Location

其中A,B,C和D是与您的中心点相对应的数字+-约0.3(在美国,经度/纬度的十分之一对应于美国的5-7英里).

Where A, B, C, and D are numbers corresponding to your center-point +- about 0.3 (As each 10th of a degree of lat/lng corresponds to about 5-7 miles in the US).

此方法在-180/180经度时会比较棘手,但这并不影响美国.

This method gets tricky at -180 / 180 Longitude, but that doesn't affect the US.

3.存储所有计算出的距离 您可以做的另一件事是预先计算所有拉链的所有距离,然后将其存储在单独的表中

3. Store All Calculated Distances Another thing you could do is precalculate all distances of all zips, and store then in a separate table

CREATE TABLE LocationDistance (
    zipcode1 varchar(5) NOT NULL REFERENCES Location(zipcode),
    zipcode2 varchar(5) NOT NULL REFERENCES Location(zipcode)
    distance double NOT NULL,
    PRIMARY KEY (zipcode1, zipcode2),
    INDEX (zipcode1, distance)
);

使用邮政编码及其计算出的距离的每种组合填充此表.

Populate this table with every combination of zip and their calculated distance.

您的查询将如下所示:

SELECT zipcode2
FROM LocationDistance 
WHERE zipcode1 = 12345
    AND distance < 25;

到目前为止,这将是最快的解决方案,尽管它涉及存储约10亿条记录.

This would by far be the fastest solution, though it involves storing on the order of 1 Billion records.