且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何删除大量数组中的重复条目(javascript)

更新时间:2022-04-20 21:19:57

您遇到了一个不重要的问题,但我会一如既往地提出建议,请问我是否迷失了您.此解决方案 not 不会将坐标转换为String或使用 JSON.stringify -

You have a non-trivial problem but I'm gonna blast right through so ask questions if I lose you somewhere along the line. This solution does not cast the coordinate into a String or serialise it using other techniques like JSON.stringify -

从创建坐标的方式开始-

Start with a way to create coordinates -

const Coord = (x, y) =>
  [ x, y ]

为演示该解决方案,我需要构造许多随机坐标-

To demonstrate the solution, I need to construct many random coordinates -

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

console.log(randCoord(1e3))
// [ 655, 89 ]

现在我们制作一百万个随机坐标的数组-

Now we make an array of 1 million random coordinates -

const million =
  Array.from(Array(1e6), _ => randCoord(1e3))

现在,我们创建一个函数,使用 DeepMap 过滤所有唯一值,DeepMap是我在这个答案.

Now we make a function to filter all of the unique values using DeepMap, a tiny module I developed in this answer.

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

由于 for DeepMap 具有出色的性能, uniq 可以在一秒钟内识别出所有唯一值-

Because for and DeepMap have excellent performance, uniq can identify all of the unique values in less than one second -

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 535 ms
// uniq length: 631970
// sample: 
// [ [ 908, 719 ]
// , [ 532, 967 ]
// , [ 228, 689 ]
// , [ 942, 546 ]
// , [ 716, 180 ]
// , [ 456, 427 ]
// , [ 714, 79 ]
// , [ 315, 480 ]
// , [ 985, 499 ]
// , [ 212, 407 ]
// ]

展开以下代码段,以在您自己的浏览器中验证结果-

Expand the snippet below to verify the results in your own browser -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))
  }

const Coord = (x, y) =>
  [ x, y ]

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

const million =
  Array.from(Array(1e6), _ => randCoord(1e3))

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 535 ms
// uniq length: 631970
// sample: 
// [ [ 908, 719 ]
// , [ 532, 967 ]
// , [ 228, 689 ]
// , [ 942, 546 ]
// , [ 716, 180 ]
// , [ 456, 427 ]
// , [ 714, 79 ]
// , [ 315, 480 ]
// , [ 985, 499 ]
// , [ 212, 407 ]
// ]

通过使用生成较小的随机坐标,我们可以验证 uniq 是否正在生成正确的输出.下面我们生成最多 [100,100] 的坐标,最大可能性为10,000个唯一坐标.当您在下面运行该程序时,由于坐标是随机生成的,因此 result.length 可能小于 10,000,但绝不能超过该值-在这种情况下我们知道添加了无效(重复)坐标-

By using generating smaller random coordinates, we can verify that uniq is generating a correct output. Below we generate coordinates up to [ 100, 100 ] for a maximum possibility of 10,000 unique coordinates. When you run the program below, because the coordinates are generated at random, it's possible that result.length will be under 10,000, but it should never exceed it - in which case we'd know an invalid (duplicate) coordinate was added -

const million =
  Array.from(Array(1e6), _ => randCoord(1e2))

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 173 ms
// uniq length: 10000
// sample: 
// [ [ 50, 60 ]
// , [ 18, 69 ]
// , [ 87, 10 ]
// , [ 8, 7 ]
// , [ 91, 41 ]
// , [ 48, 47 ]
// , [ 78, 28 ]
// , [ 39, 12 ]
// , [ 18, 84 ]
// , [ 0, 71 ]
// ]

展开以下代码段,以在您自己的浏览器中验证结果-

Expand the snippet below to verify the results in your own browser -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))
  }

const Coord = (x, y) =>
  [ x, y ]

const rand = x =>
  Math.floor(Math.random() * x)

const randCoord = x => 
  Coord(rand(x), rand(x))

const uniq = (coords = []) =>
{ const m = new Map
  const r = []
  for (const c of coords)
    if (!DeepMap.has(m, c))
      { DeepMap.set(m, c, true)
        r.push(c)
      }
  return r
}

const million =
  Array.from(Array(1e6), _ => randCoord(1e2))

console.time("uniq")
const result = uniq(million)
console.timeEnd("uniq")

console.log("uniq length:", result.length)
console.log("sample:", result.slice(0,10))

// uniq: 173 ms
// uniq length: 10000
// sample: 
// [ [ 50, 60 ]
// , [ 18, 69 ]
// , [ 87, 10 ]
// , [ 8, 7 ]
// , [ 91, 41 ]
// , [ 48, 47 ]
// , [ 78, 28 ]
// , [ 39, 12 ]
// , [ 18, 84 ]
// , [ 0, 71 ]
// ]

最后,我将在此处使用 DeepMap 模块-

Lastly, I'll include the DeepMap module used here -

const DeepMap =
  { has: (map, [ k, ...ks ]) =>
      ks.length === 0
        ? map.has(k)
        : map.has(k)
          ? DeepMap.has(map.get(k), ks)
          : false

  , set: (map, [ k, ...ks ], value) =>
      ks.length === 0
        ? map.set(k, value)
        : map.has(k)
            ? (DeepMap.set(map.get(k), ks, value), map)
            : map.set(k, DeepMap.set(new Map, ks, value))

  , get: (map, [ k, ...ks ]) =>
    // ...

  , entries: function* (map, fields = [])
    // ...
  }

有关完整的实现,请参见链接的问题与解答.Fwiw,我想您会发现该链接很有趣,因为它为该问题的复杂性提供了更多的上下文.

For a complete implementation, see the linked Q&A. Fwiw, I do think you will find the link interesting as it provides more context for the complexity of this problem.