且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用n:m-relation复制数据集

更新时间:2023-02-26 07:56:18

如果幸运的是,您可以运行当前的 PostgreSQL 9.1 ,则可以通过单个命令找到一种简洁,快速的解决方案使用新的 修改数据的CTE .

If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.

不支持

No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.

假设(col1, col2)最初是唯一的:

  • 在这种情况下,您可以轻松地从表中选择任意切片.
  • 不会浪费t.id的序列号.
  • You can easily pick arbitrary slices from the table in this case.
  • No sequence numbers for t.id will be wasted.

WITH s AS (
    SELECT id, col1, col2
    FROM   t
--  WHERE  some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2   -- I gather from comments that id is a serial column
    FROM   s
    RETURNING id, col1, col2
    )
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM   tu
JOIN   s ON tu.t = s.id
JOIN   i USING (col1, col2);

如果(col1, col2) 不是唯一,我会看到另外两种方式:

If (col1, col2) is not unique, I see two other ways:

  • Use the window function row_number() to make non-unique rows unique.
  • INSERT rows without holes in the t.id space just like in the query above.

WITH s AS (
    SELECT id, col1, col2
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   t
--  WHERE some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2
    FROM   s
    RETURNING id, col1, col2
    )
    ,r AS (
    SELECT *
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   i
    )
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM   r
JOIN   s USING (col1, col2, rn)    -- match exactly one id per row
JOIN   tu ON tu.t = s.id;

查询3

  • 这基于@ypercube已经提供的相同思想,但是全部在一个查询中.
  • 如果当前t.id的数字空间中有孔,则会相应地为新行刻录序列号.
  • 别忘了重设序列,使其超过新的最大值,否则,t中的新插入物会重复出现键错误,这些错误会从序列中提取id的默认值.我将此作为命令的最后一步进行了集成.最快的这样最安全.
  • Query 3

    • This is based on the same idea that @ypercube already supplied, but all in one query.
    • If there are holes in numbers space for current t.id, sequence numbers will be burnt for the new rows accordingly.
    • Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in t that draw the default for id from the sequence. I integrated this as final step into the command. Fastest & safest this way.
WITH s AS (
    SELECT max(id) AS max_id
    FROM   t
    )
    ,i AS (
    INSERT INTO t (id, col1, col2)
    SELECT id + s.max_id, col1, col2
    FROM   t, s
    )
    ,j AS (
    INSERT INTO tu (t, u)
    SELECT tu.t + s.max_id, tu.u
    FROM   tu, s
    )
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM   s;

手册中有关 setval()的详细信息.

Details about setval() in the manual.

进行快速测试.

CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES 
 ('A', 'B')
,('C', 'D');

CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
 (1, 100)
,(1, 101)
,(2, 100)
,(2, 102);

SELECT * FROM t;
SELECT * FROM tu;

最近有一个类似的问题,我在其中提供了一个类似的答案.以及版本 8.3 的替代版本,其中没有CTE和窗口功能.

There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.