且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

多行,而不是MYSQL中带触发器的单行数据转换

更新时间:2023-12-04 11:23:22

这看起来像一个 EAV 模式(哦!太棒了!).

看起来根本问题在于应用程序没有按照您希望的方式插入行";它在同一个表中插入多行,每一行代表一个属性值.

应用程序使用实体-属性-值 (EAV) 模型,您想要的是看起来像传统关系模型的行.

那个相当丑陋的MAX(),MAX(),MAX() ... GROUP BY"查询正在将所有这些 EAV 行转换为单行的列.

看起来您想要即时"进行这种转换,并在将行插入到 original_table 时维护 target_table 的内容.

如果我要解决这个问题,我会将 group_id 包含在我的 target_table 中,因为这是将所有单独的 EAV 行关联在一起的值(如您的视图查询中所示).>

而且我绝对会使用 SELECT MAX(group_id) 查询来引用刚刚插入到 original_table 的行上的值.在 AFTER INSERT 触发器的上下文中,我已经有了刚刚插入的行的 group_id 值;我可以使用它作为NEW.group_id".

(我避免使用 MAX(group_id) 查询来获取该值的真正原因是我无法保证其他进程不会插入更大的值在我的进程运行时为 group_id.我不能保证 MAX(group_id) 会返回刚刚插入的 group_id 的值.(当然,我永远不会在单用户测试中看到这个问题;我会在我的处理中包含一些故意延迟,并同时运行两个进程以使其发生.这是在生产中而不是在测试中出现的问题之一,基本上是因为我们不打扰设置测试用例以发现问题.)

如果我只希望 target_table 中的每个 group_id 值都有一行,我将在 target_table 的 group_id 列上创建唯一约束.然后我会使用upsert"类型的函数来更新已经存在的行,或者如果不存在则插入一行.

我可以使用 MySQL 轻松做到这一点,一个 INSERT ... ON DUPLICATE KEY ... 语句.这需要一个独特的约束,但我们已经涵盖了这一点.这个语句的一个缺点是,如果我的 target_table 有一个 AUTO_INCREMENT 列,即使一行已经存在,它也会烧掉"一个 auto_increment 值.

根据您在触发器/视图中的内容,我可以执行以下操作:

INSERT INTO target_table (group_id, name, street_address, ... )选择 o.group_idMAX(CASE WHEN o.element_label = 0 THEN o.element_value end) AS 名称,MAX(CASE WHEN o.element_label = 1 THEN o.element_value end) AS street_address,MAX(CASE WHEN o.element_label = 2 THEN o.element_value end) AS street_address_line_2,MAX(CASE WHEN o.element_label = 3 THEN o.element_value end) 作为城市,MAX(CASE WHEN o.element_label = 4 THEN o.element_value end) AS 状态,MAX(CASE WHEN o.element_label = 5 THEN o.element_value end) 作为 zip,MAX(CASE WHEN o.element_label = 6 THEN o.element_value end) 作为国家,MAX(CASE WHEN o.element_label = 7 THEN o.element_value end) AS dobFROM schema.original_table oWHERE o.group_id = NEW.group_idGROUP BY o.group_id重复密钥更新名称 = VALUES(名称), street_address = VALUES(street_address), street_address_line_2 = VALUES(street_address_line2), 城市 = VALUES(城市), 状态 = VALUES(状态), zip = VALUES(zip), 国家 = VALUES(国家), dob = VALUES(dob)

请注意,当它尝试插入具有 target_table 中已存在的 group_id 值的行时,我依靠 target_table(group_id) 上的 UNIQUE 约束来抛出重复键"异常.当这种情况发生时,该语句将变成一条 UPDATE 语句,并带有一个隐含的 WHERE group_id = VALUES(group_id)(任何涉及唯一键约束冲突的列.)

这是最简单的方法,只要不需要担心 AUTO_INCREMENT 值.

我不限于 INSERT ... ON DUPLICATE KEY 语句,我可以推出我自己的"UPSERT 功能.但是...我想知道可能的竞争条件...如果我执行一个 SELECT 然后一个后续的 INSERT,我会留下一个小窗口,另一个进程可以潜入...

我可以改为使用 NOT EXISTS 谓词来测试该行是否存在:

INSERT INTO target_table ( ...选择 ...FROM original_table oWHERE o.group_id = NEW.group_idAND NOT EXISTS (SELECT 1 FROM target_table d WHERE d.group_id = NEW.group_id)

然后我会测试是否插入了一行(通过检查受影响的行数),如果没有插入行,那么我可以尝试更新.(我指望 SELECT 语句返回单行.)

为了更好的性能,我可能会使用反连接模式来做同样的检查(对于现有行的存在),但对于一行,NOT EXISTS(子查询)很好,我认为它更容易理解.

INSERT INTO target_table ( ...选择 ...FROM original_table o剩下JOIN target_table tON t.group_id = NEW.group_idWHERE o.group_id = NEW.group_id并且 t.group_id 为空

(来自原始表的 SELECT 可能需要包装为内联视图,因为它引用了正在插入的同一个表.如果有问题,将该查询转换为派生表应该可以解决这个问题.)

我说过我可以"从触发器的视图中使用该查询.但这不是我选择使用的方法.这不是必需的.我真的不需要运行 MAX(), MAX(), MAX() 查询来获取每一列.

我有被插入到 original_table 的行的所有值,所以我已经知道要插入的是哪个 element_label,而且实际上只有一列必须被插入在 target_table 中更改.(我想要 MAX(element_value),还是我真的只想要刚刚插入的值?)

这是我在触发器中使用的方法.我完全避免对 original_table 运行查询,只需对 target_table 中的一列执行 upsert:

IF NEW.element_label = 0 THEN-  名称INSERT INTO target_table (group_id, `name`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `name` = VALUES(`name`);ELSEIF NEW.element_label = 1 THEN-  街道地址INSERT INTO target_table (group_id, `street_address`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `street_address` = VALUES(`street_address`);ELSEIF NEW.element_label = 2 THEN-  街道地址2INSERT INTO target_table (group_id, `street_address2`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `street_address2` = VALUES(`street_address2`);ELSEIF NEW.element_label = 3 THEN-  城市INSERT INTO target_table (group_id, `city`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `city` = VALUES(`city`);ELSEIF NEW.element_label = 4 THEN...结尾

我知道这不是很漂亮,但我认为如果必须在将行插入原始表时完成 target_table 的维护,这是***的方法.(问题实际上不是这里的数据库,问题在于 EAV 模型,或者实际上,EAV 模型(每个属性值一行)和关系模型(每个属性值的每一行中的一列)之间的阻抗不匹配"属性值).

这并不比 MAX(),MAX(),MAX() 查询更难看.

我还想放弃目标表中的 AUTO_INCREMENT id,只使用 group_id(来自 original_table 的值)作为我的 target_table 中的主键,因为我只希望每个 group_id 有一行.

更新

当触发器主体包含分号时,您必须将分隔符从分号更改为其他内容.文档在这里:http://dev.mysql.com/doc/refman/5.5/en/trigger-syntax.html

例如

DELIMITER $$创建触发器 trg_original_table_ai插入 original_table 后每行开始如果 NEW.element_label = 0 THEN-  名称INSERT INTO target_table (group_id, `name`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `name` = VALUES(`name`);ELSEIF NEW.element_label = 1 THEN-  街道地址INSERT INTO target_table (group_id, `street_address`)值(NEW.group_id,NEW.element_value)ON DUPLICATE KEY UPDATE `street_address` = VALUES(`street_address`);万一;完$$分隔符;

I have this query:

CREATE TRIGGER move_form_data
AFTER INSERT ON schema.original_table
FOR EACH ROW
INSERT INTO schema.new_table (name, street_address, 
            street_address_line_2, city, state, zip, country, dob)
SELECT name, street_address, street_address_line_2, city, state, zip, country, dob 
from view_data_submits

with calls this view:

CREATE VIEW view_data_submits AS 

SELECT  
        MAX(CASE WHEN element_label = 0 THEN element_value end) AS name,
        MAX(CASE WHEN element_label = 1 THEN element_value end) AS street_address,
        MAX(CASE WHEN element_label = 2 THEN element_value end) AS street_address_line_2,
        MAX(CASE WHEN element_label = 3 THEN element_value end) AS city,
        MAX(CASE WHEN element_label = 4 THEN element_value end) AS state,
        MAX(CASE WHEN element_label = 5 THEN element_value end) AS zip,
        MAX(CASE WHEN element_label = 6 THEN element_value end) AS country,
        MAX(CASE WHEN element_label = 7 THEN element_value end) AS dob
FROM schema.original_table
WHERE group_id = (select MAX(group_id) from schema.original_table)
group by group_id

I want 1 row back, and the trigger works as intended without the trigger part with just this code:

INSERT INTO schema.new_table (name, street_address, 
                street_address_line_2, city, state, zip, country, dob)
    SELECT name, street_address, street_address_line_2, city, state, zip, country, dob 
    from view_data_submits

currently, it give me back the inserted row when the user submits a form, but it transforms from the original table to the new table like this:

# id, name, street_address, street_address_line_2, city, state, zip, country, dob
2, fsa asdadFQ, , , , , , , 
3, fsa asdadFQ, BOOGYBOOGYBOOGY, , , , , , 
4, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, , , , , 
5, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, , , , 
6, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, , , 
7, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, , 
8, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 
9, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 2014-02-05  <--only row that I want (=the total form submission)

instead of just:

# id, name, street_address, street_address_line_2, city, state, zip, country, dob

9, fsa asdadFQ, BOOGYBOOGYBOOGY, YOUdooWORK, A, DD, 09876, Belize, 2014-02-05

I have a feeling it is either to do with the FOR EACH ROW syntax, or the application saves in a compounding fashion somehow. I am leaning towards the first one.

Anyone have any suggestions for a remedy? I almost feel as though its some noob mistake that I just forgot about....haha.

~~EDIT per request:

here is the select * from the original table where the max id is being pulled:

# id, form_id, element_label, element_value, group_id
----+--------+--------------+--------------+---------
 207,       2,             0,          name,       25
 208,       2,             1,     address 1,       25
 209,       2,             2,     address 2,       25
 210,       2,             3,          city,       25
 211,       2,             4,         state,       25
 212,       2,             5,           zip,       25
 213,       2,             6,       country,       25
 214,       2,             7,           dob,       25

since the values are blob form, I replaced the values with what they represent, I just pulled the newest inserted data

This looks like an EAV schema (oh! the joys!).

It looks like the root problemis that the application isn't inserting a "row" the way you want to see it; it's inserting multiple rows into the same table, with each row representing a single attribute value.

The application is using Entity-Attributute-Value (EAV) model, and what you want is a row that looks like a traditional relational model.

What that rather ugly "MAX(),MAX(),MAX() ... GROUP BY" query is doing is converting all those EAV rows into columns of a single row.


It looks like you want to do that conversion "on-the-fly" and maintain the contents of the target_table whenever rows are inserted into the original_table.

If I were solving that problem, I would include the group_id in my target_table, since that's the value that is relating all the individual EAV rows together (as demonstrated in your view query.)

And I definitely would NOT use a SELECT MAX(group_id) query to reference the value on the row that was just inserted into original_table. In the context of an AFTER INSERT trigger, I already have the group_id value of the row that was just inserted; it's available to me as "NEW.group_id".

(The real reason I would avoid using a MAX(group_id) query to get that value is that I don't have a guarantee that some other process isn't going to insert a larger value for group_id while my process is running. I'm not guaranteed the MAX(group_id) will return the value of group_id that was just inserted. (Granted, I won't ever see that problem happen in single user testing; I'd have to include some deliberate delays in my processing, and have two processes running at the same time in order to get that to happen. This is one of those problems that pops up in production, rather than in testing, basically because we don't bother to setup the test case to discover the problem.)

If I only want a single row in my target_table for each group_id value, I would create a unique constraint on the group_id column in my target_table. Then I would use an "upsert"-type function to update the row if it already exists, or insert a row if one doesn't exist.

I can easily do that with MySQL an INSERT ... ON DUPLICATE KEY ... statement. This requires a unique constraint, but we already have that covered. One downside of this statement is that if my target_table has an AUTO_INCREMENT column, this will "burn" through an auto_increment values even when a row already exists.

Based on what you have in your trigger/view, I could do something like this:

INSERT INTO target_table (group_id, name, street_address, ... )
SELECT o.group_id
       MAX(CASE WHEN o.element_label = 0 THEN o.element_value end) AS name,
       MAX(CASE WHEN o.element_label = 1 THEN o.element_value end) AS street_address,
       MAX(CASE WHEN o.element_label = 2 THEN o.element_value end) AS street_address_line_2,
       MAX(CASE WHEN o.element_label = 3 THEN o.element_value end) AS city,
       MAX(CASE WHEN o.element_label = 4 THEN o.element_value end) AS state,
       MAX(CASE WHEN o.element_label = 5 THEN o.element_value end) AS zip,
       MAX(CASE WHEN o.element_label = 6 THEN o.element_value end) AS country,
       MAX(CASE WHEN o.element_label = 7 THEN o.element_value end) AS dob
  FROM schema.original_table o
 WHERE o.group_id = NEW.group_id
 GROUP BY o.group_id
    ON DUPLICATE KEY
UPDATE name                  = VALUES(name)
     , street_address        = VALUES(street_address)
     , street_address_line_2 = VALUES(street_address_line2)
     , city                  = VALUES(city)
     , state                 = VALUES(state)
     , zip                   = VALUES(zip)
     , country               = VALUES(country)
     , dob                   = VALUES(dob)

Note that I'm counting on the UNIQUE constraint on target_table(group_id) to throw a "duplicate key" exception when it attempts to insert a row with a group_id value that already exists in target_table. When that happens, this statement will turn into an UPDATE statement, with an implied WHERE group_id = VALUES(group_id) (whatever columns were involved in the unique key constraint violation.)

This is the simplest approach, as long as burning through AUTO_INCREMENT values isn't a concern.

I'm not limited to the INSERT ... ON DUPLICATE KEY statement, I can "roll my own" UPSERT function. BUT... I want to be cognizant of possible race conditions... if I perform a SELECT and then a subsequent INSERT, I leave a small window where another process can sneak in...

I could instead use a NOT EXISTS predicate to test for the existence of the row:

INSERT INTO target_table ( ...
SELECT ...
  FROM original_table o
 WHERE o.group_id = NEW.group_id
   AND NOT EXISTS (SELECT 1 FROM target_table d WHERE d.group_id = NEW.group_id)

Then I'd test whether a row was inserted (by checking number of affected rows), and if no row was inserted, then I could attempt an update. (I'm banking on the SELECT statement returning a single row.)

For better performance, I might use an anti-join pattern to do the same check (for existence of an existing row), but for one row, the NOT EXISTS (subquery) is fine, and I think it's easier to understand.

INSERT INTO target_table ( ...
SELECT ...
  FROM original_table o
  LEFT
  JOIN target_table t
    ON t.group_id = NEW.group_id
 WHERE o.group_id = NEW.group_id
   AND t.group_id IS NULL

(That SELECT from original-table might need to be wrapped as an inline view, since it's referencing the same table that's being inserted. Turning that query into a derived table should fix that, if its a problem.)


I said I "could" use that query from the view in my trigger. But that's not the approach I'd choose to use. It's not necessary. I don't really need to run a MAX(), MAX(), MAX() query to get every column.

I have all the values of the row being inserted into original_table, so I already know which element_label is being inserted, and there's really only one column that has to be changed in the target_table. (Do I want the MAX(element_value), or do I really just want the value that was just inserted?)

Here's the approach I would use in the trigger. I'd avoid running a query against the original_table at all, and just do the upsert on the one column in target_table:

IF NEW.element_label = 0 THEN
   -- name
   INSERT INTO target_table (group_id,       `name`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `name` = VALUES(`name`);
ELSEIF NEW.element_label = 1 THEN
   -- street_address
   INSERT INTO target_table (group_id,       `street_address`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `street_address` = VALUES(`street_address`);
ELSEIF NEW.element_label = 2 THEN
   -- street_address2
   INSERT INTO target_table (group_id,       `street_address2`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `street_address2` = VALUES(`street_address2`);
ELSEIF NEW.element_label = 3 THEN
   -- city
   INSERT INTO target_table (group_id,       `city`) 
   VALUES (NEW.group_id, NEW.element_value)
   ON DUPLICATE KEY UPDATE                   `city` = VALUES(`city`);
ELSEIF NEW.element_label = 4 THEN
   ...
END

I know that's not very pretty, but I think it's the best approach if the maintenance of target_table has to be done at the time rows are inserted into original table. (The problem isn't really the database here, the problem is the EAV model, or really, the "impedance mismatch" between the EAV model (one row for each attribute value) and the relational model (one column in each row for each attribute value).

This isn't any uglier than the MAX(),MAX(),MAX() query.

I would also ditch the AUTO_INCREMENT id in the target table, and just use group_id (value from the original_table) as the primary key in my target_table, since I only want one row for each group_id.


UPDATE

You have to change the delimiter from semicolon to something else when the trigger body contains semicolons. Documentation here:http://dev.mysql.com/doc/refman/5.5/en/trigger-syntax.html

e.g.

DELIMITER $$

CREATE TRIGGER trg_original_table_ai
AFTER INSERT ON original_table
FOR EACH ROW
BEGIN
   IF NEW.element_label = 0 THEN
      -- name
      INSERT INTO target_table (group_id,       `name`) 
      VALUES (NEW.group_id, NEW.element_value)
      ON DUPLICATE KEY UPDATE                   `name` = VALUES(`name`);
   ELSEIF NEW.element_label = 1 THEN
      -- street_address
      INSERT INTO target_table (group_id,       `street_address`) 
      VALUES (NEW.group_id, NEW.element_value)
      ON DUPLICATE KEY UPDATE                   `street_address` = VALUES(`street_address`);
   END IF;
END$$

DELIMITER ;