且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

删除大型数据集的重复项,包括真实重复项(整个行重复)和基于一列的重复项

更新时间:2023-02-02 22:50:37

您可以使用内置的removeDuplicates方法,该方法将就地删除重复项.然后使用哈希对象删除日期重复项:

You can use the inbuilt removeDuplicates method, which will remove duplicates in place. Use hash object to remove date duplicates afterwards:

function remDups(sheet) {
  let sh = sheet || SpreadsheetApp.getActive().getSheetByName('Sheet1');
  let rg = sh.getRange(2, 1, sh.getLastRow() - 1, 2);
  let initDataSz = rg.getNumRows();
  let newRg = rg.removeDuplicates();
  let newDataSz = newRg.getNumRows();
  //console.info({ initDataSz, newDataSz });
  let trueDups = initDataSz - newDataSz;
  let values = newRg.getValues();
  //newRg.copyTo(sh.getRange('C1'));
  newRg.clearContent();

  let out = Object.entries(
    values.reduce((obj, [date, color]) => {
      let oldDate = (obj[color] = obj[color] || Infinity);
      if (oldDate - date > 0) {
        obj[color] = date;
      }
      return obj;
    }, {})
  ).map(e => e.reverse());
  let falseDups = newDataSz - out.length;
  sh.getRange(2, 1, out.length, out[0].length).setValues(out);
  return [`${trueDups}`, `${falseDups}`];
}

性能: