且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Postgres 9.6+中生成长度为N的随机,唯一的字母数字ID?

更新时间:2023-02-10 09:15:18

弄清楚了,这是完成该功能的函数:

 创建或替换功能generate_uid(size INT)返回文本为$$ 
声明
个字符TEXT:='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
个字节BYTEA:= gen_random_bytes(size);
l INT:=长度(字符);
i INT:= 0;
输出文本:=’;
开始
,而我<大小LOOP
输出:=输出|| substr(characters,get_byte(bytes,i)%l + 1,1);
i:= i + 1;
END LOOP;
RETURN输出;
END;
$$语言plpgsql挥发;

然后运行它就可以了:

  generate_uid(10)
-'3Rls4DjWxJ'






警告



执行此操作时,需要确保创建的ID的长度足以随着创建的对象数量的增加,避免随着时间的推移发生冲突。由于生日悖论因此,对于任何合理创建的对象,您可能希望其长度大于(或大于) 10 ,我只是使用了 10






用法



使用已定义的函数,您可以在表定义中使用它,如下所示:

 创建表集合(
id TEXT PRIMARY KEY DEFAULT generate_uid(10),
name TEXT NOT NULL,
...
);(

然后在插入数据时,就像这样:

  INSERT INTO集合(名称)VALUES(一个); 
INSERT INTO集合(名称)VALUES( Two);
INSERT INTO集合(名称)VALUES(三);
SELECT * FROM集合;

它将自动生成 id 值:

  id |名称| ... 
----------- + -------- + -----
owmCAx552Q | ian |
ZIofD6l3X9 |胜利者|






带前缀的用法



或者您可能想在查看日志或调试器中的单个ID时添加前缀以方便使用(类似于 Stripe如何做到),就像这样:

 创建表集合(
id TEXT PRIMARY KEY DEFAULT('col_'|| generate_uid(10)),
name TEXT NOT NULL,
...
);

插入收藏集(名称)VALUES(一个);
INSERT INTO集合(名称)VALUES( Two);
INSERT INTO集合(名称)VALUES(三);
SELECT * FROM集合;

id |名称| ...
--------------- + -------- + -----
col_wABNZRD5Zk | ian |
col_ISzGcTVj8f |胜利者|


I've seen a bunch of different solutions on *** that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want to ask again to see if there is a simpler solution in newer versions.

Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...

bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS

uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA

HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK

(Like the IDs that Stripe uses.)

How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?

I'm thinking that ideally the solution has a signature similar to:

generate_uid(size integer) returns text

Where size is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.

From what I can tell, it must use gen_random_bytes() instead of random() for true randomness, to reduce the chance that they can be guessed.

Thanks!


I know there's gen_random_uuid() for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ" that are as short as possible while still containing only alphanumeric characters.

This requirement is also why encode(gen_random_bytes(), 'hex') isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.

I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:

var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

function generate(length) {
  var bytes = crypto.randomBytes(length);
  var chars = [];

  for (var i = 0; i < bytes.length; i++) {
    chars.push(set[bytes[i] % set.length]);
  }

  return chars.join('');
}

Figured this out, here's a function that does it:

CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
  characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  bytes BYTEA := gen_random_bytes(size);
  l INT := length(characters);
  i INT := 0;
  output TEXT := '';
BEGIN
  WHILE i < size LOOP
    output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
    i := i + 1;
  END LOOP;
  RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

And then to run it simply do:

generate_uid(10)
-- '3Rls4DjWxJ'


Warning

When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.


Usage

With the function defined, you can use it in a table definition, like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT generate_uid(10),
  name TEXT NOT NULL,
  ...
);

And then when inserting data, like so:

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

It will automatically generate the id values:

    id     |  name  | ...
-----------+--------+-----
owmCAx552Q | ian    |
ZIofD6l3X9 | victor |


Usage with a Prefix

Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
  name TEXT NOT NULL,
  ...
);

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

      id       |  name  | ...
---------------+--------+-----
col_wABNZRD5Zk | ian    |
col_ISzGcTVj8f | victor |