且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在类似于 PHP 的 preg_match_all() 的 JavaScript 中使用正则表达式匹配多次出现?

更新时间:2022-11-11 08:42:32

从评论中提升

2020 评论:我们现在有 ,而不是使用正则表达式URLSearchParams,它为我们完成了所有这些,因此不再需要自定义代码,更不用说正则表达式了.

迈克 'Pomax' Kamermans

此处列出了浏览器支持https://caniuse.com/#feat=urlsearchparams>

我建议使用替代正则表达式,使用子组分别捕获参数的名称和值,re.exec():

function getUrlParams(url) {var re =/(?:?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,匹配,参数 = {},decode = function (s) {return decodeURIComponent(s.replace(/+/g, " "));};if (typeof url == "undefined") url = document.location.href;while (match = re.exec(url)) {参数[解码(匹配[1])] = 解码(匹配[2]);}返回参数;}var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result 是一个对象:

{f: "q"地理编码:"hl:德"即:UTF8"iwloc:地址"ll:50.116616,8.680573"q:法兰克福"sll:50.106047,8.679886"来源:s_q"spn:0.35972,0.833588"sspn:0.370369,0.833588"z:11"}

正则表达式分解如下:

(?: # 非捕获组?|& # "?"或者 "&"(?:amp;)?#(允许&",用于错误的 HTML 编码 URL)) # 结束非捕获组( # 第 1 组[^=]+ # 除="、&"或#"之外的任何字符;至少一次) # end group 1 - 这将是参数的名称(?: # 非捕获组=?# 一个=",可选( # 第 2 组[^]* # 除&"或#"之外的任何字符;任意次数) # end group 2 - 这将是参数的值) # 结束非捕获组

I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]

Hoisted from the comments

2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore.

Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams


I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
  f: "q"
  geocode: ""
  hl: "de"
  ie: "UTF8"
  iwloc: "addr"
  ll: "50.116616,8.680573"
  q: "Frankfurt am Main"
  sll: "50.106047,8.679886"
  source: "s_q"
  spn: "0.35972,0.833588"
  sspn: "0.370369,0.833588"
  z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
  ?|&         #   "?" or "&"
  (?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
  [^=]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
  =?           #   an "=", optional
  (            #   group 2
    [^]*     #     any character except "&" or "#"; any number of times
  )            #   end group 2 - this will be the parameter's value
)              # end non-capturing group