且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用JavaScript RegEx从推文中提取URL?

更新时间:2022-02-16 22:27:53

这是我用来从Twitter状态提取链接的正则表达式之一.

Here is one of the regular expressions that I've used for pulling links from Twitter statuses.

链接匹配模式

(?:<\w+.*?>|[^=!:'"/]|^)((?:https?://|www\.)[-\w]+(?:\.[-\w]+)*(?::\d+)?(?:/(?:(?:[~\w\+%-]|(?:[,.;@:][^\s$]))+)?)*(?:\?[\w\+%&=.;:-]+)?(?:\#[\w\-\.]*)?)(?:\p{P}|\s|<|$)

或者,如果您控制如何从Twitter获取状态,则可以将 include_entities 参数传递给 statuses/user_timeline ),让Twitter为您分解链接,提及和主题标签,如下所示:

Alternatively, if you control how the statuses are fetched from Twitter, you can pass the include_entities parameter to statuses/show (or any other method that supports it, such as statuses/user_timeline) to have Twitter break out the links, mentions, and hashtags for you, like the following:

http://api.twitter.com/1/statuses/show/23918022347456512.json?include_entities = true

在生成的JSON中,注意实体对象.

In the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[27,53],"url":"http:\/\/tinyurl.com\/38wp7nt"}],"hashtags":[],"user_mentions":[]}

现在,您可以引用Twitter返回的数据,而不必自己解析.这种方法***的事情是,您将工作转移到Twitter,而不必担心您的正则表达式是否与Twitter完全匹配.

Now, you can reference the data returned from Twitter rather than having to parse it yourself. The best things about this approach are you offload the work to Twitter, and never have to worry whether your regular expression will match with Twitter's exactly.