正则表达式学习笔记1.2

更新时间：2022-09-27 22:00:12

书接上一回：

实例三：

数据提取

要求：从一段HTML代码中提取出所有的email地址和< a href...>tag中的链接地址

public class HtmlTest {

public static void main(String[] args) {

String htmlText = "<html>"

+ "<a href=\"testone@163.com\">163test</a>\n"

+ "<a href='www.163.com@163-com.com'>163news</a>\n"

+ "<a href=http://www.163.com>163lady</a>\n"

+ "<a href = http://sports.163.com>网易体育</a>\n"

+ "<a href = \"http://gz.house.163.com\">网易房产</a>\n"

+ ".leemaster@163" + "luckdog.com" + "</html>";

System.out.println("开始检查email");

for (String email : extractEmail(htmlText)) {

System.out.println("邮箱是:" + email);

}

System.out.println("开始检查超链接");

for (String link : extractLink(htmlText)) {

System.out.println("超链接是:" + link);

}

private static List<String> extractLink(String htmlText) {

List<String> result = new ArrayList<String>();

Pattern p = Pattern.compile(Regexes.HREF_LINK_REGEX);

Matcher m = p.matcher(htmlText);

while (m.find()) {

result.add(m.group());

}

return result;

}

private static List<String> extractEmail(String htmlText) {

List<String> result = new ArrayList<String>();

Pattern p = Pattern.compile(Regexes.EMAIL_REGEX);

Matcher m = p.matcher(htmlText);

while (m.find()) {

result.add(m.group());

}

return result;

}

public class Regexes {

public static final String EMAIL_REGEX =

"(?i)(?<=\\b)[a-z0-9][-a-z0-9_.]+[a-z0-9]@([a-z0-9][-a-z0-9]+\\.)+[a-z]{2,4}(?=\\b)";

public static final String HREF_LINK_REGEX

= "(?i)<a\\s+href\\s*=\\s*['\"]?([^'\"\\s>]+)['\"\\s>]";

}

运行结果：

开始检查email

邮箱是:testone@163.com

邮箱是:www.163.com@163-com.com

邮箱是:leemaster@163luckdog.com

开始检查超链接

超链接是:<a href="testone@163.com"

超链接是:<a href='www.163.com@163-com.com'

超链接是:<a href=http://www.163.com>

超链接是:<a href = http://sports.163.com>

超链接是:<a href = "http://gz.house.163.com"

实例四：

查找重复单词

要求：查找一段文本中是否存在重复单词，如果存在，去掉重复单词。

public class FindWord {

public static void main(String[] args) {

String[] sentences = new String[] { "this is a normal sentence",

"Oh,my god!Duplicate word word",

"This sentence contain no duplicate word words" };

for(String sentence:sentences){

System.out.println("校验句子:"+sentence);

if(containDupWord(sentence)){

System.out.println("Duplicate word found!!");

System.out.println("正在去除重复单词"+removeDupWords(sentence));

}

System.out.println("");

}

private static String removeDupWords(String sentence) {

String regex = Regexes.DUP_WORD_REGEX;

return sentence.replaceAll(regex,"$1");

}

private static boolean containDupWord(String sentence) {

String regex = Regexes.DUP_WORD_REGEX;

Pattern p = Pattern.compile(regex);

Matcher m = p.matcher(sentence);

if(m.find()){

return true;

}else{

return false;

}

public class Regexes {

public static final String DUP_WORD_REGEX

= "(?<=\\b)(\\w+)\\s+\\1(?=\\b)";

}

运行结果：

校验句子:this is a normal sentence

校验句子:Oh,my god!Duplicate word word

Duplicate word found!!

正在去除重复单词Oh,my god!Duplicate word

校验句子:This sentence contain no duplicate word words

未完待续。。。

本文转自jooben 51CTO博客，原文链接:http://blog.51cto.com/jooben/316592

上一篇 : ：Android开发学习笔记：Activity的简介下一篇 : 正则表达式学习笔记2.1

正则表达式学习笔记1.2

相关阅读

推荐文章

正则表达式 学习笔记1.2

相关阅读

推荐文章

正则表达式学习笔记1.2