且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

检查字符串是否是有效的用Java编码的UTF-8

更新时间:2022-10-15 10:28:37

只能检查字节数据。如果你构造了一个String,那么它已经在UTF-16内部了。



另外只有字节数组可以被UTF-8编码。 >

以下是UTF-8转换的常见情况。

  String myString = \\\H\\\e\\\l\\\l\\\o世界; 
System.out.println(myString);
byte [] myBytes = null;

try
{
myBytes = myString.getBytes(UTF-8);
}
catch(UnsupportedEncodingException e)
{
e.printStackTrace();
System.exit(-1);
}

for(int i = 0; i< myBytes.length; i ++){
System.out.println(myBytes [i]);
}

如果您不知道字节数组的编码, juniversalchardet 是一个帮助您检测到它的图书馆。


How can I check if a string is in valid UTF-8 format?

Only byte data can be checked. If you constructed a String then its already in UTF-16 internally.

Also only byte arrays can be UTF-8 encoded.

Here is a common case of UTF-8 conversions.

String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;

try 
{
    myBytes = myString.getBytes("UTF-8");
} 
catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
    System.exit(-1);
}

for (int i=0; i < myBytes.length; i++) {
    System.out.println(myBytes[i]);
}

If you don't know the encoding of your byte array, juniversalchardet is a library to help you detect it.