且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何检查我的Oracle数据库中是否正确设置了商标(™)字符?

更新时间:2023-11-25 14:49:52

您发布的字符文字™不是U + 0099(控制字符),而是U + 2122(TRADE MARK SIGN)。



Unicode规范定义 U + 0099如下:

  0099;< control&gt ;; Cc; 0; BN ;;;; N ;;;; 

所以,它甚至没有名字,我没有走开规范找出这个字符是什么。



解码U + 0099在Windows中会导致一个商标字形。



UTF-8中TRADE MARK SIGN(U + 2122)的正确字节序列为 E2 84 A2


How can I check that the trademark(™) character is set correctly in my Oracle database?

I expect it to be stored using UTF-8 encoding.

I have a value stored in a Salesforce.com field that looks like this from the GUI (notice the trademark character):

Chuck Norris's Roundhouse Kick™

I'm using Informatica to replicate it to an Oracle database. My database is set to use the AL32UTF8 encoding.

How it shows up in SQL Developer

When I query my table using SQL Developer, the trademark symbol shows up as a rectangle (black border, white fill).

How it shows up in HTML

When I export it from SQL Developer using the UTF-8 encoding into an HTML document, and open it in Chrome, the trademark symbol does not appear at all. When I open it in IE, the it appears as a rectangle again. In Firefox, it's a rectangle with 00 in the top half and 99 in the bottom half. All three browsers interpret the HTML doc using UTF-8.

How it shows up in text editors

Opening the same HTML doc in Notepad and Notepad++, the trademark symbol shows up as a rectangle. If I use the Hex Viewer plugin for Notepad++ I see the byte encoding is C2 99. That seems to be the correct encoding for the trademark symbol in UTF-8.

When I open document in MS Write, the trademark character looks like this: ™.

When I get the value programmatically

Using Python, when I get the value from the database, the trademark character is replaced with '\xbf' -- the inverted question mark, but that character is not even properly encoded as far as I can tell because it's missing at least one leading byte (depending on the specific encoding)

>>> import cx_Oracle
>>> con = cx_Oracle.connect('username', 'password', 'db')
>>> cur = con.cursor()
>>> cur.execute('select * from trademark')
<__builtin__.OracleCursor on <cx_Oracle.Connection to username@db>>
>>> records = cur.fetchall()
>>> records[0][0]
"Chuck Norris's Roundhouse Kick\xbf"

Ideally, I'd like to be able to validate the data stored in my Oracle database using all of the above methods. I'd settle for someone just validating that what I saw in the Hex Viewer was enough of a "test" ;)

The character literal ™ you posted is not U+0099 (a control character), but U+2122 (TRADE MARK SIGN).

The Unicode spec defines U+0099 as follows:

0099;<control>;Cc;0;BN;;;;;N;;;;;

So, it doesn't even have a name and I haven't gone digging round the spec to find out what this character is for.

Decoding U+0099 in Windows does result in a trademark grapheme. I guess this is a bug.

The correct byte sequence for the TRADE MARK SIGN (U+2122) in UTF-8 is E2 84 A2.