What is this character: '*'?
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence
--
Chapters
00:00 What Is This Character: '*'?
01:29 Accepted Answer Score 71
02:36 Answer 2 Score 27
03:15 Thank you
--
Full question
https://superuser.com/questions/1103086/...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#unicode #specialcharacters
#avk47
ACCEPTED ANSWER
Score 71
The paste failed not because of the asterisk, which is a perfectly regular asterisk, but because of the Unicode character U+200B. As the character is a ZERO WIDTH SPACE
, it does not display when it is copied.
Using the Python code:
stro=u"'*'?"
def uniconv(text):
return " ".join(hex(ord(char)) for char in text)
uniconv(stro)
The function uniconv
converts the input string (in this case, u"'*'?"
) into their Unicode codepage equivalents in hexadecimal format. The u
prefix to the string identifies the string as a Unicode string.
I was able to obtain the output:
0x27 0x2a 0x200b 0x27 0x3f
We can clearly see that 0x27
, 0x2a
and 0x3f
are the ASCII/Unicode hexadecimal values for the characters '
,*
and ?
respectively. That leaves 0x200b
, therefore identifying the character.
Note that the Python code, when pasted into the body, had the U+200B character removed by SE's Markdown software. In order to obtain the expected result, you need to copy it directly from the title using the Edit view.
ANSWER 2
Score 27
With the help of @Rinzwind in the Ask Ubuntu chat room, I figured out that the problem isn't the character at all. Note the output of od
:
$ printf '*' | od -c
0000000 * 342 200 213
0000004
The 342 200 213
is an octal representation of another character and we can use this site to look it up:
Character
Character name ZERO WIDTH SPACE
Hex code point 200B
Decimal code point 8203
Hex UTF-8 bytes E2 80 8B
Octal UTF-8 bytes 342 200 213
UTF-8 bytes as Latin-1 characters bytes â <80> <8B>
So, what I actually had was two unicode characters, the normal *
and a zero width space.