The Computer Oracle

What is this character: '*​'?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence

--

Chapters
00:00 What Is This Character: '*​'?
01:29 Accepted Answer Score 71
02:36 Answer 2 Score 27
03:15 Thank you

--

Full question
https://superuser.com/questions/1103086/...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#unicode #specialcharacters

#avk47



ACCEPTED ANSWER

Score 71


The paste failed not because of the asterisk, which is a perfectly regular asterisk, but because of the Unicode character U+200B. As the character is a ZERO WIDTH SPACE, it does not display when it is copied.

Using the Python code:

stro=u"'*​'?"
def uniconv(text):
    return " ".join(hex(ord(char)) for char in text)
uniconv(stro)

The function uniconv converts the input string (in this case, u"'*'?") into their Unicode codepage equivalents in hexadecimal format. The u prefix to the string identifies the string as a Unicode string.

I was able to obtain the output:

0x27 0x2a 0x200b 0x27 0x3f

We can clearly see that 0x27, 0x2a and 0x3f are the ASCII/Unicode hexadecimal values for the characters ',* and ? respectively. That leaves 0x200b, therefore identifying the character.

Note that the Python code, when pasted into the body, had the U+200B character removed by SE's Markdown software. In order to obtain the expected result, you need to copy it directly from the title using the Edit view.




ANSWER 2

Score 27


With the help of @Rinzwind in the Ask Ubuntu chat room, I figured out that the problem isn't the character at all. Note the output of od:

$ printf '*​' | od -c
0000000   * 342 200 213
0000004

The 342 200 213 is an octal representation of another character and we can use this site to look it up:

Character                   ​               
Character name                              ZERO WIDTH SPACE
Hex code point                              200B
Decimal code point                          8203
Hex UTF-8 bytes                             E2 80 8B
Octal UTF-8 bytes                           342 200 213
UTF-8 bytes as Latin-1 characters bytes     â <80> <8B>

So, what I actually had was two unicode characters, the normal * and a zero width space.