Character Identifier tool

andrew_polk · January 13, 2023, 4:14pm

Is the Character Identifier tool linked to in this old Keyman blog still the recommended tool for determining Unicode code points of a bit of text? Or has it been replaced by something else?

AndyBlack · January 13, 2023, 7:20pm

I do not know what is recommended, but I use ZEdit that comes with FLEx (FieldWorks Language Explorer). Load a file or copy text into ZEdit and use its Tools menu item / Show Character Codes.

Lorna · January 13, 2023, 7:46pm

I still use it ALL the time…even installed it on my new Win11 machine. I know others use an online tool that I cannot remember the name of, but one thing I love about the Character Identifier is that it also tells me what fonts on my system support the character.

bobh · January 14, 2023, 2:05am

For me, ditto Lorna’s reply.

Marc · January 14, 2023, 2:08am

@Lorna, @bobh, that’s really helpful to know – I may try and publish an update to the tool since it is still helpful! I also use it frequently, but wasn’t sure if anyone else was.

The same tool is also built into Keyman Developer: the Character Identifier toolbox (View|Character Identifier).

The Character Identifier doesn’t do well with characters outside BMP, I think, and its font reporting is helpful but not necessarily 100% accurate.

bobh · January 14, 2023, 2:58am

Another option is Richard Ishida’s UniView web page. This page has lots of uses – so much so that I keep a Chrome shortcut for it pinned to my taskbar – but if you paste your unknown text into the “Search Text Here” box then click the Characters button, you’ll get a display listing, one per line, the characters in your text along with their Unicode value and name. Clicking on any of those then gives you details about the character. Here’s a sample result:

This does not give the font information that Marc’s tool provides, but can be useful (and works on any platform).

kblewett · January 14, 2023, 4:46am

Another (non-internet) option if you have Paratext or LibreOffice installed on your computer: In either program place your cursor to the right of the character you want to identify and type +x to see the Unicode code point (but not the character Name). Type +x again to switch back to the character view. (Disclaimer for R-to-L scripts: cursor to the left of the character??)
In Paratext it might not work in certain “views” so switch view and try again if needed. I use this ALL the time in both programs.

Matthew_Lee · January 14, 2023, 3:50pm

I use the Character Identifier tool weekly, and just showed it to a roomful of LangTech guys. I’ll also claim credit for being the one that asked Marc to publish it in the first place after he showed a screenshot.

I also use SIL’s Unicode Character Count tool (Unicode Character Count utility), and have my text editor set up to run it, but I recently found that the links are dead.

Lorna · January 14, 2023, 7:03pm

downloads on scripts.sil.org should be working again. We had an issue with AWS I think.

Andrew_Cunningham · January 15, 2023, 2:52am

I think it isn’t necessary to have a font reporting function. Such tools can only tell you if a particular codepoint is used, but the important factors in choosing a font for a language isn’t really about the codepoint perse by the glyphs used and the opentype features utilised and how they support the target language.

I tend to do most of my initial work in vscode for which there are a few extensions that can give you the codepoints of the grapheme cluster at the cursor point.

I also have own python scripts I can use from the command line, which I really need to wrap a gui around at some point.

But ultimately there are lots of tools out there, but key issue is what data you require? Just the code points? Character names as well? Other details?

Marc · January 15, 2023, 5:00am

For me, the font tool serves a purpose of reducing my 800 font choice down to 3 or 4, which I can then realistically check one-by-one.

Andrew_Cunningham · January 15, 2023, 12:05pm

Marc I guess we’ll have to agree to disagree, esp since the hidden fonts on a MacOS make things more difficult, those hidden Noto fonts, and not makes the whole process problematic.

Been playing with python, an idea for a tool which will analyse characters and check fonts, flag if any characters in string are know to have regional or language variants for checking, also thinking of incorporating those pesky Noto bugs that impact language support. Still just an idea. The part analysing what characters are in the string is done, and some of the word to optionally display certain UCD data.

Need all the infrastructure in place to search fonts and annotate results for fonts.

Christopher_Fiorello · January 17, 2023, 12:45pm

I use the very simple website What Unicode Character is this?
That’s the shortcut pinned to my web shorcuts.

Steve_White · January 17, 2023, 1:31pm

Thanks, Chris.

If someone needs an offline tool for identifying Unicode characters, there is a utility called BabelPad (made by the same folks that provide this tool) that is a free download.

makara · March 27, 2023, 7:18am

This topic will close in three weeks. Feel free to add more comments or suggestions before then.

makara · April 17, 2023, 7:18am

This topic was automatically closed after 21 days. New replies are no longer allowed.