Unicode-in-DOS (u.bas) Programmer's Manual

U.Bas Manual

Using U.Bas for Unicode Characters

Loading a Unicode Character

char$ = uchar$(&Hcode&)
char$ = ucharf$(&Hcode&, font$)

Using The Unicode Character

utf$ = forlater$(char$)
code = getcode(char$)
utf$ = scalar2utf$(index)
index = utf2scalar(index)

Using U.Bas To Make Your Own Characters

index = newchar(glyph$)
setchars(start, count, glyphs$)
setchar(slot, glyph$)

Tips and Tricks

Restoring Old Characters
Using Windows' Extended Characters

Issues
About the Author

U.Bas Manual

U.bas allows custom characters to be used in text-based DOS applications. It's original goal was to allow using Unicode characters, but that is not the only way u.bas can be used.

Using U.Bas for Unicode Characters

The PC BIOS only provides 256 ``slots'' for storing graphical representations of characters. To get around this limitation, u.bas overwrites uncommon character graphic (glyph) slots. This way Unicode characters can be used the same as as any other character.

Graphics of the control characters (slots &H00 to &H1F) are rarely used. More often control characters actually control something, and their graphics are not used. So u.bas uses the graphics for holding Unicode characters. If enough Unicode characters are loaded, u.bas will also overwrite accented characters, then Greek characters. Note that if all slots are filled up, already used slots will be overwritten.

Loading a Unicode Character

Before a Unicode character can be printed and used, it must be loaded into one of the BIOS's 256 character slots. uchar$() or ucharf$() can be used to do this.

`char$ = uchar$(&Hcode&)`

uchar$() loads, if not already loaded, the 256-bit glyph specified by CODE and returns the slot it occupies. The glyph is loaded from the file number ``FONTFILE'', which is by default opened to FONTS\UNIFONT\UNIFONT.BIN. All assigned Unicode character codes and their glyphs are available from http://www.unicode.org/charts/.

The Unicode charts give the character number in hexadecimal, you can too if the &H prefix is used. The default glyphs are part of the Unifont project at http://czyborra.com/unifont/. This project would have not have as very many glyphs without Unifont.

WARNING: The trailing & is needed to make BASIC treat the number as unsigned, otherwise high numbers will be negative. You can leave it off for low-numbered characters.

Example:

    euro$ = uchar$(&H20AC)      ' Euro sign

`char$ = ucharf$(&Hcode&, font$)`

Loads a 128-bit glyph from FONTS\128BIT\font$. You can make your own or edit existing 128-bit fonts with FontEdit (see FONTS\128BIT\README.txt), or use one of the existing ones:

Normal: My attempt at creating a complete Unicode font. Not very successful, but includes blocks 0000, 2000, and 2500. Using FONTSEL (from the DosFont/FontEdit) package, it is possible to set your font to 0000.fnt and have Windows extended characters instead of BIOS extended characters. Deprecated in favor of Unifont.
Blank: Blank character map for creating new fonts from scratch.
Hex: Each glyph contains two hexadecimal digits cooresponding to it's character code, from 00 to FF. Might be useful for debugging.

The only limitation 128-bit glyphs have over 256-bit glyphs is they can only be 8x16, while 256-bit glyphs can be 16x16 (by using two slots). See \fonts\unifont\conv.pl for the file format used for 256-bit glyphs.

Example:

    fourtwo$ = ucharf$(&H42, "Hex")          ' 4/2 symbol

Using The Unicode Character

A loaded Unicode character can be printed to the screen by simply using PRINT:

    PRINT "Balance: " + uchar$(&H20A4) + "10,000"

but for saving the character to a file forlater$(wchar$) must be used.

`utf$ = forlater$(char$)`

utf$ is a UTF-8 representation of char$, suitable for storing in files and other places which can be decoded before use.

To decode a string of UTF-8, use unutf$(). unutf$() will look for any UTF-8 sequences and load any unloaded characters. utf2scalar(utf$) may be used to decode a single UTF-8 unit into it's Unicode character number; use char$ = uchar$(utf2scalar(utf$)) to load the char.

Example:

    OPEN "fax" FOR OUTPUT AS #1
    PRINT #1, "You owe us: "
    PRINT #1, "33" + forlater$(uchar$(&HA2))
    CLOSE 1

    OPEN "fax" FOR INPUT AS #1
    LINE INPUT #1, l$
    PRINT l$
    LINE INPUT #1, l$
    PRINT unutf$(l$)
    'PRINT MID$(l$, 1, 2) + fornow$(uchar$(utf2scalar(MID$(l$, 2, 1))))
    CLOSE 1

`code = getcode(char$)`

Returns the Unicode character number which char$ holds. slots(ASC(wchar$)) may be faster.

Example:

    yingyang$ = uchar$(&H25D3)
    PRINT "Ying-yang: ", fornow$(yingyang$) + " has a code of " + STR$(getcode(yingyang$))

`utf$ = scalar2utf$(index)`

Encodes the Unicode character INDEX into UTF-8. UTF-8 is a format which stores Unicode characters in optiminally sized units. Used by forlater$().

`index = utf2scalar(index)`

Decodes a UTF-8 encoded character into the Unicode character number INDEX.

Using U.Bas To Make Your Own Characters

U.Bas allows you to completely bypass it's Unicode functions and create your own character graphics.

`index = newchar(glyph$)`

Allocates a free slot in the global slots() array and sets it to contain the graphic glyph$. index is the slot allocated; you can print your new character by using CHR$(index). glyph$ is a bitmap.

    PRINT "Gradiant: ", CHR$(newchar(STRING$(64, &HAA)))

`setchars(start, count, glyphs$)`

Sets multiple character glyphs, starting from start and ending at start+count to the character graphic glyph$. glyph$ is a one-bit-per-pixel bitmap. Setting many characters at a time results in less flashing than setting each glyph individually.

`setchar(slot, glyph$)`

Same as setchars(slot, 1, glyph$).

Example:

    ' After running this program, go to a help topic in QBasic. The hidden
    ' nulls should appear as vertical lines.
    setchar 0, STRING$(64, &HAA)

Tips and Tricks

Restoring Old Characters

Unless you want make other programs look different, your program should restore the default character set. VOLTA.COM from the DosFont distribution is provided to do this.

Using Windows' Extended Characters

Windows follows the ISO standard for characters A0-FF, but invents its own characters to fit in slots 80-9F. If you are reading lots of files which use extended Windows characters in DOS, it can be a pain to have to guess what they mean or resort to Notepad. Now you don't have to -- use FONTSEL from DosFont to set FONTS\128BIT\NORMAL\0000WIN.FNT as the DOS font. Of course, box drawing won't work then.

Issues

When a DOS box in not full screen, Unicode characters will appear as the previous character in the slot it decided to take over. This is because non-full screen DOS boxes in Windows use their own character data. To solve this problem, use the DOS box in full screen.
QBasic will treat 16-bit hexadecimal numbers over &H8000 as negative unless the & suffix is used. If you get ``Bad record number'', try using the & suffix.
Unifont does not contain every glyph in Unicode. If you run into a character without a glyph, consider drawing your own and submitting it to Unifont. You will see a ``Glyph not in file: '' message or a quadruple-line graphic if a glyph does not currently exist in Unifont.
Code that relys on uchar$() cannot be executed in the Immediate Window because #FONTFILE must be opened first. Put your test code in the main module instead.

About the Author

Send comments about u.bas to unicodeindos@xyzzy.cjb.net. Comments about Unifont should be directed to it's maintainer.

Valid HTML 4.0?

Modified Sun Mar 25 08:48:47 2007 generated Sun Mar 25 08:56:33 2007
http://jeff.tk/doschar/manual.html