[hfs-user] Char set for HFS volumes
Fri, 1 Mar 2002 08:02:59 -0800
On Friday, March 1, 2002, at 12:48 AM, Biswaroop Banerjee wrote:
> Can anybody tell me which char set is understood in
> HFS volumes. For e.g. in DOS only A-Z, 0-9 and _ are
> the valid characters.
> So, what is for HFS.
Names on HFS are 31 bytes (27 bytes for volume names) and can consist of
any byte value except ASCII colon (":"). Note: that means a zero byte
*is* valid (which can make things difficult for implementations that use
C-style strings which are zero-terminated.
Above I said bytes, not characters. To support localizations to many
languages, Mac OS supports a variety of character set encodings. Some
of those encodings use two bytes to represent a single character. That
means that file names might only contain 15 characters, which would
occupy 30 bytes.
Off hand, I don't know if or where the various encodings are described.
There may be documentation on Apple's developer web site.
Remember that HFS is case insensitive. The definition of what
characters are "upper case" or "lower case" is based on the MacRoman
encoding. MacRoman is similar to ISO Latin 1. Take a look at the
Darwin sources for code that does a case insensitive string compare
using MacRoman (it will be called as part of the B-tree key comparison
function for the catalog B-tree).
> Again, for writing into a HFS volume for creating a CD image can we
> go for UNICODE .
I would advise against that. While you can store just about any byte
sequence (as long as it doesn't contain an ASCII colon), storing Unicode
(eg., UTF-8 or UTF-16) would make for garbage-looking filenames when
viewed on a Macintosh.
> The HFS volumes contain data in "Big Endian " format.
> Can anybody tell me what are the fields which has to be
> filled in Big Endian format.
Everything is big endian. That even includes file names. So, Macintosh
encodings that use two bytes per character will store those two bytes in
big endian form on HFS. And the two bytes per UTF-16 code point are
stored in big endian form on HFS Plus.