[hfs-user] Difference in Data types??

Mark Day mday@apple.com
Fri, 29 Mar 2002 09:07:55 -0800


On Friday, March 29, 2002, at 03:40  AM, Biswaroop(External) wrote:

>   Well in the MDB structure for an HFS volume the
>   field
>   vol.drXTClpSiz /* clump size for extents overflow file */
>   is 4 bytes long.
>   Again in the Catalog Data Record structure the member
>   filClpSize; /* file clump size */
>   takes 2 bytes.
>  
>   Therefore when i assign the value of the first variable to
>   the second I lose information.

I'm not sure why you're copying from one to the other.  The drXTClpSiz 
is the clump size for the extents B-tree only.  Since the B-tree is used 
in a very different way from typical user files, I don't see a reason to 
try and set an ordinary file's clump size to be the same as one of the 
B-trees.

I believe Apple's code sets the clump size in a catalog record to zero; 
I think you can do the same.  It turns out that having different clump 
sizes for different files wasn't very useful.  If an application really 
wanted to make sure that a file was allocated in large contiguous 
pieces, it was generally better to try and pre-allocate it in one giant 
contiguous piece (or when allocating additional space, make the entire 
allocation contiguous).  At runtime, Apple's code just uses a 
volume-wide default for ordinary files (i.e. ones with a catalog record).

>   Please is there any simple formula to find out the
>   extent file size and the catalog file size for a volume
>   when we know before hand how many files have to be
>   in that volume.
>   For eg. if i know i have to write "X" files  contained in
>   "Y" number of directories.
>    Then can i calculate what should be the volume's
>    clump size for the extents overflow file and the catalog
>   file.

Certainly no simple formula for the catalog B-tree.  In part that is 
because the size of the catalog is determined in part by the lengths of 
the file and directory names (even more so on HFS Plus, where the keys 
in index nodes are variable length).  And for volumes that are modified 
over time, the order of operations will affect the size of the B-tree in 
complex ways.  I'm sure you could come up with a statistical guess based 
on average name lengths, average density of nodes (i.e. how "full" they 
are), etc.

Your particular case of creating a CD is actually a much simpler 
problem, and you can compute an exact answer if you want.  Since the 
files won't be modified over time, you can guarantee that they will not 
be fragmented.  That means you can get by with a minimal extents B-tree 
containing no leaf records.  That means a single allocation block (for 
the header node; the other nodes are unused and should be filled with 
zeroes).

Since you know the complete set of files and directories in advance, you 
can build an optimal tree by packing as many leaf records in a node as 
possible, and then moving to the next node.  All it requires is knowing 
the order that you will assign directory IDs to directories, and then be 
able to sort the file and directory names for the items in a single 
directory.  That way you can predict the entire leaf sequence.  Once you 
know the number of leaf nodes, you can calculate the number of index 
nodes that will be parents of the leaf nodes, and so on up the tree 
until you get to a level containing exactly one node (the root).  This 
should be relatively easy for HFS because the records in index nodes are 
constant size, so the calculation for each level should just be a simple 
divide and round up.  For HFS Plus, you would have to keep track of the 
actual file or directory names since the length of the keys in index 
nodes vary based on the name lengths.

If that's too complicated, you could always fall back to assuming a 
constant size (maximum or average) for all of the records.  Don't forget 
that for thread records, the key is of fixed size but the data is 
variable (since it contains a variable-length string).

-Mark