[mad-dev] libid3tag demystified

Rob Leslie rob@mars.org
Fri, 25 Jan 2002 23:39:14 -0800


The following is an attempt to provide a quick-start for using libid3tag to
read ID3 tags.

(Sorry this is long -- using the library *is* fairly simple, but the
background will probably help. See madplay.c for a concrete usage example.)

The libid3tag file API will read all tags (v1 and v2) from a file and give
access to a single tag structure which you can treat as though it were v2.4
regardless of the actual tag version(s) in the file. If the file has both v1
and v2 tags, the v2 tag will have precedence.

  struct id3_file *id3_file_open(char const *path, enum id3_file_mode mode);
  struct id3_file *id3_file_fdopen(int fd, enum id3_file_mode mode);

Pass either a pathname or a file descriptor for an already-open file. If you
pass a file descriptor and the open succeeds, the descriptor becomes "owned"
by libid3tag and you should only close it with id3_file_close(). `mode' will
usually be ID3_FILE_MODE_READONLY but eventually ID3_FILE_MODE_READWRITE will
also be supported.

  int id3_file_close(struct id3_file *file);

Use this to close a file handle opened with either of the above routines. It
frees all the internal tag structures, so you should not still have a
reference to any.

  struct id3_tag *id3_file_tag(struct id3_file const *file);

Use this to obtain a tag structure for the file which you can further
query/manipulate. Don't delete the tag; use id3_file_close() instead.

If you want to manipulate v1 and v2 tags independently, you shouldn't use the
file API -- you'll have to read the tags yourself. The following routines will
help:

  signed long id3_tag_query(id3_byte_t const *data, id3_length_t length);

Pass a block of memory and this routine will tell you if it begins with an ID3
tag -- and if so, its length. For example, if you read the last 128 bytes from
a file containing a v1 tag and pass it to this routine, it should return 128.
You must pass at least ID3_TAG_QUERYSIZE (10) bytes. To read a v2 tag, read at
least 10 bytes from the beginning of the file and pass them to this routine to
determine the full tag length. Then read the full tag and parse using the next
routine below. If id3_tag_query() returns a negative number, it means the
block you passed contains an ID3v2.4 footer, and the beginning of the tag is
located at this offset. This is useful for reading v2 tags from the end of a
file. The value 0 is returned when the block you pass doesn't begin with an
ID3 tag header or tag footer.

  struct id3_tag *id3_tag_parse(id3_byte_t const *data, id3_length_t length);

This is the heart of libid3tag: pass a block of memory containing a COMPLETE
tag and this routine creates and returns a tag structure for easy access. It
parses ID3v1, ID3v1.1, ID3v2.2, ID3v2.3, and ID3v2.4 tags, but the structure
returned should always be treated as an ID3v2.4 tag, i.e. all the frames and
fields use ID3v2.4 semantics.

  void id3_tag_delete(struct id3_tag *tag);

Use this to delete a tag structure when you're done with it. If you're using
the file API, use id3_file_close() instead.

To access the data in a tag, you retrieve ID3 "frames" that contain "fields".
A reference for all the ID3v2.4 frames is useful:

  http://www.id3.org/id3v2.4.0-frames.txt

Each frame is identified by four characters. The following routine can either
be used to fetch a single frame or to walk through a number of matching
frames:

  struct id3_frame *id3_tag_findframe(struct id3_tag const *tag,
                                      char const *frameid, unsigned int index);

Use an `index' of 0 for the first matching frame. Increment as desired; a null
pointer is returned if the requested frame doesn't exist. `frameid' can be the
full frame identifier like "TCON", or a prefix like "T" for all text frames,
or "" or a null pointer to walk through all frames.

The following are ID3v1 convenience macros for various frame IDs:

  ID3_FRAME_TITLE	"TIT2"
  ID3_FRAME_ARTIST	"TPE1"
  ID3_FRAME_ALBUM	"TALB"
  ID3_FRAME_YEAR	"TDRC"
  ID3_FRAME_TRACK	"TRCK"
  ID3_FRAME_GENRE	"TCON"
  ID3_FRAME_COMMENT	"COMM"

Once you have the frame, you can query its fields.

  struct id3_frame {
    char id[];			/* frame ID */
    char const *description;	/* English frame description */
    unsigned int nfields;	/* number of contained fields */
    union id3_field *fields;	/* array of fields */
  };

Each field has a type. The following field types exist:

  enum id3_field_type {
    ID3_FIELD_TYPE_TEXTENCODING,	/* small int; used internally */
    ID3_FIELD_TYPE_LATIN1,		/* single-line Latin-1 string */
    ID3_FIELD_TYPE_LATIN1FULL,		/* Latin-1 plus newlines allowed */
    ID3_FIELD_TYPE_LATIN1LIST,		/* Latin-1 string list */
    ID3_FIELD_TYPE_STRING,		/* single-line Unicode string */
    ID3_FIELD_TYPE_STRINGFULL,		/* Unicode plus newlines allowed */
    ID3_FIELD_TYPE_STRINGLIST,		/* Unicode string list */
    ID3_FIELD_TYPE_LANGUAGE,		/* 3-char language ID */
    ID3_FIELD_TYPE_FRAMEID,		/* 4-char frame ID */
    ID3_FIELD_TYPE_DATE,		/* 8-char date field */
    ID3_FIELD_TYPE_INT8,		/* 1-byte integer */
    ID3_FIELD_TYPE_INT16,		/* 2-byte integer */
    ID3_FIELD_TYPE_INT24,		/* 3-byte integer */
    ID3_FIELD_TYPE_INT32,		/* 4-byte integer */
    ID3_FIELD_TYPE_INT32PLUS,		/* variable-length integer */
    ID3_FIELD_TYPE_BINARYDATA		/* raw binary data */
  };

The types for all fields in a frame are fixed; see libid3tag/frametype.gperf
for the field types of each frame. For example:

  FIELDS(UFID) = {
    ID3_FIELD_TYPE_LATIN1,
    ID3_FIELD_TYPE_BINARYDATA
  };

So, a frame with ID "UFID" has two fields; the type of frame->fields[0] is
ID3_FIELD_TYPE_LATIN1, and the type of frame->fields[1] is
ID3_FIELD_TYPE_BINARYDATA. Use the ID3v2.4 frame reference document to make
sense of the actual contents.

All text frames (any frame ID beginning with "T") look like this:

  FIELDS(text) = {
    ID3_FIELD_TYPE_TEXTENCODING,
    ID3_FIELD_TYPE_STRINGLIST
  };

You may ignore frame->fields[0] in this case; it just identifies the encoding
used in the original raw tag for the Unicode strings which follow. All Unicode
strings have a standard representation in libid3tag so the original encoding
is unimportant.

There are two basic string types, Latin-1 (aka ISO-8859-1) and Unicode (aka
UCS-4). Latin-1 strings are represented by normal C strings. Unicode strings
have a special representation (an array of large integers) and so can't be
manipulated as normal C strings.

The basic character/string types are:

  typedef unsigned char id3_latin1_t;	/* Latin-1 character */
  typedef unsigned long id3_ucs4_t;	/* Unicode character */

Therefore id3_latin1_t is compatible with the C string functions, but
id3_ucs4_t is not.

You can use a number of encoding routines to transform Unicode strings into
something perhaps more useful:

  id3_latin1_t *id3_ucs4_latin1duplicate(id3_ucs4_t const *ucs4);
  id3_utf16_t *id3_ucs4_utf16duplicate(id3_ucs4_t const *ucs4);
  id3_utf8_t *id3_ucs4_utf8duplicate(id3_ucs4_t const *ucs4);

Each of the above allocates new memory and so must be free()'d when you are
finished. The encoding formats are Latin-1 (which will lose all non-Latin-1
characters), UTF-16, and UTF-8. If you're familiar with Unicode encodings, the
latter two are pretty much what you think they are. The UTF-16 encoding is
based on short ints, so the endianness depends on your platform.

OK, back to field types. To access a field, you must use the access function
for the field's type. These are:

  signed long id3_field_getint(union id3_field const *field);

This accesses an ID3_FIELD_TYPE_INT{8,16,24,32} field.

  id3_ucs4_t const *id3_field_getstring(union id3_field const *field);

This accesses an ID3_FIELD_TYPE_STRING field.

  id3_ucs4_t const *id3_field_getfullstring(union id3_field const *field);

This accesses an ID3_FIELD_TYPE_STRINGFULL field.

  unsigned int id3_field_getnstrings(union id3_field const *field);

This returns the number of strings in an ID3_FIELD_TYPE_STRINGLIST field.

  id3_ucs4_t const *id3_field_getstrings(union id3_field const *field,
					 unsigned int index);

This returns a (0-based) indexed string from an ID3_FIELD_TYPE_STRINGLIST
field.

  char const *id3_field_getframeid(union id3_field const *field);

This returns a frame ID from an ID3_FIELD_TYPE_FRAMEID field.

  id3_byte_t const *id3_field_getbinarydata(union id3_field const *field,
					    id3_length_t *length);

This returns a block of raw binary data from an ID3_FIELD_TYPE_BINARYDATA
field. In addition, the length of the block is placed in `*length'.

There are perhaps some accessor functions missing. If so, let me know. :-)

I've not covered any of the routines for creating and rendering tags, adding
or removing frames, or changing field contents. That's probably best saved for
another post.

Cheers,

-- 
Rob Leslie
rob@mars.org