Mac OS X Metadata Proposal

Important links:

The proposal itself appears below.


Many developers and users in the Mac community are concerned about the
direction of file system metadata in Mac OS X.  I wrote about the topic
in the article, "Metadata, The Mac, and You"

    http://arstechnica.com/reviews/01q3/metadata/metadata-1.html

and revisited it in a section of my Mac OS X 10.1 review:

    http://arstechnica.com/reviews/01q4/macosx-10.1/macosx-10.1-11.html

The purpose of this proposal is to condense the philosophy and
proposed changes found in those articles, augmented by the input of
the larger community, and submit it formally to Apple as a bug report
(Radar 2826368).

---

PHILOSOPHY

It's important to have a good picture of where one wants to be in the
future.  Without a clear goal, changes may seem arbitrary, and it may
be difficult to weigh the value of one change versus another.  Part of
the concern about the direction of file system metadata in Mac OS X is
due to the uncertainty about Apple's long-term goals.  To put the
changes that will be proposed in this document into perspective, it's
important to clearly identify the proposed "goal state" of file system
metadata in Mac OS X, and the philosophy behind it.

We believe that Mac OS X should achieve a superior user experience
through its native support for a complement of metadata that surpasses
that of other platforms.  Supporting a superset of the file system
metadata found elsewhere means that Mac OS X files can be pared down
to suit the limitations of file system metadata on other platforms
when this is necessary or desirable.  Similarly, transformations from
a more limited set of metadata to Mac OS X's richer environment should
be possible when that is desirable.

Even more broadly, the overriding philosophy is to maintain, and even
extend, the user experience advantages inherent in doing things not
just differently, but better than other platforms--without sacrificing
cross-platform interoperability.

It is our position that achieving transparent cross-platform
compatibility must not, and does not have to come at the cost of the
superior user experience that has defined the Macintosh.  People
choose the Macintosh not just because it is different, but because it
is better.  While removing barriers to interoperability is desirable,
doing so must not take precedence over retaining the advantages of the
Mac.  The natural extension of the willingness to sacrifice usability
for compatibility is to simply use a PC.

BASIC TENETS

There are many problems with Mac OS X's file system metadata policies
and mechanisms as of Mac OS X 10.1.1.  The most important are outlined
below by way of the historical advantages of the Mac platform that
they negate, and the basic tenets of the philosophy described above
that they violate.

1. USERS MUST HAVE FULL OWNERSHIP OF THE FILE NAME.

A file's name should be entirely in the user's domain.  (Note well:
this means the *actual* file name, not the "displayed" file name.)
Changing a file's name should never have adverse effects on its usage
on a Mac OS X system.  This is the interface that Mac users have lived
with and enjoyed for over a decade.  It is an advantage of the Mac
that is so widely recognized that Apple itself has based ad campaigns
on it in the past.

Mac OS X 10.1 eliminates this advantage by removing the user's control
over file names through its user interface, and through Apple's
official human interface recommendations to developers stating that
all files created by Mac OS X application should include file name
extensions.  Furthermore, OS X 10.1 obfuscates the previously direct
and simple interface to file naming and file name display.

For this grievance to be successfully addressed, it must be possible
for Mac OS X users to exercise full control of their file names using
the simple and direct interface that they have enjoyed in classic Mac
OS. There should be no functional or interface "penalty" for choosing
to operate this way in Mac OS X.

2. THE COMPUTER MUST SERVE THE USER WHEN IT COMES TO CROSS-PLATFORM 
   COMPATIBILITY, NOT THE OTHER WAY AROUND.

When a file from another platform is encountered, Mac OS X should help
the user by identifying the file using any information it has
available.  It should be possible to bring the file up to Mac OS X
metadata standards by "upgrading" its metadata in such a way that it
fits into the normal flow of file usage on the Mac.

Similarly, Mac OS X should be able to identify files that are not
prepared for a successful existence on other platforms and offer to
"down-grade" their metadata and/or convert their metadata
representation in such a way that they will be successfully handled on
other platforms.

The mechanisms for metadata representation transformation should be
provided by the OS, but the policies should be user-configurable.

Mac OS X 10.1 does little of the above.  It merely provides vague
warnings during certain file renaming operations and hides and/or
restricts file name editing.  Apple further recommends that all Mac OS
X applications save files with the lowest common denominator in mind.

In Mac OS X 10.1, the user must serve the computer by tolerating the
mysterious file names and file naming interface he is given, avoiding
too much file name manipulation, and by knowing or guessing how to
manually convert files to and from various metadata representations
when dealing with cross-platform file transfers.

3. THE ROBUSTNESS OF FILE IDENTIFICATION AND ASSOCIATION IS MORE
   IMPORTANT THAN PERFORMANCE MINUTIA WHEN IT SERVES TO PROVIDE A
   SUPERIOR USER INTERFACE.

While simple "paths" may have better performance than more abstracted
file identification and tracking mechanisms, the user interface
provided by the abstract mechanisms is worth the trade-off.  The fact
that more primitive operating systems do not "natively" support these
abstractions is not an adequate reason to deprecate or abandon them.

Moreover, any performance penalties and historic limitations inherent
in such file access abstractions should be eliminated, not by
eliminating or decreasing the abstraction, but by creating new, more
advanced implementations of these abstractions in Mac OS X.

PROPOSED SOLUTIONS

There are many milestones on the road to the goals outlined at the
start of this document.  To achieve them all requires a substantial
technological investment: new file systems, new APIs, new metadata
standards, etc.  Such a big change is not feasible in the short-term.
But an awareness of the long-term goals is necessary to see past the
details of the proposed short-term changes, and to keep any subsequent
changes on track.

The proposals below are split into several "phases", ordered from
short-term to long-term.  The baseline for these changes is Mac OS
10.1.1.  Any Mac OS X 10.1.1 feature or policy that is not explicitly
or implicitly negated by a change listed below is understood to be
retained.  (Please note that this includes the deprecation of Mac
resource forks.)

PHASE 1: EMERGENCY REPAIR

The purpose of this phase is to quickly address the most serious
problems and to restore faith in Apple's direction regarding file
system metadata in Mac OS X.

The following changes are proposed.  (Note that many of the changes
depend on each other and may be harmful if implemented independently.)

---

* Officially recommend that all Mac OS X applications write HFS/HFS+
type and creator codes when creating a file.

This change is necessary in order to maintain good backwards
compatibility with classic Mac OS systems and applications, and to
prevent the disappearance of one or more pieces of metadata from files
created by Mac OS X applications, which would further restrict the
number of possible policies that reference file system metadata (e.g.
application binding or icon display) in the future.

* Provide preferences at all levels (system-wide, per-app/per-user,
and per-task) that determine when file type information is redundantly
encoded in the file name (i.e. when file name extensions are appended
to files).

The system-wide preference could be set by an administrator.  It would
be the default for new users on the system.  User-level preferences
(set in System Preferences) would determine the default setting for
all applications run by that user.  These settings could be overridden
in applications' preferences dialogs.  Finally, the default open/save
dialogs and other mechanisms through which files are created should
include an interface for overriding any of the above settings on a
one-time, per-task basis.

* Make the application binding policy user-configurable.  Include
preset configurations that correspond to the traditional Mac OS and
Windows policies, and allow custom policies to be created by the user
that reference (or ignore) metadata of the user's choosing when
determining which application a file will open in and which icon will
be displayed for that file in the Finder.

The classic Mac OS application binding policy gives precedence to the
creator code (see Mac OS 9 for an example).  The traditional Windows
application binding policy is based on the file type (as encoded in
the file name).  The Mac OS X 10.1.1 application binding policy is
outlined in the Mac OS X System Overview document.  Those three
presets should cover most users' preferences, but further
configuration should be possible to ensure that every user can create
a work environment that suits his or her needs.

* Change any Apple applications or system services that currently
unnecessarily depend on the presence of file name extensions to also
understand file type metadata stored in locations other than the file
name.

* Allow the entire file name, including any extension, to be visible
and editable at all times without any warnings.  See Mac OS 9 for an
example of how this should work.

* Do not deprecate classic Mac OS file identification and tracking
abstractions until there is an adequate replacement that duplicates
and/or surpasses their abilities.

* Maintain HFS+ as the preferred volume format for Mac OS X.  It
provides the best compatibility with classic Mac OS, and can natively
support the most metadata.

---

Note that many of the changes proposed above (and below) specify
*abilities*, not necessarily *default behaviors.*  These changes do
not preclude future versions of Mac OS X shipping with default
behaviors that closely correspond to Mac OS X 10.1.1 as it exists
today: file name extensions could still be appended by default;
"smart" file name extension hiding could still be enabled by default;
Finder warnings about file name extensions could still exist by
default; application binding could still follow the 10.1.1 policy by
default; and so on.

Retaining all of those defaults is not necessarily our recommendation,
but rather is meant to highlight how little impact the changes could
have on the interoperability benefits created by OS X to-date, while
simultaneously enabling users to take back control of their systems,
if they so choose.

But enabling Mac users to set a handful of preferences and restore the
working environment that they are used to should not be seen as a
"temporary concession" towards "the old ways."  Instead, it should be
seen as a good-faith gesture towards Apple's core customers meant to
at least maintain the user experience quality achieved in classic Mac
OS, while buying Apple time to come up with something even better.

During this transitional period, the policy defaults are not that
important (again, they could conceivably be almost identical to those
in Mac OS X 10.1.1), *provided* the long-term goal (outlined in the
"philosophy" section above) is clearly articulated to developers and
users.

PHASE 2: ENHANCEMENTS

The "enhancements" phase builds on the "emergency repairs" phase, but
it still does not require any radical rethinking or the creation of
new standards.  It requires substantial new code and user interface
enhancements, but all the changes are based on known technologies and
standards like HFS+, MIME, type/creator codes, etc.

---

* Add a "metadata services" framework to the OS for converting to and
from various metadata representations (e.g. MIME, type/creator, file
name extensions).  This includes appending the appropriate file name
extensions based on other file type metadata (or even the file
contents), and removing file name extensions and setting other
metadata as appropriate.  These APIs would reference a customizable,
per-user metadata representation mapping table much like an expanded
version of Mac OS 9's "File Mapping" table in the Internet Control
Panel.

* Enhance the standard open/save dialog boxes to leverage the new
metadata services framework by providing both system-wide and
per-application preferences for each user to control which metadata
representations are used when files are saved.  (Note that this is an
extension of one of the "emergency repair" changes, refactored and
enhanced in terms of the new metadata services framework.)

* Add menu commands and context menus to the Finder (and any other
appropriate applications or services) that use the new metadata
services framework to allow the user to convert selected files (or
folders full of files, or entire volumes full of files) to and from
different metadata representations.

* Provide a simple interface to pieces of metadata that may, in rare
circumstances, have to be manually corrected by the user.  (This
interface should be built on the metadata services framework, of
course.)  At no time should the "raw" metadata values be exposed to
the user in this interface.

An example of such an interface would allow access to file type
metadata in an "Advanced" tab of the "Get Info" panel in the Finder. 
The file type would be chosen and displayed using human-readable text
such as "Microsoft Word Document."  The raw format of the file type
metadata (e.g. a 32-bit HFS/HFS+ type code) should never be seen by
the user.

* Officially recommend that application developers leverage the
metadata framework as they feel is appropriate for their application. 
Examples:

A compression program could (optionally, or by default) warn users
that files with a particular metadata representation (e.g. file type
stored someplace other than encoded in the file name) compressed in an
archive may not be readable if they are uncompressed on a foreign
platform.

The Finder could (optionally, or by default) warn users when copying
files with a particular metadata representation to a volume whose
metadata abilities are either unknown (or known to be limited), or
when files are transferred via a protocol that does not support Mac OS
X's native set of metadata.

Files that arrive on the system via a web browser or any other
network service or disk could (optionally, or by default) be brought
up to "native" metadata standards by extrapolating and filling in any
missing metadata according to the per-user mapping tables described
above.

* Extend, enhance, or create robust, high-performance APIs for file
identification and tracking.

---

The "enhancement" phase brings Mac OS X closer to the goals outlined
at the start of this document.  It does not rest on the laurels of the
"emergency repairs" phase, thinking that they had "satisfied" the
desires of the "old timers."  Instead, it recognizes that those
changes were just the start of a journey towards a new destination
(rather then a side trip on the way towards the lowest common
denominator).

PHASE 3: RADICAL CHANGES

This phase requires substantial work and cooperation with the rest of
the industry.  In order for this phase to succeed, work should be
concurrent with the other phases.  By completing this phase, Apple
will once again have established itself as a leader and an innovator
in the computer industry.

---

* Create a new volume format that supports arbitrarily extensible
metadata and robust data integrity (e.g. journaling), while providing
extremely high performance.

This new volume format will be the foundation for future metadata
initiatives.  It will provide Mac OS X with the ability to truly
handle a superset of all metadata found on foreign systems.  Combined
with the changes in phase 1 and 2, it will enable Mac OS X to be the
"skeleton key" of data formats, able to understand, store, and
translate files created on any other file system or platform.

* Lead, initiate, or participate in the creation of open standards for
file metadata representation.  MIME was a start, but it is limited.
Robust, hierarchical, extensible, standardized, and above all, *open*
standards for file metadata representation are necessary for the
future of computing.  Apple should work with the rest of the industry
on this problem.

FireWire is one example of a successful *open* technology created by
Apple that has found a place in the market and improved the products
we use.  If Apple can do it once, it can do it (even better) again.

This new metadata standard should include both a standard taxonomy for
basic attributes like file type, name, and dates, as well as an
extension mechanism for domain- and vendor-specific attributes.  There
are many similarities to the development of XML and its various
namespace mechanisms, schema standards, domain-specific DTDs, and so
on.  While a new metadata standard does not necessarily have to be
based on XML, the development model XML has followed is a good guide.

* Transition Mac OS X's "native" metadata representation to one based
on the new volume format and new metadata standards.  Deprecate
HFS/HFS+, type/creator codes, and other vestiges of classic Mac OS
(while providing easy translation to and from that representation via
the metadata services framework, of course).

---

ADDITIONAL NOTES

DARWIN/UNIX

All of the above implicitly applies to "Mac OS X applications",
meaning GUI applications based on Carbon, Cocoa, Java, or similar
high-level frameworks that make up "Mac OS."  The Unix side of things
should be addressed differently, with an awareness of (and respect
for) the history, conventions, and customers of that environment.  The
primary strength of the Unix layer is its compatibility with other
Unix-flavored OSes, and this must not be compromised unnecessarily.

Command-line tools that build on the metadata services framework and
other higher-level APIs are the appropriate level of integration for
the Unix layer.  Current examples include the "defaults" program, and
the "SetFile"/"GetFileinfo" commands.  They provide integration for
the Unix environment without breaking any of its conventions.

Further "additive" integration is possible through extension modules
(e.g. apache's "mod_hfs") and even new options to basic commands (e.g.
new flags to the "ls" command that list Mac OS X native metadata), but
it is not necessary or desirable to try to fully integrate all the Mac
OS X guidelines described above into decades worth of Unix software.

The whole Darwin layer should be treated as a separate, less
abstracted OS of its own--which it is, after all.  Its user interface
should not influence, or be influenced by, the guidelines that apply
to the "Mac OS layer."

METADATA REPRESENTATION CONVERSIONS

The process of converting from one "metadata representation" to
another deserves some clarification.  The most basic premise of such
conversions is that the "baseline" representation must contain a
minimum complement of native Mac OS X metadata.

For example, imagine a file named "resume.doc" arriving on a Mac OS X
system with no metadata other than the type information encoded in its
file name (i.e. ".doc").  That file can be "promoted" to the Mac OS X
metadata baseline by adding the appropriate complement of metadata
(via the metadata framework, and according to the user-configurable
metadata representation mapping tables).

(In phases 1 and 2, "the appropriate complement of metadata" would be
HFS/HFS+ type and creator codes.  In phase 3, it would be defined by
the new metadata standards created therein.)

Note that such a "promotion" does *not* necessarily imply the
*removal* of any metadata, including the file type encoded in the file
name in the form of a filename extension.  Once a file is "promoted",
the file name is safely back in the user domain and does not have to
be modified in any way by the system.  (Such functionality should
exist in the metadata framework, however, and should be available if
the user requests it.)

Going in the other direction, "demoting" a file to a more primitive
metadata representation must not remove the baseline complement of Mac
OS X metadata.

For example, imagine a Mac OS X Word document named "My Resume" that
needs to be shared with other platforms.  The most likely metadata
representation conversion requires the encoding of the file type
information in the file name (e.g. by appending a ".doc" extension).
If the file continues to exist on a (possibly shared) Mac OS X disk
(HFS/HFS+, or even UFS with its "._" metadata files), the Mac OS X
metadata should remain intact even after the addition of the ".doc"
file name extension.

Again, the metadata representation conversion process adheres to the
basic philosophy covered at the top of this document.  Mac OS X should
achieve cross-platform compatibility and a superior local user
experience by maintaining a large, advanced complement of metadata
that represents a superset of that found elsewhere.  This collection
of metadata should be created and preserved whenever possible.

Preparing files for survival on other platforms does not mean removing
Mac OS X's rich set of metadata.  It merely means redundantly encoding
file metadata as necessary to ensure that a file is usable if its
metadata collection is necessarily pared down by a transfer to a more
limited environment.

Note that the conversion techniques described above are not
constrained to any particular events.  Metadata representations may be
chosen and/or translated at any time, including (but not limited to)
when a file is saved by a Mac OS X application or when a file arrives
on or leaves a Mac OS X system.  Maintaining Mac OS X's baseline
complement of rich metadata is the important part, not choosing which
metadata representation is chosen at any given time.  The latter
should be controlled by the user.

METADATA DETERMINATIONS

In the world of Mac OS X, "Mac OS X metadata" rules.  Take file type
as an example.  A file's type is determined by looking at Mac OS X's
native file type metadata (the HFS/HFS+ type code in phases 1 & 2,
something else in phase 3).  If there is no Mac OS X file type
metadata, the file type is determined by a cascade of hints and clues,
all of which are secondary to the Mac OS X metadata that was missing.

For example, file type information encoded in the file name may be
checked next, triggering a subsequent look-up in the metadata mapping
tables to determine what Mac OS X native file type is indicated by a
".qux" extension, if any.  As a last resort, the file contents
themselves may be examined using /etc/magic-style byte ranges and
values or some other more advanced system.

In all cases, the outcome of a file type determination process is a
Mac OS X file type (again, an HFS/HFS+ type code in phases 1 & 2,
something else in phase 3).

This clear prioritization of "Mac OS X metadata" over the redundant
representations required for cross-platform compatibility is central
to this proposal.  It means that "file type" and "file name extension"
are two different things, for example.  "File type" is determined by
the process described above.  It *may* be arrived at by looking at a
file name extension and then looking up the corresponding file type in
the user's metadata mapping tables, but there's no guarantee that the
file name extension will have any bearing on the file type.  If proper
Mac OS X file type metadata exists, the file name extension is just
some characters at the end of the file name.  Remember, the file name
should be in the user's domain.

Keeping the file name in the user domain requires actually assigning
proper Mac OS X metadata to "foreign" files as soon as the proper
native metadata can be derived from the information available.  Again,
this does not necessarily imply the removal of "foreign" metadata such
as file name extensions.

Also note that this prioritization of proper Mac OS X metadata over
the various foreign representations does not mean that these foreign
representations cannot be referenced by Mac OS X policies.  For
example, a user may choose a "Windows style" application binding
policy which is based entirely on the file name extension.  As far as
Mac OS X is concerned, this policy is based entirely on the file
*name* (which is, after all, a proper piece of Mac OS X metadata).

File metadata that simply does not exist in a "foreign" file in any
form may be added in order to aid in classic Mac OS compatibility
and/or the application binding process.  For example, files from other
platform rarely provide any information about which application
created them.  But Mac OS X may, at the user's discretion, add creator
metadata as per the metadata mapping tables.

Metadata determination strategy summary:

* Try to understand every possible metadata representation.

* Prefer proper Mac OS X metadata over all other representations.

* "Promote" files to proper Mac OS X metadata when possible.

* The outcome of all metadata determinations in Mac OS X should be a
piece of proper Mac OS X metadata, regardless of which pieces of
information contributed to the determination.

THE EXISTENCE OF METADATA VERSUS THE POLICIES BASED ON IT

The distinction between the existence of metadata and the policies
based on it is a very important concept.  "Promoting" a file to a
proper set of Mac OS X metadata should never be seen as a "harmful"
process.  Adding file creator metadata, for example, is sometimes
considered harmful in Mac OS X as it exists today due to the
application binding policy that prioritizes creator metadata over all
else.

Since the overriding philosophy of Mac OS X metadata should be that
"more metadata is better", any situation in which the existence of
metadata is considered harmful to the user experience must be dealt
with by allowing the *policy* that references the "harmful" piece of
metadata to be changed, *not* by recommending the removal of the piece
of metadata.

Making the application binding process user-configurable is the best
example of this value system in action.  Remember, the more metadata
that is available, the richer the possible interactions with the data
can be.  And while it is trivial to ignore metadata, it is impossible
to reference it once it is gone.

THE MAC OS X METADATA USER EXPERIENCE

The goal of the user experience is to allow Mac users to think in
terms of a "vocabulary" of file metadata defined by Apple. Every time
a computer user mentions a "dot-pee-ess-dee file" or a "dot-tee-ecks-
tee" file, the user experience vocabulary of the Windows platform
(something that is not under Apple's control) is reinforced.

Before the advent of Mac OS X, Apple largely succeeded in defining its
own user experience vocabulary for the Mac platform that was much less
obscure than that of the PC world--more "user friendly", as it was
known.  The ultimate "friendly vocabulary" was the GUI itself.  But
even when the GUI became commonplace, the Mac user expereince still
enjoyed significant advantages due to its more sensible and friendly
file management vocabulary.

Mac OS X has reversed that trend significantly.  Every mention of
"Mail-dot-app" and "dot-pee-list files" erodes more of the Mac
platform's historic user-friendliness.

By focusing again on defining a consistent and powerful vocabulary for
file metadata in Mac OS X that is more friendly that found on other
platforms, Apple can regain its leadership position in this area.

By endeavoring to understand and handle the metadata vocabularies of
other platforms, Apple can bring the Mac platform beyond its former
ease of use by providing its users with the power to deal with any
file that comes their way--and, more importantly, to quickly and
easily bring it into their preferred vocabulary of the Mac.

A Mac OS X user should feel secure in the knowledge than any file he
encounters can be dealt with using the simple and sensible vocabulary
defined in Mac OS X, regardless of the (possibly obscure) vocabulary
of the originating system.  Similarly, a Mac OS X user should feel
confident that the operating system, applications, and his own actions
will work together to ensure that his files will survive happily on
other platforms, regardless of his knowledge of the "vocabularies" of
those systems.

CONCLUSION

The road to a better future for metadata in Mac OS X is long, but
staying focused on the correct destination is half the battle.  Keep
your eyes on the prize, as they say.  The prize is a superior user
experience on the Mac.  That is the number one goal.  As I hope was
made clear in this proposal, this does not preclude vastly improved
cross-platform interoperability.

Chasing standards that others in the industry have been stuck with for
decades and are desperately trying to transition away from is not a
formula for success.  Apple must innovate its way to a better
tomorrow.

Furthermore, the rest of the industry does not stand still.  In order
to provide customers with compelling reasons to keep buying Macs,
Apple must continue to improve its user experience over time.

Given Apple's market share, compromises are sometimes necessary to
maintain acceptable levels of interoperability.  But in cases where
alternate solutions provide the same interoperability improvements
without sacrificing the favorable aspects of the Mac user experience,
Apple must do everything in its power to implement them as such.  Any
part of the Mac OS user experience that duplicates the experience on
another platform ceases to be a compelling reason to buy a Mac.