Sun - December 24, 2006

Effective Dylan: 50 Specific Ways Dylan is Easier to Use Than C++


A point-by-point comparison between Dylan and the fifty items described in Scott Meyers' Effective C++.

I’ve been meaning to do this for a while. Here I go. The following is a brief† explanation of how Dylan compares against all fifty items listed in Scott Meyers’ Effective C++: 50 Specific Ways to Improve Your Programs and Designs††. Much as I like this book and find it useful for programming in C++, the fact is that a lot of what’s in it (and other books on C++) is about the unnecessarily hard parts of C++. Dylan generally provides a simpler and more productive (and enjoyable!) approach to programming.

†I wrote that before writing the bulk of this. Since there are, after all, fifty points, the overall result isn’t very short. However, my response to each item is relatively brief and may leave out a number of details that would be worth expanding on in the future.

††Note that I used the first edition of Scott’s book, although that link above points at the second edition on Amazon. At some point I’ll go through the second edition and update my comments if necessary.


Shifting from C to C++

This section is very specific to C vs. C++, so there isn’t a direct comparison in Dylan, but I’ll add some comments anyway.

1. Use const and inline instead of #define. Dylan has no preprocessor or #define. It has constants and inline functions.

2. Prefer iostream.h to stdio.h. Dylan has stream-based I/O libraries.

3. Use new and delete instead of malloc and free. Dylan has automatic memory management. You do not need to call functions to allocate and deallocate memory.

4. Prefer C++-style comments. Dylan supports both styles of C++ comments (/* */ and //), and common Dylan programming style agrees with Meyers.


Memory Management

Most of the items in this section are non-issues in Dylan, which has automatic memory management. There are no new and delete to use incorrectly.

5. Use the same form in corresponding calls to new and delete. An impossible error to make in Dylan. If you create a collection object, every object in it will be automatically deallocated as appropriate when the collection is deallocated.

6. Call delete on pointer members in destructors. Another detail that’s impossible to overlook in Dylan. All objects are properly deallocated when no longer referenced.

7. Check the return value of new. Dylan’s equivalent to new, “make,” signals an exception if object allocation or initialization fails. It is not possible to unintentionally overlook a failure.

8. Adhere to convention when writing new. There is no new.

9. Avoid hiding the global new. There is no global new.

10. Write delete if you write new. There is no…rule number six.


Constructors, Destructors, and Assignment Operators

These issues are dramatically simpler in Dylan. Automatic memory management eliminates the need to define constructors and destructors when the only resource being managed is memory. Initialization is much simpler in Dylan and in most cases can be defined in the class definition without a separate initialization function. There is no assignment operator function, and, in contrast to C++, Dylan doesn’t copy objects implicitly, and explicit copying is rare.

11. Define a copy constructor and an assignment operator for classes with dynamically allocated memory. Although you can define the equivalent of copy constructors when desired in Dylan, automatic memory management makes it unnecessary to worry about writing copying code merely to properly manage memory. There is no copying assignment operator in Dylan, and no implicit copying of objects, so it is rare that you need to write copying code; you do not need to do so simply to satisfy language semantics as in C++, you only need to do so when you wish to support explicit copying.

12. Prefer initialization to assignment in constructors. This actually applies to Dylan, although there’s less of a chance a Dylan programmer will make a mistake here. For most cases, object initialization can be expressed just once, in the class definition, unlike C++ initialization clauses, which cannot be used in as many cases and which have to be diligently written for every constructor. Dylan only requires an explicit initialization function for special cases, e.g., when multiple initial values have interdependencies.

13. List members in an initialization list in the order in which they are declared. A non-issue in Dylan, where initial values are given in the definition of each member (called “slots” in Dylan), so there is no way to get the order wrong.

14. Make destructors virtual in base classes. A non-issue in Dylan, where all functions are implicitly “virtual” (besides, Dylan does not have destructors in the C++ sense.) Note that this does not mean all function calls incur dispatching overhead, a cost many C++ programmers are very conscious of. If a given call site doesn’t require dynamic dispatch, it automatically becomes a simple function call, just as with non-virtual C++ member functions, and may even be inlined as appropriate. There is no need (or means) to explicitly declare “virtual” and “non-virtual” functions, and no class definition changes are ever necessary if program requirements later change.

15. Have operator= return a reference to *this. There is no assignment operator in Dylan.

16. Assign to all data members in operator=. There is still no assignment operator in Dylan.

17. Check for assignment to self in operator=. Well, there was one, but the cat’s eaten it.

(Someone asked me for further clarification: There is an “assignment operator” in Dylan, of course. It looks just like Pascal’s “:=”. However, it is not a function. It is not equivalent to C++’s operator=, which performs copying, and there is no need to implement one for your classes as in C++ simply to avoid problems with C++’s pervasive implicit copying.)


Classes and Functions: Design and Declaration

Meyers’ has some good things to say about class and function design that, broadly speaking, apply as much to Dylan as they do to any other language. However, a lot of these items are rendered simpler—or in some cases, trivial—to deal with in Dylan.

18. Strive for class interfaces that are complete and minimal. Okay, Scott, will do. In fact, in Dylan, class definitions are usually much simpler than in C++, only including definitions of slots (“data members”) and (implicitly) their accessors—an approach Scott recommends for C++, but which is made more difficult to adhere to than in Dylan because of the limitations of non-member functions.

19. Differentiate among member functions, global functions, and friend functions. Dylan has a simpler, more orthogonal programming model in which there is essentially only one kind of function (called a “generic function”), which is polymorphic, similar to a C++ virtual member function. Generic functions do not “belong” to classes, and there is no real distinction between “member” and “non-member”, or “friend” functions, making it simpler to evolve programs with fewer and more localized source changes. You use explicit modules (aka “namespaces”) to control access and encapsulation instead of implicit class namespaces and public, private, protected, and friend declarations.

20. Avoid data members in the public interface. In Dylan, accessor functions are automatically defined for all slots. In fact, there is no way to explicitly access a slot “directly” as in C++. All slot access is via accessor functions (calls to which are often optimized away in practice). In C++, you have to explicitly write your own accessors, complicating the code and increasing the likelihood that someone will avoid that work and make a data member public when they shouldn’t, and thereby make the code harder to maintain and evolve.

21. Use const whenever possible. Dylan doesn’t have const, which isn’t as clearly necessary as in C++. One reason for this is that, just as functions can accept multiple arguments, in Dylan they can also return multiple values, so you never need to design functions that take “output” pointer/reference arguments, and so there’s less need to differentiate them from “input” arguments. Dylan functions also tend to be functional rather than mutating (ie., they are implicitly const). Although Dylan explicitly supports imperative programming, most standard library functions do not alter their arguments, and so const-like declarations would often be redundant. (This is not to say there would be no value in having something like const in Dylan, but it isn’t as much of a clear win as in C++.)

22. Pass and return objects by reference instead of by value. Dylan always uses reference semantics. There is no direct way to pass by value, although you can explicitly copy an object then pass the copy along.

23. Don’t try to return a reference when you must return an object. See #22. This simply isn’t an issue in Dylan.

24. Choose carefully between function overloading and parameter defaulting. This applies only indirectly to Dylan. First, it’s important to note that Dylan does not have function overloading in the same sense as C++. Second, Meyers recommends overloading when there is no reasonable default value for an argument, but in Dylan it is always possible to define a default value (using type-unions, which I won’t go into here), and furthermore, it is possible to tell whether an optional argument was passed to a Dylan function, obviating the need for a default value if avoiding one is desired.

25. Avoid overloading on a pointer and a numerical type. In Dylan there are no raw pointers and therefore no null pointer, but more to the point, Dylan is more type-safe than C++ and there is no way to confuse a zero with a null pointer, or in fact any number with any value that is not a number. In stark contrast to C++, Dylan’s false, zero, and null character ('\0') are of distinct types, and there is no implicit casting between them as in C++.

26. Guard against potential ambiguity. There is no implicit casting and no ambiguity between construction and conversion; you have to explicitly invoke the desired function. Generic functions do not “belong” to classes in the same sense that C++ member functions belong to their classes, and so the potential ambiguity between two inherited member functions with the same name doesn’t have a direct analogue in Dylan. The closest match is name collisions in modules (again, aka “namespaces”); similar to C++, you do need to resolve any collisions that occur during importing, by excluding or renaming one or more of the offending names.

27. Explicitly disallow use of implicitly generated member functions you don’t want. The example Meyers uses is the default assignment operator. Dylan has no assignment operator, so that specifically is a non-issue. More generally, Dylan doesn’t automatically define any functions on your behalf that you would need to suppress (any undesirable behaviors are up to you to implement yourself).

28. Use structs to partition the global namespace. Here Meyers is suggesting a short-term workaround for the lack of namespaces in C++. Now that modern C++ compilers support namespaces, this is unnecessary. As mentioned above, Dylan has namespaces (called “modules”).


Classes and Functions: Implementation

Some of these points discuss design issues of a general nature that tend to apply across languages, but others highlight where C++ is more complex than one might desire, where Dylan provides a simpler and more enjoyable programming experience in contrast. In particular, Dylan has no header files and instead relies upon whole-program analysis, where the compiler can see all the source code for a program or library and isn’t constrained by a monolithic, compilation-order-dependent mechanism like C++’s translation unit.

29. Avoid returning “handles” to internal data from const member functions. As a general point, this applies to Dylan, and I’d modify this to the more broad, “avoid exposing implementation information through interfaces.” Note, however, that Dylan has no raw pointers, and that all slots are implicitly accessed via functions, so less raw data is exposed by default.

30. Avoid member functions that return pointers or references to members less accessible than themselves. This applies to Dylan in the same general way that #29 does, though, again, there is no way to return a raw pointer or reference to a member, making this (at least slightly) less of an issue.

31. Never return a reference to a local object or a dereferenced pointer initialized by new within the function. You can blatantly ignore this in Dylan, where it’s perfectly safe to return any object, since you’re always using heap semantics (ie., there is no way to return a reference a stack object). Note that I said heap semantics. Dylan programs are written as though objects are always allocated on the heap, but objects may be allocated on the stack if there are no external references to them. Dylan treats the location of an object as an implementation detail, and allocating on the stack is an optimization. Automatic memory management eliminates the problems seen in C++ due to incorrect references or omitted deletion.

32. Use enums for integral class constants. The opposite is true in Dylan: Go ahead and use constants, that’s what they’re for. The Dylan compilation model doesn’t use headers. Instead, the compiler simply looks at the definition of a constant regardless of where it is in the sources to find its value. So, Dylan doesn’t have the same scoping and compilation order problems that C++ does.

33. Use inlining judiciously. [Curly voice] Why, soitenly. In fact, Dylan compilers make judicious use of inlining on your behalf in many cases, and, again, Dylan doesn’t use headers, so you don’t have to place function bodies in headers to get them inlined. The compiler can inline any function in your program as appropriate, making inlining easier to manage with fewer changes to source code. Dylan compilers are also expected to perform certain kinds of inlining and partial inlining that you might not expect of a C++ compiler (or that might not be possible due to the language semantics and compilation model).

34. Minimize compilation dependencies between files. Of course, reducing dependencies is a good thing, but again, since there are no headers, the task is made much simpler in Dylan. No implementation details are explicitly exposed in header files. Only those dependencies that actually occur in your programs need cause recompiles, and only the affected functions need to be processed, rather than recompiling everything in a translation unit that directly or indirectly may depend upon a header file, as in C++.


Inheritance and Object-Oriented Design

Some of these items are general and apply to Dylan, but others are, again, non-issues in Dylan, or in fact the opposite of common Dylan practice. C++ imposes some design constraints that Dylan programs are not subject to.

35. Make sure public inheritance models “isa.” Dylan doesn’t have private inheritance, so you can ignore some of what Meyers has to say here, but the general point about inheritance modeling “isa” still basically applies to Dylan (and in fact, to all OO languages).

36. Differentiate between inheritance of interface and inheritance of implementation. A good idea in Dylan, too. Meyers goes on at length on this topic, and the summary is that Dylan is a bit simpler here because there are no non-virtual member functions, nor pure virtual member functions. To define an abstract class in Dylan, you do so explicitly with the adjective “abstract” in the class definition; there is no need to define a pure virtual member function merely to implicitly make a class uninstantiable.

37. Never redefine an inherited nonvirtual function. An impossibility in Dylan, as all functions use “virtual” semantics. This makes it simpler to modify and maintain Dylan programs without breaking client code.

38. Never redefine an inherited default parameter value. In Dylan it is perfectly fine to do so, whereas C++ semantics make this problematic. Subclasses can impose further restrictions and therefore it is desirable to alter default initialization values, and Dylan semantics make this trivial to do.

39. Avoid casts down the inheritance hierarchy. Because it is a dynamic language, Dylan does not have or need up- or down-casting. Furthermore, it does not have raw pointers or non-virtual functions, eliminating most of the issues raised by Meyers here. As he suggests, though, you should avoid relying upon the exact type of an object and instead provide one or more polymorphic functions that provide the desired tests for object attributes (in this case, the example is testing an object’s class, but this rule can be applied more generally).

40. Model “has-a” or “is-implemented-in-terms-of” through layering. Generally applicable to Dylan, too.

41. Use private inheritance judiciously. There is no private inheritance in Dylan, so this is a non-issue. More specifically, Meyers points out some cases where C++ requires using private inheritance instead of the more desirable layering approach. In Dylan, these cases simply do not occur, and layering is always applicable.

42. Differentiate between inheritance and templates. Dylan directly supports generic programming without the use of a specialized syntax like C++ templates, and it is considered good form to make the difference transparent. Most Dylan programming is generic programming to at least some degree, making it easier to reuse code. Dylan also supports homogenous and heterogeneous containers. In fact, unlike in C++, neither of these features requires generating duplicate code (although this can be done as an optimization, as a form of inlining).

43. Use multiple inheritance judiciously. Taking the topic title literally, it also applies to Dylan. However, MI in C++ has a number of complications that MI in Dylan simply doesn’t suffer from, making MI use easier and more common. Mixin inheritance is a common idiom in Dylan.

44. Say what you mean; understand what you’re saying. This is sort of a summary of some of Meyers’ previous points. I won’t bother going into detail. If you read his book and my comments, it should be straightforward to see how his points apply to Dylan.


Miscellany

These are general points, for the most part, and apply to Dylan, although the details differ.

45. Know what functions C++ silently writes and calls. Speaking very, very broadly, this is good advice for Dylan, too. However, in stark contrast to C++, Dylan writes and calls very little code implicitly, and any implicit code does the right thing (whereas, for example, in C++, if you have any pointer data members the default destructor will not call delete on them). In fact, Dylan really only generates accessor functions for slots, and they’re trivial. Default methods on make() and initialize(), which create and initialize objects, do the right thing by default. There is no copying assignment operator, copy constructor, or address-of operators as generated in C++.

46. Prefer compile-time and link-time errors to runtime errors. The same is true of Dylan, although unlike C++ you are guaranteed to get runtime errors (exceptions) instead of silent failures for things that can’t be checked at compile- or link-time.

47. Ensure that global objects are initialized before they’re used. This is guaranteed in Dylan. All globals (and locals, too) must have an explicit initial value. There is no way to forget to initialize one. Furthermore, Dylan makes it easier to specify initial values, and it guarantees that globals with dependencies will be initialized in the correct order.

48. Pay attention to compiler warnings. A good idea no matter what language you’re using.

49. Plan for coming language features. Most of Meyers’ points here are very specific to C++. I’ll just point out that one of Dylan’s greatest strengths is that it makes it easier to change things.

50. Read the ARM. Read the Dylan Reference Manual. It’s shorter and simpler than the ARM. To be fair, a lot of the “motivation” discussion in the ARM won’t be found in the DRM. Instead, look to other sources, such as info-dylan email and comp.lang.dylan Usenet archives, as well as Dylan Programming.

Posted at 05:00 AM    

Sun - August 6, 2006

A Handy CSS Debugging Snippet



I use the following bit of CSS to help visualize the structure of an XHTML (or HTML) document by putting a colored outline around the border of every element. At each level in the hierarchy the color changes so you can see when “depth” changes.

  * { outline: 2px dotted red }
  * * { outline: 2px dotted green }
  * * * { outline: 2px dotted orange }
  * * * * { outline: 2px dotted blue }
  * * * * * { outline: 1px solid red }
  * * * * * * { outline: 1px solid green }
  * * * * * * * { outline: 1px solid orange }
  * * * * * * * * { outline: 1px solid blue }

I usually keep this block of rules at the top of a stylesheet, commented out with /*…*/, which I remove when I want to see the structure.

Posted at 09:50 AM    

Tue - October 11, 2005

Generic Programming is Simpler in Dylan vs. C++


Dylan programs are usually clearer and easier to understand than their C++ equivalents, especially when writing generic code.

The following C++ examples are from Thomas Themel’s blog. He has some interesting things to say about programming languages there, but I'll let you read what he has to say. I’m just using his code as a starting point for some of my own ramblings.

The original C++ example:

  void exchange(map<string, int>& ref, int id1, int id2)
  {
      if (ref.find(id1) != ref.end() &&
          ref.find(id2) != ref.end())
      {
          string tmp = map[id1];
          map[id1] = map[id2];
          map[id2] = tmp;
      }
  }

The C++ template version, which is more general:

  template <typename _id_t, typename _mem_t>
  void exchange(map<_id_t, _mem_t>& ref, const _id_t& id1, const _id_t& id2)
  {
      typename map<_id_t, _mem_t>::iterator it1 = ref.find(id1);
      typename map<_id_t, _mem_t>::iterator end = ref.end();
  
      if (it1 != end)
      {
          typename map<_id_t, _mem_t>::iterator it2 = ref.find(id1);
          if (it2 != end)
          {
              _mem_t tmp = it2->second;
              it2->second = it1->second;
              it1->second = tmp;
          }
      }
  }

One of the problems with C++ is that if you want to write general, flexible code, it can get pretty hairy, making it more difficult to understand—or to write the code correctly in the first place. Similar Dylan code is usually much clearer and succinct.

A simple Dylan rendition:

  define method exchange (c :: <collection>, key1, key2) => ()
    let temp1 = c[key1];
    let temp2 = c[key2];
    c[key1] := temp2;
    c[key2] := temp1;
  end method exchange;

Perhaps the most important thing to note here is that this Dylan rendition is equivalent to the template version, because it can work with arguments of several different types, yet, like the original, non-template C++ code, it remains clear and easy to read, where C++ template syntax can get quite verbose and obscure.

In fact, it is more general than the template version, because it works with any collection, not just maps. If you wanted to limit it to “maps”, you would just change “<collection>” to “<table>”, the Dylan hash-table class.

Similarly, because we didn’t restrict the types of key1 and key2, they can be any type at all. So this function can work with arrays, hash-tables, linked-lists—any collection type that can be accessed with the “[]” operator—and they can contain elements of any type, without the consequent verboseness you’re likely to find in an equivalent C++ template.

Now, you may have noticed that this isn’t exactly a direct translation from the original: Unlike the C++ code, this signals an error (“throws an exception”) if key1 or key2 aren’t found. I don’t understand why the original version silently ignores invalid arguments. Seems like a bad idea to me. But, this is a good opportunity to discuss how to handle invalid keys in Dylan.

First, here’s a more direct translation of the original. It only accepts hash-tables, and it checks whether the keys are valid before using them, so invalid keys are silently ignored:

  define method exchange (t :: <table>, key1, key2) => ()
    if (key-exists?(t, key1) & key-exists?(t, key2))
      let temp = t[key1];
      t[key1] := t[key2];
      t[key2] := temp;
    end if;
  end method exchange;

Again, this is just as general as the C++ template function, but without the complicated syntax. In Dylan, all code is effectively template code, it’s just a matter of degree. The more tightly you define the types used in your programs, the less generic—and the less template-like—your programs are. In that vein, for the rest of this discussion I’ll go back to using <collection> instead of <table>.

Explicitly checking whether the keys are valid could be a waste of effort if the keys are valid most of the time. When we attempt to access the collection the keys will be tested, anyway. Instead, we can just try to use the keys and establish an exception handler that does nothing if one or both are invalid:

  define method exchange (c :: <collection>, key1, key2) => ()
    block ()
      let temp1 = c[key1];
      let temp2 = c[key2];
      c[key1] := temp2;
      c[key2] := temp1;
    exception e
    end block;
  end method exchange;

(“blockexception” is like C++’s “trycatch”. In fact, the C++ code could have used a similar approach.)

Now, Dylan actually has a more efficient, direct way to look up collection elements without the cost of testing the keys or handling an exception. Just as in C++, when you use the array access operator “[]”, you’re actually calling a function (“operator[]” in C++), and you can explicitly call that function instead of using the “x[i]” syntax. In Dylan, that function is named “element”, and when you call it explicitly you can give it additional arguments.

element accepts an optional keyword argument called “default:”, and if the key is invalid, it returns the default value instead of signaling an error. This means that the key is only tested for validity once, in element, and an invalid key produces an identifiable result instead of imposing the cost of signaling and handling an error. This is more efficient in cases where you expect invalid keys to occur often:

  define method exchange (c :: <collection>, key1, key2) => ()
    let tmp1 = element(c, key1, default: #f);
    let tmp2 = element(c, key2, default: #f);
    if (tmp1 & tmp2)
      c[key1] := tmp2;
      c[key2] := tmp1;
    end if;
  end method exchange;

Here, we’ve used “#f”, the literal for boolean false, as the default value. It’s quite common in Dylan code to use false to mean “none of the above” or “no value”, since it’s convenient to test for. Because Dylan is a type-safe language, false will never be confused with any other value, such as zero (or the empty list, as in Lisp). Dylan does not have a NULL or Nil, which are not type-safe. (In C++, you are also encouraged to use specific, type-safe values instead of NULL or 0. In fact, “NULL” is explicitly not an official part of C++, though many programmers still use it.)

Although it’s common to use false as a default value for element(), using it means that the above code isn’t appropriate if we want to allow the collection to contain false as a value. To make sure we can distinguish between “the element is #f” and “there is no element with that key”, we can construct a unique object that will never be stored in the collection:

  define constant $unfound = make(<pair>);

(The exact class doesn’t matter much, but a <pair> is probably the simplest built-in class in Dylan. It’s used to build linked lists and other data structures.)

Now, we can use it as the default value for element:

  define method exchange (c :: <collection>, key1, key2) => ()
    let tmp1 = element(c, key1, default: $unfound);
    let tmp2 = element(c, key2, default: $unfound);
    if (tmp1 ~== $unfound & tmp2 ~== $unfound)
      c[key1] := tmp2;
      c[key2] := tmp1;
    end if;
  end method exchange;

(~== is the “not the same object as” comparison operator. We’re testing whether the result of element() is the object stored in $unfound, not just whether it is equal to the object’s value.)

In fact, although this isn’t part of the language defined in the Dylan Reference Manual, $unfound is defined in a common set of extensions to the standard library, allowing us to omit the “define constant” above.

I’ve taken this opportunity to discuss several different points about Dylan and about programming in general, but I hope the take-away for you is that Dylan code can be generic while still being clear and easy to understand, where achieving the same thing in C++ may not always be possible.

(Disclaimer: I didn't compile any of the example code, but I believe it’s all correct.)
[Later, I noticed and fixed some minor typos.]
[Later later, I changed ~= to ~== and added the note that explains the operator.]

Posted at 05:38 AM    

Sat - December 25, 2004

53a50n’5 Gr33+1ng5, right back atcha


Because programmers can’t do anything in the usual way.

Scott Knaster wrote a bit of holiday code in his blog. Because true geek humor also demands a (half-)serious response, here’s my rendition, “translated” into Dylan for comparison:

let seasons-greetings =
  if (you.do-christmas?)
      #"merry-christmas"
    else
      #"happy-holidays"
  end;

let also = make( <happy-birthday>,
                 to: scott-knaster,
                 date: 26,
                 month: $december );

Posted at 11:52 PM    

Tue - December 7, 2004

Characters: How abstract should they be?


Some thoughts on whether to model characters as abstract characters or as characters in a particular character set or encoding.

Recently in the #dylan IRC channel on freenode.net there was a discussion about how to implement characters in computer programming languages as abstract characters—characters that exist outside of any particular encoding or character set, which you can map to and from character values in particular character sets and encodings—with the goal of representing all character data as abstract characters by default. Here are some of my thoughts from that discussion, lightly edited:

Every time this discussion comes up, a sticking point for me is that you have to use something as a “handle” to get to an abstract character, and that usually involves an unacceptable amount of storage overhead. The popular suggestion is to use a unique string or “symbol”, sometimes known as an “interned” or “uniqued” string.

I’ve always preferred approaches where you don’t have abstract characters. Every character is in some character set and you can map between them, and you either pick some relatively comprehensive intermediate encoding like Unicode for performing those mappings or you require pairs of mappings to be defined for combinations of character sets. In this model, character instances occupy at most a single machine word, just like an integer, and there is little or no auxiliary information about characters stored elsewhere. (By the way, mappings don’t have to be defined for every possible combination; you can chain them together to save space.)

The interface to characters is still somewhat abstract in this approach, but if there’s an intermediate encoding for mapping, you can get a speed boost by converting your character data to that encoding and using it to do most of your text processing.

At least, ten years ago it would have been unacceptable to have a symbol for every assigned Unicode code point stored on disk or in memory. Today, I am still reticent to impose an overhead of hundreds of kilobytes or potentially megabytes just to do something as common as work with text in a way that doesn’t require higher-level facilities like spelling checkers and hyphenation.

For any abstract character proposal, I want to see hard performance, size, and deliverable size numbers, as well as a comprehensive description of the target users. Making this a part of the core of a programming language means it would affect every program.

I think the overriding question to ask is, what’s the value of having truly abstract characters? Do we really need them, or is it just a conceptually pure model that has an aesthetic appeal? I don’t see much you can do with them besides test for equality and convert them to/from other character sets (you can’t compare them for sorting, you can’t convert them to/from integers, and you can’t (directly) convert digit characters to numerical values, for example).

I think we can provide a protocol that’s “abstract enough”, but uses a more concrete representation.

An issue for me is that I think characters should be more like integers in that they have very few properties that the core language has to support directly. We don’t talk about storing numerical properties for integers. Even if we support Unicode source code, character and string literals, and some Unicode character properties, we don’t necessarily need to support all of Unicode or any other non-trivial character system. Leave that to additional libraries that applications can pick and choose from.

You might say I prefer characters that are abstract in that they have little or no implementation overhead (speed or space), only the minimum of functionality necessary to support simple text processing, and that relegate most higher-level functionality to (optional) libraries. I realize that we may only be discussing where to draw that line.

It also just occurred to me that I think trying to model abstract characters is the same as trying to model abstract numbers. That’s a very high-level thing to try to do that’s probably best left to optional libraries. The size and speed efficiency of integers and floats that are close to the hardware is hard to ignore, and it’s highly desirable for characters to have the same efficiency advantages.

Posted at 05:46 AM    

Sat - November 20, 2004

Dylan Object Copying (or lack thereof)


Answering the question “How do you copy objects in Dylan?” often begins with “You don’t.”

[Update: Pete Gontier pointed out that I used the term “bitwise copying” when describing what C++ does to copy objects by default, but in fact the correct description is “member-wise copying”. This is a common terminology mistake, and the difference is important; C++ invokes a copy-constructor to copy each member and its members, recursively. Of course, when there are no explicit copy constructors to invoke, a C++ compiler could optimize it into a bitwise copy when appropriate. I've revised the text below to use “member-wise”.]

Recently there was a discussion in the #dylan IRC channel on freenode.net about copying objects in Dylan. Someone asked how to copy objects as in C++, where assignment with ‘=’ is a copying operation and when the destination is a class (or struct) instance the compiler can automatically generate a simple member-wise copy of its data members.

In short: Dylan programs copy objects much less often than typical C++ code, assignment does not copy objects, objects are only copied explicitly, and simple member-wise copying is often inappropriate.

There are four (no, three, Sir!) points to observe:

1. In Dylan, assignment to a binding (a global or local variable, or a function parameter) does not copy or construct an object, it merely makes that binding refer to the source object. Compared to C++, it is as if every variable were a pointer to a heap-allocated object, so assignment merely copies the pointer.

  let obj1 = make( <my-class> );
  let obj2 = obj1;

Here, both obj1 and obj2 refer to the same object. The ‘==’ operator tests whether two values are the same object, and returns true (#t) in this case:

  obj1 == obj2;
   #t

If we change any slots of obj1 or obj2, both bindings will see the same values:

  obj1.my-slot := 42;
  obj2.my-slot;
   42

This is analogous to the C++ code:

  my_class* obj1 = new my_class;
  my_class* obj2 = obj1;
  obj1->my_slot = 42;
  obj2->my_slot;
   42

2. Since Dylan always uses reference semantics (every binding is like a pointer to an object), there is no implicit copying of objects as there is in C++. For example, C++ copies objects to create temporaries while evaluating an expression, and when passing values in and out of functions. In contrast, objects are only copied explicitly in Dylan, and copying is performed much more rarely than in C++.

  my_class obj1;
  my_class result = some_function( obj1 );

In the above C++ example, obj1 is copied when passed to some_function() and the result of that function is copied to result, because C++ is using pass-by-value semantics here. In the Dylan version, no copying occurs:

  let obj1 = make( <my-class> );
  let result = some-function( obj1 );

The equivalent C++ code is:

  my_class *obj1 = new my_class;
  my_class *result = some_function( obj1 );

3. Copying is generally accomplished either by calling shallow-copy(), which is roughly like a C++ copy constructor, or by simply creating a new object with make() and passing in initial values taken from a source object, which is like a more general C++ constructor that takes additional parameters.

  let obj2 = shallow-copy( obj1 );
  let obj3 = make( <my-class>, foo: obj2.foo, bar: obj2.bar );

Here, we pass the foo and bar slots of obj2 to make() using keyword arguments to indicate which slots these values should be used to initialize. You could also define a custom keyword called (for example) copy-from: that just takes a source object instead of individual slot values:

  let obj4 = make( <my-class>, copy-from: obj3 );

(Example implementations are at the end of this entry.)

4. By default, shallow-copy() is defined only for collections, such as lists, vectors, strings, and arrays. In order to use it with other classes, you must define a method yourself. There is no automatic way to copy the slots of a user-defined object. A justification for this is that copying is not usually as simple as just copying the bits of every slot, so there is no reasonable default copying implementation. Copying some slots requires copying the objects they refer to, and yet other slots shouldn’t be copied at all. The same is actually true of C++, where most non-trivial classes require user-defined copy constructors or operator= member functions to handle copying correctly.

Prompted by this discussion, I did a survey of all the Dylan code in the Gwydion Dylan CVS repository in ./fundev, ./libraries, and ./src for definitions and uses of shallow-copy():

  <http://www.gwydiondylan.org/cgi-bin/viewcvs.cgi/>

As expected, there are relatively few occurrences of shallow-copy(). Also important to note is that only two or three shallow-copy() methods are effectively trivial slot copying. Furthermore, it turns out I wrote one of them (back in 1997) and looking at the code now I realize it may not even have been necessary or desirable.

So the answer to the original question is that Dylan assignment does not copy, you must explicitly copy objects using functions you provide implementations for, and in any case you’re not going to have to do that very often.


Example Implementations of Copying

Although I had originally intended to end this blog entry here, I’ve decided to provide some details that may help make some of this discussion a little more concrete for those unfamiliar with make() and shallow-copy(). Given the class definition:

  define class <my-class> (<object>)
    slot foo, init-keyword: foo:;
    slot bar, init-keyword: bar:;
  end;

we can now simply call make() with keywords foo: and bar: to copy slots from another instance as mentioned above:

  make( <my-class>, foo: obj1.foo, bar: obj1.bar )

This is very explicit, and simple to implement, but perhaps a bit verbose if we need to do this a lot. We can shorten this a bit by defining an initialize() method that implements a single copy-from: keyword that takes an object from which to copy the slots:

  define method initialize (obj :: <my-class>, #key copy-from)
    next-method(); // first, perform inherited initialization
    if (copy-from) // if supplied, copy the source object's slots
      obj.foo := copy-from.foo;
      obj.bar := copy-from.bar;
    end;
  end method;

Now we have the additional option of writing:

  make( <my-class>, copy-from: obj1 );

Another approach is to implement a shallow-copy() method, like so:

  define method shallow-copy (obj :: <my-class>)
    make( <my-class>, foo: obj.foo, bar: obj.bar )
  end method;

(Also note we could both implement copy-from: and use it to implement shallow-copy(), if desired.)

Notice that all we’ve done is wrap up the verbose call to make() in shallow-copy(). The behavior is the same whichever you call. This points out that for some cases, writing a shallow-copy() method may be unnecessary. Implementing shallow-copy() helps wrap your copying code up in a well-known copying protocol that can be used to copy objects without knowing exactly how to copy them, but sometimes that generality isn’t strictly necessary. It depends on the code in question and where the abstraction boundaries are.

Finally, notice that it’s just as easy for you to define your own copying function if shallow-copy() isn’t appropriate. For example, if you want to perform copying that is deeper, but only for certain slots:

  define method copy-in-my-own-special-way (obj :: <my-class>)
    make( <my-class>, foo: shallow-copy( obj.foo ), bar: obj.bar );
  end method;

This custom copying function copies part of the object a little more deeply, by shallow copying the obj.foo value. Remember, Dylan never copies implicitly, so obj.bar is not copied; instead, the new object’s bar slot is bound to the same object as the source object’s bar. In contrast, by calling shallow-copy() with obj.foo, we make a (shallow) copy of its value, so that foo in the new object is distinct from foo in the original object.

Posted at 12:33 AM    

Thu - November 11, 2004

A Round of Applause for OmniGraffle


A plug for OmniGraffle, the greatest graphics tool since sliced vectors.

Recently, in my work on the online Dylan Reference Manual, I needed to recreate some class hierarchy diagrams and other figures that are a part of the printed DRM, but which were never provided with the HTML version.

They tend to be structured diagrams with boxes and arrows and descriptive text, the kind of stuff that is tedious to draw by hand, but which should be a breeze with a capable piece of software that knows how to arrange and draw these patterned designs.

As it turns out, there is a wonderful tool for doing these kinds of graphics, which made my life a lot easier: OmniGraffle by The Omni Group.

It occurred to me that I should give them a plug in my blog for all the help they gave me:

Plug, plug, plug.

So, there you go.

Posted at 04:42 AM    

Mon - November 8, 2004

Online Dylan Reference Manual Redesign


I've completely redesigned the online Dylan Reference Manual, making it easier to read and to use as a convenient reference for the Dylan programming language.

It's been a while since my last Dylan-related blog entry, and partly it's because I've been spending most of my Dylan-related time for the past few weeks on heavily editing the online Dylan Reference Manual. My desire is to make sure the online DRM is an excellent reference for the Dylan language, and I think this update has made a lot of improvements (unfortunately, there are some real problems with the quality of the machine-generated HTML that make it less readable, but I've fixed a lot of them and I hope to address them all eventually).

If you're curious about the Dylan language, or you've taken a look at the online DRM before and found it lacking as a tool for learning about the language, please take a look at the new version.

Here's a copy of my commit message with the details of the changes:

Complete redesign of the online DRM, including a new navigation bar on the left, which features convenient access to interior links to items on each page (this was previously relegated to the bottom of the page, and is particularly useful for navigating longer, more complex pages).

I also used CSS to make the printed rendition of the HTML more reasonable. All the navigation items are hidden, and links are printed in plain text, without underlines or blue coloring.

While I put CSS to good use, I also tried to make the HTML produce a more reasonable, usable rendition than before, when CSS isn't supported or stylesheets are disabled.

Overall, the layout and typography is now much closer to the printed DRM, and I've reconstructed some content that was apparently lost during the automatic translation to HTML.

I made use of CSS to hide some information not in the printed DRM or that was just cluttering things up, while leaving it in the HTML in case we want to make use of it in the future. (e.g., in the navigation links, it includes tags like "[Open Generic Function]", which are no longer displayed.)

I consolidated all the HTML pages of Appendix B, Exported Names, into one page. There was no reason for it to be split up and it just made it harder to navigate.

I added "disabled" renditions of each navigation button image, so the set of buttons doesn't change on pages where one or more buttons don't apply.

Too many fixes and adjustments to tagging to mention, but this includes fixing the tagging of all the function signatures (though not all the G.F. method signatures).

In fact, this work began as merely an attempt to fix the signature tagging, but once you start making global changes to over a hundred HTML files in dire need of cleanup, it's hard to find a good stopping point. Or maybe that's just me.

Posted at 11:47 PM    

Sun - June 20, 2004

Dylan: Testing for mixin inheritance


A recommendation for good program design that avoids directly testing the type of an object.

In general, do not directly test an object's class like this:

  if (instance?( obj, <my-mixin> ))
    // mixin-specific stuff
  end;

Instead, define a predicate with a default method on <object> or some other appropriate superclass that returns false, and a method specialized on your target class that returns true:

  define method my-mixin? ( obj :: <object> ) #f end;
  define method my-mixin? ( obj :: <my-mixin> ) #t end;

This provides better encapsulation and makes it easier to change the conditions for the test, rather than relying upon simple inheritance from a particular class. In fact, try to define one or more predicates that test for specific capabilities, if possible, rather than just testing whether an object is a member of the mixin class:

  define method fooable? ( obj :: <my-mixin> ) #t end;
  define method barable? ( obj :: <my-mixin> ) #t end;

This applies to classes of all kinds, but I was reminded of this when I came across some old Dylan code I wrote (circa 1997, when I was still young and naive) that used the direct instance test to filter mixin-inheriting objects from objects of the primary class usually associated with the mixin class. I think it’s probably easier to make this kind of mistake when dealing with mixin classes, where the programmer’s attention may be focused on the classes involved (mine apparently was back then), rather than on the capabilities expressed by those classes.

Posted at 04:23 PM    

Dylan Type Declarations and Rapid Prototyping


Dylan’s type declarations are optional. Combined with a flexible, polymorphic type system, this can be a boon for real-world development cycles that include rapid prototyping and iterative refinement.

In Dylan, type declarations are optional. For example, we can write a factorial function like so:

  define method factorial (x)
    if (x < 1)
      1
    else
      x * factorial( x - 1 )
    end
  end;

Notice that the type of x is not declared, and neither is the return type of the function. If you’re used to languages like C where type declarations are required everywhere, you may be wondering what this means. Is this type unsafe? Will this function crash if we pass in, say, a string? The answer in both cases is no, but it raises some interesting questions about what it means to be “type-safe.”

Dylan is a type-safe language. By that I mean that if you call this function with any value that cannot be passed to the functions <, *, or -, an error will be signaled by one of those functions when factorial calls them. Typically, this means that you can call factorial with any type of number, including any size of integer or real. In C, this would be a problem, because factorial could only accept an integer or a floating point number. In C it would be an error to call factorial with a string or a struct regardless of whether <, *, and - are defined for those types.

In contrast, you could call the Dylan version with any type for which you define methods on <, *, and -. You could arrange for factorial to work on strings. What the result would mean, exactly, is another topic, but the function would run without signaling a type error.

What this means is that factorial has an implicit type declaration: “Any value for which <, *, and - are defined.” (To be exact, any value for which < can accept that value and the number 1, * can accept that value and the result of calling factorial, and - can accept that value and 1 — or for which < 1 returns true, in which case the other functions will never be called.)

Now, in practice we might decide early on that we only want to define factorial for numbers. That’s easy enough to do. We just change the definition of factorial to:

  define method factorial (x :: <number>)

This means two things: we can now only call factorial with a number regardless of whether the functions it calls could handle more types, and since we know that those functions handle all types of numbers this function will never signal a type error.

Finally, let’s say we later decide that we really only want to define factorial for integers, in which case we can easily change our code to:

  define method factorial (x :: <integer>)

Now, for a well-defined math function like factorial this may not seem very useful, but consider real-world programming. Typically, a program is not completely predefined down to every last statement. If it were, you’d be done; the definition would be the program. In real-world programming, we get a rough idea of what we want a program to do, we write some of it, and use what we learn from trying to code it to refine our definition. This process repeats itself, getting the code and our understanding of what we want it to do closer and closer to a finished product. At some point, this refinement segues into debugging and the focus on creating code diminishes (until somebody decides to change the requirements for our program two weeks before shipping).

This process often starts with prototyping, where even the initial definition is a bit fuzzy (even if we don’t realize it). Of course, we all know we’re supposed to throw away the prototype once we’ve gotten a better handle on the requirements and the program definition, but the reality is that code that works tends to live on (sometimes longer than it should). So, it’s important that programming languages smoothly integrate prototyping and “real” programming using iterative refinement.

One way Dylan supports this development process is by allowing you to quickly write code that works, with either more general types or no type declarations at all, then later add declarations or make them more specific as the requirements and the details of the implementation become better understood. It means that you can specify programs with whatever level of detail you have at any given moment to rapidly “get the code working,” then refine that code, transitioning it smoothly into something you can ship with less disruption to the code along the way.

Posted at 04:19 PM    

Mon - April 19, 2004

Sealing and Implementation Inheritance


Ted Neward has some interesting stuff to say about “sealing” (really, Java‘s “final” and C++’s “protected”), but he fails to mention efficiency concerns.

In Ted Neward’s blog he has some interesting stuff to say about “sealing” (really, about Java‘s “final” and C++’s “protected”), though he only talks about design and fails to mention efficiency at all. The design issues surrounding implementation inheritance are important to consider, but Dylan sealing is also about—perhaps primarily about—efficiency, not inheritance design.

Allowing subclassing at runtime or outside a compiled library imposes a runtime cost. Libraries that use sealing can be compiled more efficiently, because they can take advantage of the boundaries imposed by sealing to allow elimination of runtime dispatching (and subsequently, inlining). That explains why “sealed” is the default in Dylan.

Of course, library designers need to consider making some things open for extension by other, separately compiled libraries. From a design perspective that probably means exporting open abstract classes and only providing methods for reasonable default behaviors—ie., minimizing the use of implementation inheritance and instead focussing on interface inheritance, as Neward suggests.

Posted at 04:51 PM    

Mon - April 12, 2004

CodeCon is Vary Naz


A brief note about my impression of CodeCon.

I've been to two days of CodeCon and so far it’s a lot of fun and has a good vibe to it. It feels a bit like MacHack, although it’s more focused (only one room/track, 12:00-6:00pm each day). Most presentations are demos of working projects. You might say it’s like the entire conference is the MacHack hack show.

Posted at 12:27 AM    

SmartFriends™ U: Languages and Libraries quick postlog


A quick note about SmartFriends™ U: Languages and Libraries.

Went to SmartFriends™ U: Languages and Libraries last weekend and had a great time. Looking forward to future conferences. My talk on the Dylan programming language garnered a lot of interest in the form of an endless wave of questions, which was good and bad, as I finally had to stop answering questions and rush through the latter part of my talk.

Posted at 12:26 AM    

Syntax is everything. Syntax is nothing.


Some random thoughts about programming language syntax and development tools.

The following is a distillation of an IRC conversation I had tonight on #dylan (freenode.net) that started with a discussion of the role of syntax in programming languages. I’ve just excerpted some of my messages as a starting point. I wanted to capture some of the ideas I’ve been mulling over for probably most of my career here in my blog. At some point I’d love to find the time to expand on this and include example screen shots.

Syntax is everything.
Syntax is nothing.
Syntax is UI, and language usability is vital.
Language usability has several factors, including syntax, semantics, library availability, variety and quality, and culture, but doing something about syntax may be the easiest and at least as important and productive as improving the others.
If you’ve developed interactive GUI applications, just think of typical programming languages—which use plain text files to represent programs—as user input, model, and view.
Those really need to be separated out in any good program.
Today, we’re stuck with a very clumsy UI for most programming languages.
Instead, we should represent the program in a more computer-tractable form (eg., an abstract syntax tree) and separate out presentation and user input.
Allow for both textual and graphical presentations and editing.
Some people think of “pretty printing” and “structured” or “template editors,” but I’m talking about going way beyond those, which are still fairly primitive.

It all starts with this idea: Renaming an identifier and all its occurrences should just be a matter of changing a string in a table somewhere.
There should be no recompilation.
There should only be one change recorded in the revision control system.
It should be instantaneous.
You should immediately be warned about any conflicts with pre-existing names, and in the absence of warnings you should have confidence that the renaming was done correctly.
(Actually, you could allow name conflicts, since the computer knows which are which, but you’d want to resolve them eventually to avoid human confusion.)

If you remove a parameter from a function definition it should remove it at every call site. It could even do dependency tracking and remove any code used exclusively to generate the argument at each call site.
If you swap function parameters, it should reorder them at every call site—and warn you if it alters semantics.
If you move code it should only record one change, it should indicate whether there are any semantic changes, and it should attempt to keep the code working. eg., If you move some code that uses a local variable, it should automatically move the declaration of the local, too, if necessary to keep it before the code that uses it.
You should be able to expand a body of code and see the macros or functions it calls inline. (In fact, it should display the expanded code as it would appear during execution from that particular call site, using as much static analysis as possible to indicate what the code will do at runtime.)
You should be able to view change information and quickly move forward and back to see how a piece of code changed over time.

And, of course, the code should be formatted to maximize readability and adding code should not require typing every comma, brace, and semicolon yourself.
You should never be able to introduce a syntax error.

You should be able to, say, select part of a body of code and tell it to create a new function with that code and replace the original instance with a function call.
You should be able to view expressions formatted like math formulae, or trees, or RPN; whatever works best to make it readable.
You should be able to tell the computer to simplify an expression, or solve for a different variable, just as in a math program.
The environment should allow you to add pre/postconditions and invariants even if the language proper does not.
And it should make it easy to create test rigs for individual functions, where it automatically determines a good set of test arguments and records the results.

See what happens when you get me going on syntax?
I’ve been thinking about this since my first full-time programming job, where there were a dozen different code formatting styles. It drove me nuts.
It all started with the idea of being able to always view code the way I want to, regardless of how other people viewed it.
Part of what attracted me to the Dylan programming language was Apple Dylan’s focus on code presentation and storing everything in databases.
Just a few of these ideas made it into Functional Developer, a Dylan implementation I worked on when it was still named Harlequin Dylan. I cannot overstate how much I would like to take these ideas further.

The basic mantra is: Make it easier to write correct code faster.

All this computing power, yet programmers have some of the poorest tools, compared with shrink-wrap applications.

Posted at 12:25 AM    


©