(identity 'myron)

Tue, 27 Sep 2005

Ruby On Rails and Plurals [/programming]

I've been learning Ruby on Rails lately which is quite clean and elegant, but that's not what this post is about. From the moment I found out it attempts to auto-pluralize object names, I've been itching to bang on it to see what it understands and what it doesn't. The only thing I've heard said about it is that it's smart enough to understand child and children, person and people.... Well then, punk, let's go for a ride around the mess that is the English language, shall we?

% rails plurals
[a bunch of output about creating files and directories]
% cd plurals
% ruby script/generate model Child
[...]
    create  test/fixtures/children.yml

Works as advertised. Let's move on.

% ruby script/generate model Spy
[...]
    create  test/fixtures/spies.yml

Not bad, but still only smart enough to earn a preschool star sticker. What if my app's all about the world of fantasy fiction?

% ruby script/generate model Elf
[...]
    create  test/fixtures/elves.yml

Ok I'm mildly impressed at this poi—oh who am I kidding—it's time to bring out the big guns!

% ruby script/generate model Torpedo
[...]
    create  test/fixtures/torpedos.yml

Wrong! But most people won't even notice. Let's try something similar but more obviously wrong.

% ruby script/generate model Hero
[...]
    create  test/fixtures/heros.yml

Hehe. Now for nouns that have no plural forms at all.

% ruby script/generate model Deer
[...]
    create  test/fixtures/deers.yml
% ruby script/generate model Sheep
[...]
    create  test/fixtures/sheeps.yml

And finally the test for some Italian:

% ruby script/generate model Concerto
[...]
    create  test/fixtures/concertos.yml

For the record, this was done in Rails 0.13.1. Some other interesting examples it got right: Octopus/Octopi, Symposium/Symposia, and Vortex/Vortices. Oddly enough for each one of those, I could find a noun in the same "class" it would get wrong. It pluralized Fungus instead of Fungi, Ovums instead of Ova, Appendixes instead of Appendices. So go figure.

// posted at 19:58. permalink   comments

divider symbol

Fri, 24 Jun 2005

Side By Side Diff [/programming]

It's been a long week preparing for an open source release at work and I've been stumbling on tons of tools in trying to push my productivity up. One of them is just an option on diff, which I thought was a feature only possible in GUI form. I'm happy to have discovered I was wrong. O side by side diff, where have you been all my life? The output's especially pretty with so much code available at a glance, though it's hard to read at the 8 pt font needed to get all the output fitting on screen (now if someone can tell me how to get diff --side-by-side to do line wrapping, I'll be indebted to you for life). Geek that I am, I took the obligatory screenshot:

diff --side-by-side

In case you're staring at the code and actually reading it, it's old and crufty code written years ago by I don't know who. And don't even bother trying to hack us from what's shown—we're going open source next week.

// posted at 20:31. permalink   comments

divider symbol

Sat, 23 Apr 2005

Scraping the Web [/programming]

At the recommendation of my friend, Alex, I started reading Moby's Journal, but he doesn't provide an RSS feed and I'm too lazy to click it every day. So I wrote a quick and dirty script in Scheme this morning to scrape his journal into RSS. Lucky for me, his journal is actually correct XHTML with fairly meaningful divs (I wonder who makes his page? Kudos to them.), so I figured I'd try and scrape it using XPath to learn something new.

I know I'm getting to be a rabid Scheme fanatic, but it seriously kicks ass for XML work because of SXML. For the non-lispers out there, it's like if you could manipulate XML as a tree using a language's native data structure. Might sound odd, but then the advantage comes through when you realize all the functions that manipulate that data structure are also now useable for working with XML.

After figuring out the basics of XPath, it wasn't too hard to figure out from there how to grab everything and transform it into RSS output:


(define (journal-items)
  (let* ((sxml (moby-journal->sxml))
         (titles ((sxpath "h2/a/text()") sxml))
         (links  ((sxpath "h2/a/@href/text()") sxml))
         (items  ((sxpath "div[@class='content']") sxml))
         (dates  ((sxpath "span[@class='submitted']/text()") sxml)))
    (map make-moby-srss-item titles links dates items)))
  
(define (make-moby-srss-item title link date item)
  (make-srss-item title
                  (relative->absolute-href link)
                  (moby-date->iso8601 date)
                  (sxml->html item)))

(define (make-srss-item title link date description)
  `(item (title ,title)
         (link ,link)
         (dc:date ,date)
         (description ,description)))

This could proably look a little nicer syntactically, but it's good enough. The journal-items function scrapes all the entries on the page, then uses them to build a list of RSS item nodes in SXML, doing any necessary minor syntax transformations along the way.

The make-srss-item is a good example of how compact and transparent SXML is in Scheme. It generates a simple item node in SXML RSS 2.0. It only implements the elements needed for Moby's journal, but it's not hard to imagine generalizing it to implement the full spec. Along with the function to build the actual SXML RSS out of individual item nodes and another function to convert the SXML to an XML string, it comes out to be a total of 14 lines of code for all the RSS generating. The total lines used for scraping is also just 28 lines.

For the curious, here's the full source and example output from time of writing.

// posted at 22:52. permalink   comments

divider symbol

Thu, 21 Apr 2005

Is CS a science? [/programming]

Yet another interesting item from ACM news, Is Computer Science Science? (warning: pdf). Unlike a lot of other discussions out there on the same topic, this one argues yes.

What is your profession?

Computer Science.

Oh? Is that a science?

Sure, it is the science of information processes and their interactions with the world.

Oddly enough he later admits that it's also a blend of science, engineering and mathematics. However, to him, the lines are blurry enough that really, it still follows the scientific ways of making successful hypotheses to explain and predict phenomena in the world:

The scientific paradigm, which dates back to Francis Bacon, is the process of forming hypotheses and testing them through experiments; successful hypotheses become models that explain and predict phenomena in the world. Computing science follows this paradigm in studying information processes. The European synonym for computer science—informatics—more clearly suggests the field about information processes, not computers.

I'm not entirely convinced, but it's an interesting read anyway, if not for the article then for the quotes or the mention of our loss of credibility due to hype at the end. The latter being really interesting, since if computer science really is a science, then we've failed miserably at providing quantitative measures for claims like productivity gains of programming languages and paradigms. Anyway, if you've ever wondered why it's BSc's we have and not other degrees, here's the answer.

// posted at 09:34. permalink   comments

divider symbol

Fri, 10 Dec 2004

The Visitor Pattern [/programming]

For the second time, the question, "Why is it good to use the visitor pattern?" popped up on a test in my compilers course. And for the second time in two days, someone recommended learning patterns because they'll help me become a better programmer. Yes well, true to form, the hype's pushed me over the edge and I'm calling bullshit.

What is the visitor pattern? It's a means of letting you "define a new operation without changing the classes of the elements on which it operates" (source). It also groups together the visitation functionality among a set of classes into an external class, the visitor class, sometimes making things cleaner in terms of code organization.

What is all this saying? The underlying notion here is that you want to visit a set of objects with a common method selected by the type of each object. The pattern is to group the methods into a visitor class. Each class that can be visited then implements an accept method that receives a visitor object and applies the visitor's visit method on itself. Now you might be thinking, what's the inversion for?

This is called dual dispatch because it involves two polymorphic dispatches. The first is the accept function. This dispatch resolves the type of the object that accept is called upon. The second dispatch is the visit method called from the resolved accept method. The second dispatch resolves to the particular function to be executed. (source)

At this point, alarm bells should be sounding off in your head. Why? Recall what we're essentially trying to do: add functionality to a set of objects such that we can visit each one and invoke a method polymorphically based on its type. Now look again at all the scaffolding the pattern's requiring us to build. What's wrong? The inversion required for double dispatch makes it unclear what we're trying to accomplish. Someone reading the code who's not versed in the visitor pattern will be at a loss for understanding what we're getting at. Come to think of it, the prof remarked that very few people got the question right the first time, and so he thought it appropriate as something to ask again. The confusion comes from the fact that the code is unclear in what it's meant to accomplish.

Now you're probably thinking, "Well then, hotshot, what is clearer?" Provided that there are two features called multi-methods and higher order functions in a language, it can be succinctly rendered (here in Lisp syntax by Kaz Kylheku) as someting like:

(walk-my-object #'evaluate my-object)

Where evaluate is a multi-method applicable to each object in my-object. A more concrete example is printing various objects of different types stored in a list (written by the same person above, here):

(mapcar #'print-object object-list)

mapcar visits each object in the list calling print-object, which is specialized for each class to print different objects out in readable form. Here, dispatching based on the types of the objects being visited is implicit because it's built into the language. This allows the code to be succinct in communicating the notion of visitation, which the canonical implementation does not.

So it's cleaner and easier to read. So what? A pattern is meant to be a common concept implemented externally to the language. But here, it's a concept native to Lisp. Taking this further.... Programming is about being able to abstract and name common concepts so that they need not be repeated over and over again. If we have managed to find common patterns in code that can be applied repeatedly in a mechanical manner without being able to express them as an abstraction—I don't think this is a sign of cleverness, rather, it's a sign that our programming languages have failed us. In the words of Paul Graham:

This practice is not only common, but institutionalized. For example, in the OO world you hear a good deal about "patterns". I wonder if these patterns are not sometimes evidence of case (c), the human compiler, at work. When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough—often that I'm generating by hand the expansions of some macro that I need to write.

Along the same lines, Peter Norvig found 16 of 23 patterns to be invisible or simpler in languages like Dylan or Lisp. Personally, I'll reserve my judgement until I learn the other 22 patterns, but I also reserve my right to be skeptical of their oft-claimed greatness given the above—and so should you.

Some more discussion along these lines is available here: Are Design Patterns Missing Language Features?

// posted at 21:18. permalink   comments

divider symbol

Mon, 06 Dec 2004

Recursion and an old demon [/programming]

It's been six or seven years since my high school teacher tried to teach me Scheme and recursion after having taught me Pascal, which was a really tough mental jump. He threw at me one of the classic recursive problems, Towers of Hanoi, and for the life of me, I could never understand why the recursive solution worked and how one would come up with it. I couldn't get past wanting to roll out the algorithm step by step like in a for loop.

Well, years later, out of curiosity to see whether or not I get it now, I dug up a site on recursion and the Towers of Hanoi. Lo and behold, the solution looks fairly obvious now... it just took six or seven years. Hehe ;)

// posted at 14:55. permalink   comments

divider symbol

Tue, 30 Nov 2004

Toto, we're not in Common Lisp anymore [/programming]


(defun gauss (a l)
  "Gauss procedure from pg 285"
  (let* ((n (1- (array-dimension a 0)))
         (s (make-array (list (1+ n)))))
    (loop for i from 1 to n
       do (loop for j from 1 to n
             initially (setf (aref l i) i)
             maximize (abs (aref a i j)) into smax
             finally (setf (aref s i) smax)))
    (loop for k from 1 below n
       do (loop with rmax = 0 and r = 0 and j = 0
             for i from k to n
             do (setf r (abs (/ (aref a (aref l i) k)
                                (aref s (aref l i)))))
             when (> r rmax)
               do (setf rmax r
                        j    i)
             finally (rotatef (aref l j) (aref l k)))
         (loop with xmult = 0
            for i from (1+ k) to n
            do (setf xmult (/ (aref a (aref l i) k)
                              (aref a (aref l k) k))
                     (aref a (aref l i) k) xmult)
              (loop for j from (1+ k) to n
                 do (decf (aref a (aref l i) j)
                          (* xmult (aref a (aref l k) j))))))))

This is code translated directly from pseudocode (and hence not very readable) from my numerical analysis textbook. Thanks to common lisp's loop facility—yes, it's so complex it's a facility—if you remove the parentheses, you get back the original pseudocode, more or less.

It's kind of funny and scary at the same time... like using a chainsaw to cut a loaf of bread. I don't really know yet if that's a good thing or a bad thing: it makes translating from Algol-like syntax easy, but the result looks nothing like Lisp.

// posted at 22:41. permalink   comments

divider symbol

Mon, 18 Oct 2004

On OOP [/programming]

From a slashdot interview with Rob Pike:

The future does indeed seem to have an OO hue. It may have bearing on Unix, but I doubt it; Unix in all its variants has become so important as the operating system of the internet that whatever the Java applications and desktop dances may lead to, Unix will still be pushing the packets around for a quite a while.

On a related topic, let me say that I'm not much of a fan of object-oriented design. I've seen some beautiful stuff done with OO, and I've even done some OO stuff myself, but it's just one way to approach a problem. For some problems, it's an ideal way; for others, it's not such a good fit.

Here's an analogy. If you want to make some physical artifact, you might decide to build it purely in wood because you like the way the grain of the wood adds to the beauty of the object. In fact many of the most beautiful things in the world are made of wood. But wood is not ideal for everything. No amount of beauty of the grain can make wood conduct electricity, or support a skyscraper, or absorb huge amounts of energy without breaking. Sometimes you need metal or plastic or synthetic materials; more often you need a wide range of materials to build something of lasting value. Don't let the fact that you love wood blind you to the problems wood has as a material, or to the possibilities offered by other materials.

The promoters of object-oriented design sometimes sound like master woodworkers waiting for the beauty of the physical block of wood to reveal itself before they begin to work. "Oh, look; if I turn the wood this way, the grain flows along the angle of the seat at just the right angle, see?" Great, nice chair. But will you notice the grain when you're sitting on it? And what about next time? Sometimes the thing that needs to be made is not hiding in any block of wood.

OO is great for problems where an interface applies naturally to a wide range of types, not so good for managing polymorphism (the machinations to get collections into OO languages are astounding to watch and can be hellish to work with), and remarkably ill-suited for network computing. That's why I reserve the right to match the language to the problem, and even - often - to coordinate software written in several languages towards solving a single problem.

It's that last point - different languages for different subproblems - that sometimes seems lost to the OO crowd. In a typical working day I probably use a half dozen languages - C, C++, Java, Python, Awk, Shell - and many more little languages you don't usually even think of as languages - regular expressions, Makefiles, shell wildcards, arithmetic, logic, statistics, calculus - the list goes on.

Does object-oriented design have much to say to Unix? Sure, but no more than functions or concurrency or databases or pattern matching or little languages or....

Regardless of what I think, though, OO design is the way people are taught to think about computing these days. I guess that's OK - the work does seem to get done, after all - but I wish the view was a little broader.

// posted at 13:46. permalink   comments

divider symbol

Sun, 19 Sep 2004

The return to Java [/programming]

Act of God, perhaps? I managed to avoid using Java in 2.5 of my 4 years here at UVic, a mostly all-Java school. But finally, my time has come... in a compilers course, no less, where it's being used like a verbose functional language. See the book, Modern compiler implementation in Java, if you don't believe me.

So now I've begun the hunt for tools to help bring out the functional side of Java even more, starting with an interactive interpreter. I've found a few, of which beanshell looks the most polished. It's quite decent, the only exception being that Java isn't value-based, so you have to System.out.println() everything manually. It's better than nothing though. But this just goes to show how backwards working in Java can be. No type inferencing, no closures, no function-by-function compiler. Who uses this language anyway? Oh right, everyone. :(

// posted at 09:17. permalink   comments

divider symbol

Thu, 16 Sep 2004

Quicksort [/programming]

Had to look up quicksort for a class recently and I stumbled on its entry in Wikipedia. They have sample implementations that are "written in a non-contrived style, characteristic of the respective languages". Here's a few of the interesting ones starting with J:


sort =: ]`(($:@: ((}.<:{.)#}.)) 
           ,{., 
          ($:@: ((}.> {.)#}.)))  @. (*@#)

Seems like something that would make a perl programmer drool with envy.


sort :: (Ord a)   => [a] -> [a]
    
sort []           = []
sort (pivot:rest) = (sort [y | y <- rest, y < pivot])
                    ++ [pivot] ++ 
                    (sort [y | y <- rest, y >=pivot])

This one's quite famous since it's often the first piece of code offered in Haskell tutorials.


(defun partition (fun array)
  (list (remove-if-not fun array) (remove-if fun array)))
   
(defun sort (array)
  (if (null array) nil
    (let ((part
            (partition (lambda (x) (< x (car array))) (cdr array))))
      (append (sort (car part))
              (cons (car array) (sort (cadr part)))))))

Common Lisp. I really like this one because to me, it's typical Lisp... That is, it's not the shortest implementation but it achieves a balance between succinctness and clarity. The partitioning phase, for example, makes use of standard sequence functions and achieves an elegant economy of speech while being clear enough to be almost declarative. The only thing odd about this is that, despite the name array, it's working with lists and not arrays.... :P

// posted at 10:04. permalink   comments

divider symbol

Thu, 29 Jul 2004

Great Hackers [/programming]

I was gonna break this story before the slashbots got a hold of it, but I was too lazy and bit too busy with assignments. What am I talking about? Paul Graham's latest essay, Great Hackers. It's some of the same old, same old, the only thing different this time around being that he keeps the Lisp advocacy much more subtle while tooting Python's horn instead.

For example, if your company wants to write some software, it might seem a prudent choice to write it in Java. But when you choose a language, you're also choosing a community. The programmers you'll be able to hire to work on a Java project won't be as smart as the ones you could get to work on a project written in Python.

The only reference to Lisp I can pick up is this line: "I cheat by using a very dense language, which shrinks the court." Which most people wouldn't even pick up without knowing more about the author. You have to wonder why there's such a change in tone from his previous essays.

I'm beginning not to like reading what he writes as much, though. His style of writing is always very blunt, yet delivered in language that cuts through hard issues as well as a butter knife. His words are often much too simple for the ideas they are meant to carry. This makes for a very smooth-looking surface, but one that nevertheless flattens the finer details, the little bumps and imperfections that make things truly interesting in discussions and in life. The tone and diction just don't match the content, I guess is what I'm saying.

// posted at 10:52. permalink   comments

divider symbol

Thu, 15 Jul 2004

A Quote [/programming]

A quote from Hacker's & Painters that I caught on to from a lisp blog:

A programming language is for thinking of programs, not for expressing programs you've already thought of. It should be a pencil, not a pen. Static typing would be a fine idea if people actually did write programs the way they taught me to in college. But that's not how any of the hackers I know write programs. We need a language that lets us scribble and smudge and smear, not a language where you have to sit with a teacup of types balanced on your knee and make polite conversation with a strict old aunt of a compiler.

Honestly, I don't mind talking to a strict old aunt of a compiler provided that I can forego the conversation and just compile away anyway if I wanted to. The only people who are willing to do this seems to be the dynamic languages people including efforts in the PLT Scheme group.

Looks like a really good book to pick up once poor student syndrome rolls over....

// posted at 13:14. permalink   comments

divider symbol