|
I love Apple's listserv lists.
I've been subscribed to quite a few for a few years, but generally I've only signed up for particular lists when I find that I'm going to start needing info about that particular topic.
Unfortunately, that means that I don't have all the "back-issues." Apple provides searchable and browse-able archives, but their search engine usually only returns results with the subject line used in the email and a summary-excerpt that tends to be the email headers from that particular post. Not very helpful, or at least, not as useable as the search capabilities of Mail.
They use Mailman, which would automatically generate browse-able archives as well as make the archives available for download in an mbox format. However, they're short-circuiting that part of Mailman and using MHonarc to generate the web archives and HtDig to make them searchable.
But, I had to have my own copy of at least some of the lists. A lot of times the answer to my question is located in those "back-issues."
I've fiddled and fiddled for quite a while and finally found a solution. First, you'll need wget, which isn't included in Mac OS X 10.2 or 10.3. I'm using version 1.9.1 of wget and I had no problem with the default configure, make, sudo make install routine. Once you have wget installed, you need to go to lists.apple.com and decide on the list for which you want to create your own mbox archive. I'll use the xgrid-users list in my example.
First, you need to pull down the archive. One of the problems I'd run into is that when you browse it, the archive is served up as webpages with links on each page that take you up the hierarchy, making it difficult to pull down only exactly what you need. Luckily, wget has the right switch we need. To pull down the archives, here's the wget line to use:
wget -r --limit-rate=5k -I archives/xgrid-users \ http://archives:archives@lists.apple.com/archives/xgrid-users/
I could make you look at the man page (and you should anyway...wget is a great utility with lots of features), but I'll tell you what the options mean. -r tells wget to follow other links that it comes upon in the webpages, --limit-rate=5k will make wget try to hold it's download speeds at or below 5k per second to keep from overwhelming Apple's webserver, -I is the magic bit that tells wget to only follow links down the archives/xgrid-users path and not go try to grab everything on the website and last is the URL for wget to start at, including the username and password to get you in.
Wget will process away and you'll ultimately be left with a set of directories and files nested the same way that they're served up from the webserver. Next, we need to put all the individual text files that make up the archives together into an mbox formatted file that can be imported into Mail.
If the list you've grabbed was started relatively recently this will be easy. The problem is that there are posts in older lists in which the contents are ordered slightly differently due to the listserv system they were using at the time those messages were posted. If the list is new enough, it won't have the problem since it's always been handled by Mailman (which stores everything correctly). For the xgrid-users list, we can use the simple method since it's only been around since January. Change into the xgrid-users directory and use the find command below to concatenate all the text files together:
find . -name '*.txt' -exec cat >> ~/Desktop/mboxFile -- {} \;
You'll need to put the mboxFile that was generated on the desktop into a folder (it can just be called the default "Untitled Folder") because Mail only seems to let you select the directory containing the mbox file, not the mbox file itself. Then, open Mail and choose Import Mailboxes... from the File menu. In the presented dialog box, select the Other option and proceed to the next panel. You're then presented with only the Mailboxes (mbox format) option already selected (how intuitive!). Proceed and select the folder containing the mbox formatted file you created. Mail will parse the file and present it to you in a list to be imported. If it doesn't show up in the list, then that's a big clue that you won't be able to take the easy route here and you can cancel at this point. If it does, then go ahead and let Mail import. It will eventually finish and leave you with the imported Mailbox in an Imported folder in the Mailboxes drawer. If you've got about as many messages in the mailbox as text files were used to create the mbox file, then you're done. Weed out the duplicates from your existing archive of list messages if you have any and move the imported messages in with the ones you already have. You're done! If there seem to be much fewer messages in the imported mailbox than there were text files, or a lot of messages seem to contain multiple mail messages or only partial messages, then you may want to remove the imported messages and continue on below.
Now, if it seems you're going to have to take the long route, that's ok...I wanted to grab the applescript-users list a while back, which contains over 72,000 messages it turns out and has been running since the beginning of January 1995. It certainly had some oddly formatted messages in the archive. I wrote a perl script which will process the archive of text files and output things ordered correctly so that Mail will recognize and import it as an mbox formatted file. So, while it was a longer route, it will be a fairly quick extra step for you! I've posted the perl script on the Satellite Of Love.
I called the script concat.pl. You'll have to set the permissions to executable on the script. Then, assuming the script is on your desktop, change into the lists.apple.com/archives/ directory and execute the concat.pl script on xgrid-users (or other list archive directory you've grabbed), sending the output to the mboxFile with this command:
~/Desktop/concat.pl xgrid-users > ~/Desktop/mboxFile
From there, follow the directions above for importing the mboxFile into Mail.
I have to admit that this script may not catch every variation on text file format, but it comes pretty close. There's no guide to what format the older text files were in, and they're not necessarily in any consistent format by any means it seems. Note that even though absolutely all the messages may not get separated out completely, none of them are actually lost. At the least, you might have a few messages in Mail that are actually several messages in one. The search function will still find the message if it has content pertinent to your search.
Last, I'd like to recommend that you don't try this during the daytime hours of Cupertino, CA. These lists aren't spread across Akamai's servers, they're only on one server, which can make your download take forever as well as make them slow to access for others. Also, make sure to do some sort of bandwidth limiting like the wget switch used above.
|