The Disk Wizard File Searcher is a simple little program I made to help
find duplicate files in the exported text files Disk Wizard can create.
Well, yes, but I personally have a humongous library (over 35,000 items
on 19 disks and counting) and Disk Wizard usually chokes up on me and won't
do it. If Disk Wizard works for you when it comes to finding duplicates,
you don't need this program. By keeping it simple and stupid, I've become
pretty sure my program can search any library, given enough time and memory.
Naturally, it depends on how big your library is. My current method
uses an N-squared algorithm, which is not known for it's blazing speed.
My 35,000 item library took 15 minutes on a 210 Mhz PowerMac and nearly
90 minutes on a Performa 636. As for memory, the program needs about 150
Kb for itself, plus up to a few hundred bytes for each file you're searching
through. The program will give you some recommendations every time you
run it, but keep in mind that I kept these figures fairly pessimistic.
If you're interested, here are the formulas it uses in its recommendations:
Minimum: 150+(N*0.1) Kb Practical: 150+(N*0.6) Kb Safe Bet: 150+(N*2.1)
Kb.
First of all, you need to get a copy of Disk Wizard. I've got a
copy right here for you, or you can go
to the real Disk Wizard Web
Page and get a copy there. Once you have your libraries scanned in,
select "Export as text..." from the File menu (or hit
command-E). Make sure that you indent sub-folders by 2 spaces, and
save the file with the name "List" (case insensitive) in the
same folder with the Disk Wizard File Searcher. Open up DWFS and answer a
few yes/no questions, and DWFS will start searching. The progress bar
isn't entirely accurate (for example, the program is nearly 75% done when
it hits the 50% mark), but it will give you an idea if you have time to go
on vacation before it finishes. When it finally finishes, you'll have a
new (plain text) file called "Match List" that lists all the
matching files it found.
Are you going to use this file in a database program? If you
answer no, the matches will be seperated by lines of asterisks to make
the list easier to read, and the file will have a few lines at the top
describing the matching done. If you answer yes, these elements will be
left out so that the list may be easily imported into a database (such
as the database in ClarisWorks).
Do you want to ignore files named 'Icon'? Whenever a folder has
a custom icon, it gets an invisible file named "Icon". Since
these icon files tend to have the same size, type, and creator, they can
add a lot of unneeded names to your match list. Answering yes will keep
these annoying little files off your list.
This is how I do it with ClarisWorks: I create a new database with fields
for name, path, type, creator, size, date created, and date modified (in
that order). Then under the File menu, I select the "Insert..."
command and open the match list. A few taps of the return key later, everything's
in the database. What you do with it is up to you.
Email me! The program is totally free of any charge whatsoever, and I would really just like to hear from people who actually took the time to download it and try it. I'll even send you my source code if you ask nicely.
You betcha! I'm working on version 1.1, which I hope to incorporate
less demanding memory requirements and smart folder matching (i.e. if everything
in folder X matches everything in folder Y, then you just get the matching
folders in the output and not all their contents). If I have time, I'll
try making it faster also. Eventually I want to totally redo the interface
so it actually looks like a Mac program. This is my first real program
for the Mac, so bear with me, OK? If you have any other improvements you'd
like to see, let me know!
Mainly just François Pottier, for writing his incredibly excellent program Disk Wizard in the first place. If you found this program before finding Disk Wizard, you can download it from his homepage at: http://pauillac.inria.fr/~fpottier/
You have been reading: http://sdcc8.ucsd.edu/~bblovett/dwfs.html
Page last modified: 8/12/97
Page and art by: Brian Blovett