archive-cd paperless archiving solutions logo
 

| Home | SHOP | About Us | Affiliates | FAQ's | Privacy | Service | Terms | Contact Us |

archive-cd scanners
» Fujitsu
» Kodak
» Canon
» Surplus
»
Rent
» Lease

archive-cd software

» Adobe  
» CompuThink
» Encite - EMR
» FileMaker Pro
» GPC Data

» FileNexus
» FREEScanimator
» FREE Software

special offers authorize net
Shop Securely

mastercard visa
» FREE Shipping Details


file maker pro girlTIPS & TRICKS

Find Duplicates Intelligently
This tip explores two methods for identifying duplicates, both of which integrate some calculation and scripting techniques.
By Stuart Gripman

Few problems cut across databases as pervasively as duplicate records. Although record duplication may be most prevalent in databases containing contact information, such as names and addresses, it has the potential to show up anywhere. It comes as no surprise that just about every client I've worked with over the years has expressed a need to identify and remove duplicates.

Finding redundancy seems simple enough. Just find duplicates and delete one of them, right? Well, sure, but how do you define "duplicate?" Consider this example. Your database has the following two records:

Ben Finn
412 Arlington Boulevard
Eastsound, WA 98070

Benjamin Finn
412 Arlington Bl.
Eastsound, WA 98070-1222

Likely, these two records are for the same person, and you should purge one, but if you just went looking for identical records, these two records wouldn't come up because there are small differences on each line. Your method for identifying potential duplicates must allow for "loose" matches if you're going to catch these two "duplicates." In this tip, I'll explore two methods for identifying duplicates, both of which will integrate some calculation and scripting techniques.

The "human knows best" technique
Typically the chore of finding duplicates is going to involve some judgment calls. Somebody has to look over the records and decide whether a given record is a duplicate. Although it can be a real chore, FileMaker Pro does provide some tools to ease the burden. Using some scripting and calculations, you can let FileMaker Pro generate a list of likely duplicates and present them to you for review.

The first step is to create a calculation field. In this scenario, finding duplicates based on any single field will return a large found set with few true duplicates. The new calculation field will combine the contents of multiple fields. Your script will search for duplicates using the calculation field, thereby reducing the number of false duplicates. For this exercise, I chose to combine the Last Name and ZIP code fields. Finding duplicate last names or ZIP codes alone is sure to return many records that aren't true duplicates. When you search on the combination of the two, FileMaker Pro will find fewer records that are more likely to be duplicates.

In Define Database, create a new calculation field named "zct_DuplicateCalc" and enter the following formula:

NameLast & Zip

The ampersand concatenates -- or "glues" -- NameLast and ZIP together.

Now you're ready to write the script that simply searches for duplicate values in your "zct_DuplicateCalc" field and presents the sorted results.

When you run the script, FileMaker Pro finds the records most likely to be duplicates, sorts them so they're appropriately grouped together, and switches to a list layout where you can review the results. In the example file, the script locates 22 records or 11 potential duplicates out of 500 records in the database. Contrast this result with searching for duplicate data in the last name field alone; that results in 58 records. Finding duplicate ZIP codes alone returns 192 records.

Now that you have a manageable list of likely duplicates, you can check them over and manually delete the unneeded records.

The "blunt force" technique
The "blunt force" technique is a script that finds duplicates based on your criteria and deletes all but the first duplicate without pausing for verification. The scripting here is more complex than the previous technique, but it requires no human review. Automation comes at the cost of flexibility here. This technique deletes records without asking first. Be sure you're developing and testing this technique on backup copies of your data. When you or your script delete a record, it's gone for good.

Before you can write the script, you'll have to add two new fields to the database. Call the first "zgt_LastValue" and set it to be a text field with global storage. The script uses this field to compare records and it doesn't have to appear on any layout. The second field is "zt_DeleteFlag," a regular text field.

The script works like this:
• Find all records with duplicate cities.
• Sort by city so the duplicates line up together.
• Starting with the first record, go to each record and compare its "City" value to the "City" value in the previous record.
• If the "City" values match, mark the record for deletion.
• If there's no match, keep going.
• After checking all records, find the ones marked for deletion and delete them.

Set your expectations
As you can see, defining duplicates and accurately finding them usually requires some human intervention. This task can be much less onerous if you take the time to define what "duplicate record" means for your database first.

The approaches I presented in this issue are designed to be compatible with FileMaker Pro 7 through 8.5. With minor modifications, you can apply them to earlier versions.

ARCHIVE-CD can show you how we can help you streamline
your work environment — and expand your possibilities.

Call 800-323-1868 or visit www.archive-cd.com

| Special Offers | Rebates | Surplus | Leasing | Extended Warranty |


ARCHIVE-CD, LLC
Paperless Archiving Solutions
• 910 Beverly Way • Jacksonville, Oregon 97530   Toll Free: 800-323-1868    Email: info@archive-cd.com
ARCHIVE-CD, LLC © 1999-2009

| Home | Shop | About Us | Affiliates | Ask the Experts | Careers | Customer Service |
| Electronic Medical Record | Free Info | Frequently Asked Questions | PRIVACY | Request Info | Paperless Archiving Solutions |