Spanish Names in Customer Database Filtering - 19 January 2012
Having agreed with the business what is a valid alert, it is possible to look at reducing the number of invalid ones.

I can’t speak for others, but Fircosoft watch list filtering software is peculiarly inaccurate when screening Spanish style names. The issue arises because of the naming conventions used by Spanish speaking people, the relatively limited number of names and the prevalence of people of Spanish descent involved in drug trafficking.

We can’t do much about there being a lot of Mexicans and Colombians on the sanctions lists but we can look at the way Spanish speaking people name themselves. For them it is traditional to use a composite given name (usually two parts) and two family names, unlike the Northern European convention of using multiple given names and a single family name. The first family name is the father’s paternal family name and the second is the mother’s paternal family name. Thus a male child of "Miguel Luis Cruz Perez" and "Maria Lidia Ruiz Ferreiro" might traditionally be called "José Angel Cruz Ruiz".

Useful information for improving watch list filtering is that the given name is a composite so, for example, José Angel and José Antonio do not have the same given name just because José matches; each part of the given name should not be considered separately. Also, both family names would be used for official documents. That is, while "José Angel Cruz Ruiz" might be known familiarly as "José Cruz", this would not be acceptable for official documents.

Given this, we can say with a high degree of confidence that "José Antonio Cruz Garcia", who might also be known as José Cruz, is not the same person. His composite given name is not the same (José Antonio ≠ José Angel) and his mother’s paternal name is also different (Garcia ≠ Ruiz).

In the process of finding "close" matches, watch-list filtering software like Fircosoft examines all subsets and orderings for all words in the names of the customer and the list entry and would thus come up with a false alert when matching a customer called "José Angel Cruz Ruiz" with a list entry called "José Antonio Cruz Garcia" by matching just "José Cruz". There are normally some general settings you can make to mitigate this to some extent but these assume that all the customer names being screened are Spanish and can be treated the same way. This would be fine if you can be certain this is the case. However, in countries with a significant mix of Northern European style names and Spanish ones (such as the United States) or, if the data you are screening is for an international group, then applying these general settings to non-Spanish names can result in a risk of missing valid, or even true, alerts.

The only thing left is to create some post-screening rules to discard this type of alert. What follows is applicable in Fircosoft as this is where my expertise lies but it can no doubt be applied analogously in other watch-list filtering software. Also, I apologise that I can’t give actual code here – Fircosoft would complain.

The first rule I use discards hits in which only one of the two family names matches.
The first and second criteria checks that the generated hit concerns two Spanish style names. This can be satisfied by using appropriate wildcards to determine that both the customer and the list entry are of the required form.

The third criterion checks that the match includes at least one family name and at least one part of the given name

However, using wildcards, this does not tell you how many FamilyNames have matched because the wildcard could also include a space character. Therefore the fourth criterion is needed to ensure that the matched text must not contain a space before the separator and only one family name is matched.

In the particular case of the Fircosoft software, while this rule is useful in reducing false alerts, it is not complete. When specifying the matched text, Fircosoft only keeps the start and end positions. In our example above, this means that if our customer was called "Cruz Ruiz, José Angel" and the list entry was "Cruz Garcia, José Antonio" the matched text would be shown as "Cruz Ruiz, José", even though it is only "Cruz, José" that actually causes the match. This means that, in this case, our fourth criterion, that the matched text does not contain a space before the separator is not satisifed and the hit is not discarded. In fact, the rule will only discard hits where the match is against the maternal family name and one or both parts of the given name.

To deal with this case where there is an unmatched family name in the matching text, it is necessary to create individual rules for each list entry. Because of this, it is only used for those list entries that are causing a disproportionate number of false alerts. Here, the rule checks for missing words in the matched text. Criteria 1 & 2 are general but enough versions of Criteria 3 are needed to cover all the names in the list entry.

Using the example above, where the list entry is "Cruz Garcia, José Antonio" and the matched text is "Cruz Ruiz, José" the first two criteria are satisfied. We then need four rules to check whether any of Cruz, Garcia, José and Antonio are in the matched text. If any of them are missing then we can ignore this hit because if they were in the customer name, they would also be in the matched text.

This is just a brief introduction to the things you need to consider when dealing with Spanish names but it hopefully gives enough to inspire some thinking.