Performance of Origins

    Coverage

    Normally, Origins achieves a coding rate of more than 99% of records from reasonable quality customer files containing personal and family names. This is achieved through the vast name bank that OriginsInfo has accumulated.

    The residue will be made up of a small number of unrecognised and extremely rare names, and names that are ambiguous and 'too close to call'.

    Validation

    The proportion within each cultural group is remarkably consistent with nationally aggregated census data. This is supported by comparative mapping of census data and Origins categories that have a very good correspondence with the census classifications.

    Accuracy

    No classification of cultural origin is 100% accurate and a precise definition of cultural origin is elusive. Most definitions tend to reflect country of birth, language spoken at home, religion, or ancestry. Each has its own limitations and, even where such data is available, none may assist in understanding the drivers of customer attitudes and behaviour.

    Collecting data from consumers on cultural origin is rarely successful due to accuracy, non-response, or absence of apparent purpose. Where it does occur, collection usually relies on self-completion or self-perception. In the case of census data, reporting is usually aggregated by geography, or by classification hierarchy, or both - thereby sacrificing granularity for reasons of practicality and confidentiality.

    The view of OriginsInfo is that a person's name is a very good surrogate for cultural identity, particularly where it makes use of the substantial body of research into name patterns and their meaning.

    The level of accuracy in the CEL codes appended by Origins varies from one code to another. Names of Muslim, Chinese, Vietnamese, Indian and British (Anglo-Saxon and Celtic) origin, achieve accuracy rates in excess of 95%, based on client testing where approximately comparable data is available, and where anecdotal feedback from the Originsonline name profiler.

    Accuracy rates for southern and eastern Europeans, and Armenians, are in the low 90s, while Hispanic coding achieves in the 80s. Slightly lower levels occur with names originating from northern Europe and France. The weakest performance is 50-80% for Aboriginal and Torres Strait Islanders, and members of the Black Caribbean and Jewish communities - where there is a greater tendency to adopt Anglo-Celtic names.

    As would be expected, higher accuracy levels occur at the coarser, Origins Group level.

    Overall, levels of accuracy are very high - largely because the vast majority of Australia's population belong to the more easily defined cultural groups. However, there are specific cases where accuracy may be impaired:

    • Cross-cultural marriage - adoption of husband's family name - Although the instances where this diminishes accuracy are relatively low and there is often a counter-balancing flow between the cultures. This diminishes the significance of distortion in profiling and analysis .
    • Transliteration from non-Roman script may produce a name that is common in another part of the world. The family name 'Lee' is a case in point, where it is a common name both in Britain and in China. Usually, the personal name minimises the risk of misallocation but a small number will be assigned to an incorrect code.
    • Third generation migrants - However, the use of an Australian personal name will reduce the confidence level assigned to a person with, for example, an Italian family name. Such trends indicate weakening ties with cultural origin.

    The confidence level score in Origins allows users to define cut-offs to exclude those matches that are below an acceptable threshold for use in targeting communications to individuals based on the most likely cultural origin. This is an effective and flexible way of screening out individuals who are less likely to be in the target group.

    In summary, Origins achieves high accuracy and reliability. The Origins codes reflect the best estimate of a customer's cultural, ethnic, and linguistic background and as such, the CEL codes from Origins serve as a probabilistic indicator.

    Online Demonstration

    You can try out any name combination on the OriginsOnline demonstration page. By providing a first name and a family name you will discover the most likely cultural origin of the name combination. While this search is based on the UK version of the database and does not fully reflect outputs from the Australian database, it will gives an indication of the coverage and accuracy of Origins coding.