Anonymisation, Pseudonymisation and De-identification

We offer various options for clients requiring anonymisation of data. Essentially this is a service provided to academic researchers working to strict project requirements. One of the regularly mentioned benefits is that anonymisation usually results in the transcriptions falling outside the scope of GDPR, as it renders the data contained within the transcript unidentifiable. Pseudonymisation simply extracts the data but retains it for future reference in another location. De-identification is very similar to anonymisation and we have not included an additional option for it below.

Option 1 – Full anonymisation with timestamps

Excludes all names, places and any factors that could identify the respondent from others (this would include any mention of other people and identifiable locations). Timestamps ensure that the client can listen to the audio in the correct spot to identify the name, place etc, if required.


<name HH:MM:SS>                        <place HH:MM:SS>                         <town HH:MM:SS>

<city HH:MM:SS>                             <country HH:MM:SS>                    <company HH:MM:SS>

<school HH:MM:SS>                       <college HH:MM:SS >                     <university HH:MM:SS>

<business HH:MM:SS>                  <job title HH:MM:SS>                    <person HH:MM:SS>    

NB: HH:MM:SS is hours, minutes, seconds.

NB: this standard of anonymisation is comparable with the US HIPAA ‘Safe Harbor’ method of removing 18 identifiers from text for health information relating to an individual. Information about the safe harbor identifiers is below.

Option 2 Full anonymisation without timestamps

Excludes all names, places and any factors that could identify the respondent from others (this would include any mention of other people, holidays, job titles and locations).

<name>               <place>                <town>                <city>                    <country>           <company>

<school>              <college>            <university>       <business>         <job title>           <person>

NB: this level is a mid-range option and is cheaper than option 1 above.

Option 3 Name and place anonymisation only, no timestamps

Removes all names from the transcript only.

<name>               <place>                <town>                <city>                    <country>

NB: we can include the details but place chevrons around the identifying word so that you can search and choose which to keep in/delete. For example <John Smith> and <Birmingham>

Option 4 Name anonymisation only

Removes any mention of the respondent’s name from the transcript only.


Option 5 Pseudonymisation

In place of anonymisation, we can use a pseudonym for names, places and/or identifying factors, supplied by the client. Please note that pseudonymisation is not GDPR exempt. A full explanation about the difference between anonymisation and pseudonymisation is here.


My name’s Maya and I’m originally from Egypt and I moved to Birmingham.

My name’s <Participant 103> and I’m originally from <Country21> and I moved to <City2>

The US HIPAA (Health Insurance Portability and Accountability Act 1996) Safe Harbor rule

The HIPAA sets out a definition of acceptable anonymisation of data in the US and it is a useful check for researchers as to what can or needs to be removed. A summary of the Safe Harbor list is below – we have removed some entries for brevity. The source for the list can be found here:

The following identifiers of the individual or of relatives, employers, or household members of the individual [are to be removed]:

(A) Names

(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code …

(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

(D) Telephone numbers

(L) Vehicle identifiers and serial numbers, including license plate numbers

(E) Fax numbers

(M) Device identifiers and serial numbers

(F) Email addresses

(N) Web URLs

(G) Social security numbers

(O) Internet Protocol (IP) addresses

(H) Medical record numbers

(P) Biometric identifiers, including finger and voice prints

(I) Full-face photographs and any comparable images

(R) Any other unique identifying number, characteristic, or code