Sensitive data definitions
This topic describes the logic of identifying sensitive data during content analysis.
To reduce the number of false positives, identical matches are counted as one match for all groups of the described logical expressions.
Protected Health Information (PHI)
Supported languages
- US, UK, English-International
- Finnish
- Italian
- French
- Polish
- Russian
- Hungarian
- Norwegian
- Spanish
Data considered Protected Health Information
The following data is considered protected health information.
- First names and last names
- Address (street, city, county, precinct, zip code, and their equivalent geocodes)
- Phone numbers
- Email addresses
- Social security numbers
- Health plan beneficiary numbers
- Bank account numbers
- URLs
- IP address numbers
-
ICD-10-CM codes
-
ICD-10-PCS-and-GEMs
- HIPAA
- Other health-care related
- Credit card numbers
Logical expression used for content detection
The logical expression consists of the following strings that are joined by the logical operator OR. The OR operator is used to join different data groups in the list above if the AND logical operator is not specified explicitly. The numbers in brackets represent the number of detected instances that would return a positive detection result.
- Social Security Numbers (5)
- (First names and Last names (3) OR Address (3) OR Phone Numbers (3) OR Email Address (3) OR Bank Account Numbers (3) OR Credit Card Numbers (3)) AND (Social security numbers (3) OR Health plan beneficiary numbers (3) * OR ICD-10-CM codes (3) OR ICD-10-PCS-and-GEMs (3) OR HIPAA (3) OR * Other Health-care related (3))
Personally Identifiable Information (PII)
Supported languages
- US, UK, English-International
- Bulgarian
- Chinese
- Czech
- Danish
- Dutch
- Finnish
- French
- German
- Hungarian
- Indonesian
- Italian
- Korean
- Malay
- Norwegian
- Polish
- Portuguese (Brazil)
- Portuguese (Portugal)
- Romanian
- Russian
- Serbian
- Singapore
- Spanish
- Swedish
- Taiwan
- Turkish
- Thai
- Japanese
Data considered Personally Identifiable Information (PII)
- First names and last names
- Address (street, city, county, zip code)
- Bank account numbers
- Personal and fiscal ID numbers
- Passport numbers
- Social security numbers
- Phone numbers
- Car plate numbers
- Driving license numbers
- Identifiers and serial numbers
- IP addresses
- Email addresses
- Credit card numbers
Logical expression used for content detection
Logical expression for all supported languages except Japanese
The logical expression consists of the following strings joined by the logical operator OR or AND. The numbers in brackets represent the number of detected instances that would return a positive detection result.
- Personal and fiscal ID numbers (5)
- First names and Last names (3) AND (Credit Card Number (3) OR Social Security Number (3) OR Bank Account Number (3) OR Personal and fiscal ID numbers (3) OR Driving license numbers (3) OR Passport Numbers (3) OR Social security numbers (3) OR IP Addresses (3) OR Car plate numbers (3) OR Identifiers and serial numbers)
- Phone Numbers (3) AND (Credit Card Number (3) OR Social Security Number (3) OR Bank Account Number (3) OR Address (3) OR Personal and fiscal ID numbers (3) OR Driving license numbers (3) OR Passport Numbers (3) OR Social security numbers (3) OR Car plate numbers (3) OR Identifiers and serial numbers (3))
- (First names and Last names (30) OR Address (30)) AND (Email Addresses (30) OR Phone Numbers (30) OR IP Addresses (30))
- Email Addresses (3) AND (Credit Card Number (3) OR Social Security Number (3) OR Bank Account Number (3) OR Personal and fiscal ID numbers (3) OR Driving license numbers (3) OR Passport Numbers (3) OR Social security numbers (3) OR Car plate numbers (3) OR Identifiers and serial numbers (3))
- Email Address (30) AND (Address (30) OR Phone Numbers (30))
- First names and Last names (30) AND Address (30)
- Phone Numbers (30) AND Address (30)
- First names and Last names (3) AND Bank Account Numbers (3)
- Phone Numbers (3) AND (Credit Card Number (3) OR Bank Account Number (3) OR Social security numbers (3) OR Personal and fiscal ID numbers (3) OR Driving license numbers (3) OR Passport Numbers (3))
Logical expression for Japanese
The logical expression consists of the following strings joined by the logical operator OR. The operator OR is used to join different groups if logical operator AND is not explicitly specified.
- Social security numbers (5)
- First names and Last names (3) AND (Credit Card Number (3) OR Bank Account Number (3) OR Driving license numbers (3) OR Passport Numbers (3) OR Social security numbers (3))
- First names and Last names (30) AND (Email Addresses (30) OR Phone Numbers (30) OR IP Addresses (30) OR Address (30))
- Address (3) AND (Credit Card Number (3) OR Bank Account Number (3) OR Driving license numbers (3) OR Passport Numbers (3) OR Social security numbers (3))
- Email Address (3) AND (Credit Card Number (3) OR Bank Account Number (3) OR Social security numbers (3) OR Driving license numbers (3))
- Address (5) AND (Email Address (5) OR First names and Last names (5) OR Phone Numbers (5) OR IP Addresses (5))
- First names and Last names (3) AND Bank Account Numbers (3)
- Phone Numbers (3) AND (Credit Card Number (3) OR Bank Account Number (3) OR Address (3) OR Social security numbers (3) OR Driving license numbers (3))
Payment Card Industry Data Security Standard (PCI DSS)
Supported languages
This sensitivity group is language - independent. Тhe PCI DSS data is in English in all countries.
Data considered PCI DSS
- Cardholder data
Primary Account Number (PAN)
Cardholder Name
Expiration date
Service code
-
Sensitive Authentication Data
-
Full track data (magnetic-stripe data or equivalent on a chip)
-
CAV2/CVC2/CVV2/CID
-
PINs/PIN blocks
-
Logical expression used for content detection
The logical expression consists of the following strings joined by the logical operator OR. The numbers in brackets represent the number of detected instances that would return a positive detection result.
- Credit Card Number (5)
- Credit Card Number (3) AND (American Name (Ex) (3) OR American Name (3) OR PCI DSS Keywords (3) OR Date (month/year) (3))
- Credit Card Dump (5)
Marked as Confidential
Data marked as confidential is detected through keywords group.
The Match condition is weight-based, and every word has weight == 1. The content detection is considered positive when Match if weight > 3.
Supported languages
- English
- Bulgarian
- Chinese Simplified
- Chinese Traditional
- Czech
- Danish
- Dutch
- Finnish
- French
- German
- Hungarian
- Indonesian
- Italian
- Japanese
- Korean
- Malay
- Norwegian
- Polish
- Portuguese - Brazil
- Portuguese - Portugal
- Russian
- Serbian
- Spanish
- Swedish
- Turkish
Keyword groups
The keyword group for each language contains the country-specific equivalents of the following keywords that are used for the English language (case-insensitive).
- confidential
- internal distribution
- not for distribution
- do not distribute
- not for public
- not for external distribution
- for internal use only
- highly qualified documentation
- private
- privileged information
- for internal use only
- for official use only