February 15, 2023
7 Steps to Effective Data Classification
Follow these steps to make sure you stay in compliance with the major data privacy laws.
Overview
In today’s security landscape data protection is not just a legal necessity, it’s critical to organizational survival and profitability.
Storage is cheap, and organizations have become data hoarders. One day — they think — they’ll get around to mining all that data for something useful. But data hoarding can cause serious issues. Much of what is collected may be redundant, obsolete, trivial (ROT) or unknown (dark), and hasn’t been touched in years.
Storage may be cheap, but it’s not free. Storing massive amounts of data unnecessarily increases costs and, more importantly, it puts your organization at risk.
Sensitive information that is stored digitally — including intellectual property, personally identifying information about customers or employees such as social security numbers, protected health information (PHI), and/or financial account information and credit card details — needs to be properly secured.
Turn the Lights On
In order to protect data and comply with data protection and privacy requirements such as the European Union General Data Protection Regulation (GDPR), you need visibility into the data you’re collecting and storing in order to determine what’s important and what isn’t. Identify where sensitive data resides, set policies for handling it, implement appropriate technical controls, and educate users about current threats to the data they work with, and best practices for keeping it safe.
But this is no easy task. Every organization is different, and there is no one-size-fits-all data protection strategy.
The Role of Data Classification
In order to ensure effective security, you first need to establish exactly what you’re trying to protect.
Data classification is a critical step. It allows organizations to identify the business value of unstructured data at the time of creation, separate valuable information that may be targeted from less valuable information, and make informed decisions about resource allocation to secure data from unauthorized access.
Information is divided into predefined groups that share a common risk, and the corresponding security controls required to secure each group type are identified. Classification tools can be used to improve the treatment and handling of sensitive data, and promote a culture of security that increases awareness of data sensitivity, and prevents the storing of sensitive content on removable media or third-party web portals. Just as products with warning labels in eye-catching colors can change our behavior by making us aware of hazards that can lead to injury, visual labels and watermarks such as “Confidential” can remind users to think twice and behave more cautiously with digital data and physical copies.
Successful data classification drives the security controls applied to a particular set of data. It can help organizations meet regulatory requirements — such as those within the GDPR — for retrieving specific information within a set timeframe.
7 Steps to Effective Data Classification
While data classification is the foundation of any effort to ensure sensitive data is handled appropriately, many organizations fail to set the right expectations and approach. This leads to implementations that become overly complex and fail to produce practical results.
There are 7 steps to effective data classification:
1. Complete a risk assessment of sensitive data.
Ensure a clear understanding of the organization’s regulatory and contractual privacy and confidentiality requirements. Define your data classification objectives through an interview-based approach that involves key stakeholders, including compliance, legal and business unit leaders.
2. Develop a formalized classification policy.
Resist the urge to get too granular, as granular classification schemes tend to cause confusion and become unmanageable. Three to four classification categories are reasonable. Solidify employee roles and responsibilities. Policies and procedures should be well-defined, aligned with the sensitivity of specific data types, and easily interpreted by employees.
Below is a sample data classification scheme:
Public
Data that may be freely disclosed to the public
- Marketing Materials
- Contact Information
- Price Lists
- etc.
Internal Only
Internal data not meant for public discolsure
- Battlecards
- Sales Playbooks
- Organizational Charts
- etc.
Confidential
Sensitive Data that if compromised could negatively affect operations
- Marketing Materials
- Contact Information
- Price Lists
- etc.
Restricted
Highl sensitive corporate data that if compromised could put the organization financial or legal risk
- IP
- Credit Card Information
- Social Security Numbers
- PHI
Each category should detail the types of data included, along with guidelines for handling the data, and the potential risks associated with compromise.
It may be worth classifying the top (most sensitive) category with sub-categories to indicate regulatory relevance or alternate access control models that may be required. Here are some examples of sub-categories that can be added for clarity:
- PCI (Cardholder) data
- HIPAA-relevant
- GDPR-relevant
- Unpublished financial data
Once your policy has been completed and communicated, end-users should classify all newly created and recently accessed data from that day forward, before turning their attention to legacy data at rest.
3. Categorize the types of data.
Determining what types of sensitive data exist within your organization can present challenges. It is an effort that should be organized around business processes and driven by process owners. Consider each business process—tracking the flow of data provides insight into what data needs to be protected and how it should be protected.
Consider the following questions:
- What customer and partner data does your organization collect?
- What data do you create about them?
- What proprietary data do you create?
- What transactional data do you deal with?
- Of all the collected and created data, what is confidential?
4. Discover the location of your data.
After establishing the types of data in your organization, it’s important to catalog all the places data is stored electronically. The flow of data into and out of the organization is a key consideration. How does your organization store and share data internally and externally? Do you use cloud-based services such as Dropbox, Box, OneDrive, etc.? What about mobile devices?
Data discovery tools can help generate an inventory of unstructured data and help you understand exactly where your company’s data is stored, regardless of the format or location. These tools also help address difficulties around identifying data owners by providing insights about users who are handling data. In your discovery efforts, you can incorporate keywords or specific types or formats of data, such as medical record numbers, social security numbers, or credit card numbers.
5. Identify and classify data.
Only after you know where your data is stored can you identify and then classify it so that it’s appropriately protected. Consider the penalties associated with a loss or breach. For example, what fines can be levied per record for a HIPAA breach involving protected health information? Insight into the potential costs associated with the compromise of a data set will enable you to set expectations for the cost to protect it and which classification level to set
Commercial classification tools support data classification initiatives by facilitating the determination of appropriate classifications and then applying the classification label either to the metadata of the item or as a watermark.
Robust classification systems offer user-driven, system-suggested and automated capabilities:
- Provision of a menu of tailored data classification options.
- Detection of content within a data item followed by the offering of classification options for selection by the user.
- Automation through which the system selects the appropriate classification based on analysis engines with limited (if any) user input.
6. Enable controls.
Establish baseline cybersecurity measures and define policy-based controls for each data classification label to ensure the appropriate solutions are in place. High-risk data requires more advanced levels of protection while lower-risk data requires less protection. By understanding where data resides and the organizational value of the data, you can implement appropriate security controls based on associated risks. Classification metadata can be used by data loss prevention (DLP), encryption and other security solutions to determine what information is sensitive and how it should be protected.
7. Monitor and maintain.
Be prepared to monitor and maintain the organization’s data classification system, making updates as necessary. Classification policies should be dynamic. You need to establish a process for review and update that involves users to encourage adoption and ensure your approach continues to meet the changing needs of the business.
Be Selective
Full data classification is an expensive and cumbersome activity that few companies are equipped to handle. A good retention policy can help whittle down data sets and facilitate your efforts. Start by selecting specific types of data to classify in line with your confidentiality requirements, adding more security for increasingly confidential data.
All Data Is Not Created Equal
From the time information is created until it is destroyed, data classification can help your organization ensure it is effectively protected, stored and managed. Putting data classification at the heart of your data protection strategy allows you to reduce risks to sensitive data, enhance decision-making and increase the effectiveness of DLP, encryption and other security controls.
By creating a straightforward classification scheme, comprehensively assessing and locating data and implementing the right solutions, your organization can ensure that sensitive data is handled appropriately and reduce threats to your business.