Tag-based Data Masking in Snowflake

Spread the love

1. Introduction

In our previous articles we have discussed about tagging and data masking features in Snowflake. The tagging feature helps in categorizing and grouping database objects whereas the data masking feature helps in masking sensitive data in real-time. The best of the both features could be combined in Snowflake, referred to as Tag-based Data Masking which simplifies the data protection process in an Organization.

Consider there are multiple columns spread across multiple tables whose data to be masked. Instead of applying a masking policy on each column individually, all these sensitive columns can be grouped under a tag and a masking policy can be applied directly on the tag. This is referred to as Tag-based data masking in Snowflake.

There are also several restrictions within which the data masking applies on a column when masking policy is set on a tag which we will discuss in the further sections of the article.

2. Benefits of Tag-based Data Masking

Below are the benefits of Tag-based Data Masking

  • Assigning and managing masking policies set on tags is simpler compared to those set on individual columns.
  • Multiple masking policies (one for each data type) can be assigned to a tag which protects columns spread across multiple tables.
  • Assigning a masking policy on a tag automatically protects the columns which will be assigned to the same tag in the future.

3. Steps to perform Tag-based Data Masking in Snowflake

Follow below steps to perform Tag-based Data Masking in Snowflake.

  • Create a custom role with Tagging Privileges
  • Create a custom role with Masking Privileges
  • Assign the Masking and Tagging custom roles to existing Role
  • Create and apply Tags on columns of a table
  • Create a Masking Policy
  • Assign a Masking Policy to a tag
  • Verify the masking rules by querying data

3.1. Create a custom role with Tagging Privileges

The below SQL statement creates a custom role TAG_ADMIN in Snowflake.

USE ROLE USERADMIN;
CREATE OR REPLACE ROLE TAG_ADMIN;

The below SQL statement grants privileges to create and apply tags on database objects to the role TAG_ADMIN.

USE ROLE ACCOUNTADMIN;
GRANT CREATE TAG ON SCHEMA DEMO_DB.PUBLIC TO ROLE TAG_ADMIN;
GRANT APPLY TAG ON ACCOUNT TO ROLE TAG_ADMIN;

3.2. Create a custom role with Masking Privileges

The below SQL statement creates a custom role MASKING_ADMIN in Snowflake.

USE ROLE USERADMIN;
CREATE OR REPLACE ROLE MASKING_ADMIN;

The below SQL statement grants privileges to create and apply masking policies to the role MASKING_ADMIN.

USE ROLE ACCOUNTADMIN;
GRANT CREATE MASKING POLICY ON SCHEMA DEMO_DB.PUBLIC TO ROLE MASKING_ADMIN;
GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE MASKING_ADMIN;

3.3. Assign the Masking and Tagging custom roles to existing Role

By default, the custom roles created do not have access to any database nor warehouse. These role needs to be assigned to another Custom Role or a User who have privileges to access a database and warehouse.

The below SQL statements assigns TAG_ADMIN and MASKING_ADMIN roles to another custom role named DATA_ENGINEER which has access to tables in DEMO_DB database and PUBLIC schema (example used for demonstration purpose)

USE ROLE USERADMIN;
GRANT ROLE TAG_ADMIN TO ROLE DATA_ENGINEER;
GRANT ROLE MASKING_ADMIN TO ROLE DATA_ENGINEER;

3.4. Create and apply Tags on columns of a table

The below SQL statement creates a tag named PII_DATA with Names, Address, Dates and Sensitive as allowed values.

USE ROLE DATA_ENGINEER;
CREATE OR REPLACE TAG DEMO_DB.PUBLIC.PII_DATA
      ALLOWED_VALUES 'Names','Address','Dates','Sensitive';

The below SQL statement assigns the tag PII_DATA at table level on table named LOCATIONS with tag value as Address.

ALTER TABLE LOCATIONS SET TAG PII_DATA = 'Address';

When you set a tag on a table, the tag is set on all columns in the table.

The below SQL statement assigns the tag PII_DATA at column level on multiple columns with different tag values on the table named EMPLOYEES.

ALTER TABLE EMPLOYEES
MODIFY COLUMN FIRST_NAME SET TAG PII_DATA = 'Names',
MODIFY COLUMN LAST_NAME SET TAG PII_DATA = 'Names',
MODIFY COLUMN HIRE_DATE SET TAG PII_DATA = 'Dates',
MODIFY COLUMN SALARY SET TAG PII_DATA = 'Sensitive';

3.5. Create a Masking Policy

Create a masking policy that masks the data of a column when queried from a role other than DATA_ENGINEER.

The below SQL statement creates a masking policy STRING_DATA_MASK that can mask data of columns of type string when queried from a role other than DATA_ENGINEER.

USE ROLE DATA_ENGINEER;

CREATE OR REPLACE MASKING POLICY STRING_DATA_MASK AS (VAL STRING) RETURNS STRING ->
  CASE
    WHEN CURRENT_ROLE() NOT IN ('DATA_ENGINEER') THEN '***MASKED***'
    ELSE VAL
  END;

3.6. Assign Masking Policy to a tag

A masking policy can be assigned to a tag using ALTER TAG statement.

The below SQL statement assigns the masking policy STRING_DATA_MASK on a tag named PII_DATA.

USE ROLE DATA_ENGINEER;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA SET
      MASKING POLICY DEMO_DB.PUBLIC.STRING_DATA_MASK;

3.7. Verify the masking rules by querying data

Verify the data present in EMPLOYEES and LOCATIONS table by querying them using SYSADMIN ROLE.

USE ROLE SYSADMIN;
SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME, HIRE_DATE, SALARY FROM EMPLOYEES;
Employees table showing only fields of type string are masked by STRING_DATA_MASK policy
Employees table showing only fields of type string are masked by STRING_DATA_MASK policy
USE ROLE SYSADMIN;
SELECT * FROM LOCATIONS;
Locations table showing only fields of type string are masked by STRING_DATA_MASK policy
Locations table showing only fields of type string are masked by STRING_DATA_MASK policy

A tagged column data is protected by masking policy only when the data type in the masking policy signature and the data type of the column match.

The output from both tables indicate that, though the masking policy is applied on the tag which is assigned to multiple columns in both the tables, columns with data type string are only masked.

4. Assigning multiple masking policies to a tag

When a tag is applied on multiple columns of different data types, multiple masking policies each with a different data type signature should be applied on the tag to mask all columns.

The below SQL statements creates two masking policies NUMBER_DATA_MASK and DATE_DATA_MASK which masks the columns of data type number and date respectively when queried from a role other than DATA_ENGINEER.

USE ROLE DATA_ENGINEER;

CREATE OR REPLACE MASKING POLICY NUMBER_DATA_MASK AS (VAL NUMBER) RETURNS NUMBER ->
  CASE
    WHEN CURRENT_ROLE() NOT IN ('DATA_ENGINEER') THEN '0'
    ELSE VAL
  END;

CREATE OR REPLACE MASKING POLICY DATE_DATA_MASK AS (VAL DATE) RETURNS DATE ->
  CASE
    WHEN CURRENT_ROLE() NOT IN ('DATA_ENGINEER') THEN '1900-01-01'
    ELSE VAL
  END;

The below SQL statement assigns the masking policies NUMBER _DATA_MASK and DATE_DATA_MASK on a tag named PII_DATA.

USE ROLE DATA_ENGINEER;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA SET
      MASKING POLICY DEMO_DB.PUBLIC.NUMBER_DATA_MASK;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA SET
      MASKING POLICY DEMO_DB.PUBLIC.DATE_DATA_MASK;

Now when you verify the data present in EMPLOYEES and LOCATIONS table, all the columns that are tagged using PII_DATA should be masked when queried using SYSADMIN ROLE.

Employees table showing all fields assigned to PII_DATA tag are masked
Employees table showing all fields assigned to PII_DATA tag are masked
Locations table showing all fields assigned to PII_DATA tag are masked
Locations table showing all fields assigned to PII_DATA tag are masked

5. How to Remove (Unset) Masking Policy from a tag?

A masking policy can be removed from a tag using ALTER TAG statement.

The below SQL statements removes the masking policies on a tag PII_DATA.

USE ROLE DATA_ENGINEER;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA UNSET
      MASKING POLICY DEMO_DB.PUBLIC.STRING_DATA_MASK;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA UNSET
      MASKING POLICY DEMO_DB.PUBLIC.NUMBER_DATA_MASK;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA UNSET
      MASKING POLICY DEMO_DB.PUBLIC.DATE_DATA_MASK;

6. How to create and assign a Masking Policy based on a Tag Value?

In the examples discussed in previous sections of the article, we have seen assigning the masking policies on a tag. This applies the masking policies on all the columns covered by the tag. Instead a masking policy can also be applied on a tag based on tag value.

When a masking policy is applied on a tag based on a tag value, only the columns with the assigned tag value are masked.

The below SQL statement creates a masking policy NAMES_MASK that masks data of columns assigned to tag PII_DATA with tag value as Names when queried from role other than DATA_ENGINEER.

USE ROLE DATA_ENGINEER;

CREATE OR REPLACE MASKING POLICY NAMES_MASK AS (VAL STRING) RETURNS STRING ->
  CASE
    WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DEMO_DB.PUBLIC.PII_DATA') = 'Names'
    AND CURRENT_ROLE() NOT IN ('DATA_ENGINEER')
    THEN '***MASKED***'
    ELSE VAL
  END;

The below SQL statement assigns the masking policy NAMES_MASK on a tag named PII_DATA.

USE ROLE DATA_ENGINEER;

ALTER TAG DEMO_DB.PUBLIC.PII_DATA SET
      MASKING POLICY DEMO_DB.PUBLIC.NAMES_MASK;

Now when you verify the data present in EMPLOYEES and LOCATIONS table, the columns in LOCATIONS table are not masked when queried using SYSADMIN ROLE though they are of data type string as they are tagged using a different tag value.

Employees table showing all fields assigned to PII_DATA tag with tag value Names are masked
Employees table showing all fields assigned to PII_DATA tag with tag value Names are masked
Locations table showing none of the fields assigned to PII_DATA tag are masked as their tag value is not Names
Locations table showing none of the fields assigned to PII_DATA tag are masked as their tag value is not Names

7. Assigning multiple masking policies with same data type signature

When a masking policy with a specific data type signature is assigned to a tag, another masking policy with same data type signature cannot be assigned to the tag.

A tag can support only one masking policy for each data type.

For example, in the section-6 of the article, we have assigned the masking policy NAMES_MASK to the tag PII_DATA with data type signature as string. Let us create and apply a new masking policy with same data type signature on the same tag.

The below SQL statement creates a masking policy ADDRESS_MASK that masks data of columns assigned to tag PII_DATA with tag value as Address when queried from role other than DATA_ENGINEER.

USE ROLE DATA_ENGINEER;

CREATE OR REPLACE MASKING POLICY ADDRESS_MASK AS (VAL STRING) RETURNS STRING ->
  CASE
    WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('DEMO_DB.PUBLIC.PII_DATA') = 'Address' THEN '***MASKED***'
    ELSE VAL
  END;

The below SQL statement tries to assigns masking policy ADDRESS_MASK on a tag named PII_DATA but returns error as another masking policy with same data type signature is already assigned to the tag.

Error message showing that two policies with same data type cannot be assigned to a tag
Error message showing that two policies with same data type cannot be assigned to a tag

8. Overriding the Tag-based Data Masking

The data of a column can be protected either by assigning the masking policy directly to the column or to the tag assigned to the column. If a column references both of these masking policies, the masking policy that is directly assigned to the column takes precedence over the tag-based masking policy.

Masking policy directly assigned to the column takes precedence over the tag-based masking policy.

For example, in the section-6 of the article, we have assigned the masking policy NAMES_MASK to the tag PII_DATA which masks FIRST_NAME and LAST_NAME fields of EMPLOYEES table.

Let us create apply an additional masking policy directly on the field LAST_NAME which masks the field with a different masked value.

The below SQL statements creates and assigns masking policy LNAME_MASK on LAST_NAME column when queried from role other than DATA_ENGINEER.

USE ROLE DATA_ENGINEER;

CREATE OR REPLACE MASKING POLICY LNAME_MASK AS (VAL STRING) RETURNS STRING ->
  CASE
    WHEN CURRENT_ROLE() NOT IN ('DATA_ENGINEER') THEN '***LASTNAME MASKED***'
    ELSE VAL
  END;

ALTER TABLE EMPLOYEES MODIFY LAST_NAME SET MASKING POLICY LNAME_MASK;

The below image shows that the masked value of LAST_NAME applied by NAMES_MASK masking policy is overridden by the value applied by LNAME_MASK masking policy.

Employees table showing Column level masking overriding Tag-based masking
Employees table showing Column level masking overriding Tag-based masking

9. Limitations of Tag-based Data Masking

Below are the limitations of tag-based data masking apart from the existing limitations of masking policies.

  • A tag can support only one masking policy for each data type.
  • A masking policy cannot be assigned to a system tag.
  • Neither the masking policy nor the tag can be dropped if the masking policy is assigned to a tag.
  • A materialized view cannot be created if the underlying table is protected by a tag-based masking policy.
  • A column cannot be specified both in a masking policy signature and a row access policy signature at the same time.

Subscribe to our Newsletter !!

Related Articles:

  • Search Optimization Service in Snowflake

    Search Optimization Service is a query optimization service that improves query performance of selective point lookup queries.

    READ MORE

  • Snowflake SnowPro Core Certification Preparation Guide

    A comprehensive list of all resources that help you in your Snowflake SnowPro Core Certification examination preparation.

    READ MORE

  • Tagging in Snowflake

    Tags in Snowflake can be applied on the database objects which lets you discover, monitor and protect sensitive data.

    READ MORE

Leave a Comment

Related Posts