Tagging in Snowflake

Spread the love

1. Introduction

Tags serve a general purpose of categorizing and organizing items into specific categories or groups, making it easier to navigate through the content. Tags can also be used as metadata to provide additional information about an item or data. This additional context helps in filtering and understanding the content without having to go through the entire content.

Tags serve a similar purpose in Snowflake. Tags can be applied on the database objects which lets you discover, monitor and protect sensitive data either through a centralized or decentralized data governance management approach.

In this article let us discuss in detail how tagging in Snowflake works.

2. What is a Tag?

Tag is a schema-level object in Snowflake that can be assigned to another Snowflake object. A tag can be assigned a string value while assigning the tag to a Snowflake object. Snowflake stores the tag and its string value as a key-value pair. The tag must be unique for your schema, and the tag value is always a string.

As tags can be applied on various levels of database objects like tables, views or columns, it becomes easier to identify sensitive information present across the account. This helps in restricting the access to sensitive data using the Row Access Policies, or using Masking Policies in conjunction with tags.

3. What Grants are required to create and assign tags?

Follow below steps to set up grants to create and assign tags

  • Step-1: Create a Custom Role
  • Step-2: Assign Tagging Privileges to custom role
  • Step-3: Assign the custom role to existing Role/User

Step-1: Create a Custom Role

The below SQL statement creates a custom role TAG_ADMIN in Snowflake.

USE ROLE USERADMIN;
CREATE OR REPLACE ROLE TAG_ADMIN;

In a centralized tagging approach, TAG_ADMIN role is only used for creating and assigning tags to Snowflake objects

Step-2: Assign Tagging Privileges to custom role

The below SQL statement grants privileges to create tags to the role TAG_ADMIN.

USE ROLE ACCOUNTADMIN;
GRANT CREATE TAG ON SCHEMA my_db.my_schema TO ROLE TAG_ADMIN;

The below SQL statement grants privileges to apply tags on Snowflake objects to the role TAG_ADMIN.

GRANT APPLY TAG ON ACCOUNT TO ROLE TAG_ADMIN;

Step-3: Assign custom role to an existing Role/User

The TAG_ADMIN role by default will not have access to any database nor warehouse. The role needs to be assigned to another Custom Role or a User who have privileges to access a database and warehouse.

The below SQL statement assigns the TAG_ADMIN role to another custom role named DATAENGINEER.

USE ROLE USERADMIN;
GRANT ROLE TAG_ADMIN TO ROLE DATAENGINEER;

This allows all users with DATAENGINEER role to inherit tagging privileges. Instead if you want to limit the privileges, assign the role to individual users.

The below SQL statement assigns TAG_ADMIN to a user named TONY.

USE ROLE USERADMIN;
GRANT ROLE TAG_ADMIN TO USER TONY;

4. How to Create Tags in Snowflake?

Tags in Snowflake can be created using CREATE TAG statement as shown below.

CREATE OR REPLACE TAG SENSITIVE_DATA;
CREATE OR REPLACE TAG STAFF_DATA;

Tags can also be created with a list of allowed values.

The below example shows creating a tag named PII_DATA with Names, Contact Details and Address as allowed values.

CREATE OR REPLACE TAG PII_DATA
      ALLOWED_VALUES 'Names', 'Contact Details','Address';

5. How to find the allowed values for a Tag?

To find the list of allowed string values for a given tag, below functions can be used.

  • GET_DDL
  • SYSTEM$GET_TAG_ALLOWED_VALUES
SELECT GET_DDL('tag','demo_db.public.PII_DATA');
GET_DDL
SELECT SYSTEM$GET_TAG_ALLOWED_VALUES('demo_db.public.PII_DATA');
SYSTEM$GET_TAG_ALLOWED_VALUES

6. How to Assign Tags in Snowflake?

Tags can be assigned at various levels such as at Database level or Schema level or table or column level using ALTER statement.

6.1. Assigning Tags at Database level

When a tag is assigned at a Database level, all the supported objects in the database will inherit the tag.

The below example applies the tag SENSITIVE_DATA with tag value as confidential on the database named DEMO_DB.

ALTER DATABASE demo_db SET TAG SENSITIVE_DATA='confidential';

6.2. Assigning Tags at Schema level

When a tag is assigned at a schema level, all the supported objects in the schema will inherit the tag.

The below example applies the tag STAFF_DATA with tag value as HR on the schema named PUBLIC.

ALTER SCHEMA demo_db.public SET TAG STAFF_DATA='HR';

6.3. Assigning Tags at Table level

When a tag is assigned at a table level, all the columns the table will inherit the tag.

The below example assigns the tag PII_DATA with tag value as Address on the table named LOCATIONS.

ALTER TABLE demo_db.public.locations SET TAG PII_DATA = 'Address';

6.4. Assigning Tags at Column level

To set a tag on an existing column, use the ALTER TABLE … MODIFY COLUMN command for a table column

The below example assigns the tag PII_DATA on multiple columns with different tag values on the table named EMPLOYEES.

ALTER TABLE employees MODIFY COLUMN first_name SET TAG PII_DATA = 'Names';
ALTER TABLE employees MODIFY COLUMN last_name SET TAG PII_DATA = 'Names';

ALTER TABLE employees MODIFY COLUMN email SET TAG PII_DATA = 'Contact Details';
ALTER TABLE employees MODIFY COLUMN phone_number SET TAG PII_DATA = 'Contact Details';

It is recommended to apply tags at the more granular level possible for the requirement. Applying same tag on all objects defeats the whole purpose of tags.

7. How to Monitor Tags in Snowflake?

Snowflake provides below options to identify tags and their values assigned on Snowflake objects.

7.1. TAG_REFERENCES_ALL_COLUMNS Table function

The Information Schema table function TAG_REFERENCES_ALL_COLUMNS helps in obtaining all the tags that are set on every column in a given table or view.

The below example helps in identifying all the tags assigned on the columns of the table EMPLOYEES.

SELECT * FROM TABLE(
     INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS('demo_db.public.employees', 'table')
     );
TAG_REFERENCES_ALL_COLUMNS

The above output shows the tags and their respective values assigned at DATABASE, SCHEMA and COLUMN levels on the columns of employees table.

7.2. TAGS View

The TAGS view in the Account Usage schema of Snowflake database provides information of all the tags in your Snowflake account including the deleted tags.

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAGS
ORDER BY TAG_NAME;
SNOWFLAKE.ACCOUNT_USAGE.TAGS

7.3. TAG_REFERENCES View

The TAG_REFERENCES view in the Account Usage schema of Snowflake database provides information of all the objects in your Snowflake account that are assigned with a tag and a tag value.

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES
ORDER BY DOMAIN, COLUMN_ID;
SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES

The above output shows all the tags assigned at DATABASE, SCHEMA, TABLE and COLUMN levels at the account level.

7.4. SYSTEM$GET_TAG Function

The system function SYSTEM$GET_TAG returns the tag value associated with the specified Object or Column.

The below example shows the value of the tag PII_DATA associated with the table LOCATIONS.

SELECT SYSTEM$GET_TAG('PII_DATA', 'demo_db.public.locations', 'table');
SYSTEM$GET_TAG

The below example shows the value of the tag PII_DATA associated with the column named FIRST_NAME present in EMPLOYEES table.

SELECT SYSTEM$GET_TAG('PII_DATA', 'demo_db.public.employees.first_name', 'column');
SYSTEM$GET_TAG

The system function SYSTEM$GET_TAG returns NULL if a tag is not set on the specified Snowflake object or column.

The below statement returns a NULL value as output as the tag PII_DATA is not assigned to the table EMPLOYEES at TABLE level.

SELECT SYSTEM$GET_TAG('PII_DATA', 'demo_db.public.employees', 'table');
SYSTEM$GET_TAG

8. What are System Tags and their Categories?

The tags that are created and maintained by Snowflake are called System Tags. Below are the two different System tags provided by Snowflake which exist in SNOWFLAKE.CORE schema.

  • SEMANTIC_CATEGORY
  • PRIVACY_CATEGORY

Snowflake defines allowed values for the system tags and they can be found using the below statements.

SELECT SYSTEM$GET_TAG_ALLOWED_VALUES('snowflake.core.semantic_category');
SELECT SYSTEM$GET_TAG_ALLOWED_VALUES('snowflake.core.privacy_category');

8.1. SEMANTIC_CATEGORY

The SEMANTIC_CATEGORY tag is used to identify the personal attributes of an individual like name, age, gender, salary etc. The allowed values of the tag is extensive and covers a lot of personal attributes.

8.2. PRIVACY_CATEGORY

The allowed values of the PRIVACY_CATEGORY tag is limited and this helps in further classifying a column identified for Semantic category. The allowed values of the privacy category are – Identifier, Quasi-Identifier, Sensitive and Insensitive.

Identifier: These attributes uniquely identify an individual. Example attributes are name, social security number, and phone number.

Quasi-identifier: These attributes can uniquely identify an individual when two or more of these attributes are in combination. Example attributes are age and occupation.

Sensitive: These attributes are not considered enough to identify an individual but these are the details that a person would prefer to keep private. Example attribute is Salary.

Insensitive: These attributes are not considered enough to identify an individual and are of less significance.

9. How to identify sensitive columns and apply System Tags?

Snowflake provides built-in function and stored procedure to identify the sensitive columns which qualify for the system tag categories and apply them on the columns of the table.

9.1. EXTRACT_SEMANTIC_CATEGORIES Function

The EXTRACT_SEMANTIC_CATEGORIES built-in function returns details of supported tag categories (semantic and privacy) for each column of the specified table or view in a JSON format.

  • The function also provides the probability number indicating how likely the suggested tag category is correct.
  • The maximum probability number is 1.00 and the minimum threshold is 0.15.
  • If the probability is less than 0.15, then no values are returned for the column.
  • If the probability number is less than 0.8 then the suggested tag category are listed as an alternate.

The below example shows the suggested tags for the columns of employees table.

SELECT EXTRACT_SEMANTIC_CATEGORIES('demo_db.public.employees');
{
  "EMPLOYEE_ID": {
    "extra_info": {
      "alternates": []
    }
  },
  "FIRST_NAME": {
    "extra_info": {
      "alternates": [],
      "probability": "1.00"
    },
    "privacy_category": "IDENTIFIER",
    "semantic_category": "NAME"
  },
  "LAST_NAME": {
    "extra_info": {
      "alternates": [],
      "probability": "1.00"
    },
    "privacy_category": "IDENTIFIER",
    "semantic_category": "NAME"
  },
  "PHONE_NUMBER": {
    "extra_info": {
      "alternates": [
        {
          "privacy_category": "IDENTIFIER",
          "probability": "0.49",
          "semantic_category": "PHONE_NUMBER"
        }
      ]
    }
  },
  "SALARY": {
    "extra_info": {
      "alternates": [],
      "probability": "1.00"
    },
    "privacy_category": "SENSITIVE",
    "semantic_category": "SALARY"
  }
}

9.2. ASSOCIATE_SEMANTIC_CATEGORY_TAGS Stored Procedure

The ASSOCIATE_SEMANTIC_CATEGORY_TAGS Stored Procedure takes the results of the EXTRACT_SEMANTIC_CATEGORIES function on a table/view and apply tags on supported columns in the table/view.

The below stored procedure call extracts the semantic and privacy category tags for the employees table and apply the categories as tags for the table.

CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS(
'demo_db.public.employees',EXTRACT_SEMANTIC_CATEGORIES('demo_db.public.employees'));
ASSOCIATE_SEMANTIC_CATEGORY_TAGS

Subscribe to our Newsletter !!

Related Articles:

  • Snowflake User-Defined Functions (UDFs)

    UDF is a reusable component defined by user to perform a specific task which can be called from a SQL statement.

    READ MORE

  • Caching in Snowflake

    Learn how Snowflake caching helps in improving the query performance by making data retrieval faster and more efficient.

    READ MORE

  • Query Acceleration Service in Snowflake

    Query Acceleration Service is a feature to improve the performance of huge queries by offloading portions of query processing to the additional compute resources.

    READ MORE

Leave a Comment

Related Posts