1. What is Snowflake?
Snowflake is a single, integrated analytic data platform built for cloud delivered as Data warehouse as a service. It is a fully managed SaaS (software as a service) offering that provides a single cloud data platform for Data Warehousing, Data Lakes, Data Engineering, Data Science, Data application development, and secure sharing and consumption of real-time / shared data.
2. Explain about Snowflake Architecture?
Snowflake’s Architecture is a hybrid of traditional Shared-Disk and Shared-Nothing database architectures. It consists of three key layers:
- Database Storage
Snowflake uses highly secure cloud storage to maintain all your data. Snowflake stores all data in databases. A database is a logical grouping of objects consisting primarily of tables and views organized into one or more schemas.
- Query Processing
Query execution is performed in the compute layer where queries are executed using resources provisioned from a cloud provider. Unlike traditional architectures snowflake allows you to create multiple independent compute clusters called Virtual Warehouses.
- Cloud Services
The Snowflake Cloud Services layer is the ‘Brain’ of the system which coordinates and manages the entire system.
The Services layer is responsible for:
- Authentication of users.
- Query compilation and optimization.
- Infrastructure Management
- Access Control
- Metadata Store
3. What are the different system defined roles provided by Snowflake?
Below are the system defined roles in Snowflake
Organization Administrator role that manages operations of Organizations and accounts at the Organization level.
Role that manages all aspects of the account. It inherits the privileges of the SYSADMIN and SECURITYADMIN system-defined roles and sits on top of role hierarchy.
Role that can create, monitor, and manage users and roles as well manage any object grant globally. It inherits the privileges of the USERADMIN role via the system role hierarchy.
Role that manages creation of users and roles.
Role that has privileges to create and manage warehouses and databases in an account. All Custom roles are recommended to roll up to SYSADMIN.
Public role is automatically granted to all the users in the account. The objects owned by the role are available to every other user and role in your account.
4. What are the different Snowflake Editions available for consumers?
Snowflake provides below editions to choose from that fits your organization’s specific requirements.
▪️ Standard Edition
▪️ Enterprise Edition
▪️ Business Critical Edition
▪️ Virtual Private Snowflake (VPS)
5. What are Micro-partitions in Snowflake?
All data in Snowflake is stored in database tables, logically structured as collections of columns and rows. Although this is how the front-end functions, Snowflake does not actually handle and store data in this manner. Instead, Snowflake stores all table data automatically divided into encrypted compressed files which are referred to as Micro-partitions. Micro-partitions are immutable files i.e. cannot be modified once created and stores data in columnar format.
6. What are Snowflake Virtual Warehouses?
A Virtual Warehouse is a cluster of compute resources in Snowflake. A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform various operations in Snowflake like executing SQL statements, performing data loading and unloading operations.
7. What are Multi-Cluster Warehouses?
Multi-cluster warehouses enable you to automatically scale out compute resources to manage your concurrent users/queries needs as they change, such as during peak and off hours. Multi-cluster warehouses are not beneficial for improving the performance of slow-running queries or data loading. Instead Multi-cluster Warehouse can spin-up additional warehouses to handle the workload.
8. What are different types of tables in Snowflake?
Snowflake supports three types of tables
- Permanent Table
These are the standard, regular database tables. Permanent tables are the default table type in Snowflake and do not need any additional syntax while creating to make them permanent.
- Transient Table
Transient tables in Snowflake are similar to permanent tables except that that they do not have a Fail-safe period and only have a very limited Time-Travel period. These are best suited in scenarios where the data in your table is not critical and can be recovered from external means if required.
- Temporary Table
Temporary Tables in Snowflake exist only within the session in which they were created and available only for the remainder of the session. They are not visible to other users or sessions. Once the session ends, data stored in the table is dropped completely and is not recoverable.
9. What are different types of Views in Snowflake?
There are three broad categories of views in Snowflake.
- Non-Materialized views (Regular views)
A Non-Materialized view’s results are created by executing the query at the time that the view is referenced in a query. The term “Views” generally refers to Non-Materialized views.
- Materialized Views
A Materialized view is a database object that stores the pre-computed results of a query definition of a view. While simple views allow us to save complicated queries for future use, materialized views store a copy of the query results.
- Secure Views
A Secure View limit access to the data definition of the view so that the sensitive data that should not be exposed to all users of the underlying table(s) stays hidden.
10. What are Snowflake Stages?
Snowflake Stages are locations where data files are stored (“staged”) which helps in loading data into and unloading data out of database tables.
Snowflake supports two types of stages:
▪️ Internal Stage: Stores the files internally within Snowflake.
▪️ External Stage: Stores the files in an external location (AWS S3 bucket or Azure Containers or GCP Cloud storage) that is referenced by the stage.
11. What are different types of Snowflake Internal Stages?
Snowflake Internal Stages stores data files internally within Snowflake. Snowflake Internal Stages are further classified into
By default, each user and table in Snowflake is automatically allocated an internal stage for staging data files. In addition, you can create internal named stages.
12. What are Snowflake File Formats?
Snowflake File format is a named database object that can be used to simplify the process of accessing the staged data and streamlines loading data into and unloading data out of database tables. A Snowflake File format encapsulates information of data files, such as file type (CSV, JSON, etc.) and formatting options specific to each type used for bulk loading/unloading.
13. What are Snowflake External tables?
Snowflake External Tables provides a unique way of accessing the data from files in external locations without actually moving them into Snowflake. They enable you to query data stored in files in an external stage as if it were inside a database by storing the file level metadata.
Consider below points before creating a Snowflake External table:
▪️ There must be an external stage created in Snowflake to access the files from external location.
▪️ External tables support external (i.e. S3, Azure, or GCS) stages only. Internal (i.e. Snowflake) stages are not supported.
▪️ You are required to have knowledge of the file format (CSV, Parquet etc).
14. What is Snowflake Zero Copy Cloning?
Snowflake’s Zero Copy Cloning is a feature which provides a quick and easy way to create a copy of any table, schema, or an entire database without incurring any additional costs as the derived copy shares the underlying storage with the original object. The cloned and original objects are independent from each other i.e., any changes done on either of the objects do not impact others.
15. How does Snowflake’s Zero Copy Cloning works?
All data in Snowflake tables is automatically divided into micro-partitions, which are smallest continuous units of storage. Micro-partitions in Snowflake are immutable i.e., once created lasts in the same state until table is dropped. Hence, for any change in the data of a micro-partition, a new micro-partition is created with updated changes and metadata will point to the newly created micro partition. The older partition is retained for time-travel and Fail-safe purposes.
16. What is Snowflake Time Travel?
Snowflake Time Travel enables accessing historical data that has been changed or deleted at any point within a defined period. It is a powerful CDP (Continuous Data Protection) feature which ensures the maintenance and availability of your historical data.
Below actions can be performed using Snowflake Time Travel within a defined period of time:
▪️ Restore tables, schemas, and databases that have been dropped.
▪️ Query data in the past that has since been updated or deleted.
▪️ Create clones of entire tables, schemas, and databases at or before specific points in the past.
17. What is Snowflake Fail-Safe period?
Fail-safe provides a (non-configurable) 7-day period during which historical data may be recoverable by Snowflake. This period starts immediately after the Time Travel retention period ends. Fail-Safe requires intervention by Snowflake Support team to restore data. It is not provided as a means for accessing historical data after the Time Travel retention period has ended.
18. What is Snowflake Secure Data Sharing?
Secure Data Sharing in Snowflake enables account-to-account sharing of selected database objects in your account with other. This Data Sharing feature is supported only between Snowflake accounts. In scenarios where the consumer is a non-snowflake customer, the access can be provided through Reader accounts.
19. Explain about Snowflake Information Schema?
The Snowflake Information Schema is a Data Dictionary schema available as a read-only schema named INFORMATION_SCHEMA under each database created automatically by Snowflake. It consists of a set of system-defined views and table functions that provide extensive metadata information about the objects like tables, views, stages, external tables, procedures, functions etc., created in your account.
20. What are Snowflake Resource Monitors?
Resource Monitors in Snowflake assist in cost management and prevent unforeseen credit usage caused by operating virtual warehouses. They issue alarm alerts and helps in stopping user-managed warehouses when certain limits are reached or approaching.
The Resource Monitors in Snowflake can monitor the credit usage at two different levels.
▪️ ACCOUNT: At the Account level i.e. all the warehouses in the account.
▪️ WAREHOUSE: At the individual Warehouse or a group of warehouses level.
21. What is Snowflake Snowpipe?
Snowpipe is Snowflake’s continuous data ingestion service which enables loading data from files as soon as they are available in a stage. It is basically a COPY INTO command continuously looking for fresh files to process them in micro-batches as soon as they arrive at the external staging area. Snowpipe eliminates the requirement to build out near-real-time pipelines to process data from external locations.
22. What are Snowflake Snowflake Streams?
A Stream is a Snowflake object that provides Change Data Capture (CDC) capabilities to track the changes made to tables including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data. Streams can be queried just like a table which makes it easy to grab the new data in a table. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object.
23. How to run SQL Statements on Schedule in Snowflake?
A Snowflake Task allows scheduled execution of SQL statements including calling a stored procedure or Procedural logic using Snowflake Scripting. Snowflake Tasks are not event based, instead a task runs on a schedule. The Snowflake task engine has a CRON and NONCRON variant scheduling mechanisms.
24. What is Snowflake SnowSQL?
SnowSQL is a command line tool for connecting to Snowflake from Windows, Linux and macOS. Snowflake SnowSQL lets you execute all SQL queries and perform DDL and DML operations, including loading data into and unloading data out of database tables. Snowflake SnowSQL must be downloaded and installed on your machine to run commands from command line.
25. What is Data Clustering?
Clustering is a technique used to organize data storage to better accommodate anticipated queries. The key objective is to increase query performance while lowering the amount of system resources needed to run queries. Although clustering can substantially improve the performance and reduce the cost of some queries, the compute resources used to perform clustering consume credits. So, you should cluster only when the table contains multiple terabytes (TB) of data with a large number of micro-partitions and queries will benefit substantially from the clustering.
26. What is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking is a security feature that allows organizations to mask sensitive data in their database tables, views, and query results in real-time. This is useful for protecting sensitive information from unauthorized access or exposure. Snowflake Dynamic Data Masking allows the data to be masked as it is accessed, rather than being permanently altered in the database.
Subscribe to our Newsletter !!
The INFORMATION_SCHEMA is a read-only schema available automatically under each database. It stores metadata of all the objects built under the database.
Snowflake Resource Monitors assist in cost management and prevent unforeseen credit usage caused by operating warehouses.
Learn different ways to remove duplicate records from a Snowflake table using SWAP WITH and INSERT OVERWRITE commands.