Snowflake

Pull and Push data to and from Scispot <> Snowflake

Features

Syncing your Snowflake Data Warehouse and Snowflake Data Types with Scispot

Overview of Snowflake Data Sharing with Scispot

Setting up the Snowflake destination connector involves setting up Snowflake entities (warehouse, database, schema, user, and role) in the Snowflake console, setting up the data loading method (internal stage, AWS S3, or Google Cloud Storage bucket), and configuring the Snowflake destination connector using the Scispot UI.

This page describes the step-by-step process of setting up the Snowflake destination connector.

Prerequisites 

  • A Snowflake account with the ACCOUNTADMIN role. If you don’t have an account with the ACCOUNTADMIN role, contact your Snowflake administrator to set one up for you. 
  • (Optional) An AWS, or Google Cloud Storage. 

Network policies 

By default, Snowflake allows users to connect to the service from any computer or device IP address. A security administrator (i.e. users with the SECURITYADMIN role) or higher can create a network policy to allow or deny access to a single IP address or a list of addresses.

If you have any issues connecting with Scispot please make sure that the list of IP addresses is on the allowed list.

To determine whether a network policy is set on your account or for a specific user, execute the SHOW PARAMETERS command.

Account

SHOW PARAMETERS LIKE 'network_policy' IN ACCOUNT;

User

SHOW PARAMETERS LIKE 'network_policy' IN USER <username>;

To read more please check official Snowflake documentation.

Install

Setup guide 

Step 1: Set up Scispot-specific entities in Snowflake 

To set up the Snowflake destination connector, you first need to create Scispot-specific Snowflake entities (a warehouse, database, schema, user, and role) with the OWNERSHIP permission to write data into Snowflake, track costs pertaining to Scispot, and control permissions at a granular level.

You can use the following script in a new Snowflake worksheet to create the entities:

  1. Log into your Snowflake account.
  2. Edit the following script to change the password to a more secure password and to change the names of other resources if you so desire.

Note: Make sure you follow the Snowflake identifier requirements while renaming the resources.

-- set variables (these need to be uppercase) set scispot_role = 'SCISPOT_ROLE'; set scispot_username = 'SCISPOT_USER'; set scispot_warehouse = 'SCISPOT_WAREHOUSE'; set scispot_database = 'SCISPOT_DATABASE'; set scispot_schema = 'SCISPOT_SCHEMA';

-- set user password set scispot_password = 'password';

begin;

-- create Scispot role use role securityadmin; create role if not exists identifier($scispot_role); grant role identifier($scispot_role) to role SYSADMIN;

-- create Scispot user create user if not exists identifier($scispot_username) password = $scispot_password default_role = $scispot_role default_warehouse = $scispot_warehouse;

grant role identifier($scispot_role) to user identifier($scispot_username);

-- change role to sysadmin for warehouse / database steps use role sysadmin;

-- create Scispot warehouse create warehouse if not exists identifier($scispot_warehouse) warehouse_size = xsmall warehouse_type = standard auto_suspend = 60 auto_resume = true initially_suspended = true;

-- create Scispot database create database if not exists identifier($scispot_database);

-- grant Scispot warehouse access grant USAGE on warehouse identifier($scispot_warehouse) to role identifier($scispot_role);

-- grant Scispot database access grant OWNERSHIP on database identifier($scispot_database) to role identifier($scispot_role);

commit;

begin;

USE DATABASE identifier($scispot_database);

-- create schema for Scispot data CREATE SCHEMA IF NOT EXISTS identifier($scispot_schema);

commit;

begin;

-- grant Scispot schema access grant OWNERSHIP on schema identifier($scispot_schema) to role identifier($scispot_role);

commit;

  1. Run the script using the Worksheet page or Snowsight. Make sure to select the All Queries checkbox.

Step 2: Set up a data loading method 

By default, Scispot uses Snowflake’s Internal Stage to load data. You can also load data using an Amazon S3 bucket, or Google Cloud Storage bucket.

Make sure the database and schema have the USAGE privilege.

Using an Amazon S3 bucket 

To use an Amazon S3 bucket, create a new Amazon S3 bucket with read/write access for Scispot to stage data to Snowflake.

Using a Google Cloud Storage bucket 

To use a Google Cloud Storage bucket:

  1. Navigate to the Google Cloud Console and create a new bucket with read/write access for Scispot to stage data to Snowflake.
  2. Generate a JSON key for your service account.
  3. Edit the following script to replace SCISPOT_ROLE with the role you used for Scispot's Snowflake configuration and YOURBUCKETNAME with your bucket name.

create storage INTEGRATION gcs_scispot_integration TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = GCS ENABLED = TRUE STORAGE_ALLOWED_LOCATIONS = ('gcs://YOURBUCKETNAME');

create stage gcs_scispot_stage url = 'gcs://YOURBUCKETNAME' storage_integration = gcs_scispot_integration;

GRANT USAGE ON integration gcs_scispot_integration TO ROLE SCISPOT_ROLE

GRANT USAGE ON stage gcs_scispot_stage TO ROLE SCISPOT_ROLE;

DESC STORAGE INTEGRATION gcs_scispot_integration;

The final query should show a STORAGE_GCP_SERVICE_ACCOUNT property with an email as the property value. Add read/write permissions to your bucket with that email.

  1. Navigate to the Snowflake UI and run the script as a Snowflake account admin using the Worksheet page or Snowsight..

Step 3: Set up Snowflake as a destination in Scispot 

Navigate to the Scispot UI to set up Snowflake as a destination. You can authenticate using username/password or OAuth 2.0:

Login and Password 

OAuth 2.0

Key pair authentication

In order to configure key pair authentication you will need a private/public key pair.

If you do not have the key pair yet, you can generate one using openssl command line tool

Use this command in order to generate an unencrypted private key file:

  `openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt`

Alternatively, use this command to generate an encrypted private key file:

 `openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -v1 PBE-SHA1-RC4-128 -out rsa_key.p8`

Once you have your private key, you need to generate a matching public key.

You can do so with the following command:

 `openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub`

Finally, you need to add the public key to your Snowflake user account.

You can do so with the following SQL command in Snowflake:

 `alter user <user_name> set rsa_public_key=<public_key_value>;`

and replace <user_name> with your user name and <public_key_value> with your public key.

To use AWS S3 as the cloud storage, enter the information for the S3 bucket you created in Step 2:

To use a Google Cloud Storage bucket, enter the information for the bucket you created in Step 2:

Request Integration

* indicates required