Auto Ingest from S3 to Snowflake
In order for this to work, lets make sure you have these steps are in order.
1. An external stage is created over an S3 bucket. There are no changes in requirements to the external stage as provided by Snowflake.
2. Assuming that a suitable table already exists, a pipe is created with the special flag “auto_ingest=true” (detailed steps below).
3. Now this pipe will be bound to this stage location using an AWS SQS queue as a notification channel. This notification channel can be seen by doing a “SHOW STAGES” command and next to the stage to which the pipe was bound. (Special details on the SHOW STAGES command below).
4. This SQS ARN should then be set on the S3 bucket hosting the stage as a notification configuration for the CreatedObject event. This step is outside the purview of Snowflake and is done using AWS tools (SDK/CLI/Console).
5. Once the notification ARN has been set on the bucket, all new files created under the stage location will be loaded through the associated auto_ingest pipes for that stage to the respective tables as defined by the pipes.
Things to remember
Pipes can only load a delimited file at this time and fixed length is not allowed.
Once the file loaded snowpipe does not allow the same file name to be loaded even if you delete the file re upload.
If you drop a file in the stage please make sure the file append with a date / unique name.
Detailed Steps
1) Create a stage
CREATE OR REPLACE stage “INO”.”SHARED”.”MY_EXT_INO” storage_integration = SNOWFLAKE_INO_STORAGE_INTEGRATION URL = ‘s3://s3snowflake/’ ;
2) Create a pipe to load data.
create or replace pipe ino_pipe1 auto_ingest=true as copy into callupdump from @”INO”.”SHARED”.”MY_EXT_INO”/ino/ file_format = (format_name = CSV_FORMAT);
3) Get the name of SQS queue by running show pipe command:
Show pipes;
The notification_channel column contains the sqs arn.
arn:aws:sqs:us-east-1:999999:sf-snowpipe-xxxxxxx
4) AWS S3 event creation to send notification to snowflake.
a) Go to your S3 bucket properties Events
a) Click on events Add notification , and the fil the below details
Add the SQS ARN in ‘SQS queue ARN’ mentioned in step 3.
aws s3api get-bucket-notification-configuration — bucket nwsnowflake
1) After creating the event load the file in your bucket , it should automatically copy the data into the snowflake table.
2) Snowflake query to check the status :
a) Query to check the status of snowpipe :
select SYSTEM$PIPE_STATUS( ‘ino_pipe1’ );
{“executionState”:”RUNNING”,”pendingFileCount”:0,”notificationChannelName”:”arn:aws:sqs:us-east-1:999999999:sf-snowpipe-IEKBCd-8WvNXfBLb0I5Q-g”,”numOutstandingMessagesOnChannel”:0,”lastReceivedMessageTimestamp”:”2019–10–13T18:02:47.638Z”,”lastForwardedMessageTimestamp”:”2019–10–12T18:39:38.343Z”}
b) Query to check copy command ran against the table in last 1 hour.
select file_name,last_load_time,row_count,pipe_name,pipe_received_time
from table(information_schema.copy_history(table_name=>’CALLUPDUMP’ ,start_time=> dateadd(hours, -1, current_timestamp())));
Runtime statistics
I took a 6.7 Million row file with 2.8GB size
1) Uploaded to s3 from on-perm 3 mins
2) Snowpipe automatically picked up the new file in less than a min
3) Snowpiple loaded the new file to stage table in less than two minute 6.7 Mill rows