You may wish to access the data stored within your S3 buckets in Python, from Workbench or Connect using JupyterLab or Jupyter Notebook. The article below walks through the process in R:
https://support.posit.co/hc/en-us/articles/13248638476055-Connecting-Workbench-Connect-to-AWS-S3
While Python does not have any native or dedicated packages to connect your R session to S3 buckets like the article above, you can just as easily do so using the boto3 package, which is an AWS SDK for Python.
Connecting to S3
Dependencies
First, install the boto3 package for the version of Python you wish to use:
pip install boto3IAM Role
From there, you will need to ensure that the server you are connecting from can access your AWS account. If your Workbench host is using an IAM role, then you only need to make sure that the IAM role has access to interact with the S3 buckets in your account.
Access Keys
If using access keys, then you will need to set this up on the server itself. There are a number of ways to do this, however, the most common is using the aws configure command (which is interactive) or exporting these keys as environment variables:
export AWS_ACCESS_KEY_ID="access-key"
export AWS_SECRET_ACCESS_KEY="secret-key"Connecting from Python
Now that our server can see our AWS account, we can use Python to view our S3 buckets. You can list out the buckets using this code block:
#Import the boto3 package
import boto3
#Initialize the S3 client
s3 = boto3.client('s3')
#List all buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
print(f"Bucket: {bucket['Name']}")Upload a file to S3:
s3.upload_file("local_file.csv", "bucket-name", "folder_name/file_in_s3.csv")In this case, I've used placeholders above.
- local_file.csv is the name of the file currently in your working directory
- bucket-name is the name of your S3 bucket
- folder_name/file_in_s3.csv is the name of the folder that you wish to upload it to within your S3 bucket. This also changes the name of the local_file.csv to file_in_s3.csv.
Download a file from S3:
s3.download_file("bucket-name", "folder_name/file_in_s3.csv", "local_file.csv")The placeholders above are:
- bucket-name is the name of your S3 bucket
- folder_name/file_in_s3.csv is the name of the file that you wish to download from your S3 bucket
- local_file.csv is the name of the file that is installed locally. This also changes the name of file_in_s3.csv to local_file.csv.
I've used csv's in the examples above, however, this works for all file types. There are a number of other functions that can be performed using boto3, and all of these can be found below:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html