Skip to main content

Amazon S3 Configuration

Connect to Amazon S3 for file storage, data lakes, backups, and object storage needs.

Connection Parameters

Required Fields

FieldDescriptionExample
Bucket NameS3 bucket namemy-data-bucket
RegionAWS regionus-east-1
Access Key IDAWS access keyAKIA...
Secret Access KeyAWS secret key (encrypted at rest)

Optional Fields

FieldDescriptionExample
Custom EndpointFor S3-compatible services (MinIO, DigitalOcean Spaces, etc.)https://minio.example.com:9000
Credential Field Names

The credential fields use camelCase names: bucketName, region, accessKeyId, secretAccessKey, and endpoint. These are the exact field names you will see in STRONGLY_SERVICES.

Configuration Example

When creating an S3 data source, provide the following information:

FieldExample ValueNotes
Data source labelprod-s3-storageKebab-case unique identifier (used as both name and label)
Bucket Namemy-data-bucketS3 bucket name
Regionus-east-1AWS region
Access Key IDAKIA...AWS access key
Secret Access KeywJalr...Encrypted at rest
Custom Endpoint-Only for S3-compatible services

AWS IAM Configuration

Creating an IAM User

  1. Go to AWS IAM Console
  2. Navigate to Users -> Add Users
  3. Select Access key - Programmatic access
  4. Attach policies (see Required Permissions)
  5. Complete user creation
  6. Save the Access Key ID and Secret Access Key
Security

Save your secret access key immediately. AWS won't show it again after creation.

Required Permissions

Grant the following permissions to your IAM user:

Read-Only Access

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-data-bucket",
"arn:aws:s3:::my-data-bucket/*"
]
}
]
}

Read-Write Access

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-data-bucket",
"arn:aws:s3:::my-data-bucket/*"
]
}
]
}

Full Access (Including Bucket Management)

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-data-bucket",
"arn:aws:s3:::my-data-bucket/*"
]
}
]
}
Least Privilege

Use the minimum required permissions. For most applications, read-only or read-write access is sufficient.

Test Connection

When you create or test an S3 data source, the platform uses the @aws-sdk/client-s3 library to create an S3 client with the provided region, accessKeyId, and secretAccessKey. If a custom endpoint is provided, forcePathStyle is enabled for S3-compatible services. The test executes a ListBuckets command to verify credentials. On success, it returns "Bucket accessible."

MinIO data sources use the same test connection logic.

Schema Discovery

S3 has full native schema discovery support. Clicking Refresh Metadata returns:

  • Buckets: If a bucketName is configured, returns that single bucket. Otherwise, lists all accessible buckets via ListBuckets.
  • Size: Total size of all objects in the configured bucket (in bytes)
  • Object count: Total number of objects in the bucket (returned in the rowCount field)

The bucket contents are enumerated using ListObjectsV2 with pagination (1000 objects per page).

Browsing Bucket Contents

The platform also provides a dedicated method (datasources.getS3BucketContents) to browse S3 bucket contents with folder navigation. This returns files and folders (using the / delimiter) with their keys, sizes, and last modified dates. This works for S3, MinIO, GCS, and Azure Blob Storage type data sources.

Usage in Workflows (STRONGLY_SERVICES)

When an S3 data source is attached to a workflow, credentials are injected via STRONGLY_SERVICES with all fields at the top level (not nested under a credentials key). No connectionString is generated for S3 (the connection string format s3://bucket is used for display only):

{
"datasources": {
"prod_s3_storage": {
"type": "s3",
"name": "prod-s3-storage",
"bucketName": "my-data-bucket",
"region": "us-east-1",
"accessKeyId": "AKIA...",
"secretAccessKey": "wJalr...",
"endpoint": null
}
}
}

Python Example (boto3)

import os, json
import boto3

# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})

# Get S3 data source (key is sanitized name: hyphens become underscores)
s3_ds = datasources['prod_s3_storage']

# Create S3 client using top-level camelCase fields
s3_config = {
'aws_access_key_id': s3_ds['accessKeyId'],
'aws_secret_access_key': s3_ds['secretAccessKey'],
'region_name': s3_ds['region']
}

# Add custom endpoint if provided (for S3-compatible services)
if s3_ds.get('endpoint'):
s3_config['endpoint_url'] = s3_ds['endpoint']

s3 = boto3.client('s3', **s3_config)

# List objects in bucket
response = s3.list_objects_v2(Bucket=s3_ds['bucketName'])
for obj in response.get('Contents', []):
print(f"File: {obj['Key']}, Size: {obj['Size']} bytes")

# Upload a file
s3.upload_file('local-file.txt', s3_ds['bucketName'], 'remote-file.txt')

# Download a file
s3.download_file(s3_ds['bucketName'], 'remote-file.txt', 'downloaded-file.txt')

# Read file content directly
obj = s3.get_object(Bucket=s3_ds['bucketName'], Key='data.json')
content = obj['Body'].read().decode('utf-8')
print(content)

Python with Custom Endpoint (MinIO)

import boto3

# For S3-compatible services like MinIO
s3 = boto3.client(
's3',
endpoint_url=s3_ds.get('endpoint'), # e.g., 'https://minio.example.com'
aws_access_key_id=s3_ds['accessKeyId'],
aws_secret_access_key=s3_ds['secretAccessKey'],
region_name=s3_ds['region']
)

Node.js Example (AWS SDK v3)

const { S3Client, ListObjectsV2Command, GetObjectCommand, PutObjectCommand } = require('@aws-sdk/client-s3');
const { createReadStream, createWriteStream } = require('fs');

// Parse STRONGLY_SERVICES environment variable
const services = JSON.parse(process.env.STRONGLY_SERVICES || '{}');
const datasources = services.datasources || {};

// Get S3 data source (key is sanitized name)
const s3ds = datasources['prod_s3_storage'];

// Create S3 client using top-level camelCase fields
const s3Config = {
region: s3ds.region,
credentials: {
accessKeyId: s3ds.accessKeyId,
secretAccessKey: s3ds.secretAccessKey
}
};

// Add custom endpoint if provided
if (s3ds.endpoint) {
s3Config.endpoint = s3ds.endpoint;
s3Config.forcePathStyle = true;
}

const s3Client = new S3Client(s3Config);

// List objects
const listCommand = new ListObjectsV2Command({
Bucket: s3ds.bucketName
});
const listResponse = await s3Client.send(listCommand);
console.log('Objects:', listResponse.Contents);

// Upload a file
const uploadCommand = new PutObjectCommand({
Bucket: s3ds.bucketName,
Key: 'remote-file.txt',
Body: createReadStream('local-file.txt')
});
await s3Client.send(uploadCommand);

// Download a file
const downloadCommand = new GetObjectCommand({
Bucket: s3ds.bucketName,
Key: 'remote-file.txt'
});
const downloadResponse = await s3Client.send(downloadCommand);
const body = await downloadResponse.Body.transformToString();
console.log(body);

Common Operations

Upload File with Metadata

s3.upload_file(
'local-file.txt',
s3_ds['bucketName'],
'remote-file.txt',
ExtraArgs={
'Metadata': {
'uploaded-by': 'my-app',
'content-type': 'text/plain'
},
'ContentType': 'text/plain'
}
)

Generate Presigned URL

Create temporary URLs for file access:

# Generate presigned URL (valid for 1 hour)
url = s3.generate_presigned_url(
'get_object',
Params={
'Bucket': s3_ds['bucketName'],
'Key': 'private-file.txt'
},
ExpiresIn=3600
)
print(f"Temporary URL: {url}")

Copy Objects Between Buckets

copy_source = {
'Bucket': 'source-bucket',
'Key': 'source-file.txt'
}

s3.copy_object(
CopySource=copy_source,
Bucket='destination-bucket',
Key='destination-file.txt'
)

Delete Objects

# Delete single object
s3.delete_object(Bucket=s3_ds['bucketName'], Key='file-to-delete.txt')

# Delete multiple objects
s3.delete_objects(
Bucket=s3_ds['bucketName'],
Delete={
'Objects': [
{'Key': 'file1.txt'},
{'Key': 'file2.txt'},
{'Key': 'file3.txt'}
]
}
)

S3-Compatible Services

This configuration also works with S3-compatible services:

MinIO

s3 = boto3.client(
's3',
endpoint_url='https://minio.example.com',
aws_access_key_id=s3_ds['accessKeyId'],
aws_secret_access_key=s3_ds['secretAccessKey'],
region_name='us-east-1'
)

DigitalOcean Spaces

s3 = boto3.client(
's3',
endpoint_url='https://nyc3.digitaloceanspaces.com',
aws_access_key_id=s3_ds['accessKeyId'],
aws_secret_access_key=s3_ds['secretAccessKey'],
region_name='nyc3'
)

Wasabi

s3 = boto3.client(
's3',
endpoint_url='https://s3.wasabisys.com',
aws_access_key_id=s3_ds['accessKeyId'],
aws_secret_access_key=s3_ds['secretAccessKey'],
region_name='us-east-1'
)

Common Issues

Access Denied

  • Verify IAM permissions include required S3 actions
  • Check bucket policy allows access from IAM user
  • Ensure bucket exists and name is correct
  • Verify access keys are correct and not expired

Invalid Access Key ID

  • Check access key ID is correct
  • Verify IAM user still exists
  • Ensure access key hasn't been deleted or deactivated
  • Regenerate keys if necessary

Bucket Not Found

  • Verify bucket name is correct (case-sensitive)
  • Check bucket exists in correct region
  • Ensure IAM user has permission to access bucket

Region Mismatch

  • Verify region matches bucket location
  • Use us-east-1 for buckets without specific region
  • Check region when creating S3 client

Best Practices

  1. Use IAM Roles: For applications running on AWS, use IAM roles instead of access keys
  2. Least Privilege: Grant minimal required permissions
  3. Enable Versioning: Use S3 versioning for important data
  4. Server-Side Encryption: Enable encryption at rest for sensitive data
  5. Lifecycle Policies: Configure lifecycle policies to manage storage costs
  6. Access Logging: Enable S3 access logging for audit trails
  7. Secure Keys: Never commit access keys to version control
  8. Use HTTPS: Always use HTTPS endpoints for data in transit encryption
  9. Multipart Upload: Use multipart upload for large files (>100MB)
  10. Monitor Costs: Set up billing alerts and monitor S3 usage

Performance Optimization

Multipart Upload for Large Files

import boto3
from boto3.s3.transfer import TransferConfig

# Configure multipart upload thresholds
config = TransferConfig(
multipart_threshold=1024 * 25, # 25 MB
max_concurrency=10,
multipart_chunksize=1024 * 25,
use_threads=True
)

s3.upload_file(
'large-file.zip',
s3_ds['bucketName'],
'remote-large-file.zip',
Config=config
)

Parallel Downloads

# Download multiple files in parallel
from concurrent.futures import ThreadPoolExecutor

def download_file(key):
s3.download_file(s3_ds['bucketName'], key, f"local-{key}")

files = ['file1.txt', 'file2.txt', 'file3.txt']
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download_file, files)