Skip to main content

Google BigQuery Configuration

Connect to Google BigQuery for analytics, large-scale queries, and business intelligence workloads.

Connection Parameters

Required Fields

FieldDescriptionExample
Project IDGCP project IDmy-gcp-project
Service Account Key (JSON)Complete JSON key file content (stored in the options credential field){ "type": "service_account", ... }

Optional Fields

FieldDescriptionExample
Dataset IDDefault dataset to useanalytics
Credential Field Names

In the creation form, the service account key field is labeled "Service Account Key (JSON)" but is stored internally in the options credential field. The Project ID is stored as projectId and Dataset ID as datasetId. These are the field names you will see in STRONGLY_SERVICES.

Configuration Example

When creating a BigQuery data source, provide the following information:

FieldExample ValueNotes
Data source labelprod-bigqueryKebab-case unique identifier (used as both name and label)
Project IDmy-gcp-projectGCP project ID
Dataset IDanalyticsOptional default dataset
Service Account Key (JSON){ "type": "service_account", ... }Full JSON key pasted as text

Creating a Service Account Key

To create a service account key for BigQuery:

  1. Go to GCP Console
  2. Navigate to IAM & Admin -> Service Accounts
  3. Click Create Service Account or select an existing one
  4. Grant necessary roles (see Required Permissions)
  5. Click Keys tab -> Add Key -> Create New Key
  6. Select JSON format
  7. Download the JSON key file
  8. Copy and paste the entire JSON content into the Service Account Key field
Security

Keep your service account key secure. Never commit it to version control or share it publicly.

Required Permissions

Grant the following IAM roles to your service account:

For Read-Only Access

- BigQuery Data Viewer (roles/bigquery.dataViewer)
- BigQuery Job User (roles/bigquery.jobUser)

For Read-Write Access

- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Job User (roles/bigquery.jobUser)

For Full Access

- BigQuery Admin (roles/bigquery.admin)
Least Privilege

Use the minimum required permissions. For most applications, read-only access is sufficient.

Test Connection

When you create or test a BigQuery data source, the platform uses the @google-cloud/bigquery Node.js library. It creates a BigQuery client with the provided projectId and parses the service account JSON from the options field (or serviceAccountKey field). The test executes SELECT 1 to verify connectivity.

Schema Discovery

BigQuery has full native schema discovery support. Clicking Refresh Metadata returns:

  • Schemas: All dataset names in the project (BigQuery datasets are analogous to schemas)
  • Tables: Tables within the specified dataset (only populated if datasetId is provided)
  • Databases: The project ID itself
  • Size: Total bytes across tables in the specified dataset
  • Row count: Total row count across tables in the specified dataset

Column-Level Metadata

You can fetch column details for individual tables, which returns:

  • Column name and BigQuery data type (e.g., STRING, INTEGER, TIMESTAMP)
  • Nullability (based on field mode: REQUIRED vs NULLABLE)
  • Default value expressions
  • Table row count and size in bytes

Usage in Workflows (STRONGLY_SERVICES)

When a BigQuery data source is attached to a workflow, credentials are injected via STRONGLY_SERVICES with all fields at the top level (not nested under a credentials key). No connectionString is generated for BigQuery:

{
"datasources": {
"prod_bigquery": {
"type": "bigquery",
"name": "prod-bigquery",
"projectId": "my-gcp-project",
"datasetId": "analytics",
"options": "{ \"type\": \"service_account\", \"project_id\": \"my-gcp-project\", ... }"
}
}
}

Note that the service account JSON is in the options field as a string. You need to parse it to use as credentials.

Python Example

import os, json
from google.cloud import bigquery
from google.oauth2 import service_account

# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})

# Get BigQuery data source (key is sanitized name: hyphens become underscores)
bq = datasources['prod_bigquery']

# Parse service account key from the 'options' field
service_account_info = json.loads(bq['options'])

# Create credentials from service account key
credentials = service_account.Credentials.from_service_account_info(service_account_info)

# Create BigQuery client
client = bigquery.Client(
credentials=credentials,
project=bq['projectId']
)

# Execute query
query = """
SELECT *
FROM `my-project.analytics.users`
WHERE active = TRUE
"""
results = client.query(query).result()

# Process results
for row in results:
print(f"{row.id}: {row.name}")

Python with Pandas

import os, json
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account

# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})

# Get BigQuery data source
bq = datasources['prod_bigquery']

# Parse service account key from 'options' field
service_account_info = json.loads(bq['options'])

# Create credentials
credentials = service_account.Credentials.from_service_account_info(service_account_info)

# Create client
client = bigquery.Client(
credentials=credentials,
project=bq['projectId']
)

# Query into DataFrame
query = """
SELECT *
FROM `my-project.analytics.users`
WHERE active = TRUE
LIMIT 1000
"""
df = client.query(query).to_dataframe()
print(df.head())

Node.js Example

const { BigQuery } = require('@google-cloud/bigquery');

// Parse STRONGLY_SERVICES environment variable
const services = JSON.parse(process.env.STRONGLY_SERVICES || '{}');
const datasources = services.datasources || {};

// Get BigQuery data source (key is sanitized name)
const bq = datasources['prod_bigquery'];

// Parse service account key from 'options' field
const credentials = JSON.parse(bq.options);

// Create BigQuery client using top-level fields
const bigquery = new BigQuery({
projectId: bq.projectId,
credentials: credentials
});

// Execute query
const query = `
SELECT *
FROM \`my-project.analytics.users\`
WHERE active = TRUE
`;

async function runQuery() {
const [rows] = await bigquery.query(query);
console.log('Rows:');
rows.forEach(row => console.log(row));
}

runQuery();

Query Syntax

Fully Qualified Table Names

Always use fully qualified table names in the format:

`project-id.dataset-id.table-name`

Example:

SELECT COUNT(*)
FROM `my-project.analytics.users`
WHERE created_date >= '2024-01-01'

Standard SQL vs Legacy SQL

BigQuery supports two SQL dialects. Always use Standard SQL (default):

# Standard SQL (recommended)
query = """
SELECT * FROM `project.dataset.table`
WHERE date >= '2024-01-01'
"""

# Legacy SQL (not recommended)
query = """
#legacySQL
SELECT * FROM [project:dataset.table]
WHERE date >= '2024-01-01'
"""

Cost Optimization

Query Cost Estimation

Before running expensive queries, estimate costs:

# Dry run to estimate bytes processed
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = client.query(query, job_config=job_config)

print(f"This query will process {query_job.total_bytes_processed} bytes")
# Approximate cost: $5 per TB processed
estimated_cost = (query_job.total_bytes_processed / (1024**4)) * 5
print(f"Estimated cost: ${estimated_cost:.2f}")

Optimize Query Costs

  1. Use Partitioning: Query only necessary partitions
SELECT * FROM `project.dataset.table`
WHERE _PARTITIONTIME >= TIMESTAMP('2024-01-01')
  1. Select Specific Columns: Avoid SELECT *
-- Bad (expensive)
SELECT * FROM `project.dataset.table`

-- Good (cheaper)
SELECT id, name, email FROM `project.dataset.table`
  1. Use Clustering: Optimize queries with WHERE clauses
CREATE TABLE `project.dataset.table`
PARTITION BY DATE(created_date)
CLUSTER BY user_id, region
AS SELECT * FROM source_table
  1. Limit Results During Development:
SELECT * FROM `project.dataset.table`
LIMIT 100
  1. Use Table Preview: Free preview of table data
# Preview table (free)
table = client.get_table("project.dataset.table")
rows = client.list_rows(table, max_results=10)
for row in rows:
print(row)

Common Issues

Permission Denied

  • Verify service account has required BigQuery roles
  • Check project ID is correct
  • Ensure service account key is valid and not expired
  • Grant dataset-level permissions if needed

Invalid Credentials

  • Ensure service account key JSON is complete and valid
  • Verify the key hasn't been deleted or revoked
  • Check that the service account still exists
  • Regenerate key if necessary

Table Not Found

  • Verify table name is fully qualified: `project.dataset.table`
  • Check if table exists in BigQuery console
  • Ensure service account has access to the dataset
  • Verify dataset ID in your query matches actual dataset name

Query Timeout

  • Break large queries into smaller chunks
  • Use appropriate WHERE clauses to filter data
  • Consider using materialized views for complex queries
  • Increase query timeout in your client:
job_config = bigquery.QueryJobConfig(
use_query_cache=True,
use_legacy_sql=False,
maximum_bytes_billed=1000000000 # 1 GB limit
)
query_job = client.query(query, job_config=job_config)
results = query_job.result(timeout=300) # 5 minute timeout

Best Practices

  1. Use Standard SQL: Always use Standard SQL, not Legacy SQL
  2. Partition Large Tables: Use date partitioning for time-series data
  3. Cluster Tables: Use clustering for frequently filtered columns
  4. Select Specific Columns: Avoid SELECT * to reduce costs
  5. Use Query Cache: Enable query cache to avoid reprocessing
  6. Monitor Costs: Set up billing alerts and monitor query costs
  7. Least Privilege: Grant minimal required permissions to service accounts
  8. Secure Keys: Never commit service account keys to version control
  9. Query Preview: Use preview for development to avoid charges
  10. Batch Queries: Batch multiple queries when possible to reduce overhead