Google BigQuery Configuration

Connect to Google BigQuery for analytics, large-scale queries, and business intelligence workloads.

Connection Parameters

Required Fields

Field	Description	Example
Project ID	GCP project ID	`my-gcp-project`
Service Account Key (JSON)	Complete JSON key file content (stored in the `options` credential field)	`{ "type": "service_account", ... }`

Optional Fields

Field	Description	Example
Dataset ID	Default dataset to use	`analytics`

Credential Field Names

In the creation form, the service account key field is labeled "Service Account Key (JSON)" but is stored internally in the options credential field. The Project ID is stored as projectId and Dataset ID as datasetId. These are the field names you will see in STRONGLY_SERVICES.

Configuration Example

When creating a BigQuery data source, provide the following information:

Field	Example Value	Notes
Data source label	`prod-bigquery`	Kebab-case unique identifier (used as both `name` and `label`)
Project ID	`my-gcp-project`	GCP project ID
Dataset ID	`analytics`	Optional default dataset
Service Account Key (JSON)	`{ "type": "service_account", ... }`	Full JSON key pasted as text

Creating a Service Account Key

To create a service account key for BigQuery:

Go to GCP Console
Navigate to IAM & Admin -> Service Accounts
Click Create Service Account or select an existing one
Grant necessary roles (see Required Permissions)
Click Keys tab -> Add Key -> Create New Key
Select JSON format
Download the JSON key file
Copy and paste the entire JSON content into the Service Account Key field

Security

Keep your service account key secure. Never commit it to version control or share it publicly.

Required Permissions

Grant the following IAM roles to your service account:

For Read-Only Access

- BigQuery Data Viewer (roles/bigquery.dataViewer)
- BigQuery Job User (roles/bigquery.jobUser)

For Read-Write Access

- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Job User (roles/bigquery.jobUser)

For Full Access

- BigQuery Admin (roles/bigquery.admin)

Least Privilege

Use the minimum required permissions. For most applications, read-only access is sufficient.

Test Connection

When you create or test a BigQuery data source, the platform uses the @google-cloud/bigquery Node.js library. It creates a BigQuery client with the provided projectId and parses the service account JSON from the options field (or serviceAccountKey field). The test executes SELECT 1 to verify connectivity.

Schema Discovery

BigQuery has full native schema discovery support. Clicking Refresh Metadata returns:

Schemas: All dataset names in the project (BigQuery datasets are analogous to schemas)
Tables: Tables within the specified dataset (only populated if datasetId is provided)
Databases: The project ID itself
Size: Total bytes across tables in the specified dataset
Row count: Total row count across tables in the specified dataset

Column-Level Metadata

You can fetch column details for individual tables, which returns:

Column name and BigQuery data type (e.g., STRING, INTEGER, TIMESTAMP)
Nullability (based on field mode: REQUIRED vs NULLABLE)
Default value expressions
Table row count and size in bytes

Usage in Workflows (STRONGLY_SERVICES)

When a BigQuery data source is attached to a workflow, credentials are injected via STRONGLY_SERVICES with all fields at the top level (not nested under a credentials key). No connectionString is generated for BigQuery:

{
  "datasources": {
    "prod_bigquery": {
      "type": "bigquery",
      "name": "prod-bigquery",
      "projectId": "my-gcp-project",
      "datasetId": "analytics",
      "options": "{ \"type\": \"service_account\", \"project_id\": \"my-gcp-project\", ... }"
    }
  }
}

Note that the service account JSON is in the options field as a string. You need to parse it to use as credentials.

Python Example

import os, json
from google.cloud import bigquery
from google.oauth2 import service_account

# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})

# Get BigQuery data source (key is sanitized name: hyphens become underscores)
bq = datasources['prod_bigquery']

# Parse service account key from the 'options' field
service_account_info = json.loads(bq['options'])

# Create credentials from service account key
credentials = service_account.Credentials.from_service_account_info(service_account_info)

# Create BigQuery client
client = bigquery.Client(
    credentials=credentials,
    project=bq['projectId']
)

# Execute query
query = """
    SELECT *
    FROM `my-project.analytics.users`
    WHERE active = TRUE
"""
results = client.query(query).result()

# Process results
for row in results:
    print(f"{row.id}: {row.name}")

Python with Pandas

import os, json
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account

# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})

# Get BigQuery data source
bq = datasources['prod_bigquery']

# Parse service account key from 'options' field
service_account_info = json.loads(bq['options'])

# Create credentials
credentials = service_account.Credentials.from_service_account_info(service_account_info)

# Create client
client = bigquery.Client(
    credentials=credentials,
    project=bq['projectId']
)

# Query into DataFrame
query = """
    SELECT *
    FROM `my-project.analytics.users`
    WHERE active = TRUE
    LIMIT 1000
"""
df = client.query(query).to_dataframe()
print(df.head())

Node.js Example

const { BigQuery } = require('@google-cloud/bigquery');

// Parse STRONGLY_SERVICES environment variable
const services = JSON.parse(process.env.STRONGLY_SERVICES || '{}');
const datasources = services.datasources || {};

// Get BigQuery data source (key is sanitized name)
const bq = datasources['prod_bigquery'];

// Parse service account key from 'options' field
const credentials = JSON.parse(bq.options);

// Create BigQuery client using top-level fields
const bigquery = new BigQuery({
  projectId: bq.projectId,
  credentials: credentials
});

// Execute query
const query = `
  SELECT *
  FROM \`my-project.analytics.users\`
  WHERE active = TRUE
`;

async function runQuery() {
  const [rows] = await bigquery.query(query);
  console.log('Rows:');
  rows.forEach(row => console.log(row));
}

runQuery();

Query Syntax

Fully Qualified Table Names

Always use fully qualified table names in the format:

`project-id.dataset-id.table-name`

Example:

SELECT COUNT(*)
FROM `my-project.analytics.users`
WHERE created_date >= '2024-01-01'

Standard SQL vs Legacy SQL

BigQuery supports two SQL dialects. Always use Standard SQL (default):

# Standard SQL (recommended)
query = """
    SELECT * FROM `project.dataset.table`
    WHERE date >= '2024-01-01'
"""

# Legacy SQL (not recommended)
query = """
    #legacySQL
    SELECT * FROM [project:dataset.table]
    WHERE date >= '2024-01-01'
"""

Cost Optimization

Query Cost Estimation

Before running expensive queries, estimate costs:

# Dry run to estimate bytes processed
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = client.query(query, job_config=job_config)

print(f"This query will process {query_job.total_bytes_processed} bytes")
# Approximate cost: $5 per TB processed
estimated_cost = (query_job.total_bytes_processed / (1024**4)) * 5
print(f"Estimated cost: ${estimated_cost:.2f}")

Optimize Query Costs

Use Partitioning: Query only necessary partitions

SELECT * FROM `project.dataset.table`
WHERE _PARTITIONTIME >= TIMESTAMP('2024-01-01')

Select Specific Columns: Avoid SELECT *

-- Bad (expensive)
SELECT * FROM `project.dataset.table`

-- Good (cheaper)
SELECT id, name, email FROM `project.dataset.table`

Use Clustering: Optimize queries with WHERE clauses

CREATE TABLE `project.dataset.table`
PARTITION BY DATE(created_date)
CLUSTER BY user_id, region
AS SELECT * FROM source_table

Limit Results During Development:

SELECT * FROM `project.dataset.table`
LIMIT 100

Use Table Preview: Free preview of table data

# Preview table (free)
table = client.get_table("project.dataset.table")
rows = client.list_rows(table, max_results=10)
for row in rows:
    print(row)

Common Issues

Permission Denied

Verify service account has required BigQuery roles
Check project ID is correct
Ensure service account key is valid and not expired
Grant dataset-level permissions if needed

Invalid Credentials

Ensure service account key JSON is complete and valid
Verify the key hasn't been deleted or revoked
Check that the service account still exists
Regenerate key if necessary

Table Not Found

Verify table name is fully qualified: `project.dataset.table`
Check if table exists in BigQuery console
Ensure service account has access to the dataset
Verify dataset ID in your query matches actual dataset name

Query Timeout

Break large queries into smaller chunks
Use appropriate WHERE clauses to filter data
Consider using materialized views for complex queries
Increase query timeout in your client:

job_config = bigquery.QueryJobConfig(
    use_query_cache=True,
    use_legacy_sql=False,
    maximum_bytes_billed=1000000000  # 1 GB limit
)
query_job = client.query(query, job_config=job_config)
results = query_job.result(timeout=300)  # 5 minute timeout

Best Practices

Use Standard SQL: Always use Standard SQL, not Legacy SQL
Partition Large Tables: Use date partitioning for time-series data
Cluster Tables: Use clustering for frequently filtered columns
Select Specific Columns: Avoid SELECT * to reduce costs
Use Query Cache: Enable query cache to avoid reprocessing
Monitor Costs: Set up billing alerts and monitor query costs
Least Privilege: Grant minimal required permissions to service accounts
Secure Keys: Never commit service account keys to version control
Query Preview: Use preview for development to avoid charges
Batch Queries: Batch multiple queries when possible to reduce overhead

Connection Parameters​

Required Fields​

Optional Fields​

Configuration Example​

Creating a Service Account Key​

Required Permissions​

For Read-Only Access​

For Read-Write Access​

For Full Access​

Test Connection​

Schema Discovery​

Column-Level Metadata​

Usage in Workflows (STRONGLY_SERVICES)​

Python Example​

Python with Pandas​

Node.js Example​

Query Syntax​

Fully Qualified Table Names​

Standard SQL vs Legacy SQL​

Cost Optimization​

Query Cost Estimation​

Optimize Query Costs​

Common Issues​

Permission Denied​

Invalid Credentials​

Table Not Found​

Query Timeout​

Best Practices​

Related Resources​