Google BigQuery Configuration
Connect to Google BigQuery for analytics, large-scale queries, and business intelligence workloads.
Connection Parameters
Required Fields
- Project ID: GCP project ID
- Service Account Key: Complete JSON key file content
Optional Fields
- Dataset ID: Optional default dataset
Configuration Example
When creating a BigQuery data source, provide the following information:
| Field | Example Value | Notes |
|---|---|---|
| Name | prod-bigquery | Unique identifier |
| Label | Production BigQuery | Display name |
| Project ID | my-gcp-project | GCP project ID |
| Dataset ID | analytics | Optional default dataset |
| Service Account Key | { "type": "service_account", ... } | Full JSON key |
Creating a Service Account Key
To create a service account key for BigQuery:
- Go to GCP Console
- Navigate to IAM & Admin → Service Accounts
- Click Create Service Account or select an existing one
- Grant necessary roles (see Required Permissions)
- Click Keys tab → Add Key → Create New Key
- Select JSON format
- Download the JSON key file
- Copy and paste the entire JSON content into the Service Account Key field
Security
Keep your service account key secure. Never commit it to version control or share it publicly.
Required Permissions
Grant the following IAM roles to your service account:
For Read-Only Access
- BigQuery Data Viewer (roles/bigquery.dataViewer)
- BigQuery Job User (roles/bigquery.jobUser)
For Read-Write Access
- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Job User (roles/bigquery.jobUser)
For Full Access
- BigQuery Admin (roles/bigquery.admin)
Least Privilege
Use the minimum required permissions. For most applications, read-only access is sufficient.
Usage in Applications
Python Example
import os, json
from google.cloud import bigquery
from google.oauth2 import service_account
# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})
# Get BigQuery data source
bq_config = datasources['prod-bigquery']
creds_dict = json.loads(bq_config['credentials']['service_account_key'])
# Create credentials from service account key
credentials = service_account.Credentials.from_service_account_info(creds_dict)
# Create BigQuery client
client = bigquery.Client(
credentials=credentials,
project=bq_config['credentials']['project_id']
)
# Execute query
query = """
SELECT *
FROM `my-project.analytics.users`
WHERE active = TRUE
"""
results = client.query(query).result()
# Process results
for row in results:
print(f"{row.id}: {row.name}")
Python with Pandas
import os, json
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
# Parse STRONGLY_SERVICES environment variable
services = json.loads(os.getenv('STRONGLY_SERVICES', '{}'))
datasources = services.get('datasources', {})
# Get BigQuery data source
bq_config = datasources['prod-bigquery']
creds_dict = json.loads(bq_config['credentials']['service_account_key'])
# Create credentials
credentials = service_account.Credentials.from_service_account_info(creds_dict)
# Create client
client = bigquery.Client(
credentials=credentials,
project=bq_config['credentials']['project_id']
)
# Query into DataFrame
query = """
SELECT *
FROM `my-project.analytics.users`
WHERE active = TRUE
LIMIT 1000
"""
df = client.query(query).to_dataframe()
print(df.head())
Node.js Example
const { BigQuery } = require('@google-cloud/bigquery');
// Parse STRONGLY_SERVICES environment variable
const services = JSON.parse(process.env.STRONGLY_SERVICES || '{}');
const datasources = services.datasources || {};
// Get BigQuery data source
const bqConfig = datasources['prod-bigquery'];
const credentials = JSON.parse(bqConfig.credentials.service_account_key);
// Create BigQuery client
const bigquery = new BigQuery({
projectId: bqConfig.credentials.project_id,
credentials: credentials
});
// Execute query
const query = `
SELECT *
FROM \`my-project.analytics.users\`
WHERE active = TRUE
`;
async function runQuery() {
const [rows] = await bigquery.query(query);
console.log('Rows:');
rows.forEach(row => console.log(row));
}
runQuery();
Java Example
import com.google.auth.oauth2.ServiceAccountCredentials;
import com.google.cloud.bigquery.*;
import org.json.*;
import java.io.*;
public class BigQueryExample {
public static void main(String[] args) throws Exception {
// Parse STRONGLY_SERVICES environment variable
String servicesJson = System.getenv("STRONGLY_SERVICES");
JSONObject services = new JSONObject(servicesJson);
JSONObject datasources = services.getJSONObject("datasources");
JSONObject bqConfig = datasources.getJSONObject("prod-bigquery");
JSONObject creds = bqConfig.getJSONObject("credentials");
// Parse service account key
String keyJson = creds.getString("service_account_key");
InputStream keyStream = new ByteArrayInputStream(keyJson.getBytes());
ServiceAccountCredentials credentials =
ServiceAccountCredentials.fromStream(keyStream);
// Create BigQuery client
BigQuery bigquery = BigQueryOptions.newBuilder()
.setCredentials(credentials)
.setProjectId(creds.getString("project_id"))
.build()
.getService();
// Execute query
String query = "SELECT * FROM `my-project.analytics.users` WHERE active = TRUE";
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();
TableResult results = bigquery.query(queryConfig);
// Process results
for (FieldValueList row : results.iterateAll()) {
System.out.println(row.toString());
}
}
}
Query Syntax
Fully Qualified Table Names
Always use fully qualified table names in the format:
`project-id.dataset-id.table-name`
Example:
SELECT COUNT(*)
FROM `my-project.analytics.users`
WHERE created_date >= '2024-01-01'
Standard SQL vs Legacy SQL
BigQuery supports two SQL dialects. Always use Standard SQL (default):
# Standard SQL (recommended)
query = """
SELECT * FROM `project.dataset.table`
WHERE date >= '2024-01-01'
"""
# Legacy SQL (not recommended)
query = """
#legacySQL
SELECT * FROM [project:dataset.table]
WHERE date >= '2024-01-01'
"""
Cost Optimization
Query Cost Estimation
Before running expensive queries, estimate costs:
# Dry run to estimate bytes processed
job_config = bigquery.QueryJobConfig(dry_run=True)
query_job = client.query(query, job_config=job_config)
print(f"This query will process {query_job.total_bytes_processed} bytes")
# Approximate cost: $5 per TB processed
estimated_cost = (query_job.total_bytes_processed / (1024**4)) * 5
print(f"Estimated cost: ${estimated_cost:.2f}")
Optimize Query Costs
- Use Partitioning: Query only necessary partitions
SELECT * FROM `project.dataset.table`
WHERE _PARTITIONTIME >= TIMESTAMP('2024-01-01')
- Select Specific Columns: Avoid
SELECT *
-- Bad (expensive)
SELECT * FROM `project.dataset.table`
-- Good (cheaper)
SELECT id, name, email FROM `project.dataset.table`
- Use Clustering: Optimize queries with WHERE clauses
CREATE TABLE `project.dataset.table`
PARTITION BY DATE(created_date)
CLUSTER BY user_id, region
AS SELECT * FROM source_table
- Limit Results During Development:
SELECT * FROM `project.dataset.table`
LIMIT 100
- Use Table Preview: Free preview of table data
# Preview table (free)
table = client.get_table("project.dataset.table")
rows = client.list_rows(table, max_results=10)
for row in rows:
print(row)
Common Issues
Permission Denied
- Verify service account has required BigQuery roles
- Check project ID is correct
- Ensure service account key is valid and not expired
- Grant dataset-level permissions if needed:
# Grant dataset access
bq show --format=prettyjson project:dataset > /tmp/dataset.json
# Edit JSON to add service account email to access list
bq update --source /tmp/dataset.json project:dataset
Invalid Credentials
- Ensure service account key JSON is complete and valid
- Verify the key hasn't been deleted or revoked
- Check that the service account still exists
- Regenerate key if necessary
Table Not Found
- Verify table name is fully qualified:
`project.dataset.table` - Check if table exists in BigQuery console
- Ensure service account has access to the dataset
- Verify dataset ID in your query matches actual dataset name
Query Timeout
- Break large queries into smaller chunks
- Use appropriate WHERE clauses to filter data
- Consider using materialized views for complex queries
- Increase query timeout in your client:
job_config = bigquery.QueryJobConfig(
use_query_cache=True,
use_legacy_sql=False,
maximum_bytes_billed=1000000000 # 1 GB limit
)
query_job = client.query(query, job_config=job_config)
results = query_job.result(timeout=300) # 5 minute timeout
Best Practices
- Use Standard SQL: Always use Standard SQL, not Legacy SQL
- Partition Large Tables: Use date partitioning for time-series data
- Cluster Tables: Use clustering for frequently filtered columns
- Select Specific Columns: Avoid
SELECT *to reduce costs - Use Query Cache: Enable query cache to avoid reprocessing
- Monitor Costs: Set up billing alerts and monitor query costs
- Least Privilege: Grant minimal required permissions to service accounts
- Secure Keys: Never commit service account keys to version control
- Query Preview: Use preview for development to avoid charges
- Batch Queries: Batch multiple queries when possible to reduce overhead