Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/build-deploy-cloudrun-google-sheets-trigger.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Build & Deploy Google Sheets Trigger

on:
push:
branches:
- main
- poc
- staging
paths:
- 'google-sheets-trigger/**'
workflow_dispatch:

jobs:
build_and_deploy_google_sheets_trigger:
uses: CruGlobal/.github/.github/workflows/build-deploy-cloudrun-function.yml@v1
with:
function_name: google-sheets-trigger
entry_point: trigger_sheets_check
runtime: python312
environment: ${{ github.ref == 'refs/heads/main' && 'production' || github.ref == 'refs/heads/poc' && 'poc' || github.ref == 'refs/heads/staging' && 'staging' }}
secrets:
GCP_PROJECT_ID: ${{ vars.GCP_PROJECT_ID }}
GCP_PROJECT_NUMBER: ${{ vars.GCP_PROJECT_NUMBER }}
WORKLOAD_IDENTITY_POOL: ${{ vars.WORKLOAD_IDENTITY_POOL }}
WORKLOAD_IDENTITY_PROVIDER: ${{ vars.WORKLOAD_IDENTITY_PROVIDER }}
GCP_SERVICE_ACCOUNT: ${{ vars.GCP_SERVICE_ACCOUNT }}
GCP_SERVICE_ACCOUNT_EMAIL: ${{ vars.GCP_SERVICE_ACCOUNT_EMAIL }}
GCP_REGION: ${{ vars.GCP_REGION }}
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,7 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
.vscode/
.vscode/

# Claude Code local settings
CLAUDE.local.md
126 changes: 123 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,38 @@ Creating a new secret version does not automatically take effect.
You'll need to trigger a new deployment.
See below, except ignore the advice about "only doing this in the POC env".

## Deploy new code manually:
You probably should only be doing this in the POC env.
## Environments

This project uses three environments, each tied to a Git branch. Pushing code to a branch
automatically deploys the corresponding function(s) via GitHub Actions.

| Environment | Branch | GCP Project | Purpose |
|-------------|--------|-------------|---------|
| **Staging** | `staging` | staging GCP project | Testing and validation before production. Use this for all pre-production testing. |
| **Production** | `main` | production GCP project | Live environment. Deploy by merging PRs to `main`. |
| **POC** | `poc` | `cru-data-orchestration-poc` | Experimental sandbox. Requires local Terraform setup (see below). Not needed for regular testing. |

### Testing in staging

Staging is the recommended environment for testing new or updated functions:

1. Merge your feature branch into `staging` and push (or push directly to `staging`):
```bash
git checkout staging
git merge feature/your-feature
git push origin staging
```
2. GitHub Actions will automatically deploy the function to the staging GCP project.
3. Use Cloud Scheduler in the [GCP Console](https://console.cloud.google.com/cloudscheduler) to **"Force Run"** the scheduled job and verify behavior.
4. Check [Cloud Logging](https://console.cloud.google.com/logs) for function output and errors.

### Deploying to production

After testing in staging, merge your PR to `main`. GitHub Actions will deploy to the production GCP project.

### Deploy new code manually

You should only need to do this for experimental work in the POC environment:

```bash
gcloud functions deploy fivetran-trigger --source=. --entry-point=hello_http --runtime=python312 --gen2 --region=us-central1
Expand All @@ -67,6 +97,8 @@ gcloud functions deploy fivetran-trigger --source=. --entry-point=hello_http --r

## POC environment infrastructure

> **Note:** The POC environment is for experimental work only. For regular testing, use the **staging** environment (see above).

The POC environment is contained within the [cru-data-orchestration-poc](https://console.cloud.google.com/welcome?project=cru-data-orchestration-poc) GCP project.
The project and its integrations with Datadog and Github are managed by Terraform and Atlantis in the [cru-terraform repo](https://github.com/CruGlobal/cru-terraform/tree/master/applications/data-warehouse/dot/poc).
However, the functions and related infrastructure are not managed that way. Instead, devs can 'spin up' the functions by using terraform locally, using local tf state. They can then use the web console or gcloud cli to experiment and learn.
Expand All @@ -84,4 +116,92 @@ To spin up the POC infrastructure:
To clean up when you're done:
* `terraform destroy`

Infrastructure learnings here can be applied to the terraform config for the beta and production environments.
Infrastructure learnings here can be applied to the terraform config for the production environment.


## Service Accounts

Service accounts for this project are managed in a separate Terraform repository:
- **Location**: `cru-terraform/applications/data-warehouse/dot/prod/permissions.tf`
- **Pattern**: One service account per Cloud Run function/job
- **Naming convention**: `{function-name}@{project-id}.iam.gserviceaccount.com`

### Adding a New Service Account

1. Add the service account resource to `permissions.tf` in the cru-terraform repo
2. Add required IAM bindings (Pub/Sub publisher, BigQuery access, etc.)
3. If the function needs access to Google Sheets, share those files with the service account email


## Pub/Sub Topics

### cloud-run-job-completed
This topic triggers dbt jobs after a Cloud Run job completes.

**Publishers**: okta-sync, woo-sync, process-geography, google-sheets-trigger

**Subscriber**: A Cloud Function (not in this repo) that calls dbt-trigger

To trigger a dbt job from your function:
```python
from google.cloud import pubsub_v1
import json
import os

def publish_pubsub_message(data: dict, topic_name: str) -> None:
google_cloud_project_id = os.environ.get("GOOGLE_CLOUD_PROJECT")
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(google_cloud_project_id, topic_name)
data_encoded = json.dumps(data).encode("utf-8")
future = publisher.publish(topic_path, data_encoded)
future.result()

# Trigger dbt job:
publish_pubsub_message({"job_id": "YOUR_DBT_JOB_ID"}, "cloud-run-job-completed")
```


## google-sheets-trigger

A reusable Cloud Function that checks Google Sheets for changes and triggers dbt jobs.

### Usage

Configure schedules in Terraform (`poc-terraform/functions.tf`). Each schedule specifies:
- Which sheets to monitor (by ID and name)
- Which dbt job to trigger
- When to run (cron schedule)
- Whether to include weekends in change detection

Example Terraform configuration:
```hcl
module "google-sheets-trigger" {
source = "..."
schedule = {
my_sheets: {
cron: "0 17 * * 1-5", # M-F 5pm
argument = {
"sheets" = [
{ "id" = "your-sheet-id", "name" = "My Sheet" }
],
"dbt_job_id" = "123456",
"include_weekends" = false
}
}
}
}
```

### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `sheets` | array | Yes | List of sheets to monitor, each with `id` and `name` |
| `dbt_job_id` | string | Yes | The dbt job ID to trigger when changes are detected |
| `include_weekends` | boolean | No | If `true`, always looks back 24 hours. If `false` (default), looks back 72 hours on Monday to cover the weekend. |

### Permissions

Share all monitored Google Sheets with the service account:
- `google-sheets-trigger@{project-id}.iam.gserviceaccount.com`
- Grant "Viewer" access (read-only is sufficient)
Loading
Loading