Deploy Rehydrations
5 minute read
Rehydration requires a dedicated Kubernetes deployment separate from your Edge Delta agents. This guide walks you through setting up the rehydration infrastructure.
Prerequisites
Before deploying rehydration, ensure you have:
- A Kubernetes cluster (GKE, EKS, or AKS)
kubectlconfigured with cluster access- Archive integrations configured in Edge Delta (S3, GCS, or MinIO)
- Destination integrations configured in Edge Delta (Splunk, Elasticsearch, Dynatrace, or Google Cloud Logging)
Infrastructure Requirements
Recommended Node Configurations
| Environment | Node Count | Node Type | Specs |
|---|---|---|---|
| GKE | 4 | n2-standard-16 | 16 vCPUs, 64 GB Memory |
| AWS (EKS) | 4 | m5.4xlarge | 16 vCPUs, 64 GB Memory |
| Azure (AKS) | 4 | Standard_D16s_v3 | 16 vCPUs, 64 GB Memory |
Generate Required Credentials
API Token
Create an API token with the minimum required permissions:
- In the Edge Delta UI, navigate to Admin > My Organization > API Tokens.
- Click + Create Token.
- Configure the following minimum permissions:
- Integrations: Read
- Accesses: Read
- Rehydrations: Write
- Organization: Read
- Save the token securely.
Cluster ID
Generate a unique cluster ID for your rehydration deployment. This ID ensures multiple rehydration clusters don’t pick up the same rehydration job:
uuidgen
Base64 Encode Credentials
Encode your API token and cluster ID for use in Kubernetes secrets:
echo -n 'your-api-token' | base64
echo -n 'your-cluster-id' | base64
Deployment Steps
Contact Edge Delta Support to obtain the rehydration deployment files. The deployment package includes configuration files for all required components.
1. Create Namespace
kubectl create namespace edgedelta-rehydration
2. Deploy ClickHouse Operator
kubectl apply -f ed-ch-operator.yml
3. Deploy NATS JetStream
NATS provides the message queue for manager-worker communication:
kubectl apply -f js_deploy.yaml -n edgedelta-rehydration
4. Deploy KEDA
KEDA manages autoscaling of worker pods based on queue depth:
kubectl apply --server-side -f keda-2.17.1-core.yaml
5. Configure Autoscaling
kubectl apply -f autoscale.yaml
6. Deploy ClickHouse
kubectl apply -f ch_deploy.yaml
7. Configure and Deploy Rehydration
Before applying rehydrate_deploy.yaml, update the following:
API Token Secret (around line 7):
data:
ed-api-token: "<base64-encoded-api-token>"
Cluster ID Secret (around line 15):
data:
ed-cluster-id: "<base64-encoded-cluster-id>"
Organization ID (lines 44 and 92):
- name: ORG_ID
value: "<your-org-id>"
Then apply the deployment:
kubectl apply -f rehydrate_deploy.yaml
GCS Authentication (GKE)
For GKE deployments reading from Google Cloud Storage, you have two authentication options:
Option A: Workload Identity (Recommended)
Workload Identity eliminates the need to manage service account keys.
1. Enable Workload Identity on Your Cluster
For new clusters:
gcloud container clusters create CLUSTER_NAME \
--workload-pool=PROJECT_ID.svc.id.goog \
--region=REGION
For existing clusters:
gcloud container clusters update CLUSTER_NAME \
--workload-pool=PROJECT_ID.svc.id.goog \
--region=REGION
2. Create a Google Service Account
# Create the service account
gcloud iam service-accounts create rehydration-gcs-reader \
--display-name="Rehydration GCS Reader" \
--project=PROJECT_ID
# Grant GCS permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
3. Create and Bind Kubernetes Service Account
# Create KSA
kubectl create serviceaccount rehydration-workload-identity \
-n edgedelta-rehydration
# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID.svc.id.goog[edgedelta-rehydration/rehydration-workload-identity]"
# Annotate the KSA
kubectl annotate serviceaccount rehydration-workload-identity \
-n edgedelta-rehydration \
iam.gke.io/gcp-service-account=rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com
4. Deploy with Workload Identity
Use the rehydrate_deploy_workload_identity.yaml file instead of the standard deployment. This file includes serviceAccountName: rehydration-workload-identity in the pod spec.
Option B: Service Account JSON Key
For non-GKE environments or when Workload Identity is not available:
# Create a Kubernetes secret with your service account key
kubectl create secret generic gcp-sa-key \
--from-file=sa-key.json=/path/to/your/service-account-key.json \
-n edgedelta-rehydration
Then mount the secret in your deployment and set GOOGLE_APPLICATION_CREDENTIALS to the key path.
AWS-Specific Configuration
For AWS EKS deployments, configure EBS for persistent storage:
Enable EBS CSI Driver
Install the Amazon EBS CSI Driver addon in your EKS cluster.
Create StorageClass
If your cluster doesn’t have a default StorageClass, create one:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: xfs
type: io1
iopsPerGB: "50"
encrypted: "true"
Verify Deployment
Check that all components are running:
# Check manager
kubectl get pods -n edgedelta-rehydration -l app=rehydration-manager
# Check NATS
kubectl get pods -n edgedelta-rehydration -l app.kubernetes.io/name=nats
# Check ClickHouse
kubectl get pods -n edgedelta-rehydration -l app=clickhouse
# Workers should be at 0 initially (they scale up when jobs are queued)
kubectl get pods -n edgedelta-rehydration -l app=rehydration-worker
Performance Tuning
ClickHouse Resources
In ch_deploy.yaml:
requests:
cpu: 15000m
memory: 28Gi
KEDA Autoscaler
In autoscale.yaml, adjust maxReplicaCount to control maximum worker pods:
maxReplicaCount: 15
Rehydration Manager
In rehydrate_deploy.yaml, key settings for the manager:
| Setting | Default | Description |
|---|---|---|
ED_REHYDRATION_LIST_CONCURRENCY | 1000 | Increase for long time range rehydrations |
ED_REHYDRATION_MAX_CHUNK_SIZE | 50MB | Increase for very large files (with worker memory) |
ED_REHYDRATION_ALLOWED_CONCURRENT_REHYDRATIONS | 1 | Sequential processing recommended for best performance |
Manager resource requests:
requests:
cpu: 2000m
memory: 3Gi
Rehydration Workers
In rehydrate_deploy.yaml, key settings for workers:
| Setting | Default | Description |
|---|---|---|
ED_REHYDRATION_TRANSFORM_CONCURRENCY | 1024 | Parallel transformers per worker |
ED_REHYDRATION_PUSH_CONCURRENCY | 256 | Parallel pushers per worker |
ED_TERMINATION_GRACE_SECONDS | 60 | Graceful shutdown timeout |
Worker resource requests:
requests:
cpu: 3000m
memory: 5Gi
Performance Expectations
Example performance on GKE with 4 x n2-standard-16 nodes:
| Replicas | Data Size | Run Time | Source Throughput | Destination Throughput |
|---|---|---|---|---|
| 1 | 6.55 GB | 17:33 | 371.55 MB/min | 3.14 GB/min |
| 2 | 6.55 GB | 8:47 | 741.99 MB/min | 6.26 GB/min |
| 15 | 6.55 GB | 3:03 | 2.18 GB/min | 18.71 GB/min |
| 15 | 11.51 GB | 5:06 | 2.26 GB/min | 22.95 GB/min |
Performance scales approximately linearly with worker count up to infrastructure limits.
Troubleshooting
Manager Not Polling
kubectl logs -n edgedelta-rehydration deployment/rehydration-manager
Common causes:
- Invalid API token
- Incorrect organization ID
- NATS connection failed
Workers Not Scaling
kubectl get scaledobjects -n edgedelta-rehydration
kubectl describe scaledobject rehydrate-scaled-object -n edgedelta-rehydration
Check KEDA operator logs:
kubectl logs -n keda deployment/keda-operator
Workers Failing to Read Archives
kubectl logs -n edgedelta-rehydration -l app=rehydration-worker --tail=100
Common causes:
- Missing cloud credentials (GCS service account, AWS IAM role)
- Incorrect bucket permissions
- Network connectivity issues
Next Steps
Once deployed, you can run rehydrations from the Edge Delta UI.