Deploy Rehydrations

Set up the rehydration infrastructure in your Kubernetes cluster to enable querying and replaying archived logs.

Rehydration requires a dedicated Kubernetes deployment separate from your Edge Delta agents. This guide walks you through setting up the rehydration infrastructure.

Prerequisites

Before deploying rehydration, ensure you have:

  • A Kubernetes cluster (GKE, EKS, or AKS)
  • kubectl configured with cluster access
  • Archive integrations configured in Edge Delta (S3, GCS, or MinIO)
  • Destination integrations configured in Edge Delta (Splunk, Elasticsearch, Dynatrace, or Google Cloud Logging)

Infrastructure Requirements

EnvironmentNode CountNode TypeSpecs
GKE4n2-standard-1616 vCPUs, 64 GB Memory
AWS (EKS)4m5.4xlarge16 vCPUs, 64 GB Memory
Azure (AKS)4Standard_D16s_v316 vCPUs, 64 GB Memory

Generate Required Credentials

API Token

Create an API token with the minimum required permissions:

  1. In the Edge Delta UI, navigate to Admin > My Organization > API Tokens.
  2. Click + Create Token.
  3. Configure the following minimum permissions:
    • Integrations: Read
    • Accesses: Read
    • Rehydrations: Write
    • Organization: Read
  4. Save the token securely.

Cluster ID

Generate a unique cluster ID for your rehydration deployment. This ID ensures multiple rehydration clusters don’t pick up the same rehydration job:

uuidgen

Base64 Encode Credentials

Encode your API token and cluster ID for use in Kubernetes secrets:

echo -n 'your-api-token' | base64
echo -n 'your-cluster-id' | base64

Deployment Steps

Contact Edge Delta Support to obtain the rehydration deployment files. The deployment package includes configuration files for all required components.

1. Create Namespace

kubectl create namespace edgedelta-rehydration

2. Deploy ClickHouse Operator

kubectl apply -f ed-ch-operator.yml

3. Deploy NATS JetStream

NATS provides the message queue for manager-worker communication:

kubectl apply -f js_deploy.yaml -n edgedelta-rehydration

4. Deploy KEDA

KEDA manages autoscaling of worker pods based on queue depth:

kubectl apply --server-side -f keda-2.17.1-core.yaml

5. Configure Autoscaling

kubectl apply -f autoscale.yaml

6. Deploy ClickHouse

kubectl apply -f ch_deploy.yaml

7. Configure and Deploy Rehydration

Before applying rehydrate_deploy.yaml, update the following:

API Token Secret (around line 7):

data:
  ed-api-token: "<base64-encoded-api-token>"

Cluster ID Secret (around line 15):

data:
  ed-cluster-id: "<base64-encoded-cluster-id>"

Organization ID (lines 44 and 92):

- name: ORG_ID
  value: "<your-org-id>"

Then apply the deployment:

kubectl apply -f rehydrate_deploy.yaml

GCS Authentication (GKE)

For GKE deployments reading from Google Cloud Storage, you have two authentication options:

Workload Identity eliminates the need to manage service account keys.

1. Enable Workload Identity on Your Cluster

For new clusters:

gcloud container clusters create CLUSTER_NAME \
  --workload-pool=PROJECT_ID.svc.id.goog \
  --region=REGION

For existing clusters:

gcloud container clusters update CLUSTER_NAME \
  --workload-pool=PROJECT_ID.svc.id.goog \
  --region=REGION

2. Create a Google Service Account

# Create the service account
gcloud iam service-accounts create rehydration-gcs-reader \
  --display-name="Rehydration GCS Reader" \
  --project=PROJECT_ID

# Grant GCS permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

3. Create and Bind Kubernetes Service Account

# Create KSA
kubectl create serviceaccount rehydration-workload-identity \
  -n edgedelta-rehydration

# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[edgedelta-rehydration/rehydration-workload-identity]"

# Annotate the KSA
kubectl annotate serviceaccount rehydration-workload-identity \
  -n edgedelta-rehydration \
  iam.gke.io/gcp-service-account=rehydration-gcs-reader@PROJECT_ID.iam.gserviceaccount.com

4. Deploy with Workload Identity

Use the rehydrate_deploy_workload_identity.yaml file instead of the standard deployment. This file includes serviceAccountName: rehydration-workload-identity in the pod spec.

Option B: Service Account JSON Key

For non-GKE environments or when Workload Identity is not available:

# Create a Kubernetes secret with your service account key
kubectl create secret generic gcp-sa-key \
  --from-file=sa-key.json=/path/to/your/service-account-key.json \
  -n edgedelta-rehydration

Then mount the secret in your deployment and set GOOGLE_APPLICATION_CREDENTIALS to the key path.

AWS-Specific Configuration

For AWS EKS deployments, configure EBS for persistent storage:

Enable EBS CSI Driver

Install the Amazon EBS CSI Driver addon in your EKS cluster.

Create StorageClass

If your cluster doesn’t have a default StorageClass, create one:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  csi.storage.k8s.io/fstype: xfs
  type: io1
  iopsPerGB: "50"
  encrypted: "true"

Verify Deployment

Check that all components are running:

# Check manager
kubectl get pods -n edgedelta-rehydration -l app=rehydration-manager

# Check NATS
kubectl get pods -n edgedelta-rehydration -l app.kubernetes.io/name=nats

# Check ClickHouse
kubectl get pods -n edgedelta-rehydration -l app=clickhouse

# Workers should be at 0 initially (they scale up when jobs are queued)
kubectl get pods -n edgedelta-rehydration -l app=rehydration-worker

Performance Tuning

ClickHouse Resources

In ch_deploy.yaml:

requests:
  cpu: 15000m
  memory: 28Gi

KEDA Autoscaler

In autoscale.yaml, adjust maxReplicaCount to control maximum worker pods:

maxReplicaCount: 15

Rehydration Manager

In rehydrate_deploy.yaml, key settings for the manager:

SettingDefaultDescription
ED_REHYDRATION_LIST_CONCURRENCY1000Increase for long time range rehydrations
ED_REHYDRATION_MAX_CHUNK_SIZE50MBIncrease for very large files (with worker memory)
ED_REHYDRATION_ALLOWED_CONCURRENT_REHYDRATIONS1Sequential processing recommended for best performance

Manager resource requests:

requests:
  cpu: 2000m
  memory: 3Gi

Rehydration Workers

In rehydrate_deploy.yaml, key settings for workers:

SettingDefaultDescription
ED_REHYDRATION_TRANSFORM_CONCURRENCY1024Parallel transformers per worker
ED_REHYDRATION_PUSH_CONCURRENCY256Parallel pushers per worker
ED_TERMINATION_GRACE_SECONDS60Graceful shutdown timeout

Worker resource requests:

requests:
  cpu: 3000m
  memory: 5Gi

Performance Expectations

Example performance on GKE with 4 x n2-standard-16 nodes:

ReplicasData SizeRun TimeSource ThroughputDestination Throughput
16.55 GB17:33371.55 MB/min3.14 GB/min
26.55 GB8:47741.99 MB/min6.26 GB/min
156.55 GB3:032.18 GB/min18.71 GB/min
1511.51 GB5:062.26 GB/min22.95 GB/min

Performance scales approximately linearly with worker count up to infrastructure limits.

Troubleshooting

Manager Not Polling

kubectl logs -n edgedelta-rehydration deployment/rehydration-manager

Common causes:

  • Invalid API token
  • Incorrect organization ID
  • NATS connection failed

Workers Not Scaling

kubectl get scaledobjects -n edgedelta-rehydration
kubectl describe scaledobject rehydrate-scaled-object -n edgedelta-rehydration

Check KEDA operator logs:

kubectl logs -n keda deployment/keda-operator

Workers Failing to Read Archives

kubectl logs -n edgedelta-rehydration -l app=rehydration-worker --tail=100

Common causes:

  • Missing cloud credentials (GCS service account, AWS IAM role)
  • Incorrect bucket permissions
  • Network connectivity issues

Next Steps

Once deployed, you can run rehydrations from the Edge Delta UI.