GitHub API Integration
4 minute read
Overview
GitHub’s REST API provides comprehensive access to repository data, organization events, issues, pull requests, and more. The Edge Delta HTTP Pull source can efficiently retrieve this data using GitHub’s standard pagination and authentication mechanisms.
Endpoints
Example | Description | Features |
---|---|---|
Organization Events | Monitor org activity | Time-based filtering |
Repositories | List all repos | Link header pagination |
Pull Requests | Track PR activity | State filtering |
Workflow Runs | CI/CD monitoring | Status filtering |
Issues | Issue management | Label and state filters |
Authentication
GitHub requires authentication for most API endpoints. Use a personal access token (PAT) or GitHub App token.
Environment Variables
Set these environment variables for secure credential management:
# GitHub Personal Access Token or GitHub App Token
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Optional: Organization name
export GITHUB_ORG="your-organization"
# Optional: Default repository
export GITHUB_REPO="your-repository"
# Optional: GitHub Enterprise Server URL (if not using github.com)
export GITHUB_API_URL="https://api.github.com"
Using Environment Variables in Configuration
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
endpoint_expression: Concat([EDXEnv("GITHUB_API_URL", "https://api.github.com"), "/orgs/", EDXEnv("GITHUB_ORG", "edgedelta"), "/events"], "")
Basic GitHub Events Pull
Monitor organization events with dynamic time windows:
nodes:
- name: github_events
type: http_pull_input
endpoint: https://api.github.com/orgs/edgedelta/events
method: GET
# Headers
headers:
- header: Accept
value: application/vnd.github.v3+json
# Dynamic authentication
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
# Time-based filtering
parameter_expressions:
- name: "since"
value_expression: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
- name: "per_page"
value_expression: "100"
pull_interval: 5m
This configuration:
- Polls GitHub every 5 minutes
- Retrieves events from the last 10 minutes (overlapping windows)
- Uses secure token authentication from environment variables
- Requests up to 100 events per page
Repository Data with Pagination
Retrieve all repositories from an organization using Link header pagination:
nodes:
- name: github_repos
type: http_pull_input
endpoint: https://api.github.com/orgs/edgedelta/repos
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameter_expressions:
- name: "per_page"
value_expression: "100"
- name: "sort"
value_expression: "updated"
- name: "direction"
value_expression: "desc"
# Automatic pagination
pagination:
link_relation: "next"
max_parallel: 3
pull_interval: 1h
GitHub returns Link headers like:
Link: <https://api.github.com/organizations/123/repos?page=2>; rel="next",
<https://api.github.com/organizations/123/repos?page=10>; rel="last"
The agent automatically follows these links to retrieve all pages.
Pull Requests Monitoring
Monitor pull requests across repositories:
nodes:
- name: github_pull_requests
type: http_pull_input
endpoint: https://api.github.com/repos/edgedelta/edgedelta/pulls
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameters:
- name: state
value: "all"
parameter_expressions:
- name: "since"
value_expression: FormatTime(Now() - Duration("24h"), "%Y-%m-%dT%H:%M:%SZ")
- name: "per_page"
value_expression: "50"
pagination:
link_relation: "next"
max_parallel: 2
pull_interval: 15m
Issues and Comments
Track issues and their comments:
nodes:
- name: github_issues
type: http_pull_input
endpoint: https://api.github.com/repos/edgedelta/edgedelta/issues
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameters:
- name: state
value: "all"
- name: sort
value: "updated"
parameter_expressions:
- name: "since"
value_expression: FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")
- name: "per_page"
value_expression: "100"
pagination:
link_relation: "next"
pull_interval: 10m
Commit Activity
Monitor repository commits:
nodes:
- name: github_commits
type: http_pull_input
endpoint: https://api.github.com/repos/edgedelta/edgedelta/commits
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameter_expressions:
- name: "since"
value_expression: FormatTime(Now() - Duration("6h"), "%Y-%m-%dT%H:%M:%SZ")
- name: "per_page"
value_expression: "100"
pagination:
link_relation: "next"
max_parallel: 2
pull_interval: 30m
Workflow Runs (GitHub Actions)
Monitor CI/CD workflow executions:
nodes:
- name: github_workflows
type: http_pull_input
endpoint: https://api.github.com/repos/edgedelta/edgedelta/actions/runs
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameter_expressions:
- name: "created"
value_expression: Concat([">", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")], "")
- name: "per_page"
value_expression: "100"
pagination:
link_relation: "next"
pull_interval: 5m
Release Monitoring
Track new releases and deployments:
nodes:
- name: github_releases
type: http_pull_input
endpoint: https://api.github.com/repos/edgedelta/edgedelta/releases
method: GET
header_expressions:
- header: "Authorization"
value_expression: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameters:
- name: per_page
value: "30"
pagination:
link_relation: "next"
pull_interval: 1h
Rate Limiting Considerations
GitHub API has rate limits:
- Authenticated requests: 5,000 per hour
- Unauthenticated: 60 per hour
Best practices:
- Always authenticate requests
- Use appropriate
pull_interval
values - Limit
max_parallel
for pagination - Add rate limit handling:
retry_http_code:
- 403 # Forbidden (often rate limit)
- 429 # Too Many Requests
Environment Setup
Set up required environment variables:
# GitHub Personal Access Token
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
# Optional: Custom API host for GitHub Enterprise
export GITHUB_API_HOST="api.github.company.com"
Common Parameters
Parameter | Description | Example |
---|---|---|
since | ISO 8601 timestamp | 2024-01-01T00:00:00Z |
per_page | Results per page (max 100) | 100 |
sort | Sort field | created , updated , pushed |
direction | Sort direction | asc , desc |
state | Filter by state | open , closed , all |
Troubleshooting
401 Unauthorized:
- Verify
GITHUB_TOKEN
environment variable is set - Check token has required scopes for the endpoint
403 Forbidden:
- Often indicates rate limiting
- Check
X-RateLimit-Remaining
response header - Increase
pull_interval
if hitting limits
422 Unprocessable Entity:
- Check timestamp format in
since
parameter - Verify query parameters are valid for the endpoint
Incomplete data:
- Ensure
pagination.link_relation
is set to"next"
- Check
max_parallel
isn’t too high for rate limits