GitHub API Integration

Configure HTTP Pull to retrieve data from GitHub API endpoints with authentication, pagination, and time-based filtering.

Overview

GitHub’s REST API provides comprehensive access to repository data, organization events, issues, pull requests, and more. The Edge Delta HTTP Pull source can efficiently retrieve this data using GitHub’s standard pagination and authentication mechanisms.

Authentication

GitHub requires authentication for most API endpoints. Use a personal access token (PAT) or GitHub App token stored in environment variables.

header_expressions:
  Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

Basic GitHub Events Pull

Monitor organization events with dynamic time windows:

nodes:
- name: github_events
  type: http_pull_input
  endpoint: https://api.github.com/orgs/edgedelta/events
  method: GET

  # Headers
  headers:
    - header: Accept
      value: application/vnd.github.v3+json

  # Dynamic authentication
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  # Time-based filtering
  parameter_expressions:
    since: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "100"

  pull_interval: 5m

This configuration:

  • Polls GitHub every 5 minutes
  • Retrieves events from the last 10 minutes (overlapping windows)
  • Uses secure token authentication from environment variables
  • Requests up to 100 events per page

Repository Data with Pagination

Retrieve all repositories from an organization using Link header pagination:

nodes:
- name: github_repos
  type: http_pull_input
  endpoint: https://api.github.com/orgs/edgedelta/repos
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameter_expressions:
    per_page: "100"
    sort: "updated"
    direction: "desc"

  # Automatic pagination
  pagination:
    link_relation: "next"
    max_parallel_requests: 3

  pull_interval: 1h

GitHub returns Link headers like:

Link: <https://api.github.com/organizations/123/repos?page=2>; rel="next",
      <https://api.github.com/organizations/123/repos?page=10>; rel="last"

The agent automatically follows these links to retrieve all pages.

Pull Requests Monitoring

Monitor pull requests across repositories:

nodes:
- name: github_pull_requests
  type: http_pull_input
  endpoint: https://api.github.com/repos/edgedelta/edgedelta/pulls
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameters:
    - name: state
      value: "all"

  parameter_expressions:
    since: FormatTime(Now() - Duration("24h"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "50"

  pagination:
    link_relation: "next"
    max_parallel_requests: 2

  pull_interval: 15m

Issues and Comments

Track issues and their comments:

nodes:
- name: github_issues
  type: http_pull_input
  endpoint: https://api.github.com/repos/edgedelta/edgedelta/issues
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameters:
    - name: state
      value: "all"
    - name: sort
      value: "updated"

  parameter_expressions:
    since: FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "100"

  pagination:
    link_relation: "next"

  pull_interval: 10m

Commit Activity

Monitor repository commits:

nodes:
- name: github_commits
  type: http_pull_input
  endpoint: https://api.github.com/repos/edgedelta/edgedelta/commits
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameter_expressions:
    since: FormatTime(Now() - Duration("6h"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "100"

  pagination:
    link_relation: "next"
    max_parallel_requests: 2

  pull_interval: 30m

Workflow Runs (GitHub Actions)

Monitor CI/CD workflow executions:

nodes:
- name: github_workflows
  type: http_pull_input
  endpoint: https://api.github.com/repos/edgedelta/edgedelta/actions/runs
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameter_expressions:
    created: Concat([">", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")], "")
    per_page: "100"

  pagination:
    link_relation: "next"

  pull_interval: 5m

Release Monitoring

Track new releases and deployments:

nodes:
- name: github_releases
  type: http_pull_input
  endpoint: https://api.github.com/repos/edgedelta/edgedelta/releases
  method: GET

  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")

  parameters:
    - name: per_page
      value: "30"

  pagination:
    link_relation: "next"

  pull_interval: 1h

Rate Limiting Considerations

GitHub API has rate limits:

  • Authenticated requests: 5,000 per hour
  • Unauthenticated: 60 per hour

Best practices:

  1. Always authenticate requests
  2. Use appropriate pull_interval values
  3. Limit max_parallel_requests for pagination
  4. Add rate limit handling:
retry_http_code:
  - 403  # Forbidden (often rate limit)
  - 429  # Too Many Requests

Environment Setup

Set up required environment variables:

# GitHub Personal Access Token
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

# Optional: Custom API host for GitHub Enterprise
export GITHUB_API_HOST="api.github.company.com"

Common Parameters

Parameter Description Example
since ISO 8601 timestamp 2024-01-01T00:00:00Z
per_page Results per page (max 100) 100
sort Sort field created, updated, pushed
direction Sort direction asc, desc
state Filter by state open, closed, all

Troubleshooting

401 Unauthorized:

  • Verify GITHUB_TOKEN environment variable is set
  • Check token has required scopes for the endpoint

403 Forbidden:

  • Often indicates rate limiting
  • Check X-RateLimit-Remaining response header
  • Increase pull_interval if hitting limits

422 Unprocessable Entity:

  • Check timestamp format in since parameter
  • Verify query parameters are valid for the endpoint

Incomplete data:

  • Ensure pagination.link_relation is set to "next"
  • Check max_parallel_requests isn’t too high for rate limits