ExtractGrokPatterns
2 minute read
The ExtractGrokPatterns converter uses Grok patterns to extract data from a string.
Syntax: ExtractGrokPatterns(string, grokPattern)
- string: the bracket notation location of the string field
- grokPattern: the grok pattern to use for extraction
Input
{
"_type": "log",
"attributes": {
"decoded_body": "time=1724177404|hostname=CPLPOL32|product=Firewall|layer_name=ENGCORE_MASTER"
},
"body": "time=1724177404|hostname=CPLPOL32|product=Firewall|layer_name=ENGCORE_MASTER",
"resource": {
"ed.conf.id": "123456789",
"ed.domain": "pipeline",
"ed.org.id": "987654321",
"ed.source.name": "__ed_dummy_test_input",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "ed-tester",
"src_type": "memory_input"
},
"timestamp": 1733727200176
}
Statement
set(attributes["grokked"], ExtractGrokPatterns(attributes["decoded_body"], "time=(?P<log_timestamp>\\d+)\\|hostname=(?P<log_hostname>[^|]+)\\|product=(?P<log_product>[^|]+)\\|layer_name=(?P<log_layer_name>[^|]+)", true))
Output
{
"_type": "log",
"attributes": {
"decoded_body": "time=1724177404|hostname=CPLPOL32|product=Firewall|layer_name=ENGCORE_MASTER",
"grokked": {
"log_hostname": "CPLPOL32",
"log_layer_name": "ENGCORE_MASTER",
"log_product": "Firewall",
"log_timestamp": "1724177404"
}
},
"body": "time=1724177404|hostname=CPLPOL32|product=Firewall|layer_name=ENGCORE_MASTER",
"resource": {
"ed.conf.id": "123456789",
"ed.domain": "pipeline",
"ed.org.id": "987654321",
"ed.source.name": "__ed_dummy_test_input",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "ed-tester",
"src_type": "memory_input"
},
"timestamp": 1733727245095
}
The ExtractGrokPatterns function was applied to extract structured data from the decoded_body attribute, which contained log information in a single string format. The transformation used a regular expression pattern to parse and extract parts of the log into key-value pairs, which were then stored in an attribute map called grokked.
Example: Extracting and Parsing Nested Data
This example shows how to extract a message from JSON, parse it with grok patterns, and then use ParseKeyValue to further parse component data.
Input
{
"_type": "log",
"timestamp": 1762912482420,
"body": {
"message": "2025-11-12T01:54:41Z INFO service{component=api,version=1.2.3,region=us-west}:request{id=req-94}: User authentication successful",
"seq": 94
},
"resource": {
"ed.source.name": "kubernetes_input_e389",
"ed.source.type": "kubernetes_input",
"k8s.namespace.name": "busy",
"k8s.pod.name": "test-app"
},
"attributes": {}
}
Statements
set(cache["message"], body["message"])
set(attributes["message_data"], ExtractGrokPatterns(cache["message"], "^(?P<log_timestamp>.*Z) (?P<log_level>[A-Z]+)\\s[a-z]+\\{(?P<component>[^}]+)}:request\\{(?<request>[^}]+)}: (?P<message_new>[^}]+)"))
// put component data in cache
set(cache["component"], attributes["message_data"]["component"])
// update component info - parse key=value pairs separated by commas
set(attributes["message_data"]["component"], ParseKeyValue(cache["component"], "=",","))
Output
{
"_type": "log",
"timestamp": 1762912482420,
"body": {
"message": "2025-11-12T01:54:41Z INFO service{component=api,version=1.2.3,region=us-west}:request{id=req-94}: User authentication successful",
"seq": 94
},
"resource": {
"ed.source.name": "kubernetes_input_e389",
"ed.source.type": "kubernetes_input",
"k8s.namespace.name": "busy",
"k8s.pod.name": "test-app"
},
"attributes": {
"message_data": {
"component": {
"component": "api",
"region": "us-west",
"version": "1.2.3"
},
"log_level": "INFO",
"log_timestamp": "2025-11-12T01:54:41Z",
"message_new": "User authentication successful",
"request": "id=req-94"
}
}
}
This example demonstrates:
- Extracting the “message” string from a parsed JSON body field
- Using
ExtractGrokPatternsto parse the structured log message into named fields - Further parsing the
componentfield usingParseKeyValueto convertkey=value,key=valueformat into a nested map