StringSplitter
This presentation’s goal it to introduce the features of the StringSplitter and how to configure it.
The challenges
I want to split strings of varying length contained in a source field
given preprocessed log entry:
[9]:
document = {
"ip_addresses": "192.168.5.1, 10.10.2.1, fe80::, 127.0.0.1"
}
Create rules and processor
create the rules:
[10]:
import sys
sys.path.append("../../../../../")
from logprep.processor.string_splitter.rule import StringSplitterRule
rules_definitions = [
{
"filter": "ip_addresses",
"string_splitter": {
"source_fields": ["ip_addresses"],
"target_field": "ip_address_list"
},
}
]
rules = [StringSplitterRule.create_from_dict(rule_dict) for rule_dict in rules_definitions]
rules
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[10], line 4
1 import sys
2 # sys.path.append("../../../../../")
----> 4 from logprep.processor.string_splitter.rule import StringSplitterRule
5 rules_definitions = [
6 {
7 "filter": "ip_addresses",
(...)
12 }
13 ]
14 rules = [StringSplitterRule._create_from_dict(rule_dict) for rule_dict in rules_definitions]
ModuleNotFoundError: No module named 'logprep.processor.string_splitter'
create the processor config:
[ ]:
processor_config = {
"allmighty_string_splitter": {
"type": "string_splitter",
"rules": ["/dev"],
}
}
create the processor with the factory:
[ ]:
from logging import getLogger
from logprep.factory import Factory
logger = getLogger()
processor = Factory.create(processor_config)
processor
string_splitter
load rules to processor
[ ]:
for rule in rules:
processor._rule_tree.add_rule(rule)
processor.rules
[filter="ip_addresses", StringSplitterRule.Config(description='', regex_fields=[], tests=[], tag_on_failure=['_string_splitter_failure'], source_fields=['ip_addresses'], target_field='ip_addresses', delete_source_fields=False, overwrite_target=True, extend_target_list=False, delimeter=' ')]
Process event
[ ]:
from copy import deepcopy
mydocument = deepcopy(document)
processor.process(mydocument)
Check Results
[ ]:
document
{'ip_addresses': '192.168.5.1, 10.10.2.1, fe80::, 127.0.0.1'}
[ ]:
mydocument
{'ip_addresses': ['192.168.5.1,', '10.10.2.1,', 'fe80::,', '127.0.0.1']}