Configuration
Configuration is done via YAML or JSON files or http api resources.
Logprep searches for the file /etc/logprep/pipeline.yml if no
configuration file is passed.
You can pass multiple configuration files via valid file paths or urls.
logprep run /different/path/file.yml
logprep run http://url-to-our-yaml-file-or-api
logprep run http://api/v1/pipeline http://api/v1/addition_processor_pipline /path/to/connector.yaml
Security Best Practice - Configuration - Combining multiple configuration files
Consider when using multiple configuration files logprep will reject all configuration files if one can not be retrieved or is not valid. If using multiple files ensure that all can be loaded safely and that all endpoints (if using http resources) are accessible.
Security Best Practice - Configuration - Authenticity and Integrity
Ensure that all configuration files are retrieved from trusted sources and have not been
tampered with. Use tls to encrypt the transmission of configuration files and use authentication
described in Authentication for HTTP Getters to ensure confidentiality and integrity.
Configuration File Structure
version: config-1.0
process_count: 2
restart_count: 5
timeout: 5
logger:
level: INFO
input:
kafka:
type: confluentkafka_input
topic: consumer
offset_reset_policy: smallest
kafka_config:
bootstrap.servers: localhost:9092
group.id: test
output:
kafka:
type: confluentkafka_output
topic: producer
flush_timeout: 30
send_timeout: 2
kafka_config:
bootstrap.servers: localhost:9092
pipeline:
- labelername:
type: labeler
schema: examples/exampledata/rules/labeler/schema.json
include_parent_labels: true
rules:
- examples/exampledata/rules/labeler/rules
- dissectorname:
type: dissector
rules:
- examples/exampledata/rules/dissector/rules
- dropper:
type: dropper
rules:
- examples/exampledata/rules/dropper/rules
- filter: "test_dropper"
dropper:
drop:
- drop_me
description: "..."
- pre_detector:
type: pre_detector
rules:
- examples/exampledata/rules/pre_detector/rules
outputs:
- opensearch: sre
tree_config: examples/exampledata/rules/pre_detector/tree_config.json
alert_ip_list_path: examples/exampledata/rules/pre_detector/alert_ips.yml
- amides:
type: amides
rules:
- examples/exampledata/rules/amides/rules
models_path: examples/exampledata/models/model.zip
num_rule_attributions: 10
max_cache_entries: 1000000
decision_threshold: 0.32
- pseudonymizer:
type: pseudonymizer
pubkey_analyst: examples/exampledata/rules/pseudonymizer/example_analyst_pub.pem
pubkey_depseudo: examples/exampledata/rules/pseudonymizer/example_depseudo_pub.pem
regex_mapping: examples/exampledata/rules/pseudonymizer/regex_mapping.yml
hash_salt: a_secret_tasty_ingredient
outputs:
- opensearch: pseudonyms
rules:
- examples/exampledata/rules/pseudonymizer/rules
max_cached_pseudonyms: 1000000
- calculator:
type: calculator
rules:
- filter: "test_label: execute"
calculator:
target_field: "calculation"
calc: "1 + 1"
The options under input, output and pipeline are passed
to factories in Logprep.
They contain settings for each separate processor and connector.
Details for configuring connectors are described in
Output and Input and for processors in Processors.
It is possible to use environment variables in all configuration
and rule files in all places.
Environment variables have to be set in uppercase and prefixed
with LOGPREP_, GITHUB_, PYTEST_ or
CI_. Lowercase variables are ignored. Forbidden
variable names are: ["LOGPREP_LIST"], as it is already used internally.
Security Best Practice - Configuration - Environment Variables
As it is possible to replace all configuration options with environment variables it is
recommended to use these especially for sensitive information like usernames, password, secrets
or hash salts.
Examples where this could be useful would be the key for the hmac calculation (see
input > preprocessing) or the user/secret for the opensearch
connectors.
The following config file will be valid by setting the given environment variables:
version: $LOGPREP_VERSION
process_count: $LOGPREP_PROCESS_COUNT
timeout: 0.1
logger:
level: $LOGPREP_LOG_LEVEL
$LOGPREP_PIPELINE
$LOGPREP_INPUT
$LOGPREP_OUTPUT
export LOGPREP_VERSION="1"
export LOGPREP_PROCESS_COUNT="1"
export LOGPREP_LOG_LEVEL="DEBUG"
export LOGPREP_PIPELINE="
pipeline:
- labelername:
type: labeler
schema: examples/exampledata/rules/labeler/schema.json
include_parent_labels: true
rules:
- examples/exampledata/rules/labeler/rules"
export LOGPREP_OUTPUT="
output:
kafka:
type: confluentkafka_output
topic: producer
flush_timeout: 30
send_timeout: 2
kafka_config:
bootstrap.servers: localhost:9092"
export LOGPREP_INPUT="
input:
kafka:
type: confluentkafka_input
topic: consumer
offset_reset_policy: smallest
kafka_config:
bootstrap.servers: localhost:9092
group.id: test"
- class logprep.util.configuration.Configuration
the configuration class
- version: str
It is optionally possible to set a version to your configuration file which can be printed via
logprep run --version config/pipeline.yml. This has no effect on the execution of logprep but is used as hook for reloading the configuration. Defaults tounset.
- config_refresh_interval: int | None
Configures the interval in seconds on which logprep should try to reload the configuration. If not configured, logprep won’t reload the configuration automatically. If configured the configuration will only be reloaded if the configuration version changes. If http errors occurs on configuration reload config_refresh_interval is set to a quarter of the current config_refresh_interval until a minimum of 5 seconds is reached. Defaults to
None, which means that the configuration will not be refreshed.Security Best Practice - Configuration - Refresh Interval
The refresh interval for the configuration shouldn’t be set too high in production environments. It is suggested to not set a value higher than
300(5 min). That way configuration updates are propagated fairly quickly instead of once a day.It should also be noted that a new configuration file will be read as long as it is a valid config. There is no further check to ensure credibility.
In case a new configuration could not be retrieved successfully and the
config_refresh_intervalis already reduced automatically to 5 seconds it should be noted that this could lead to a blocking behavior or a significant reduction in performance as logprep is often retrying to reload the configuration. Because of that ensure that the configuration endpoint is always available.
- process_count: int
Number of logprep processes to start. Defaults to
1.
- restart_count: int
Number of restarts before logprep exits. Defaults to
5. If this value is set to a negative number, logprep will always restart immediately.Security Best Practice - Configuration - Restart Counter
The restart counter should be set to a value greater than 0 to ensure that logprep exits gracefully in case of repeated failures. This ensures that resources are released properly and any necessary cleanup is performed. Additionally the process will exit with an exit code unequal 0 to indicate that an error occurred. This is especially useful if you use an external orchestrator like k8s or systemd to manage the logprep process to get notified about failures via their respective monitoring and alerting systems.
- timeout: float
Logprep tries to react to signals (like sent by CTRL+C) within the given time. The time taken for some processing steps is not always predictable, thus it is not possible to ensure that this time will be adhered to. However, Logprep reacts quickly for small values (< 1.0), but this requires more processing power. This can be useful for testing and debugging. Larger values (like 5.0) slow the reaction time down, but this requires less processing power, which makes in preferable for continuous operation. Defaults to
5.0.
- logger: LoggerConfig
Logger configuration.
- class LoggerConfig
The logger config class used in Configuration. The schema for this class is derived from the python logging module: https://docs.python.org/3/library/logging.config.html#dictionary-schema-details
- LoggerConfig.level: str
The log level of the root logger. Defaults to
INFO.Security Best Practice - Configuration - Log-Level
The log level of the root logger should be set to
INFOor higher in production environments to avoid exposing sensitive information in the logs.
- LoggerConfig.format: str
The format of the log message as supported by the
LogprepFormatter. Defaults to"%(asctime)-15s %(name)-10s %(levelname)-8s: %(message)s".- class LogprepFormatter
A custom formatter for logprep logging with additional attributes.
The Formatter can be initialized with a format string which makes use of knowledge of the LogRecord attributes - e.g. the default value mentioned above makes use of the fact that the user’s message and arguments are pre- formatted into a LogRecord’s message attribute. The available attributes are listed in the python documentation . Additionally, the formatter provides the following logprep specific attributes:
attribute
description
%(hostname)
(Logprep specific) The hostname of the machine where the log was emitted
- LoggerConfig.datefmt: str
The date format of the log message. Defaults to
"%Y-%m-%d %H:%M:%S".
- LoggerConfig.loggers: dict
The loggers loglevel configuration. Defaults to:
root
INFO
filelock
ERROR
urllib3.connectionpool
ERROR
opensearch
ERROR
uvicorn
INFO
uvicorn.access
INFO
uvicorn.error
INFO
You can alter the log level of the loggers by adding them to the loggers mapping like in the example. Logprep opts out of hierarchical loggers and so it is possible to set the log level in general for all loggers in the
rootlogger toINFOand then set the log level for specific loggers likeRunnertoDEBUGto get only DEBUG Messages from the Runner instance.If you want to silence other loggers like
py.warningsyou can set the log level toERRORhere.Example of a custom logger configurationlogger: level: ERROR format: "%(asctime)-15s %(hostname)-5s %(name)-10s %(levelname)-8s: %(message)s" datefmt: "%Y-%m-%d %H:%M:%S" loggers: "py.warnings": {"level": "ERROR"} "Runner": {"level": "DEBUG"}
Note
The effective log level of the root logger is controlled via
logger.level. By default,logger.levelis set toINFOif not configured explicitly. A value configured underloggers.root.levelis currently ignored for the root logger, because it will always be overwritten bylogger.level. Providingloggers.root.leveltherefore has no effect (except for triggering a warning during startup).
- input: dict
Input connector configuration. Defaults to
{}. For detailed configurations see Input.
- output: dict
Output connector configuration. Defaults to
{}. For detailed configurations see Output.
- error_output: dict
Error output connector configuration. Defaults to
{}. This is optional. If no error output is configured, logprep will not handle events that could not be processed by the pipeline, not parsed correctly by input connectors or not stored correctly by output connectors. For detailed configurations see Output.
- pipeline: list[dict]
Pipeline configuration. Defaults to
[]. See Processors for a detailed overview on how to configure a pipeline.
- metrics: MetricsConfig
Metrics configuration. Defaults to
{"enabled": False, "port": 8000, "uvicorn_config": {}}.The key
uvicorn_configcan be configured with any uvicorn config parameters. For further information see the uvicorn documentation.Security Best Practice - Configuration - Metrics Configuration
Additionally to the below it is recommended to configure ssl on the metrics server endpoint
metrics: enabled: true port: 9000 uvicorn_config: access_log: true server_header: false date_header: false workers: 1
- profile_pipelines: bool
Start the profiler to profile the pipeline. Defaults to
False. This can be used to profile logprep in near production environments to inspect performance bottlenecks.
- error_backlog_size: int
Size of the error backlog. Defaults to
15000.Security Best Practice - Configuration - Error Backlog Size
Depending on your environment ensure that this value adheres to your overall system resource limits. This can lead to OOM (Out Of Memory) errors if the backlog grows too large in failure situations. You have to reserve memory for this backlog to avoid DOS (Denial of Service) attacks by sending failing logs.
- Input
- Output
- Processors
- Amides
- Calculator
- Clusterer
- Concatenator
- DatetimeExtractor
- Deleter
- Decoder
- Dissector
- DomainLabelExtractor
- DomainResolver
- Dropper
- FieldManager
- GenericAdder
- GenericResolver
- GeoipEnricher
- Grokker
- IpInformer
- KeyChecker
- Labeler
- ListComparison
- PreDetector
- Pseudonymizer
- Replacer
- Requester
- SelectiveExtractor
- StringSplitter
- TemplateReplacer
- Timestamper
- TimestampDiffer
- Rules
- Getters
- Metrics
- YAML Tags