Services
A service is a named instance of a service type (e.g. DatabaseConnector, StorageConnector) with a specific configuration.
Services are defined in service files and can be executed using the CLI application or they can be scheduled and automatically launched by the server application.
Creating a service
A service is created by defining its name, a service type, and the specific configuration in a service file.
Example: Service MyService
services:
MyService:
type: DatabaseConnector
The name and service type are mandatory. The specific configuration depends on the service type.
Service type
Each service type implements a specific task in a predefined, fixed sequence of steps (e.g. extraction, transformation, or reorganization). The application supports the following service types:
| Service type | Description |
|---|---|
DatabaseConnector | Connects to a database using JDBC, extracts metadata from the database catalog, and transforms and uploads the metadata to the destination application (dataspot) using the upload API. |
StorageConnector | Connects to a storage system (e.g. AWS S3, Google Cloud Storage, Azure Storage), extracts metadata from files (e.g. CSV, Parquet), and transforms and uploads the metadata to a destination application (dataspot) using the upload API. |
DatabricksConnector | Connects to a Databricks Unity Catalog instance, extracts metadata and runtime lineage information from the workspace, and transforms and uploads the metadata to a destination application (dataspot) using the upload API. |
FileUpload | Uploads a payload file to a destination application (dataspot) using the upload API. |
ApplicationReorg | Reorganizes the application by deleting expired jobs. |
Service file
Services are defined in service files in the format YAML.
The root property services contains a map of services configurations, with the map key being the service name.
Example: Service file with a single service
services:
MyDatabaseService:
type: DatabaseConnector
source:
url: jdbc:sqlserver://myserver:1433;DatabaseName=mydatabase
A service file may contain multiple services, possibly with different service types. Within the service file, each service must be identified by a unique name.
Example: Service file with multiple services
services:
MyDatabaseService:
type: DatabaseConnector
source:
url: jdbc:sqlserver://myserver:1433;DatabaseName=mydatabase
MyStorageService:
type: StorageConnector
source:
url: s3a://my-s3-bucket
Executing a service
A service can be executed using the command-line interface (CLI) application by specifying the service name and the service file. Services can also be automatically launched by the server application, based on the schedules defined in the services.
The application loads and executes the service:
- The application validates the service configuration.
- The application starts a job that, depending on the service type, performs a predefined, fixed sequence of steps.
- Each step of the job typically processes a different section of the service configuration.
- Finally, the job finishes successfully or terminates with an error message.
Each service type specifies a predefined, fixed sequence of steps.
A service type (e.g. DatabaseConnector) might extract and transform metadata from a source, while another service type (e.g. ApplicationReorg) might perform maintenance work.
During execution, the placeholders in the service configuration are resolved by searching for them in the configured placeholder sources. The execution details and the statuses of the currently running or finished services are stored in the working database.
Service file monitor
The server application automatically starts a service file monitor. The service file monitor periodically scans the default service directory (including its subdirectories) and the default service file for created, modified, or deleted services.
The application configuration connector.config.service.monitor.interval specifies how often the service file monitor should scan the file system, or if the service file monitor should be disabled.
By keeping track of service files, the server application can extract the schedules defined in the services and automatically launch scheduled services. Each time a service file is created, modified, or deleted, the file is parsed and the scheduled services associated with the file are updated. If a service file cannot be parsed, all existing schedules of that file are canceled.
Configuration
A service is configured by defining its unique name, the service type, and the configuration.
While YAML itself doesn't enforce any naming style for property names, multi-word properties (for example, access key) are typically specified in lowercase separated by hyphens (for example, access-key).
This naming style - commonly referred to as kebab-case - is used in the following descriptions and examples.
However, all multi-word properties can also be specified in camelCase (for example, accessKey).
Services have the following configuration in common. Specific service types have additional configurations.
Properties marked with * are required for the service to run.
🔑 Property type *
The service type.
requiredThe service type identifies the predefined, fixed sequence of steps that are performed by the job when the service is executed.
Example: Property type
services:
MyService:
type: DatabaseConnector
🔑 Property schedule
The service schedule specified as a cron expression.
optionalThe default is null (not scheduled).
If a schedule is defined, the server application automatically launches the service, when the specified cron expression matches the current date and time.
Example: Property schedule
services:
MyDatabaseConnector:
type: DatabaseConnector
schedule: "30 09,12,15,18 * * *" # daily, at 09:30, 12:30, 15:30, 18:30
MyApplicationReorg:
type: ApplicationReorg
schedule: "0 1 1,15 * *" # on the 1st and 15th of every month, at 01:00
While each service type has its own specific configuration, the following general concepts apply to all service configurations.
Placeholders
Placeholders are tokens used in service configurations to represent string values which are stored in external sources.
Placeholders are specified in the format ${key}.
Their actual values are resolved when the service is executed.
| Format | Description |
|---|---|
${key} | The key is an identifier that maps to a value in one of the external sources. |
As a recommendation, credentials or sensitive data - such as passwords, access tokens or client secrets - should not be stored in service files (where they might be compromised). Instead, they should reside separately in external sources.
Example: Placeholders
services:
MyService:
type: DatabaseConnector
upload:
authentication:
method: password
username: ${basic.username}
password: ${basic.password}
Placeholder sources
When a service is executed, the application attempts to resolve each placeholder ${key} and replace it with an actual value by searching for key in the following sources:
| Source | Description |
|---|---|
connector.config.placeholders.files | Additional properties files specified in connector.config.placeholders.files. |
| Command-line arguments | Placeholders defined as command-line arguments using --key=value. |
| System properties | Placeholders defined as system properties (JVM) using java -Dkey=value |
| Environment variables | Placeholders defined as environment variables (e.g. export key=value on Linux) |
application.properties | Placeholders defined in the configuration file application.properties. |
application.yaml | Placeholders defined in the configuration file application.yaml. |
The sources are searched in the above order - from top to bottom.
The additional properties files specified in the application configuration connector.config.placeholders.files have the highest precedence.
The configuration file application.yaml has the lowest precedence.
If a placeholder ${key} is not found in any of the sources, the placeholder isn't replaced but remains in the string as ${key}.
Example: Additional properties file basic.properties
basic.username=myuser
basic.password=mypassword
Example: Command-line arguments
--basic.username=myuser --basic.password=mypassword
Example: System properties
-Dbasic.username=myuser -Dbasic.password=mypassword
Example: Environment variables
export BASIC_USERNAME=myuser
export BASIC_PASSWORD=mypassword
Placeholders defined as environment variables are specified in snake_case or SCREAMING_SNAKE_CASE.
Built-in placeholders
The following properties have built-in, predefined placeholders.
| Property | Built-in placeholder |
|---|---|
utilities.dump | ${utilities.dump} |
utilities.restore.landing | ${utilities.restore.landing} |
utilities.restore.staging | ${utilities.restore.staging} |
upload.options.dry-run | ${upload.options.dry-run} |
Properties with built-in placeholders can be set without ever specifying them in the service file. If the property is not specified in the service file, the built-in placeholder is resolved, by default:
- If the built-in placeholder is found in one of the external sources, the property is set to the resolved value.
- Otherwise, the built-in placeholder is ignored (i.e. the property is set to the property's default value).
Built-in placeholders allow certain features, typically related to maintenance (e.g. writing dumps or performing dry runs), to be enabled or disabled without ever modifying the service file.
Example: Built-in placeholder
A property (e.g. utilities.dump) could be specified in a service file with a fixed value (e.g. true).
Changing the property's value would always involve modifying the service file:
services:
MyService:
type: DatabaseConnector
utilities:
dump: true
Alternatively, the property could be specified with a custom placeholder (e.g. ${dump.enabled}) and subsequently be set without modifying the service file, but by defining the placeholder value in an external source (e.g. java -Ddump.enabled=true):
services:
MyService:
type: DatabaseConnector
utilities:
dump: ${dump.enabled}
More conveniently, using its built-in placeholder (e.g. ${utilities.dump}), the property can be set by removing it from the service file altogether and instead only defining the built-in placeholder value in an external source (e.g. java -Dutilities.dump=true):
services:
MyService:
type: DatabaseConnector
Cron expressions
A recurring point in time (for example, a service schedule) can be specified using a UNIX-style cron expression with five fields:
* * * * *
| | | | |
| | | | └ day of week (0-6, Sunday = 0)
| | | |
| | | └ month (1-12)
| | |
| | └ day of month (1-31)
| |
| └ hour (0-23)
|
└ minute (0-59)
A cron expression is evaluated by matching the specified values with the current date and time.
- An asterisk
*matches all possible values for a field (i.e. no restriction).
For example,hourwith the value*means "match every hour",day of monthwith the value*means "match every day". - A comma
,specifies a list of values.
For example,minutewith the value list0,15,30,45means "match at the minutes 0, 15, 30, and 45". - A dash
-specifies a range of values.
For example,1-6means "match the values from 1 to 6" (equivalent to1,2,3,4,5,6). - A slash
/specifies to skip a certain number of values.
For example,minutewith the value*/15means "match every 15 minutes" (equivalent to0,15,30,45).
If ambiguous entries are defined, only one of the criteria must be met.
For example, if both day of month and day of week are restricted (i.e. not *), then the current date only needs to match one of these two criteria.
| Example | Description |
|---|---|
* * * * * | match every minute |
*/10 * * * * | match every 10 minutes |
0 17 * * 0 | match every Sunday, at 17:00 |
0 1 * * 2-5 | match every Tuesday to Friday, at 01:00 |
30 07,09,13,15 * * * | match every day, at 07:30, 09:30, 13:30, 15:30 |
0 */4 * * * | match every four hours (at minute 0) |
0 3 1,15 * * | match every month on days 1 and 15, at 03:00 |
0 22 1 * 0 | match every Sunday or on day 1 of every month, at 22:00 |
Pattern filters
A pattern filter is an advanced pattern matching mechanism used for properties that define filters (e.g. names, types) in service configurations.
Rather than using a single regular expression to match a given value, a pattern filter allows the value to be matched against multiple regular expressions.
- The property
acceptdefines a list of regular expressions to accept the value. - The property
rejectdefines a list of regular expressions to reject the value.
Example: Pattern filter
names: #
accept: #
- Finance #
- Sales.* # match the name 'Finance' and names starting with 'Sales'
reject: #
- SalesTest #
- .*_Temp # except the name 'SalesTest' and names ending with '_Temp'
A given value matches the pattern filter if it is accepted and is not rejected:
- Evaluate the
acceptlist- If the
acceptlist isnull, all values are accepted. - If the
acceptlist is empty, no values are accepted. - Otherwise, the value is accepted if it matches any regular expression in the
acceptlist.
- If the
- Evaluate the reject list
- If the
rejectlist isnull, no values are rejected. - If the
rejectlist is empty, no values are rejected. - Otherwise, the value is rejected if it matches any regular expression in the
rejectlist.
- If the
An empty pattern filter (accept and reject are null) matches any value (all values are accepted and no values are rejected).
Patterns are automatically anchored by adding ^ at the beginning and $ at the end (corresponding to start and end).
These anchor characters should not be specified in the pattern.
The entire value, from beginning to finish, must match the pattern - nothing can come before or after the pattern.
To match a value that only contains a specific pattern (e.g. TEMP) - allowing something to come before or after the pattern - the pattern should be embedded in .* (e.g. .*TEMP.*)
Example: Pattern filter
types: #
accept: #
- .*TABLE # match types ending with 'TABLE'
reject: #
- SYSTEM.* #
- .*TEMP.* # except types starting with 'SYSTEM' or containing 'TEMP'
Both the value and the pattern may be null.
The value null only matches a pattern, if the pattern is also null - and vice versa.
In the simplest case, a pattern filter could define an accept list with a single regular expression and no reject list.
This is equivalent to matching against a single regular expression.
Example: Pattern filter
names: #
accept: Sales # match the name 'Sales'
Notice how accept is a single-value list - containing only the single value Sales.
Single-value lists
Specific properties in service configurations may contain lists of values.
Lists are typically formatted using -, even when they contain only a single value.
Example: Single-value lists
extensions:
- parquet # primitive property
datatypes:
- stereotype: type #
restricted: true # structured property
For convenience, a single-value list in a service configuration can be formatted as a single value, rather than as a list with a single value.
The single value is automatically treated as a list behind the scenes.
Example: Single-value lists (compact format)
extensions: parquet # primitive property
datatypes:
stereotype: type #
restricted: true # structured property
Notice how this applies not only to primitive properties, such as strings or numbers, but also to structured properties.