Services

A service is a named instance of a service type (e.g. DatabaseConnector, StorageConnector) with a specific configuration. Services are defined in service files and can be executed using the CLI application or they can be scheduled and automatically launched by the server application.

Creating a service

A service is created by defining its name, a service type, and the specific configuration in a service file.

Example: Service MyService

services:
  MyService:
    type: DatabaseConnector

The name and service type are mandatory. The specific configuration depends on the service type.

Service type

Each service type implements a specific task in a predefined, fixed sequence of steps (e.g. extraction, transformation, or reorganization). The application supports the following service types:

Service type	Description
`DatabaseConnector`	Connects to a database using JDBC, extracts metadata from the database catalog, and transforms and uploads the metadata to the destination application (dataspot) using the upload API.
`StorageConnector`	Connects to a storage system (e.g. AWS S3, Google Cloud Storage, Azure Storage), extracts metadata from files (e.g. CSV, Parquet), and transforms and uploads the metadata to a destination application (dataspot) using the upload API.
`DatabricksConnector`	Connects to a Databricks Unity Catalog instance, extracts metadata and runtime lineage information from the workspace, and transforms and uploads the metadata to a destination application (dataspot) using the upload API.
`FileUpload`	Uploads a payload file to a destination application (dataspot) using the upload API.
`ApplicationReorg`	Reorganizes the application by deleting expired jobs.

Service file

Services are defined in service files in the format YAML. The root property services contains a map of services configurations, with the map key being the service name.

Example: Service file with a single service

services:
  MyDatabaseService:
    type: DatabaseConnector
    source:
      url: jdbc:sqlserver://myserver:1433;DatabaseName=mydatabase

A service file may contain multiple services, possibly with different service types. Within the service file, each service must be identified by a unique name.

Example: Service file with multiple services

services:
  MyDatabaseService:
    type: DatabaseConnector
    source:
      url: jdbc:sqlserver://myserver:1433;DatabaseName=mydatabase

  MyStorageService:
    type: StorageConnector
    source:
      url: s3a://my-s3-bucket

Executing a service

A service can be executed using the command-line interface (CLI) application by specifying the service name and the service file. Services can also be automatically launched by the server application, based on the schedules defined in the services.

The application loads and executes the service:

The application validates the service configuration.
The application starts a job that, depending on the service type, performs a predefined, fixed sequence of steps.
Each step of the job typically processes a different section of the service configuration.
Finally, the job finishes successfully or terminates with an error message.

Each service type specifies a predefined, fixed sequence of steps. A service type (e.g. DatabaseConnector) might extract and transform metadata from a source, while another service type (e.g. ApplicationReorg) might perform maintenance work.

Note

During execution, the placeholders in the service configuration are resolved by searching for them in the configured placeholder sources. The execution details and the statuses of the currently running or finished services are stored in the working database.

Service file monitor

The server application automatically starts a service file monitor. The service file monitor periodically scans the default service directory (including its subdirectories) and the default service file for created, modified, or deleted services.

Tooltip

The application configuration connector.config.service.monitor.interval specifies how often the service file monitor should scan the file system, or if the service file monitor should be disabled.

By keeping track of service files, the server application can extract the schedules defined in the services and automatically launch scheduled services. Each time a service file is created, modified, or deleted, the file is parsed and the scheduled services associated with the file are updated. If a service file cannot be parsed, all existing schedules of that file are canceled.

Configuration

A service is configured by defining its unique name, the service type, and the configuration.

Tooltip

While YAML itself doesn't enforce any naming style for property names, multi-word properties (for example, access key) are typically specified in lowercase separated by hyphens (for example, access-key). This naming style - commonly referred to as kebab-case - is used in the following descriptions and examples. However, all multi-word properties can also be specified in camelCase (for example, accessKey).

Services have the following configuration in common. Specific service types have additional configurations.

Tooltip

Properties marked with * are required for the service to run.

🔑 Property `type` *

The service type.

required

The service type identifies the predefined, fixed sequence of steps that are performed by the job when the service is executed.

Example: Property type

services:
  MyService:
    type: DatabaseConnector

🔑 Property `schedule`

The service schedule specified as a cron expression.

optional

The default is null (not scheduled).

If a schedule is defined, the server application automatically launches the service, when the specified cron expression matches the current date and time.

Example: Property schedule

services:
  MyDatabaseConnector:
    type: DatabaseConnector
    schedule: "30 09,12,15,18 * * *" # daily, at 09:30, 12:30, 15:30, 18:30

  MyApplicationReorg:
    type: ApplicationReorg
    schedule: "0 1 1,15 * *"         # on the 1st and 15th of every month, at 01:00

While each service type has its own specific configuration, the following general concepts apply to all service configurations.

Placeholders

Placeholders are tokens used in service configurations to represent string values which are stored in external sources. Placeholders are specified in the format ${key}. Their actual values are resolved when the service is executed.

Format	Description
`${key}`	The `key` is an identifier that maps to a value in one of the external sources.

Attention

As a recommendation, credentials or sensitive data - such as passwords, access tokens or client secrets - should not be stored in service files (where they might be compromised). Instead, they should reside separately in external sources.

Example: Placeholders

services:
  MyService:
    type: DatabaseConnector
    upload:
      authentication:
        method: password
        username: ${basic.username}
        password: ${basic.password}

Placeholder sources

When a service is executed, the application attempts to resolve each placeholder ${key} and replace it with an actual value by searching for key in the following sources:

Source	Description
`connector.config.placeholders.files`	Additional properties files specified in `connector.config.placeholders.files`.
Command-line arguments	Placeholders defined as command-line arguments using `--key=value`.
System properties	Placeholders defined as system properties (JVM) using `java -Dkey=value`
Environment variables	Placeholders defined as environment variables (e.g. `export key=value` on Linux)
`application.properties`	Placeholders defined in the configuration file `application.properties`.
`application.yaml`	Placeholders defined in the configuration file `application.yaml`.

Note

The sources are searched in the above order - from top to bottom. The additional properties files specified in the application configuration connector.config.placeholders.files have the highest precedence. The configuration file application.yaml has the lowest precedence.

If a placeholder ${key} is not found in any of the sources, the placeholder isn't replaced but remains in the string as ${key}.

Example: Additional properties file basic.properties

basic.username=myuser
basic.password=mypassword

Example: Command-line arguments

--basic.username=myuser --basic.password=mypassword

Example: System properties

-Dbasic.username=myuser -Dbasic.password=mypassword

Example: Environment variables

export BASIC_USERNAME=myuser
export BASIC_PASSWORD=mypassword

Tooltip

Placeholders defined as environment variables are specified in snake_case or SCREAMING_SNAKE_CASE.

Built-in placeholders

The following properties have built-in, predefined placeholders.

Property	Built-in placeholder
`utilities.dump`	`${utilities.dump}`
`utilities.restore.landing`	`${utilities.restore.landing}`
`utilities.restore.staging`	`${utilities.restore.staging}`
`upload.options.dry-run`	`${upload.options.dry-run}`

Properties with built-in placeholders can be set without ever specifying them in the service file. If the property is not specified in the service file, the built-in placeholder is resolved, by default:

If the built-in placeholder is found in one of the external sources, the property is set to the resolved value.
Otherwise, the built-in placeholder is ignored (i.e. the property is set to the property's default value).

Tooltip

Built-in placeholders allow certain features, typically related to maintenance (e.g. writing dumps or performing dry runs), to be enabled or disabled without ever modifying the service file.

Example: Built-in placeholder

A property (e.g. utilities.dump) could be specified in a service file with a fixed value (e.g. true). Changing the property's value would always involve modifying the service file:

services:
  MyService:
    type: DatabaseConnector
    utilities:
      dump: true

Alternatively, the property could be specified with a custom placeholder (e.g. ${dump.enabled}) and subsequently be set without modifying the service file, but by defining the placeholder value in an external source (e.g. java -Ddump.enabled=true):

services:
  MyService:
    type: DatabaseConnector
    utilities:
      dump: ${dump.enabled}

More conveniently, using its built-in placeholder (e.g. ${utilities.dump}), the property can be set by removing it from the service file altogether and instead only defining the built-in placeholder value in an external source (e.g. java -Dutilities.dump=true):

services:
  MyService:
    type: DatabaseConnector

Cron expressions

A recurring point in time (for example, a service schedule) can be specified using a UNIX-style cron expression with five fields:

* * * * *
| | | | |
| | | | └ day of week (0-6, Sunday = 0)
| | | |
| | | └ month (1-12)
| | |
| | └ day of month (1-31)
| |
| └ hour (0-23)
|
└ minute (0-59)

A cron expression is evaluated by matching the specified values with the current date and time.

An asterisk * matches all possible values for a field (i.e. no restriction).
For example, hour with the value * means "match every hour", day of month with the value * means "match every day".
A comma , specifies a list of values.
For example, minute with the value list 0,15,30,45 means "match at the minutes 0, 15, 30, and 45".
A dash - specifies a range of values.
For example, 1-6 means "match the values from 1 to 6" (equivalent to 1,2,3,4,5,6).
A slash / specifies to skip a certain number of values.
For example, minute with the value */15 means "match every 15 minutes" (equivalent to 0,15,30,45).

Tooltip

If ambiguous entries are defined, only one of the criteria must be met. For example, if both day of month and day of week are restricted (i.e. not *), then the current date only needs to match one of these two criteria.

Example	Description
`* * * * *`	match every minute
`/10 * * *`	match every 10 minutes
`0 17 * * 0`	match every Sunday, at 17:00
`0 1 * * 2-5`	match every Tuesday to Friday, at 01:00
`30 07,09,13,15 * * *`	match every day, at 07:30, 09:30, 13:30, 15:30
`0 /4 * *`	match every four hours (at minute 0)
`0 3 1,15 * *`	match every month on days 1 and 15, at 03:00
`0 22 1 * 0`	match every Sunday or on day 1 of every month, at 22:00

Pattern filters

A pattern filter is an advanced pattern matching mechanism used for properties that define filters (e.g. names, types) in service configurations. Rather than using a single regular expression to match a given value, a pattern filter allows the value to be matched against multiple regular expressions.

The property accept defines a list of regular expressions to accept the value.
The property reject defines a list of regular expressions to reject the value.

Example: Pattern filter

names:          #
  accept:       #
    - Finance   #
    - Sales.*   # match the name 'Finance' and names starting with 'Sales'
  reject:       #
    - SalesTest #
    - .*_Temp   # except the name 'SalesTest' and names ending with '_Temp'

A given value matches the pattern filter if it is accepted and is not rejected:

Evaluate the accept list
- If the accept list is null, all values are accepted.
- If the accept list is empty, no values are accepted.
- Otherwise, the value is accepted if it matches any regular expression in the accept list.
Evaluate the reject list
- If the reject list is null, no values are rejected.
- If the reject list is empty, no values are rejected.
- Otherwise, the value is rejected if it matches any regular expression in the reject list.

Note

An empty pattern filter (accept and reject are null) matches any value (all values are accepted and no values are rejected).

Patterns are automatically anchored by adding ^ at the beginning and $ at the end (corresponding to start and end). These anchor characters should not be specified in the pattern. The entire value, from beginning to finish, must match the pattern - nothing can come before or after the pattern.

Tooltip

To match a value that only contains a specific pattern (e.g. TEMP) - allowing something to come before or after the pattern - the pattern should be embedded in .* (e.g. .*TEMP.*)

Example: Pattern filter

types:          #
  accept:       #
    - .*TABLE   # match types ending with 'TABLE'
  reject:       #
    - SYSTEM.*  #
    - .*TEMP.*  # except types starting with 'SYSTEM' or containing 'TEMP'

Note

Both the value and the pattern may be null. The value null only matches a pattern, if the pattern is also null - and vice versa.

In the simplest case, a pattern filter could define an accept list with a single regular expression and no reject list. This is equivalent to matching against a single regular expression.

Example: Pattern filter

names:           #
  accept: Sales  # match the name 'Sales'

Tooltip

Notice how accept is a single-value list - containing only the single value Sales.

Single-value lists

Specific properties in service configurations may contain lists of values. Lists are typically formatted using -, even when they contain only a single value.

Example: Single-value lists

extensions:
  - parquet           # primitive property

datatypes:
  - stereotype: type  #
    restricted: true  # structured property

For convenience, a single-value list in a service configuration can be formatted as a single value, rather than as a list with a single value.

info

The single value is automatically treated as a list behind the scenes.

Example: Single-value lists (compact format)

extensions: parquet   # primitive property

datatypes:
  stereotype: type    #
  restricted: true    # structured property

Tooltip

Notice how this applies not only to primitive properties, such as strings or numbers, but also to structured properties.

Creating a service​

Service type​

Service file​

Executing a service​

Service file monitor​

Configuration​

🔑 Property type *

🔑 Property schedule

Placeholders​

Placeholder sources​

Built-in placeholders​

Cron expressions​

Pattern filters​

Single-value lists​

Creating a service

Service type

Service file

Executing a service

Service file monitor

Configuration

🔑 Property `type` *

🔑 Property `schedule`

Placeholders

Placeholder sources

Built-in placeholders

Cron expressions

Pattern filters

Single-value lists