Skip to main content

Connectors

In general, connectors are a category of service types that extract metadata from an external source, transform the metadata, and finally upload the transformed metadata to a destination application (dataspot).

ConnectorDescription
DatabaseConnectorConnects to a database using JDBC, extracts metadata from the database catalog, and transforms and uploads the metadata to the destination application (dataspot) using the upload API.
StorageConnectorConnects to a storage system (e.g. AWS S3, Google Cloud Storage, Azure Storage), extracts metadata from files (e.g. CSV, Parquet), and transforms and uploads the metadata to a destination application (dataspot) using the upload API.
DatabricksConnectorConnects to a Databricks Unity Catalog instance, extracts metadata and runtime lineage information from the workspace, and transforms and uploads the metadata to a destination application (dataspot) using the upload API.

While concrete connectors (e.g. DatabaseConnector, StorageConnector, DatabricksConnector) differ in their supported sources and ingestion options, this section describes the common architecture and configuration shared by all connectors. Refer to specific connectors for details on their respectively supported sources and ingestion options.

Architecture

Connectors adhere to an architectural blueprint, implementing a workflow with similar steps, and sharing the same components.

Connector

Workflow

Connectors implement a workflow with similar steps, where metadata is extracted from a source, transformed, and asynchronously uploaded to a destination.

StepDescription
ExtractThe connector connects to the specified source and extracts metadata in accordance with the selected extraction options and filters. The connector writes the extracted metadata to the landing repository.
TransformThe connector reads the extracted metadata from the landing repository and transforms the metadata in accordance with the selected transformation options. The connector writes the transformed metadata to the staging repository.
UploadThe connector reads the transformed metadata from the staging repository and creates the payload. The connector connects to the specified destination application using the selected authentication settings and upload options. The connector sends the payload to the destination application, where an asynchronous import job is submitted to process the uploaded payload.
Poll JobThe connector polls the asynchronous import job and displays the job progress and statistics. When the import job has finished, the connector terminates. If the import job fails, the connector displays the error logs and exits.

The landing and staging repositories are used as an intermediate storage for the metadata entities, avoiding to hold these entities in memory. Connectors have a minimal memory footprint - writing each entity to the repository as soon as it's extracted or transformed - allowing processing to scale to arbitrarily large volumes.

Tooltip

The import job is a regular, asynchronous job in dataspot. It can be monitored in the dashboard of the user.

Repositories

A connector typically uses the working database to store entities during processing:

RepositoryDescription
LandingStores the metadata extracted from the source.
StagingStores the metadata transformed from the landing repository.

Connectors can stream large-scale metadata without ever materializing the full volume in memory, by using a landing repository to store raw extracts, and a staging repository to hold transformed entities before the final upload.

Note

The metadata in the landing and staging repositories is automatically deleted when the connector terminates, regardless of whether the service completed successfully or not.

Dump and restore

Connectors can write dump files during processing or resume from previous dumps.

If the property utilities.dump is enabled, the extraction, transformation, and upload steps automatically write dump files of the landing repository, the staging repository, and the payload.

Tooltip

The dump directory is determined by connector.config.dump.directory and is created automatically, if it doesn't exist.

StepDumpTypeFormat
ExtractLanding repositorylandingJSON (.json)
TransformStaging repositorystagingJSON (.json)
UploadPayloadpayloadJSON + gzip (.json.gz)
Tooltip

The dump filename is determined by connector.config.dump.template where the placeholder ${dump} is automatically replaced by the corresponding dump type (landing, staging, or payload).

If the property utilities.restore.landing is defined, the connector restores the landing repository from the specified landing dump file, rather than extracting metadata from the source. In this case, the extraction step is skipped altogether and processing resumes with transforming the landing repository that was restored from the dump file.

If the property utilities.restore.staging is defined, the connector restores the staging repository from the specified staging dump file, rather than extracting metadata from the source and transforming it. In this case, the extraction and transformation steps are skipped altogether and processing resumes with uploading the staging repository that was restored from the dump file.

Note

For troubleshooting and traceability, the dumps of the landing and staging repositories also include the relevant ingestion options from the configuration. When the landing or staging repository is restored from a dump file, the relevant ingestion options are also restored and processing resumes with these options.

Configuration

A connector is configured by defining its unique name, the service type, and the configuration.

Tooltip

While YAML itself doesn't enforce any naming style for property names, multi-word properties (for example, access key) are typically specified in lowercase separated by hyphens (for example, access-key). This naming style - commonly referred to as kebab-case - is used in the following descriptions and examples. However, all multi-word properties can also be specified in camelCase (for example, accessKey).

In additional to the general service configuration, connectors have the following configuration in common. Specific connectors have additional configurations.

Tooltip

Properties marked with * are required for the connector to run.

Upload

Connectors upload metadata by specifying the destination application (e.g. the URL, the database and the scheme) as well as the upload options and the authentication settings.

🔑 Property upload.url *

The URL of the destination application.

required

The connector uploads the metadata to the specified URL.

Example: Property upload.url

services:
MyService:
upload:
url: https://my.dataspot.io
Tooltip

If the destination application has a context path (e.g. myapp), the URL must also include the context path (e.g. https://my.server.com/dataspot).

🔑 Property upload.database *

The database of the destination application.

required

The connector uploads the metadata to the specified database.

Example: Property upload.database

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
🔑 Property upload.tenant

The tenant in the database.

optional

The default is null (determine the tenant automatically).

If a tenant is defined, the connector uploads the metadata to the specified tenant. Otherwise, the tenant is automatically determined based on the credentials (e.g. login ID) or on the access key.

Note

If the user does not have access to the tenant, or the tenant does not exist, the request is rejected.

Example: Property upload.tenant

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
tenant: mytenant
Tooltip

If an access key is specified, the tenant is determined only by the access key. If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.

🔑 Property upload.access-key

The access key to identify a user in the destination application.

Tooltip

An access key is a secret character sequence that is created, assigned to a user, and managed in the destination application. An access key replaces the login with credentials (e.g. login ID and password).

optional

The default is null (no access key).

If an access key is defined, the access key is used to identify the user and tenant. Otherwise, the user in the destination application is determined based on the credentials (e.g. login ID).

Example: Property upload.access-key

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
access-key: ${dataspot.access-key}
Tooltip

If an access key is specified, the tenant is determined only by the access key. If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.

🔑 Property upload.scheme *

The scheme in the tenant.

required

The connector uploads the metadata to the specified scheme.

Attention

The scheme must conform to the uploaded metadata. For example, data objects require the scheme to be a data model.

Example: Property upload.scheme

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
Note

The scheme must already exist in the specified database - it will not be created automatically.

🔑 Property upload.collection

The collection in the specified scheme.

optional

The default is null (no collection).

If a collection is defined, the connector uploads the metadata to the specified collection. Otherwise, the metadata is uploaded to the root of the specified scheme.

Example: Property upload.collection

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
collection: My Collection
Note

The collection must already exist in the specified scheme - it will not be created automatically.

Options

Reconciliation

Connectors can specify how uploaded metadata is reconciled with existing metadata.

🔑 Property upload.options.agent-id

The identification of the upload agent.

Tooltip

An agent identification is an arbitrary string that identifies the upload agent and allows the origin of metadata to be traced back to the service that uploaded it.

optional

The default is null (no agent identification).

If an agent identification is defined, the connector "takes ownership" of the uploaded metadata by storing the agent identification alongside the metadata. Otherwise, the connector does not modify the ownership of the uploaded metadata.

Note

When determining obsolete metadata, only metadata with the specified agent identification is taken into account. Metadata that already exists in the scheme, but is not uploaded by the connector, is only considered obsolete if it was previously uploaded by the same agent.

Example: Property upload.options.agent-id

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
agent-id: ${dataspot.agent-id}
Tooltip

The agent identification allows metadata to be uploaded (e.g. into the same scheme or collection) from different sources by different agents without deleting or deactivating each other's metadata. This prevents a service from accidentally deleting or deactivating metadata that was uploaded by a different agent or that was created manually in the user interface.

🔑 Property upload.options.operation

The reconciliation mode of the upload.

optional

The default is ADD.

The reconciliation mode defines whether metadata that already exists in the scheme, but is not uploaded by the connector, is considered obsolete.

Note

Metadata that was previously uploaded by another agent is not considered obsolete.

operationDescription
ADDExisting metadata, that was not uploaded, is not modified.
REPLACEExisting metadata, that was not uploaded, is considered obsolete, if the superordinate object was uploaded.
FULL_LOADExisting metadata, that was not uploaded, is considered obsolete.
Attention

The property upload.options.on-delete defines whether obsolete metadata is actually deleted or is instead set to a specific workflow status.

Example: Property upload.options.operation

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
Workflow

Connectors can specify the workflow statuses of inserted, updated, or deleted metadata.

Tooltip

By defining a corresponding workflow in the destination application, the modifications performed by the upload can, for example, trigger notifications or be integrated in an approval process.

🔑 Property upload.options.on-insert

The new status for inserted metadata.

optional

The default is null (no insert status).

If an insert status is defined, inserted metadata is set to the specified workflow status. Otherwise, inserted metadata is automatically set to the initial status of the scheme's workflow.

Example: Property upload.options.on-insert

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-insert: WORKING
🔑 Property upload.options.on-update

The new status for updated metadata.

optional

The default is null (no update status).

If an update status is defined, updated metadata is set to the specified workflow status. Otherwise, the status of updated metadata is not modified.

Example: Property upload.options.on-update

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-update: SUBMITTED
🔑 Property upload.options.on-delete

The new status for deleted metadata.

optional

The default is null (no delete status).

If a delete status is defined, obsolete metadata is set to the specified workflow status. Otherwise, obsolete metadata is actually deleted.

Example: Property upload.options.on-delete

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
on-delete: INACTIVE
Dry run
🔑 Property upload.options.dry-run

The flag that specifies if the upload is performed as a dry run.

optional

The default is false (disabled).

If the flag is true, the upload is performed as a dry run - without changing any data. Otherwise, the upload is actually performed.

Tooltip

The dry run can be used to test or check the connector, by querying the job statistics and logs to see which metadata would actually be changed or which errors and warnings would occur.

Example: Property upload.options.dry-run

services:
MyService:
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
dry-run: true
Tooltip

For convenience, the dry run can also be enabled using the built-in placeholder ${upload.options.dry-run}, without ever specifying upload.options.dry-run in the service file. For example, as a system property:

java -Dupload.options.dry-run=true

Headers

Connectors can specify additional HTTP header fields.

🔑 Property upload.headers

The additional HTTP header fields of the upload request, specified as a map of fields.

Note

The map key is the field name. The map value is the list of field values (or a single field value).

optional

The default is null (no additional headers).

If additional headers are defined, the connector sets the corresponding HTTP header fields of the upload request.

Tooltip

Additional headers can be used to modify the upload request, by setting HTTP header fields that are not natively supported by the connector configuration.

Example: Property upload.headers

services:
MyService:
upload:
headers:
Accept-Language: en
X-Data:
- ${data.first}
- ${data.second}
Note

Notice how Accept-Language is a single-value list - containing only the single value en - and can be formatted as a single value, rather than as a list with a single value.

Authentication

Connectors can specify the authentication settings for uploading to the destination application.

🔑 Property upload.authentication

The authentication settings for the upload.

optional

The default is null (no authentication).

If an authentication is defined, the connector uploads to the destination application with the specified authentication. Otherwise, the connector uploads without authentication.

🔑 Property upload.authentication.method

The authentication method.

required

The property is required if upload.authentication is specified.

Connectors support the following authentication methods:

Authentication methodmethod
Username and passwordpassword
Bearer tokentoken
OAuth 2.0oauth

Example: Property upload.authentication.method

services:
MyService:
upload:
authentication:
method: password
Username and password

Connectors can use basic authentication with username and password for uploading to the destination application.

🔑 Property upload.authentication.username

The username.

required

The property can only be specified and is required if upload.authentication.method is password.

The connector uses the specified username and password for authentication.

Note

The password is specified in upload.authentication.password.

Example: Property upload.authentication.username

services:
MyService:
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
🔑 Property upload.authentication.password

The password.

required

The property can only be specified and is required if upload.authentication.method is password.

The connector uses the specified username and password for authentication.

Note

The username is specified in upload.authentication.username.

Example: Property upload.authentication.password

services:
MyService:
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
Bearer token

Connectors can use authentication with a bearer token for uploading to the destination application.

🔑 Property upload.authentication.token

The bearer token.

required

The property can only be specified and is required if upload.authentication.method is token.

The connector uses the specified bearer token for authentication.

Example: Property upload.authentication.token

services:
MyService:
upload:
authentication:
method: token
token: ${dataspot.token}
OAuth 2.0

Connectors can use OAuth 2.0 authentication for uploading to the destination application. The application supports non-interactive (machine to machine) grants to obtain an access token as a client application or to obtain an ID token as an end-user.

Note

If the destination application is hosted in the cloud (e.g. as Software-as-a-Service in Microsoft Azure), the server can be protected by an authentication module (e.g. Azure Easy Auth). When uploading to the destination application, a valid access or ID token must be provided so that the request can pass the authentication module.

🔑 Property upload.authentication.provider-url

The URL of the identity provider, that supports OAuth 2.0.

required

The property can only be specified and is required if upload.authentication.method is oauth.

The connector uses the provider URL and the client ID to contact the identity provider.

Note

The client ID is specified in upload.authentication.client-id.

Example: Property upload.authentication.provider-url

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
🔑 Property upload.authentication.client-id

The OAuth 2.0 client ID.

required

The property can only be specified and is required if upload.authentication.method is oauth.

The connector uses the provider URL and the client ID to contact the identity provider.

Note

The provider URL is specified in upload.authentication.provider-url.

Example: Property upload.authentication.client-id

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
🔑 Property upload.authentication.credentials.type

The credentials type.

required

The property can only be specified and is required if upload.authentication.method is oauth.

Connectors support the following credentials types to obtain an access or ID token:

Credentials typetype
Client credentials with certificateclient-certificate
Client credentials with client secretclient-secret
Resource owner password credentialspassword

Example: Property upload.authentication.credentials.type

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
Client credentials with certificate

The OAuth 2.0 client credentials grant with a certificate (and optional password) is a non-interactive (machine to machine) authentication. The connector authenticates as a client application, rather than as an end-user, to obtain an access token. The access token does not contain any end-user information.

Note

In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.

🔑 Property upload.authentication.credentials.file

The absolute or relative path to the certificate file.

required

The property can only be specified and is required if upload.authentication.credentials.type is client-certificate.

The connector uses the specified certificate and password to obtain an access token from the identity provider.

Note

The password is specified in upload.authentication.credentials.password.

Example: Property upload.authentication.credentials.file

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
🔑 Property upload.authentication.credentials.password

The certificate password.

optional

The property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is null (no password).

The connector uses the specified certificate and password to obtain an access token from the identity provider.

Note

The certificate is specified in upload.authentication.credentials.file.

Example: Property upload.authentication.credentials.password

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
password: ${dataspot.certificate.password}
🔑 Property upload.authentication.credentials.scope

The requested scope.

optional

The property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is api:// + client ID + /.default.

The connector can request access to a specific scope, such as an exposed API.

Example: Property upload.authentication.credentials.scope

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Client credentials with client secret

The OAuth 2.0 client credentials grant with a client secret is a non-interactive (machine to machine) authentication. The connector authenticates as a client application, rather than as an end-user, to obtain an access token. The access token does not contain any end-user information.

Note

In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.

🔑 Property upload.authentication.credentials.client-secret

The client secret.

required

The property can only be specified and is required if upload.authentication.credentials.type is client-secret.

The connector uses the specified client secret to obtain an access token from the identity provider.

Example: Property upload.authentication.credentials.client-secret

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
🔑 Property upload.authentication.credentials.scope

The requested scope.

optional

The property can only be specified if upload.authentication.credentials.type is client-secret.
The default is api:// + client ID + /.default.

The connector can request access to a specific scope, such as an exposed API.

Example: Property upload.authentication.credentials.scope

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Resource owner password credentials

The OAuth 2.0 resource owner password credentials (ROPC) grant with a username and password is a non-interactive (machine to machine) authentication. The connector authenticates as an end-user and uses the OpenID Connect (OIDC) authentication layer to obtain an ID token. The ID token contains end-user information.

Attention

If the identity provider requires multi-factor authentication (MFA) (and therefore user interaction), using resource owner password credentials (ROPC) is not a suitable machine to machine authentication method. Alternatively, a non-interactive authentication must be used (e.g. client credentials with certificate or client credentials with client secret).

🔑 Property upload.authentication.credentials.username

The username.

Note

Typically, the username is an e-mail address.

required

The property can only be specified and is required if upload.authentication.credentials.type is password.

The connector uses the specified username and password to obtain an ID token from the identity provider.

Note

The password is specified in upload.authentication.credentials.password.

Example: Property upload.authentication.credentials.username

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}
🔑 Property upload.authentication.credentials.password

The password.

required

The property can only be specified and is required if upload.authentication.credentials.type is password.

The connector uses the specified username and password to obtain an ID token from the identity provider.

Note

The username is specified in upload.authentication.credentials.username.

Example: Property upload.authentication.credentials.password

services:
MyService:
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}

Dump

Connectors can write or restore dump files during processing.

Tooltip

Dump files can be used to manually inspect the extracted, transformed, or uploaded metadata - but also to restore a repository and resume processing from a specific step.

🔑 Property utilities.dump

The flag that enables dumps.

optional

The default is false (disabled).

If the flag is true, dump files of the landing repository, the staging repository, and the payload are automatically written during processing. The format of the dump files is JSON.

Tooltip

The dump directory and the dump filename can be specified in the application configuration.

Example: Property utilities.dump

services:
MyService:
utilities:
dump: true
Tooltip

For convenience, dumps can also be enabled using the built-in placeholder ${utilities.dump}, without ever specifying utilities.dump in the service file. For example, as a system property:

java -Dutilities.dump=true
🔑 Property utilities.restore.landing

The absolute or relative path to the dump file to restore the landing repository.

optional

The default is null (don't restore).

If a dump file is defined, the connector restores the landing repository and skips the extraction step.

Example: Property utilities.restore.landing

services:
MyService:
utilities:
restore:
landing: /home/connector/dumps/DatabaseConnector-MyService-landing-1.json
Tooltip

For convenience, the landing repository can also be restored using the built-in placeholder ${utilities.restore.landing}, without ever specifying utilities.restore.landing in the service file. For example, as a system property:

java -Dutilities.restore.landing=/home/connector/dumps/DatabaseConnector-MyService-landing-1.json
🔑 Property utilities.restore.staging

The absolute or relative path to the dump file to restore the staging repository.

optional

The default is null (don't restore).

If a dump file is defined, the connector restores the staging repository and skips the extraction and transformation steps.

Example: Property utilities.restore.staging

services:
MyService:
utilities:
restore:
staging: /home/connector/dumps/DatabaseConnector-MyService-staging-1.json
Tooltip

For convenience, the staging repository can also be restored using the built-in placeholder ${utilities.restore.staging}, without ever specifying utilities.restore.staging in the service file. For example, as a system property:

java -Dutilities.restore.staging=/home/connector/dumps/DatabaseConnector-MyService-staging-1.json