Examples
You're welcome to copy and adapt the following examples at your convenience.
Application
The application configuration contains static application settings, such as the locations of application directories and resources or global execution options.
▶ Example: application.yaml
connector:
config:
database:
# path to the working database file
file: /home/connector/db
dump:
# path to the dump directory
directory: ./temp
# dump filename template, store the dumps in separate directories for each service type
template: `${type}/${name}-${dump}-${date}T${time}`
service:
# path to the default service file
file: /home/connector/services.yaml
# path to the default service directory
directory: /home/connector/services
# scan the default service file and directory every 60 seconds
monitor:
interval: 60
placeholders:
# paths to additional properties files containing placeholders
files:
- ./resources/oauth.properties
- ./resources/secrets.properties
execution:
thread:
# number of threads used for parallel processing
count: 32
# maximum execution time for parallel processing, 60 seconds
timeout: 60000
# additional jdbc drivers
jars:
- type: jdbc
file: /home/connector/mssql-jdbc-12.10.0.jre11.jar
- type: jdbc
file: /home/connector/postgresql-42.7.5.jar
▶ Example: application.properties
# path to the working database file
connector.config.database.file=/home/connector/db
# path to the dump directory
connector.config.dump.directory=./temp
# dump filename template, store the dumps in separate directories for each service type
connector.config.dump.template=`${type}/${name}-${dump}-${date}T${time}`
# path to the default service file
connector.config.service.file=/home/connector/services.yaml
# path to the default service directory
connector.config.service.directory=/home/connector/services
# scan the default service file and directory every 60 seconds
connector.config.service.monitor.interval=60
# paths to additional properties files containing placeholders
connector.config.placeholders.files=./resources/oauth.properties,./resources/secrets.properties
# number of threads used for parallel processing
connector.config.execution.thread.count=32
# maximum execution time for parallel processing, 60 seconds
connector.config.execution.thread.timeout=60000
# additional jdbc drivers
connector.config.jars[0].type=jdbc
connector.config.jars[0].file=/home/connector/mssql-jdbc-12.10.0.jre11.jar
connector.config.jars[1].type=jdbc
connector.config.jars[1].file=/home/connector/postgresql-42.7.5.jar
Services
DatabaseConnector
▶ Example: DatabaseConnector - Microsoft SQL Server
The following service connects to a Microsoft SQL Server database with basic authentication and extracts metadata from specific schemas, before it uploads the transformed metadata to a destination application.
services:
MyService:
type: DatabaseConnector
# this section specifies the source database
source:
# connect to microsoft sql server database 'Sales'
url: jdbc:sqlserver://www.myserver.com:1433;DatabaseName=Sales;trustServerCertificate=true
# authenticate with username and password
authentication:
method: password
username: ${database.username}
password: ${database.password}
extensions:
# microsoft sql server extension to extract comments from extended property 'MS_Description'
Microsoft SQL Server:
comment: MS_Description
# this section specifies the extraction and transformation options
ingestion:
# schema filter
schemas:
# extract only these schemas
names:
accept:
- Finance
- Product
- KPI
# transform into collections with stereotype 'schema'
stereotype: schema
# table filter
tables:
# extract only these table types
types:
accept:
- TABLE
- VIEW
# transform into data objects with stereotype 'table'
stereotype: table
# this section specifies the destination application
upload:
# upload to my.dataspot.io into prod database
url: https://my.dataspot.io
database: prod
# the scheme must be a data model
scheme: Enterprise Model
# we're authenticating with client credentials (see below)
# therefore we need an access key to identify the service user
access-key: ${dataspot.access-key}
options:
# reconciliation mode
operation: REPLACE
# agent identification of this connector
agent-id: sales-connector
# authenticate with OAuth 2.0, using a client certificate (no password)
authentication:
method: oauth
provider-url: ${dataspot.oauth.provider-url}
client-id: ${dataspot.oauth.client-id}
credentials:
type: client-certificate
file: /home/connector/dataspot.pfx
The connection to Microsoft SQL Server will require a suitable JDBC driver. Be sure to download and specify the additional JAR in the application configuration:
connector:
config:
jars:
- type: jdbc
file: /home/connector/mssql-jdbc-12.10.0.jre11.jar
▶ Example: DatabaseConnector - PostgreSQL
The following service connects to a PostgreSQL database using additional properties to authenticate with a server/client certificate pair.
services:
MyService:
type: DatabaseConnector
# this section specifies the source database
source:
# connect to postgres database 'sales'
url: jdbc:postgresql://www.myserver.com:5432/sales
# authenticate with username and password...
authentication:
method: password
username: ${database.username}
password: ${database.password}
# ...but also pass the database-specific server/client certificate pair
properties:
sslmode: verify-full
sslrootcert: /home/connector/certs/server-ca.pem
sslcert: /home/connector/certs/client-cert.pem
sslkey: /home/connector/certs/client-key.pk8
# ingestion and upload options go here :)
The connection to PostgreSQL will require a suitable JDBC driver. Be sure to download and specify the additional JAR in the application configuration:
connector:
config:
jars:
- type: jdbc
file: /home/connector/postgresql-42.3.0.jar
▶ Example: DatabaseConnector - Google BigQuery
The following service connects to Google BigQuery using additional properties to authenticate with a keyfile.
services:
MyService:
type: DatabaseConnector
# this section specifies the source database
source:
# connect to google big query
url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0
# the authentication properties could also be specified in the connection url
properties:
ProjectId: ${bigquery.projectId}
OAuthServiceAcctEmail: ${bigquery.service.account.email}
OAuthPvtKeyPath: /home/connector/bigquery-keyfile.json
ingestion:
# extract all projects and datasets (the JDBC driver maps projects to catalogs and datasets to schemas)
catalogs:
names:
accept: .*
The connection to Google BigQuery will require a suitable JDBC driver. Be sure to download and specify the additional JARs in the application configuration:
connector:
config:
jars:
- type: jdbc
directory: /home/connector/google-big-query # directory containing JDBC driver and 3rd party JARs
DatabricksConnector
▶ Example: DatabricksConnector
The following service connects to a Databricks Unity Catalog instance, authenticates with a personal access token, and extracts metadata from a workspace, before it uploads the transformed metadata to a destination application.
services:
MyService:
type: DatabricksConnector
# this section specifies the databricks unity catalog instance
source:
url: https://dbc-00182f59-66ea.cloud.databricks.com
# the warehouse id is required to extract runtime data lineage
warehouse-id: b079313fa6222088
# authenticate with personal access token
authentication:
method: token
token: ${databricks.pat}
# this section specifies the extraction and transformation options
ingestion:
# catalog filter
catalogs:
# extract only this catalogs
names:
accept: my_workspace
# schema filter
schemas:
# extract these schemas, transform with different stereotypes
- names:
accept: gold
stereotype: gold
- names:
accept: silver
stereotype: silver
- names:
accept: bronze
stereotype: bronze
# extract runtime data lineage
lineages:
# extract runtime lineage since cutoff date
options:
cutoff-date: 2025-05-21
streams:
# extract column lineage
- type: column
filter:
# extract lineage create by a specific entity (e.g. a pipeline)
entity:
id: 939430187383108
# create transformations and rules
transformation:
label: Pipeline
stereotype: pipeline
rule:
stereotype: mapping
# this section specifies the destination application
upload:
# upload to my.dataspot.io into prod database
url: https://my.dataspot.io
database: prod
# the scheme must be a data model
scheme: Enterprise Model
collection: Databricks
# we're authenticating with client credentials (see below)
# therefore we need an access key to identify the service user
access-key: ${dataspot.access-key}
options:
# reconciliation mode and workflow statuses
operation: REPLACE
# agent identification of this connector
agent-id: databricks-connector
# authenticate with OAuth 2.0, using a client secret
authentication:
method: oauth
provider-url: ${dataspot.oauth.provider-url}
client-id: ${dataspot.oauth.client-id}
credentials:
type: client-secret
client-secret: ${dataspot.oauth.client-secret}
StorageConnector
▶ Example: StorageConnector
The following service connects to an AWS S3 storage, authenticates with a secret key, and extracts metadata from specific file types, before it uploads the transformed metadata to a destination application.
services:
MyService:
type: StorageConnector
# this section specifies the source storage system
source:
# connect to aws s3 bucket
url: s3a://my-s3-bucket
# authenticate with access key and secret key
authentication:
method: secret-key
access-key: ${aws.s3.access-key}
secret-key: ${aws.s3.secret-key}
# use a specific aws region
properties:
fs.s3a.endpoint.region: eu-central-1
# this section specifies the extraction and transformation options
ingestion:
options:
# process all objects added since the last run
read: since-last-run
directories:
# extract data objects from these paths
paths:
- sales/annual
- finance/*
files:
# partitioned parquet files
- type: parquet
partition:
delimiter: "="
# this section specifies the destination application
upload:
# upload to my.dataspot.io into prod database
url: https://my.dataspot.io
database: prod
# the scheme must be a data model
scheme: Enterprise Model
collection: AWS
options:
# reconciliation mode and workflow statuses
operation: REPLACE
on-insert: WORKING
on-update: SUBMITTED
on-delete: INACTIVE
# agent identification of this connector
agent-id: s3-connector
# authenticate with username and password
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
FileUpload
▶ Example: FileUpload
The following service uploads an existing payload file to a destination application.
services:
MyService:
type: FileUpload
source:
# use a placeholder to get the file path from an external source
file: ${payload.file}
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: MyModel
# authenticate with username and password
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
Start the CLI application to automatically upload a payload file, for example /home/connector/upload.json.gz:
java -Dpayload.file=/home/connector/upload.json.gz -jar dataspot-connector.jar --service=MyService
ApplicationReorg
▶ Example: ApplicationReorg
The following service deletes expired jobs older than 2 weeks.
services:
MyService:
type: ApplicationReorg
job:
retention: 2w # delete jobs older than 2 weeks