Examples

You're welcome to copy and adapt the following examples at your convenience.

Application

The application configuration contains static application settings, such as the locations of application directories and resources or global execution options.

▶ Example: application.yaml

connector:
  config:
    database:
      # path to the working database file
      file: /home/connector/db
    dump:
      # path to the dump directory
      directory: ./temp
      # dump filename template, store the dumps in separate directories for each service type
      template: `${type}/${name}-${dump}-${date}T${time}`
    service:
      # path to the default service file
      file: /home/connector/services.yaml
      # path to the default service directory
      directory: /home/connector/services
      # scan the default service file and directory every 60 seconds
      monitor:
        interval: 60
    placeholders:
      # paths to additional properties files containing placeholders
      files:
        - ./resources/oauth.properties
        - ./resources/secrets.properties
    execution:
      thread:
        # number of threads used for parallel processing
        count: 32
        # maximum execution time for parallel processing, 60 seconds
        timeout: 60000
    # additional jdbc drivers
    jars:
      - type: jdbc
        file: /home/connector/mssql-jdbc-12.10.0.jre11.jar
      - type: jdbc
        file: /home/connector/postgresql-42.7.5.jar

▶ Example: application.properties

# path to the working database file
connector.config.database.file=/home/connector/db

# path to the dump directory
connector.config.dump.directory=./temp

# dump filename template, store the dumps in separate directories for each service type
connector.config.dump.template=`${type}/${name}-${dump}-${date}T${time}`

# path to the default service file
connector.config.service.file=/home/connector/services.yaml

# path to the default service directory
connector.config.service.directory=/home/connector/services

# scan the default service file and directory every 60 seconds
connector.config.service.monitor.interval=60

# paths to additional properties files containing placeholders
connector.config.placeholders.files=./resources/oauth.properties,./resources/secrets.properties

# number of threads used for parallel processing
connector.config.execution.thread.count=32

# maximum execution time for parallel processing, 60 seconds
connector.config.execution.thread.timeout=60000

# additional jdbc drivers
connector.config.jars[0].type=jdbc
connector.config.jars[0].file=/home/connector/mssql-jdbc-12.10.0.jre11.jar
connector.config.jars[1].type=jdbc
connector.config.jars[1].file=/home/connector/postgresql-42.7.5.jar

Services

`DatabaseConnector`

▶ Example: DatabaseConnector - Microsoft SQL Server

The following service connects to a Microsoft SQL Server database with basic authentication and extracts metadata from specific schemas, before it uploads the transformed metadata to a destination application.

services:
  MyService:
    type: DatabaseConnector
    # this section specifies the source database
    source:
      # connect to microsoft sql server database 'Sales'
      url: jdbc:sqlserver://www.myserver.com:1433;DatabaseName=Sales;trustServerCertificate=true
      # authenticate with username and password
      authentication:
        method: password
        username: ${database.username}
        password: ${database.password}
      extensions:
        # microsoft sql server extension to extract comments from extended property 'MS_Description'
        Microsoft SQL Server:
          comment: MS_Description
    # this section specifies the extraction and transformation options
    ingestion:
      # schema filter
      schemas:
        # extract only these schemas
        names:
          accept:
            - Finance
            - Product
            - KPI
        # transform into collections with stereotype 'schema'
        stereotype: schema
        # table filter
        tables:
          # extract only these table types
          types:
            accept:
              - TABLE
              - VIEW
          # transform into data objects with stereotype 'table'
          stereotype: table
    # this section specifies the destination application
    upload:
      # upload to my.dataspot.io into prod database
      url: https://my.dataspot.io
      database: prod
      # the scheme must be a data model
      scheme: Enterprise Model
      # we're authenticating with client credentials (see below)
      # therefore we need an access key to identify the service user
      access-key: ${dataspot.access-key}
      options:
        # reconciliation mode
        operation: REPLACE
        # agent identification of this connector
        agent-id: sales-connector
      # authenticate with OAuth 2.0, using a client certificate (no password)
      authentication:
        method: oauth
        provider-url: ${dataspot.oauth.provider-url}
        client-id: ${dataspot.oauth.client-id}
        credentials:
          type: client-certificate
          file: /home/connector/dataspot.pfx

The connection to Microsoft SQL Server will require a suitable JDBC driver. Be sure to download and specify the additional JAR in the application configuration:

connector:
  config:
    jars:
      - type: jdbc
        file: /home/connector/mssql-jdbc-12.10.0.jre11.jar

▶ Example: DatabaseConnector - PostgreSQL

The following service connects to a PostgreSQL database using additional properties to authenticate with a server/client certificate pair.

services:
  MyService:
    type: DatabaseConnector
    # this section specifies the source database
    source:
      # connect to postgres database 'sales'
      url: jdbc:postgresql://www.myserver.com:5432/sales
      # authenticate with username and password...
      authentication:
        method: password
        username: ${database.username}
        password: ${database.password}
      # ...but also pass the database-specific server/client certificate pair
      properties:
        sslmode: verify-full
        sslrootcert: /home/connector/certs/server-ca.pem
        sslcert: /home/connector/certs/client-cert.pem
        sslkey: /home/connector/certs/client-key.pk8

# ingestion and upload options go here :)

The connection to PostgreSQL will require a suitable JDBC driver. Be sure to download and specify the additional JAR in the application configuration:

connector:
  config:
    jars:
      - type: jdbc
        file: /home/connector/postgresql-42.3.0.jar

▶ Example: DatabaseConnector - Google BigQuery

The following service connects to Google BigQuery using additional properties to authenticate with a keyfile.

services:
  MyService:
    type: DatabaseConnector
    # this section specifies the source database
    source:
      # connect to google big query
      url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=0
      # the authentication properties could also be specified in the connection url
      properties:
        ProjectId: ${bigquery.projectId}
        OAuthServiceAcctEmail: ${bigquery.service.account.email}
        OAuthPvtKeyPath: /home/connector/bigquery-keyfile.json
    ingestion:
      # extract all projects and datasets (the JDBC driver maps projects to catalogs and datasets to schemas)
      catalogs:
        names:
          accept: .*

The connection to Google BigQuery will require a suitable JDBC driver. Be sure to download and specify the additional JARs in the application configuration:

connector:
  config:
    jars:
      - type: jdbc
        directory: /home/connector/google-big-query # directory containing JDBC driver and 3rd party JARs

`DatabricksConnector`

▶ Example: DatabricksConnector

The following service connects to a Databricks Unity Catalog instance, authenticates with a personal access token, and extracts metadata from a workspace, before it uploads the transformed metadata to a destination application.

services:
  MyService:
    type: DatabricksConnector
    # this section specifies the databricks unity catalog instance
    source:
      url: https://dbc-00182f59-66ea.cloud.databricks.com
      # the warehouse id is required to extract runtime data lineage
      warehouse-id: b079313fa6222088
      # authenticate with personal access token
      authentication:
        method: token
        token: ${databricks.pat}
    # this section specifies the extraction and transformation options
    ingestion:
      # catalog filter
      catalogs:
        # extract only this catalogs
        names:
          accept: my_workspace
        # schema filter
        schemas:
          # extract these schemas, transform with different stereotypes
          - names:
              accept: gold
            stereotype: gold
          - names:
              accept: silver
            stereotype: silver
          - names:
              accept: bronze
            stereotype: bronze
      # extract runtime data lineage
      lineages:
        # extract runtime lineage since cutoff date
        options:
          cutoff-date: 2025-05-21
        streams:
          # extract column lineage
          - type: column
            filter:
              # extract lineage create by a specific entity (e.g. a pipeline)
              entity:
                id: 939430187383108
            # create transformations and rules
            transformation:
              label: Pipeline
              stereotype: pipeline
            rule:
              stereotype: mapping
    # this section specifies the destination application
    upload:
      # upload to my.dataspot.io into prod database
      url: https://my.dataspot.io
      database: prod
      # the scheme must be a data model
      scheme: Enterprise Model
      collection: Databricks
      # we're authenticating with client credentials (see below)
      # therefore we need an access key to identify the service user
      access-key: ${dataspot.access-key}
      options:
        # reconciliation mode and workflow statuses
        operation: REPLACE
        # agent identification of this connector
        agent-id: databricks-connector
      # authenticate with OAuth 2.0, using a client secret
      authentication:
        method: oauth
        provider-url: ${dataspot.oauth.provider-url}
        client-id: ${dataspot.oauth.client-id}
        credentials:
          type: client-secret
          client-secret: ${dataspot.oauth.client-secret}

`StorageConnector`

▶ Example: StorageConnector

The following service connects to an AWS S3 storage, authenticates with a secret key, and extracts metadata from specific file types, before it uploads the transformed metadata to a destination application.

services:
  MyService:
    type: StorageConnector
    # this section specifies the source storage system
    source:
      # connect to aws s3 bucket
      url: s3a://my-s3-bucket
      # authenticate with access key and secret key
      authentication:
        method: secret-key
        access-key: ${aws.s3.access-key}
        secret-key: ${aws.s3.secret-key}
      # use a specific aws region
      properties:
        fs.s3a.endpoint.region: eu-central-1
    # this section specifies the extraction and transformation options
    ingestion:
      options:
        # process all objects added since the last run
        read: since-last-run
      directories:
        # extract data objects from these paths
        paths:
          - sales/annual
          - finance/*
        files:
          # partitioned parquet files
          - type: parquet
            partition:
              delimiter: "="
    # this section specifies the destination application
    upload:
      # upload to my.dataspot.io into prod database
      url: https://my.dataspot.io
      database: prod
      # the scheme must be a data model
      scheme: Enterprise Model
      collection: AWS
      options:
        # reconciliation mode and workflow statuses
        operation: REPLACE
        on-insert: WORKING
        on-update: SUBMITTED
        on-delete: INACTIVE
        # agent identification of this connector
        agent-id: s3-connector
      # authenticate with username and password
      authentication:
        method: password
        username: ${dataspot.basic.username}
        password: ${dataspot.basic.password}

`FileUpload`

▶ Example: FileUpload

The following service uploads an existing payload file to a destination application.

services:
  MyService:
    type: FileUpload
    source:
      # use a placeholder to get the file path from an external source
      file: ${payload.file}
    upload:
      url: https://my.dataspot.io
      database: mydatabase
      scheme: MyModel
      # authenticate with username and password
      authentication:
        method: password
        username: ${dataspot.basic.username}
        password: ${dataspot.basic.password}

Start the CLI application to automatically upload a payload file, for example /home/connector/upload.json.gz:

java -Dpayload.file=/home/connector/upload.json.gz -jar dataspot-connector.jar --service=MyService

`ApplicationReorg`

▶ Example: ApplicationReorg

The following service deletes expired jobs older than 2 weeks.

services:
  MyService:
    type: ApplicationReorg
    job:
      retention: 2w # delete jobs older than 2 weeks

Application​

Services​

DatabaseConnector​

DatabricksConnector​

StorageConnector​

FileUpload​

ApplicationReorg​