Skip to main content

FileUpload

FileUpload is a service that uploads a payload file to a destination application (dataspot) using the upload API.

FileUpload automatically handles the connection to the destination application, the authentication settings, the upload options, and the polling of the asynchronous import job - making it a very convenient approach to uploading payload files created by, for example, 3rd-party tools or external extraction and transformation processes.

Instead of implementing the upload to dataspot (including the supported authentication methods and upload options) in the 3rd-party tool or external process, use a FileUpload service.

java -Dpayload.file=upload.json -jar dataspot-connector.jar --service=MyService --file=myservices.yaml
Tooltip

Find FileUpload configuration examples here.

Architecture

FileUpload has an architecture with the following steps and components.

FileUpload

Note

In contract to connectors, that extract and transform metadata from an external source, FileUpload reads metadata from an existing payload file - skipping the extraction and transformation steps altogether.

Workflow

FileUpload implements a simple workflow, where metadata is read from a payload file and asynchronously uploaded to a destination.

StepDescription
UploadFileUpload connects to the specified destination application using the selected authentication settings and upload options. FileUpload sends the payload to the destination application, where an asynchronous import job is submitted to process the uploaded payload.
Poll JobFileUpload polls the asynchronous import job and displays the job progress and statistics. When the import job has finished, FileUpload terminates. If the import job fails, FileUpload displays the error logs and exits.
Tooltip

The import job is a regular, asynchronous job in dataspot. It can be monitored in the dashboard of the user.

Configuration

A FileUpload service is configured by defining its unique name, the service type FileUpload, and the configuration.

Example: FileUpload

services:
MyService:
type: FileUpload
Tooltip

While YAML itself doesn't enforce any naming style for property names, multi-word properties (for example, access key) are typically specified in lowercase separated by hyphens (for example, access-key). This naming style - commonly referred to as kebab-case - is used in the following descriptions and examples. However, all multi-word properties can also be specified in camelCase (for example, accessKey).

In additional to the general service configuration, FileUpload has the following configuration to specify the payload file as well as the destination application.

Tooltip

Properties marked with * are required for FileUpload to run.

Source

FileUpload uploads a payload file.

🔑 Property source.file *

The absolute or relative path to the payload file.

required

FileUpload verifies the payload file exists and uploads it to the destination application.

Example: Property source.file

services:
MyService:
type: FileUpload
source:
# use a placeholder to get the file path from an external source, such as a command-line argument
file: ${payload.file}

Upload

FileUpload uploads metadata by specifying the destination application (e.g. the URL, the database and the scheme) as well as the upload options and the authentication settings.

🔑 Property upload.url *

The URL of the destination application.

required

FileUpload uploads the metadata to the specified URL.

Example: Property upload.url

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
Tooltip

If the destination application has a context path (e.g. myapp), the URL must also include the context path (e.g. https://my.server.com/dataspot).

🔑 Property upload.database *

The database of the destination application.

required

FileUpload uploads the metadata to the specified database.

Example: Property upload.database

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
🔑 Property upload.tenant

The tenant in the database.

optional

The default is null (determine the tenant automatically).

If a tenant is defined, FileUpload uploads the metadata to the specified tenant. Otherwise, the tenant is automatically determined based on the credentials (e.g. login ID) or on the access key.

Note

If the user does not have access to the tenant, or the tenant does not exist, the request is rejected.

Example: Property upload.tenant

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
tenant: mytenant
Tooltip

If an access key is specified, the tenant is determined only by the access key. If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.

🔑 Property upload.access-key

The access key to identify a user in the destination application.

Tooltip

An access key is a secret character sequence that is created, assigned to a user, and managed in the destination application. An access key replaces the login with credentials (e.g. login ID and password).

optional

The default is null (no access key).

If an access key is defined, the access key is used to identify the user and tenant. Otherwise, the user in the destination application is determined based on the credentials (e.g. login ID).

Example: Property upload.access-key

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
access-key: ${dataspot.access-key}
Tooltip

If an access key is specified, the tenant is determined only by the access key. If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.

🔑 Property upload.scheme *

The scheme in the tenant.

required

FileUpload uploads the metadata to the specified scheme.

Attention

The scheme must conform to the uploaded metadata. For example, data objects require the scheme to be a data model.

Example: Property upload.scheme

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
Note

The scheme must already exist in the specified database - it will not be created automatically.

🔑 Property upload.collection

The collection in the specified scheme.

optional

The default is null (no collection).

If a collection is defined, FileUpload uploads the metadata to the specified collection. Otherwise, the metadata is uploaded to the root of the specified scheme.

Example: Property upload.collection

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
collection: My Collection
Note

The collection must already exist in the specified scheme - it will not be created automatically.

Options

Reconciliation

FileUpload can specify how uploaded metadata is reconciled with existing metadata.

🔑 Property upload.options.agent-id

The identification of the upload agent.

Tooltip

An agent identification is an arbitrary string that identifies the upload agent and allows the origin of metadata to be traced back to the service that uploaded it.

optional

The default is null (no agent identification).

If an agent identification is defined, FileUpload "takes ownership" of the uploaded metadata by storing the agent identification alongside the metadata. Otherwise, FileUpload does not modify the ownership of the uploaded metadata.

Note

When determining obsolete metadata, only metadata with the specified agent identification is taken into account. Metadata that already exists in the scheme, but is not uploaded by FileUpload, is only considered obsolete if it was previously uploaded by the same agent.

Example: Property upload.options.agent-id

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
agent-id: ${dataspot.agent-id}
Tooltip

The agent identification allows metadata to be uploaded (e.g. into the same scheme or collection) from different sources by different agents without deleting or deactivating each other's metadata. This prevents a service from accidentally deleting or deactivating metadata that was uploaded by a different agent or that was created manually in the user interface.

🔑 Property upload.options.operation

The reconciliation mode of the upload.

optional

The default is ADD.

The reconciliation mode defines whether metadata that already exists in the scheme, but is not uploaded by FileUpload, is considered obsolete.

Note

Metadata that was previously uploaded by another agent is not considered obsolete.

operationDescription
ADDExisting metadata, that was not uploaded, is not modified.
REPLACEExisting metadata, that was not uploaded, is considered obsolete, if the superordinate object was uploaded.
FULL_LOADExisting metadata, that was not uploaded, is considered obsolete.
Attention

The property upload.options.on-delete defines whether obsolete metadata is actually deleted or is instead set to a specific workflow status.

Example: Property upload.options.operation

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
Workflow

FileUpload can specify the workflow statuses of inserted, updated, or deleted metadata.

Tooltip

By defining a corresponding workflow in the destination application, the modifications performed by the upload can, for example, trigger notifications or be integrated in an approval process.

🔑 Property upload.options.on-insert

The new status for inserted metadata.

optional

The default is null (no insert status).

If an insert status is defined, inserted metadata is set to the specified workflow status. Otherwise, inserted metadata is automatically set to the initial status of the scheme's workflow.

Example: Property upload.options.on-insert

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-insert: WORKING
🔑 Property upload.options.on-update

The new status for updated metadata.

optional

The default is null (no update status).

If an update status is defined, updated metadata is set to the specified workflow status. Otherwise, the status of updated metadata is not modified.

Example: Property upload.options.on-update

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-update: SUBMITTED
🔑 Property upload.options.on-delete

The new status for deleted metadata.

optional

The default is null (no delete status).

If a delete status is defined, obsolete metadata is set to the specified workflow status. Otherwise, obsolete metadata is actually deleted.

Example: Property upload.options.on-delete

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
on-delete: INACTIVE
Dry run
🔑 Property upload.options.dry-run

The flag that specifies if the upload is performed as a dry run.

optional

The default is false (disabled).

If the flag is true, the upload is performed as a dry run - without changing any data. Otherwise, the upload is actually performed.

Tooltip

The dry run can be used to test or check FileUpload, by querying the job statistics and logs to see which metadata would actually be changed or which errors and warnings would occur.

Example: Property upload.options.dry-run

services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
dry-run: true
Tooltip

For convenience, the dry run can also be enabled using the built-in placeholder ${upload.options.dry-run}, without ever specifying upload.options.dry-run in the service file. For example, as a system property:

java -Dupload.options.dry-run=true

Headers

FileUpload can specify additional HTTP header fields.

🔑 Property upload.headers

The additional HTTP header fields of the upload request, specified as a map of fields.

Note

The map key is the field name. The map value is the list of field values (or a single field value).

optional

The default is null (no additional headers).

If additional headers are defined, FileUpload sets the corresponding HTTP header fields of the upload request.

Tooltip

Additional headers can be used to modify the upload request, by setting HTTP header fields that are not natively supported by the FileUpload configuration.

Example: Property upload.headers

services:
MyService:
type: FileUpload
upload:
headers:
Accept-Language: en
X-Data:
- ${data.first}
- ${data.second}
Note

Notice how Accept-Language is a single-value list - containing only the single value en - and can be formatted as a single value, rather than as a list with a single value.

Authentication

FileUpload can specify the authentication settings for uploading to the destination application.

🔑 Property upload.authentication

The authentication settings for the upload.

optional

The default is null (no authentication).

If an authentication is defined, FileUpload uploads to the destination application with the specified authentication. Otherwise, FileUpload uploads without authentication.

🔑 Property upload.authentication.method

The authentication method.

required

The property is required if upload.authentication is specified.

FileUpload supports the following authentication methods:

Authentication methodmethod
Username and passwordpassword
Bearer tokentoken
OAuth 2.0oauth

Example: Property upload.authentication.method

services:
MyService:
type: FileUpload
upload:
authentication:
method: password
Username and password

FileUpload can use basic authentication with username and password for uploading to the destination application.

🔑 Property upload.authentication.username

The username.

required

The property can only be specified and is required if upload.authentication.method is password.

FileUpload uses the specified username and password for authentication.

Note

The password is specified in upload.authentication.password.

Example: Property upload.authentication.username

services:
MyService:
type: FileUpload
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
🔑 Property upload.authentication.password

The password.

required

The property can only be specified and is required if upload.authentication.method is password.

FileUpload uses the specified username and password for authentication.

Note

The username is specified in upload.authentication.username.

Example: Property upload.authentication.password

services:
MyService:
type: FileUpload
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
Bearer token

FileUpload can use authentication with a bearer token for uploading to the destination application.

🔑 Property upload.authentication.token

The bearer token.

required

The property can only be specified and is required if upload.authentication.method is token.

FileUpload uses the specified bearer token for authentication.

Example: Property upload.authentication.token

services:
MyService:
type: FileUpload
upload:
authentication:
method: token
token: ${dataspot.token}
OAuth 2.0

FileUpload can use OAuth 2.0 authentication for uploading to the destination application. The application supports non-interactive (machine to machine) grants to obtain an access token as a client application or to obtain an ID token as an end-user.

Note

If the destination application is hosted in the cloud (e.g. as Software-as-a-Service in Microsoft Azure), the server can be protected by an authentication module (e.g. Azure Easy Auth). When uploading to the destination application, a valid access or ID token must be provided so that the request can pass the authentication module.

🔑 Property upload.authentication.provider-url

The URL of the identity provider, that supports OAuth 2.0.

required

The property can only be specified and is required if upload.authentication.method is oauth.

FileUpload uses the provider URL and the client ID to contact the identity provider.

Note

The client ID is specified in upload.authentication.client-id.

Example: Property upload.authentication.provider-url

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
🔑 Property upload.authentication.client-id

The OAuth 2.0 client ID.

required

The property can only be specified and is required if upload.authentication.method is oauth.

FileUpload uses the provider URL and the client ID to contact the identity provider.

Note

The provider URL is specified in upload.authentication.provider-url.

Example: Property upload.authentication.client-id

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
🔑 Property upload.authentication.credentials.type

The credentials type.

required

The property can only be specified and is required if upload.authentication.method is oauth.

FileUpload supports the following credentials types to obtain an access or ID token:

Credentials typetype
Client credentials with certificateclient-certificate
Client credentials with client secretclient-secret
Resource owner password credentialspassword

Example: Property upload.authentication.credentials.type

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
Client credentials with certificate

The OAuth 2.0 client credentials grant with a certificate (and optional password) is a non-interactive (machine to machine) authentication. FileUpload authenticates as a client application, rather than as an end-user, to obtain an access token. The access token does not contain any end-user information.

Note

In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.

🔑 Property upload.authentication.credentials.file

The absolute or relative path to the certificate file.

required

The property can only be specified and is required if upload.authentication.credentials.type is client-certificate.

FileUpload uses the specified certificate and password to obtain an access token from the identity provider.

Note

The password is specified in upload.authentication.credentials.password.

Example: Property upload.authentication.credentials.file

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
🔑 Property upload.authentication.credentials.password

The certificate password.

optional

The property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is null (no password).

FileUpload uses the specified certificate and password to obtain an access token from the identity provider.

Note

The certificate is specified in upload.authentication.credentials.file.

Example: Property upload.authentication.credentials.password

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
password: ${dataspot.certificate.password}
🔑 Property upload.authentication.credentials.scope

The requested scope.

optional

The property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is api:// + client ID + /.default.

FileUpload can request access to a specific scope, such as an exposed API.

Example: Property upload.authentication.credentials.scope

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Client credentials with client secret

The OAuth 2.0 client credentials grant with a client secret is a non-interactive (machine to machine) authentication. FileUpload authenticates as a client application, rather than as an end-user, to obtain an access token. The access token does not contain any end-user information.

Note

In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.

🔑 Property upload.authentication.credentials.client-secret

The client secret.

required

The property can only be specified and is required if upload.authentication.credentials.type is client-secret.

FileUpload uses the specified client secret to obtain an access token from the identity provider.

Example: Property upload.authentication.credentials.client-secret

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
🔑 Property upload.authentication.credentials.scope

The requested scope.

optional

The property can only be specified if upload.authentication.credentials.type is client-secret.
The default is api:// + client ID + /.default.

FileUpload can request access to a specific scope, such as an exposed API.

Example: Property upload.authentication.credentials.scope

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Resource owner password credentials

The OAuth 2.0 resource owner password credentials (ROPC) grant with a username and password is a non-interactive (machine to machine) authentication. FileUpload authenticates as an end-user and uses the OpenID Connect (OIDC) authentication layer to obtain an ID token. The ID token contains end-user information.

Attention

If the identity provider requires multi-factor authentication (MFA) (and therefore user interaction), using resource owner password credentials (ROPC) is not a suitable machine to machine authentication method. Alternatively, a non-interactive authentication must be used (e.g. client credentials with certificate or client credentials with client secret).

🔑 Property upload.authentication.credentials.username

The username.

Note

Typically, the username is an e-mail address.

required

The property can only be specified and is required if upload.authentication.credentials.type is password.

FileUpload uses the specified username and password to obtain an ID token from the identity provider.

Note

The password is specified in upload.authentication.credentials.password.

Example: Property upload.authentication.credentials.username

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}
🔑 Property upload.authentication.credentials.password

The password.

required

The property can only be specified and is required if upload.authentication.credentials.type is password.

FileUpload uses the specified username and password to obtain an ID token from the identity provider.

Note

The username is specified in upload.authentication.credentials.username.

Example: Property upload.authentication.credentials.password

services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}