FileUpload
FileUpload is a service that uploads a payload file to a destination application (dataspot) using the upload API.
FileUpload automatically handles the connection to the destination application, the authentication settings, the upload options, and the polling of the asynchronous import job - making it a very convenient approach to uploading payload files created by, for example, 3rd-party tools or external extraction and transformation processes.
Instead of implementing the upload to dataspot (including the supported authentication methods and upload options) in the 3rd-party tool or external process, use a FileUpload service.
- Specify the payload file (for example, using a placeholder such as
${payload.file}). - Specify the destination application, authentication settings and upload options.
- Start the CLI application to execute the service that automatically uploads the payload file and polls the import job:
java -Dpayload.file=upload.json -jar dataspot-connector.jar --service=MyService --file=myservices.yaml
Find FileUpload configuration examples here.
Architecture
FileUpload has an architecture with the following steps and components.
In contract to connectors, that extract and transform metadata from an external source, FileUpload reads metadata from an existing payload file - skipping the extraction and transformation steps altogether.
Workflow
FileUpload implements a simple workflow, where metadata is read from a payload file and asynchronously uploaded to a destination.
| Step | Description |
|---|---|
| Upload | FileUpload connects to the specified destination application using the selected authentication settings and upload options. FileUpload sends the payload to the destination application, where an asynchronous import job is submitted to process the uploaded payload. |
| Poll Job | FileUpload polls the asynchronous import job and displays the job progress and statistics. When the import job has finished, FileUpload terminates. If the import job fails, FileUpload displays the error logs and exits. |
The import job is a regular, asynchronous job in dataspot. It can be monitored in the dashboard of the user.
Configuration
A FileUpload service is configured by defining its unique name, the service type FileUpload, and the configuration.
Example: FileUpload
services:
MyService:
type: FileUpload
While YAML itself doesn't enforce any naming style for property names, multi-word properties (for example, access key) are typically specified in lowercase separated by hyphens (for example, access-key).
This naming style - commonly referred to as kebab-case - is used in the following descriptions and examples.
However, all multi-word properties can also be specified in camelCase (for example, accessKey).
In additional to the general service configuration, FileUpload has the following configuration to specify the payload file as well as the destination application.
Properties marked with * are required for FileUpload to run.
Source
FileUpload uploads a payload file.
🔑 Property source.file *
The absolute or relative path to the payload file.
requiredFileUpload verifies the payload file exists and uploads it to the destination application.
Example: Property source.file
services:
MyService:
type: FileUpload
source:
# use a placeholder to get the file path from an external source, such as a command-line argument
file: ${payload.file}
Upload
FileUpload uploads metadata by specifying the destination application (e.g. the URL, the database and the scheme) as well as the upload options and the authentication settings.
🔑 Property upload.url *
The URL of the destination application.
requiredFileUpload uploads the metadata to the specified URL.
Example: Property upload.url
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
If the destination application has a context path (e.g. myapp), the URL must also include the context path (e.g. https://my.server.com/dataspot).
🔑 Property upload.database *
The database of the destination application.
requiredFileUpload uploads the metadata to the specified database.
Example: Property upload.database
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
🔑 Property upload.tenant
The tenant in the database.
optionalThe default is null (determine the tenant automatically).
If a tenant is defined, FileUpload uploads the metadata to the specified tenant.
Otherwise, the tenant is automatically determined based on the credentials (e.g. login ID) or on the access key.
If the user does not have access to the tenant, or the tenant does not exist, the request is rejected.
Example: Property upload.tenant
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
tenant: mytenant
If an access key is specified, the tenant is determined only by the access key.
If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.
🔑 Property upload.access-key
The access key to identify a user in the destination application.
An access key is a secret character sequence that is created, assigned to a user, and managed in the destination application. An access key replaces the login with credentials (e.g. login ID and password).
The default is null (no access key).
If an access key is defined, the access key is used to identify the user and tenant. Otherwise, the user in the destination application is determined based on the credentials (e.g. login ID).
Example: Property upload.access-key
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
access-key: ${dataspot.access-key}
If an access key is specified, the tenant is determined only by the access key.
If a tenant specified in upload.tenant is different from the tenant determined by the access key, the request is rejected.
🔑 Property upload.scheme *
The scheme in the tenant.
requiredFileUpload uploads the metadata to the specified scheme.
The scheme must conform to the uploaded metadata. For example, data objects require the scheme to be a data model.
Example: Property upload.scheme
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
The scheme must already exist in the specified database - it will not be created automatically.
🔑 Property upload.collection
The collection in the specified scheme.
optionalThe default is null (no collection).
If a collection is defined, FileUpload uploads the metadata to the specified collection.
Otherwise, the metadata is uploaded to the root of the specified scheme.
Example: Property upload.collection
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
collection: My Collection
The collection must already exist in the specified scheme - it will not be created automatically.
Options
Reconciliation
FileUpload can specify how uploaded metadata is reconciled with existing metadata.
🔑 Property upload.options.agent-id
The identification of the upload agent.
An agent identification is an arbitrary string that identifies the upload agent and allows the origin of metadata to be traced back to the service that uploaded it.
The default is null (no agent identification).
If an agent identification is defined, FileUpload "takes ownership" of the uploaded metadata by storing the agent identification alongside the metadata.
Otherwise, FileUpload does not modify the ownership of the uploaded metadata.
Example: Property upload.options.agent-id
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
agent-id: ${dataspot.agent-id}
The agent identification allows metadata to be uploaded (e.g. into the same scheme or collection) from different sources by different agents without deleting or deactivating each other's metadata. This prevents a service from accidentally deleting or deactivating metadata that was uploaded by a different agent or that was created manually in the user interface.
🔑 Property upload.options.operation
The reconciliation mode of the upload.
optionalThe default is ADD.
The reconciliation mode defines whether metadata that already exists in the scheme, but is not uploaded by FileUpload, is considered obsolete.
Metadata that was previously uploaded by another agent is not considered obsolete.
operation | Description |
|---|---|
ADD | Existing metadata, that was not uploaded, is not modified. |
REPLACE | Existing metadata, that was not uploaded, is considered obsolete, if the superordinate object was uploaded. |
FULL_LOAD | Existing metadata, that was not uploaded, is considered obsolete. |
The property upload.options.on-delete defines whether obsolete metadata is actually deleted or is instead set to a specific workflow status.
Example: Property upload.options.operation
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
Workflow
FileUpload can specify the workflow statuses of inserted, updated, or deleted metadata.
By defining a corresponding workflow in the destination application, the modifications performed by the upload can, for example, trigger notifications or be integrated in an approval process.
🔑 Property upload.options.on-insert
The new status for inserted metadata.
optionalThe default is null (no insert status).
If an insert status is defined, inserted metadata is set to the specified workflow status. Otherwise, inserted metadata is automatically set to the initial status of the scheme's workflow.
Example: Property upload.options.on-insert
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-insert: WORKING
🔑 Property upload.options.on-update
The new status for updated metadata.
optionalThe default is null (no update status).
If an update status is defined, updated metadata is set to the specified workflow status. Otherwise, the status of updated metadata is not modified.
Example: Property upload.options.on-update
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
on-update: SUBMITTED
🔑 Property upload.options.on-delete
The new status for deleted metadata.
optionalThe default is null (no delete status).
If a delete status is defined, obsolete metadata is set to the specified workflow status. Otherwise, obsolete metadata is actually deleted.
Example: Property upload.options.on-delete
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
operation: REPLACE
on-delete: INACTIVE
Dry run
🔑 Property upload.options.dry-run
The flag that specifies if the upload is performed as a dry run.
optionalThe default is false (disabled).
If the flag is true, the upload is performed as a dry run - without changing any data.
Otherwise, the upload is actually performed.
The dry run can be used to test or check FileUpload, by querying the job statistics and logs to see which metadata would actually be changed or which errors and warnings would occur.
Example: Property upload.options.dry-run
services:
MyService:
type: FileUpload
upload:
url: https://my.dataspot.io
database: mydatabase
scheme: My Model
options:
dry-run: true
For convenience, the dry run can also be enabled using the built-in placeholder ${upload.options.dry-run}, without ever specifying upload.options.dry-run in the service file. For example, as a system property:
java -Dupload.options.dry-run=true
Headers
FileUpload can specify additional HTTP header fields.
🔑 Property upload.headers
The additional HTTP header fields of the upload request, specified as a map of fields.
The map key is the field name. The map value is the list of field values (or a single field value).
The default is null (no additional headers).
If additional headers are defined, FileUpload sets the corresponding HTTP header fields of the upload request.
Additional headers can be used to modify the upload request, by setting HTTP header fields that are not natively supported by the FileUpload configuration.
Example: Property upload.headers
services:
MyService:
type: FileUpload
upload:
headers:
Accept-Language: en
X-Data:
- ${data.first}
- ${data.second}
Notice how Accept-Language is a single-value list - containing only the single value en - and can be formatted as a single value, rather than as a list with a single value.
Authentication
FileUpload can specify the authentication settings for uploading to the destination application.
🔑 Property upload.authentication
The authentication settings for the upload.
optionalThe default is null (no authentication).
If an authentication is defined, FileUpload uploads to the destination application with the specified authentication.
Otherwise, FileUpload uploads without authentication.
🔑 Property upload.authentication.method
The authentication method.
requiredThe property is required if upload.authentication is specified.
FileUpload supports the following authentication methods:
| Authentication method | method |
|---|---|
| Username and password | password |
| Bearer token | token |
| OAuth 2.0 | oauth |
Example: Property upload.authentication.method
services:
MyService:
type: FileUpload
upload:
authentication:
method: password
Username and password
FileUpload can use basic authentication with username and password for uploading to the destination application.
🔑 Property upload.authentication.username
The username.
requiredThe property can only be specified and is required if upload.authentication.method is password.
FileUpload uses the specified username and password for authentication.
The password is specified in upload.authentication.password.
Example: Property upload.authentication.username
services:
MyService:
type: FileUpload
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
🔑 Property upload.authentication.password
The password.
requiredThe property can only be specified and is required if upload.authentication.method is password.
FileUpload uses the specified username and password for authentication.
The username is specified in upload.authentication.username.
Example: Property upload.authentication.password
services:
MyService:
type: FileUpload
upload:
authentication:
method: password
username: ${dataspot.basic.username}
password: ${dataspot.basic.password}
Bearer token
FileUpload can use authentication with a bearer token for uploading to the destination application.
🔑 Property upload.authentication.token
The bearer token.
requiredThe property can only be specified and is required if upload.authentication.method is token.
FileUpload uses the specified bearer token for authentication.
Example: Property upload.authentication.token
services:
MyService:
type: FileUpload
upload:
authentication:
method: token
token: ${dataspot.token}
OAuth 2.0
FileUpload can use OAuth 2.0 authentication for uploading to the destination application.
The application supports non-interactive (machine to machine) grants to obtain an access token as a client application or to obtain an ID token as an end-user.
If the destination application is hosted in the cloud (e.g. as Software-as-a-Service in Microsoft Azure), the server can be protected by an authentication module (e.g. Azure Easy Auth). When uploading to the destination application, a valid access or ID token must be provided so that the request can pass the authentication module.
🔑 Property upload.authentication.provider-url
The URL of the identity provider, that supports OAuth 2.0.
requiredThe property can only be specified and is required if upload.authentication.method is oauth.
FileUpload uses the provider URL and the client ID to contact the identity provider.
The client ID is specified in upload.authentication.client-id.
Example: Property upload.authentication.provider-url
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
🔑 Property upload.authentication.client-id
The OAuth 2.0 client ID.
requiredThe property can only be specified and is required if upload.authentication.method is oauth.
FileUpload uses the provider URL and the client ID to contact the identity provider.
The provider URL is specified in upload.authentication.provider-url.
Example: Property upload.authentication.client-id
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
🔑 Property upload.authentication.credentials.type
The credentials type.
requiredThe property can only be specified and is required if upload.authentication.method is oauth.
FileUpload supports the following credentials types to obtain an access or ID token:
| Credentials type | type |
|---|---|
| Client credentials with certificate | client-certificate |
| Client credentials with client secret | client-secret |
| Resource owner password credentials | password |
Example: Property upload.authentication.credentials.type
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
Client credentials with certificate
The OAuth 2.0 client credentials grant with a certificate (and optional password) is a non-interactive (machine to machine) authentication.
FileUpload authenticates as a client application, rather than as an end-user, to obtain an access token.
The access token does not contain any end-user information.
In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.
🔑 Property upload.authentication.credentials.file
The absolute or relative path to the certificate file.
requiredThe property can only be specified and is required if upload.authentication.credentials.type is client-certificate.
FileUpload uses the specified certificate and password to obtain an access token from the identity provider.
The password is specified in upload.authentication.credentials.password.
Example: Property upload.authentication.credentials.file
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
🔑 Property upload.authentication.credentials.password
The certificate password.
optionalThe property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is null (no password).
FileUpload uses the specified certificate and password to obtain an access token from the identity provider.
The certificate is specified in upload.authentication.credentials.file.
Example: Property upload.authentication.credentials.password
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
password: ${dataspot.certificate.password}
🔑 Property upload.authentication.credentials.scope
The requested scope.
optionalThe property can only be specified if upload.authentication.credentials.type is client-certificate.
The default is api:// + client ID + /.default.
FileUpload can request access to a specific scope, such as an exposed API.
Example: Property upload.authentication.credentials.scope
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-certificate
file: /home/connector/certificate.pfx
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Client credentials with client secret
The OAuth 2.0 client credentials grant with a client secret is a non-interactive (machine to machine) authentication.
FileUpload authenticates as a client application, rather than as an end-user, to obtain an access token.
The access token does not contain any end-user information.
In addition to the access token, that doesn't contain any end-user information, an access key should also be provided to identify the user in the application.
🔑 Property upload.authentication.credentials.client-secret
The client secret.
requiredThe property can only be specified and is required if upload.authentication.credentials.type is client-secret.
FileUpload uses the specified client secret to obtain an access token from the identity provider.
Example: Property upload.authentication.credentials.client-secret
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
🔑 Property upload.authentication.credentials.scope
The requested scope.
optionalThe property can only be specified if upload.authentication.credentials.type is client-secret.
The default is api:// + client ID + /.default.
FileUpload can request access to a specific scope, such as an exposed API.
Example: Property upload.authentication.credentials.scope
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: client-secret
client-secret: ${dataspot.client-secret}
scope: api://8241ca67-24b5-12ac-24fe-7dab2154361a/.default # other client id
Resource owner password credentials
The OAuth 2.0 resource owner password credentials (ROPC) grant with a username and password is a non-interactive (machine to machine) authentication.
FileUpload authenticates as an end-user and uses the OpenID Connect (OIDC) authentication layer to obtain an ID token.
The ID token contains end-user information.
If the identity provider requires multi-factor authentication (MFA) (and therefore user interaction), using resource owner password credentials (ROPC) is not a suitable machine to machine authentication method. Alternatively, a non-interactive authentication must be used (e.g. client credentials with certificate or client credentials with client secret).
🔑 Property upload.authentication.credentials.username
The username.
Typically, the username is an e-mail address.
The property can only be specified and is required if upload.authentication.credentials.type is password.
FileUpload uses the specified username and password to obtain an ID token from the identity provider.
The password is specified in upload.authentication.credentials.password.
Example: Property upload.authentication.credentials.username
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}
🔑 Property upload.authentication.credentials.password
The password.
requiredThe property can only be specified and is required if upload.authentication.credentials.type is password.
FileUpload uses the specified username and password to obtain an ID token from the identity provider.
The username is specified in upload.authentication.credentials.username.
Example: Property upload.authentication.credentials.password
services:
MyService:
type: FileUpload
upload:
authentication:
method: oauth
provider-url: https://login.microsoftonline.com/b0ebd953-fb5f-425e-98fc-ec46bf8ce2f1
client-id: 6731de76-14a6-49ae-97bc-6eba6914391e
credentials:
type: password
username: ${dataspot.oauth.username}
password: ${dataspot.oauth.password}