This Issue deals with creating a specification of the Data Management API based on the design considerations and recommendations stated in the test listed below ~~I didn't want to paste directly here, to avoid walls of text.~~
Core aspects
1. Endpoints vs Use cases
Following the REST principles, every endpoint should modify the state of an object rather than trigger behaviour. For example, to initiate a data transfer the endpoints should be POST http://your-connector.com/api/v1/transfer/transfer-request
rather than POST http://your-connector.com/api/v1/transfer/start-transfer
Consequently, uses cases (as outlined below) might not have a 1-to-1 translation into API endpoints, they might even involve several consecutive API calls.
2. delete
is only possible sometimes
objects may only be deleted if they are not involved in ongoing contracts, etc. Updating is (initially) done through "delete-and-recreate"
3. support pagination
every GET
endpoint should support pagination, sorting and filtering
4. Desired outcome
We should have a (or several) YAML files containing the specification of each of the endpoints. In addition, we should have Controller
classes that implement that spec, even if most of them are just stubs returning dummy data for the immediate future. So, by referencing the data-management
-BOM, a controller implicitly exposes the entire API.
Note: we will NOT consider the standalone deployment scenario, especially the communication channel, in this issue
5. Module structure
Every domain object should be reflected in the module
extensions/api/data-management/asset //includes DataAddress
extensions/api/data-management/contractdefinition
extensions/api/data-management/contractnegotiation
extensions/api/data-management/transferprocess
extensions/api/data-management/policy //includes PolicyDefinition
a BOM should exist for the entire API.
Full proposal
Cross-cutting aspects
Terminology
- domain object: (or "business object") an object such as
Asset
, DataAddress
, ContractDefinition
etc. May contain potentially internal information.
- entity: separate database-specific object, that is used to persist a domain object. May differ from business objects and may contain database-specific
information, such as IDs, foreign keys. For example, the
ContractAgreementEntity
would not contain the entire Asset
, but merely maintain a foreign key to an AssetEntity
. Entities are never to be passed outside the persistence layer.
- DTO: public representation of a domain object that is specific to a use case. Should contain only the information that is necessary. Different use cases may
warrant different DTOs for the same business object.
Object Lifecycle
Most domain objects go through a lifecycle where transitions might or might not be possible at a given time. Or, certain transitions may trigger subsequent
actions, which could potentially be long-running or even involve human interaction. For example, decommissioning an Asset
that is involved in
a ContractAgreement
might cause the contract to expire, or not be possible at all. A similar situation exists for ContractDefinition
s.
We'll assume that domain objects have some sort of state associated with them, regardless whether that is implemented using an actual state
field, or a
computed state, or anything else.
We propose the following definition of states domain objects can make use of:
INITIAL
: object is not included in any communication with other dataspace participants. This state is optional, API clients may choose to "publish immediately".
PUBLISHED
: object is (potentially) visible to other participants.
DEPRECATED
: object is only available to existing relations, e.g. an Asset is not included in new contract offers anymore.
DECOMMISSIONED
: object is not in use anymore (i.e. not included in any active contract agreement or offer)
DELETED
: object is not visible to clients anymore, just exists for traceability reasons.
Decommissioning: as a starting point let's go with option a) as it is simpler to implement. If necessary, we can change this later.
Authentication
This solely deals with access to the API and is completely separate from the IdentityService
and connector-to-connector authentication. There should be a
thin authentication-spi
extension that contains an AuthenticationService
, which could then be backed by an implementation extension LDAP, AAD, Apache Shiro,
Keycloak, OAuth2/OIDC, etc. In a first PoC it could be sufficient to have an API-Key-based authentication implementation.
Updating and deleting items
Modifying certain domain objects could potentially have complicated and legally relevant consequences, therefor there is no UPDATE
API for those domain objects. There will be DELETE
endpoints, which return an error when the item in question cannot be deleted, e.g. when an Asset
is already involved in a ContractAgreement
.
Consequently, whether or not an item can be "updated" (i.e. deleted and recreated) depends on its state, which could be an explicit field on the domain
object, or it could be a computed property, e.g. by checking whether an Asset
is involved in a ContractAgreement
or ContractNegotiation
.
That said, creating new items is always possible, but they will have a different identity.
API layer
In order to decouple the API from the internal domain objects and entities there should be a dedicated API layer, that contains:
- domain objects: those are the data objects we have now, e.g.
Asset
, ContractAgreement
, etc. They are not
database entities!
- DTOs: specific to a use case. should contain as little information as possible (e.g. ID instead of full object). Cannot contain internal information.
- transformers: convert DTOs into the corresponding domain objects (
Asset
, DataAddress
, ContractDefinition
, etc.) and back.
- services: must model a particular use case. For example, there should be an
AssetService
, that has methods uploadAsset()
, publishAsset()
,
and rescindAsset()
. Such a service would then maintain references to the AssetIndex
, the ContractDefinitionStore
, etc and be responsible for
transactional consistency.
- Pagination/Sorting: all
Read
endpoints must support filtering, sorting and pagination. Even though cursor-based pagination is widely used nowadays, it
is mostly used for real-time applications (things like Facebook) where data is expected to change constantly. Here, we'll deal with mostly static data, so we
can reap the benefits of offset pagination as it is easier to implement. A Pagination
object could like this:
class PagingSpec {
int getOffset(); // default: 0
int getLimit(); // default: 50
@Nullable String getSortCriteria(); // refers to one particular field of the object
@Nullable String getFilterExpression();
SortOrder getSortOrder(); // can be ASC or DESC
}
This will greatly improve usability and performance in UI views such as tables and lists, and will reduce stress on the involved systems.
On a related note, the entities that we have now, e.g. Asset
, ContractDefinition
will most certainly differ from the actual database entities. For example,
a ContractDefinition
would not contain the Asset
as nested object, but would maintain a foreign key to it. Thus, the way domain objects are modeled
may/will differ from how database entities are modeled.
Assets + DataAddress
Create
- request must contain the
Asset
and the DataAddress
, both stored transactionally. The DataAddress
may initially be an empty skeleton.
DataLoader
can be used to do that.
Read
GET
endpoints for Asset
and DataAddress
should exist.
- use the
AssetIndex
with a SELECT_ALL
query
- UI implementations could create filter expressions -> use
Criterion
and add pagination
Update
- Updating the
DataAddress
is done through a separate endpoint. Copy/clone is not needed, DataAddress
es can just be modified as they are an internal concept
and do not modify the characteristics of an Asset
.
Assets
can not be updated.
Delete
- Deleting Assets is only possible while they are not included in an ongoing
ContractNegotation
or ContractAgreement
. The request will fail otherwise.
- If an
Asset
is still in this manner "unpublished", deleting it is straight-forward.
- again, a Danger Zone could be implemented to blindly delete an
Asset
without looking left or right.
Danger Zone
As a convenience feature we could implement a danger zone, where deleting assets is possible but no validation is performed, which could put the application's
data backend in an invalid or inconsistent state.
Policies + PolicyTemplates
Policies are pre-loaded as so-called PolicyTemplates
and are stored in a PolicyTemplateStore
. Upon instantiation of a policy e.g. in
the ContractDefinition
, the template is copied into the ContractDefinition
with a new policy ID.
The reason we propose to actually copy the policy, as opposed to reference it, is that copying them would clutter the Read
endpoint with a potentially high
number of almost identical policies. Assuming that this endpoint will be used to select policies for the creation of ContractDefinition
, it is simply a matter
of usability. Furthermore, a ContractDefinition
is legally binding and should remain in place, even if the backing policy changes at some point.
In order to bring this feature back in line with other domain endpoints, there could be an additional PolicyStore
, where all instantiated policies are stored,
so as not to persist them embedded in a ContractDefinitions
.
Create
no special considerations. Copies a PolicyTemplate
into the PolicyStore
effectively "instantiating" it.
Update
PolicyTemplates
can simply be updated using copy/clone
semantics, without special considerations. Instantiated Policy
objects are immutable.
Delete
PolicyTemplates
can simply be decommissioned (using the lifecycle process), without special considerations.
ContractDefinitions
Create
- request must contain ID of
contractPolicy
and accessPolicy
(cf. previous section)
- Policies must pre-exist in a
PolicyTemplateStore
and upon creating the ContractDefinition
the policy is simply copied.
- Those copies have different
PolicyId
values, so they are technically different policies, there is no way to backreference the PolicyTemplate
.
Read
- the
GET
endpoint should accept an optional paging object
- if we want to re-use the
ContractDefinitionStore
, the interface needs to be amended to accept those parameters
Update
no update, just "create-new"
Delete
Deleting a ContractNegotation
is only possible as long as there is no ContractAgreement
based off of it. There is no way of knowing whether
a ContractDefinition
is involved in an ongoing ContractNegotation
, so deleting it will cause the CN to fail. ``
Lifecycle aspects
Decommissioning: The ContractDefinitionStore
cannot return decommissioned or deprecated definitions.
ContractNegotiations
Read
no special considerations, filtering and pagination must be implemented.
Update
not possible through the API.
Delete
not possible throught the API.
Cancel
There could be a CANCEL
endpoint which would effectively move it to a (new) CANCELLING/CANCELLED
state.
Decline
see Cancel
, would move it to the DECLINING/DECLINED
state.
Q: Should the ContractNegotiationManager
use copy/clone
semantics as well?upd
ContractOffers
Create
/Read
/Update
/Delete
ContractOffers
are not domain objects in the conventional sense, they are generated on the fly for every participant, so we cannot directly "touch"
those. They are protocol details rather than domain objects. Rescinding an offer would have to be done by updating the ContractDefinition
. To browse other
participants' offerings, see use case "Browse Catalog"
TransferProcesses
Create
/Update
/Delete
not possible
Read
no special considerations, filtering and paging must be implemented.
Cancel
Since there is no delete
operation, the CANCEL
endpoint would effectively send a TransferProcess
to a CANCELLED
state in the next tick-over. The
CommandQueue must be used for this.
Note: there will be other APIs pertaining to TransferProcesses where they can be updated. Necessary endpoints would be:
- PROVISIONING/DEPROVISIONING done (= state changes, could be done with status checkers that listen on a webhook)
- USAGE POLICY expired
- DELETION confirmed
- DATA HAS CHANGED (sending notifications to EDC, e.g. when an AI model has improved)
- The API of backend data services needs to be done as well.
Use Cases
- Upload an Asset
- Publish Asset
- Rescind/unpublish an Asset
- Browse Catalog
- Request access to an Asset (i.e. start negotiation)
- Cancel request (= decline or cancel negotiation)
- Transfer an asset, potentially enter data destination on-the-fly
- Cancel data transfer, up until it's actually in
IN_PROGRESS
- Deprovision data transfer
Linked Issues - Preliminary work:
- [x] #583
- [x] #584
- [x] #593
- [x] #586
Linked Issues - endpoints
- [x] #587
- [x] #588
- [x] #589
- [x] #590
- [x] #591
This Issue replaces/supersedes #476
[EDIT 1] kicked "copy/clone"
[EDIT 2] corrected some whitespaces
[EDIT 3] update description and changed "TL;DR" -> "Core aspects", added sub-issues
documentation core feature api