Cognite Data Fusion API

Create entity matcher model

Train a model that predicts matches between entities (for example, time series names to asset names). This is also known as fuzzy joining. If there are no trueMatches (labeled data), you train a static (unsupervised) model, otherwise a machine learned (supervised) model is trained.

Securityoidc-token or oauth2-client-credentials or oauth2-open-industrial-data or oauth2-auth-code

Request

Request Body schema: application/json

sources required	Array of objects (Sources) [ 0 .. 2000000 ] items List of custom source object to match from, for example, time series. String key -> value. Only string values are considered in the matching. Both `id` and `externalId` fields are optional, only mandatory if the item is to be referenced in `trueMatches`.
targets required	Array of objects (Targets) [ 1 .. 2000000 ] items List of custom target object to match to, for example, assets. String key -> value. Only string values are considered in the matching. Both `id` and `externalId` fields are optional, only mandatory if the item is to be referenced in `trueMatches`.
	Array of objects or objects or objects or objects (TrueMatches) [ 1 .. 2000000 ] items A list of confirmed source/target matches, which will be used to train the model. If omitted, an unsupervised model is trained.
externalId	string (CogniteExternalId) <= 255 characters The external ID provided by the client. Must be unique for the resource type.
name	string (ModelName) <= 256 characters User defined name.
description	string (ModelDescription) <= 500 characters User defined description.
featureType	string Default: "simple" Each feature type defines one combination of features that will be created and used in the entity matcher model. All features are based on matching tokens. Tokens are defined at the top of the Entity matching section. The options are: Simple: Calculates the cosine-distance similarity score for each of the pairs of fields defined in `matchFields`. This is the fastest option. Insensitive: Similar to Simple, but ignores lowercase/uppercase differences. Bigram: Similar to `simple`, but adds similarity score based on matching bigrams of the tokens. FrequencyWeightedBigram: Similar to `bigram`, but give higher weights to less commonly occurring tokens. BigramExtraTokenizers: Similar to `bigram`, but able to learn that leading zeros, spaces, and uppercase/lowercase differences should be ignored in matching. BigramCombo: Calculates all of the above options, relying on the model to determine the appropriate features to use. Hence, this option is only appropriate if there are labeled data/trueMatches. This is the slowest option. Enum: "simple" "insensitive" "bigram" "frequencyweightedbigram" "bigramextratokenizers" "bigramcombo"
	Array of objects (MatchFields) List of pairs of fields from the target and source items, used to calculate features. All source and target items should have all the `source` and `target` fields specified here.
classifier	string (Classifier) The classifier used in the model. Only relevant if there are trueMatches/labeled data and a supervised model is fitted. Enum: "randomforest" "decisiontree" "logisticregression" "augmentedlogisticregression" "augmentedrandomforest"
ignoreMissingFields	boolean (IgnoreMissingFields) If True, replaces missing fields in `sources` or `targets` entities, for fields set in set in `matchFields`, with empty strings. Else, returns an error if there are missing data.

Responses

200

Success

400

The response for a failed request.

post/context/entitymatching

Request samples

Payload
JavaScript SDK
Java SDK
curl

application/json

{"sources": [{"id": 10,
"name": "a_name",
"field": "value",
"ignoredfield": {"key": "value"
}
}
],
"targets": [{"id": 6,
"name": "some_name",
"somefield": "value",
"ignoredfield": {"key": "value"
}
}
],
"trueMatches": [{"sourceId": 23,
"targetExternalId": "my.known.id"
}
],
"externalId": "my.known.id",
"name": "simple_model_1",
"description": "Simple model 1",
"featureType": "simple",
"matchFields": [{"source": "name",
"target": "name"
},
{"source": "name",
"target": "someField"
}
],
"classifier": "randomforest",
"ignoreMissingFields": true
}

Response samples

application/json

{"id": 1,
"externalId": "my.known.id",
"status": "Queued",
"createdTime": 1730204346000,
"startTime": 1730204346000,
"statusTime": 1730204346000,
"errorMessage": null,
"name": "simple_model_1",
"description": "Simple model 1",
"featureType": "simple",
"matchFields": [{"source": "name",
"target": "name"
},
{"source": "name",
"target": "someField"
}
],
"ignoreMissingFields": true,
"classifier": "randomforest",
"originalId": 111
}

➔ Next to Delete entity matcher model