This endpoint lets you search for relevant passages of up to 100 PDF documents by using advanced filters and semantic search queries.
Where documents search results in a list of relevant documents, this endpoint gives you a result of the most relevant parts of a set of documents.
For each query you have to specify a list of document ids using the in
filter (see exmple).
Semantic search works by comparing the semantic meaning of the search query to the semantic meaning of the document passages. The document passages are automatically derived and indexed upon file uploads (PDF only).
A natural flow would be:
/documents/status
to check if the document is ready (see beta documentation)/documents/passages/search
to find relevant passages in the document(s)This endpoint requires a list of document ids that you want to do semantic search on. You can not currently search through all passages for the entire project in a single query.
Search results are ranked by relevance, with the most relevant appearing first. How relevance is determined depends on the filter you choose:
Lexical Search Filter: This filter looks for exact word matches, making it ideal for finding specific dates, names, or keywords. Use this when you know exactly what you’re looking for. This is the default search behavior for the Documents API.
Semantic Search Filter: This filter understands the meaning behind words and phrases, even if they don’t match exactly. It’s useful for broader searches or when you’re looking for related ideas or concepts.
Choose the filter based on your needs: use Lexical for precise results and Semantic for more general, context-based searches.
When both filters are used together, the results are combined to give you a balanced, relevant list. For best results, we recommend starting with both filters active.
The following request will return document passages matching the specified search query for the document ids 1, 2, and 3.
{
"filter":{
"and": [
{
"semanticSearch":{
"property":["content"],
"value":"I have an overheating pump that is 85 degrees celsius, what dangers are there with higher temperatures"
}
},
{
"in":{
"property":["document", "id"],
"values":[1, 2, 3]
}
}
]
}
}
If you need keyword matching you can specify a lexicalSearch
filter. This helps with edge cases where it's hard to extract meaning such as numbers (eg. dates), names, etc.
{
"filter":{
"and": [
{
"semanticSearch":{
"property":["content"],
"value":"I have an overheating pump that is 85 degrees celsius, what dangers are there with higher temperatures"
}
},
{
"lexicalSearch":{
"property":["content"],
"value":"pump AJ523-253-133"
}
},
{
"in":{
"property":["document", "id"],
"values":[1, 2, 3]
}
}
]
}
}
Doing just lexical search is also possible:
{
"filter":{
"and": [
{
"lexicalSearch":{
"property":["content"],
"value":"pump AJ523-253-133"
}
},
{
"in":{
"property":["document", "id"],
"values":[1, 2, 3]
}
}
]
}
}
Filtering uses a special JSON filtering language.
It's flexible and consists of a number of different "leaf" filters, which can be combined arbitrarily using the boolean clauses and
.
This is the same filters used in the search, list and aggregate endpoints, with some restrictions and the addition of "semanticSearch" filter.
Leaf filter |
Supported fields |
Description |
---|---|---|
equals | Non-array type fields | Only includes results that are equal to the specified value.
|
in | Non-array type fields | Only includes results that are equal to one of the specified values.
|
semanticSearch | content |
|
lexicalSearch | content |
|
The following overview shows the properties you can filter on and which filter applies to which property.
Property | Type | Applicable filters |
---|---|---|
["document", "id"] |
integer | equals, in |
["document", "externalId"] |
string | equals, in |
["document", "instanceId"] |
object | equals, in |
["type"] |
string | equals |
["content"] |
string | semanticSearch, lexicalSearch |
{
"filter":{
"and": [
{
"in": {
"property": ["document", "instanceId"],
"values": [
{"space": "space1", "externalId": "my-ext-id-1"},
{"space": "space1", "externalId": "my-ext-id-45"},
{"space": "space1", "externalId": "some-other-ext-id"},
]
}
},
{
"semanticSearch":{
"property":["content"],
"value":"A person walking a dog at night time"
}
},
{
"lexicalSearch":{
"property":["content"],
"value":"11:23pm"
}
}
]
}
}
Fields to be set for the search request.
List of most relevant document passages for a given query. The results are sorted by relevance, and contains metadata such as page numbers.
The response for a failed request.
{- "filter": {
- "and": [
- {
- "prefix": {
- "property": [
- "name"
], - "value": "Report"
}
}, - {
- "equals": {
- "property": [
- "type"
], - "value": "PDF"
}
}
]
}, - "expansionStrategy": {
- "strategy": "symmetric",
- "chunk_count": 1
}, - "limit": 10
}
{- "items": [
- {
- "text": "Pump installation\nFollow these 15 steps:\n ...",
- "document": {
- "id": 1234
}, - "locations": [
- {
- "page_number": 3,
- "left": 68.78,
- "right": 478.56,
- "top": 75.04,
- "bottom": 386.1
}, - {
- "page_number": 4,
- "left": 68.78,
- "right": 478.56,
- "top": 75.04,
- "bottom": 386.1
}
]
}
]
}