Semantic search

This endpoint lets you search for relevant parts (passages) of up to 100 PDF documents by using advanced filters and semantic search queries. For each query you have to specify a list of document ids using the in filter (see exmple).

Semantic search works by comparing the semantic meaning of the search query to the semantic meaning of the document passages. The document passages are automatically derived and indexed upon file uploads (PDF only).

A natural flow would be:

  1. Upload a document
  2. Query /documents/status to check if the document is ready (see alpha documentation)
  3. Query /documents/semantic/search to find relevant passages in the document(s)

This endpoint requires a list of document ids that you want to do semantic search on. You can not currently search through all passages for the entire project in a single query.

Ordering

Search results are ranked by relevance, with the most relevant appearing first. How relevance is determined depends on the filter you choose:

  • Lexical Search Filter: This filter looks for exact word matches, making it ideal for finding specific dates, names, or keywords. Use this when you know exactly what you’re looking for. This is the default search behavior for the Documents API.

  • Semantic Search Filter: This filter understands the meaning behind words and phrases, even if they don’t match exactly. It’s useful for broader searches or when you’re looking for related ideas or concepts.

Choose the filter based on your needs: use Lexical for precise results and Semantic for more general, context-based searches.

When both filters are used together, the results are combined to give you a balanced, relevant list. For best results, we recommend starting with both filters active.

Examples

The following request will return document passages matching the specified search query for the document ids 1, 2, and 3.

{
    "filter":{
        "and": [
            {
                "semanticSearch":{
                    "property":["content"],
                    "value":"I have an overheating pump that is 85 degrees celsius, what dangers are there with higher temperatures"
                }
            },
            {
                "in":{
                    "property":["id"],
                    "values":[1, 2, 3]
                }
            }
        ]
    }
}

If you need keyword matching you can specify a lexicalSearch filter. This helps with edge cases where it's hard to extract meaning such as numbers (eg. dates), names, etc.

{
    "filter":{
        "and": [
            {
                "semanticSearch":{
                    "property":["content"],
                    "value":"I have an overheating pump that is 85 degrees celsius, what dangers are there with higher temperatures"
                }
            },
            {
                "lexicalSearch":{
                    "property":["content"],
                    "value":"pump AJ523-253-133"
                }
            },
            {
                "in":{
                    "property":["id"],
                    "values":[1, 2, 3]
                }
            }
        ]
    }
}

Doing just lexical search is also possible:

{
    "filter":{
        "and": [
            {
                "lexicalSearch":{
                    "property":["content"],
                    "value":"pump AJ523-253-133"
                }
            },
            {
                "in":{
                    "property":["id"],
                    "values":[1, 2, 3]
                }
            }
        ]
    }
}

Filtering

Filtering uses a special JSON filtering language. It's flexible and consists of a number of different "leaf" filters, which can be combined arbitrarily using the boolean clauses and. This is the same filters used in the search, list and aggregate endpoints, with some restrictions and the addition of "semanticSearch" filter.

Supported leaf filters

Leaf filter
Supported fields
Description
equals Non-array type fields Only includes results that are equal to the specified value.
{
    "equals":{
        "property":["property"],
        "value":"example"
    }
}
in Non-array type fields Only includes results that are equal to one of the specified values.
{
    "in":{
        "property":["property"],
        "values":[
            1,
            2,
            3
        ]
    }
}
semanticSearch content
{
    "semanticSearch":{
        "property":["property"],
        "value":"example"
    }
}
lexicalSearch content
{
    "lexicalSearch":{
        "property":["property"],
        "value":"example"
    }
}

Properties

The following overview shows the properties you can filter on and which filter applies to which property.

Property Type Applicable filters
["id"] integer equals, in
["type"] string equals
["content"] string semanticSearch, lexicalSearch

Full example

{
    "filter":{
      "and": [
        {
          "in": {
            "property": ["id"],
            "values": [1, 2, 3, 4, 5, 6, 7, 8, 9]
          }
        },
        {
          "semanticSearch":{
              "property":["content"],
              "value":"A person walking a dog at night time"
          }
        },
        {
          "lexicalSearch":{
              "property":["content"],
              "value":"11:23pm"
          }
        }
      ]
    }
}
Securityoidc-token or oauth2-client-credentials or oauth2-open-industrial-data or oauth2-auth-code
Request
header Parameters
cdf-version
string

cdf version header. Use this to specify the requested CDF release.

Example: alpha
Request Body schema: application/json
required

Fields to be set for the search request.

required
bool filters (and (object)) or (leaf filters (equals (object) or in (object) or semanticSearch (object) or lexicalSearch (object))) (DocumentSemanticFilter)
required
DocumentSemanticSearchPassageExpansionSymmetric (object)

A expansion strategy to to increase the text view for each passage returned. Helpful to increase context for an LLM.

limit
integer <int32> [ 1 .. 10 ]
Default: 10

Maximum number of items.

Responses
200

List of most relevant document passages for a given query. The results are sorted by relevance, and contains metadata such as page numbers.

400

The response for a failed request.

post/documents/semantic/search
Request samples
application/json
{
  • "filter": {
    • "and": [
      ]
    },
  • "expansionStrategy": {
    • "strategy": "symmetric",
    • "chunk_count": 1
    },
  • "limit": 10
}
Response samples
application/json
{
  • "items": [
    • {
      }
    ]
}