Supplier Integration Guide

Overview

This document outlines the integration process for suppliers who wish to expose their data securely while allowing our system to fetch necessary information. The integration follows a pull-based model, meaning suppliers do not need to send data proactively but must implement two API endpoints that we will query when needed.

Key Integration Principles

Data Stays on Your Servers
Suppliers maintain full control over their data. Our system will only retrieve what is needed, when it is needed.
Flat Query Structure
The request format will always be a flat dictionary, ensuring consistency in how queries are made across different systems.
Standardized Matching Process
The supplier’s system will be responsible for processing and returning the closest matches based on similarity.
Read-Only Access
The integration is designed for data retrieval only—our system will never write or modify supplier data.

Required Endpoints

1. Matching Endpoint (`/match`)

Purpose

This endpoint receives a structured query representing a company record and returns the closest matches from the supplier’s dataset based on similarity scores.

Expected Behavior

Accepts a query containing structured attributes like company name, tax identifiers, and location details.
Processes the query using a matching algorithm.
Returns a ranked list of the top k closest matches, each with a similarity score between 0 and 1.

Considerations

The similarity function must be designed to handle minor variations in company details (e.g., different spellings or abbreviations).
Data normalization and standardization may improve accuracy.

Data Matching Strategy

Suppliers need to implement a matching function that processes queries and identifies the most relevant records. Below are the key elements:

1. Query Structure

The request will be a flat dictionary with attributes such as:

{
  "company_name": "Example Logistics",
  "SIREN": "123456789",
  "VAT_number": "FR123456789",
  "city": "Paris",
  "postal_code": "75001",
  "country": "FR",
  "email": "contact@example.com"
}

2. Matching Process

Normalize input fields (e.g., case normalization, removal of special characters).
Compare query attributes against internal records.
Rank results by similarity score.

3. Response Format

The endpoint must return an array of the best-matching records, each structured as follows:

{
  "record": {
    "company_name": "Example Logistics",
    "SIREN": "123456789",
    "VAT_number": "FR123456789",
    "city": "Paris",
    "postal_code": "75001",
    "country": "FR",
    "email": "contact@example.com"
  },
  "similarity": 0.95
}

2. Agencies Endpoint (`/agencies`)

Purpose

Returns a list of available agency names from the supplier’s system. This helps define the scope of available entities.

Expected Behavior

Returns a predefined or dynamically retrieved list of agency names.
Can be hardcoded or fetched from an internal database.

Considerations

The list should be up to date to reflect operational agencies.
If agencies frequently change, consider automating list updates.

Implementation Notes

Suppliers are free to implement the service using any technology stack (e.g., Python, Node.js, Java, Go).
Consider caching results if processing is expensive.

Next Steps

Implement the two required endpoints.
Test the endpoints with sample queries.
Provide the API base URL and authentication details for integration.
Validate integration using test queries provided by our team.

For further questions, refer to the API documentation:

Example Implementation

Here is an example implementation of the integration:

from fastapi import APIRouter, Query
from pydantic import BaseModel, ConfigDict, Field
from typing import Dict, Any, List, cast
import pandas as pd

from code_matcher import find_top_k_neighbors, MatchResult


# The person will have to specify how to create a dictionary that creates the right query. The query will have to be flat. 

router = APIRouter()

# Pydantic model for a match result that wraps a generic record and a similarity score.
class MatchResultModel(BaseModel):
    record: Dict[str, Any] = Field(
        description="The matched record containing company information",
        examples=[{
            "company_name": "HEPPNER - SOCIETE DE TRANSPORTS",
            "SIREN": "763200202",
            "reasoning___VAT_number": "Found VAT number FR42169800202 in header of the document.",
            "VAT_number": "FR42169800202",
            "city": "ST PIERRE DES CORPS",
            "postal_code": "37700",
            "country": "FR",
            "email": "jean-michel@heppner-group.com"
        }]
    )
    similarity: float = Field(
        description="Similarity score between 0 and 1, where 1 indicates a perfect match",
        ge=0,
        le=1,
        examples=[0.95]
    )

    model_config = ConfigDict(
        from_attributes=True
    )

# Load data once at module level.
df = pd.read_csv("data/client_codes.csv")

# Cast the records to the expected type.
database: List[Dict[str, Any]] = cast(List[Dict[str, Any]], df.to_dict("records"))

# Create a Pydantic model for the request body
class MatchRequest(BaseModel):
    record: Dict[str, Any] = Field(
        description="Record to match against the database",
        examples=[{
            "company_name": "HEPPNER - SOCIETE DE TRANSPORTS",
            "SIREN": "763200202",
            "reasoning___VAT_number": "Found VAT number FR42169800202 in header of the document.",
            "VAT_number": "FR42169800202",
            "city": "ST PIERRE DES CORPS",
            "postal_code": "37700",
            "country": "FR",
            "email": "jean-michel@heppner-group.com"
        }]
    )
    k: int = Field(
        default=20,
        description="Number of matching results to return",
        ge=1,
        le=100,
        examples=[20]
    )

@router.post(
    "/match", 
    response_model=List[MatchResultModel],
    description="Find the k closest matching records for a given query record",
    responses={
        200: {
            "description": "Successful matching results",
            "content": {
                "application/json": {
                    "example": [
                        {
                            "record": {
                                "code": "HEPP37ST",
                                "company_name": "HEPPNER - SOCIETE DE TRANSPORTS",
                                "SIREN": "763200202",
                                "VAT_number": "FR42169800202",
                                "city": "ST PIERRE DES CORPS",
                                "postal_code": "37700",
                                "country": "FR",
                                "email": "jean-michel@heppner-group.com"
                            },
                            "similarity": 0.95
                        },
                        {
                            "record": {
                                "code": "HEPP75PA",
                                "company_name": "HEPPNER TRANSPORT SAS",
                                "SIREN": "763200203",
                                "VAT_number": "FR42169800203",
                                "city": "PARIS",
                                "postal_code": "75001",
                                "country": "FR",
                                "email": "contact@heppner-group.com"
                            },
                            "similarity": 0.85
                        }
                    ]
                }
            }
        }
    }
)
async def find_matches(
    request: MatchRequest
) -> List[MatchResultModel]:


    """
    Endpoint to find the k closest matching records for a given query.

    
    Example of record received from the frontend:
    Record = {
       "company_name": "HEPPNER - SOCIETE DE TRANSPORTS",
       "SIREN": "763200202", 
       "reasoning___VAT_number": "Found VAT number FR42169800202 in header of the document.",
       "VAT_number": "FR42169800202",
       "city": "ST PIERRE DES CORPS",
       "postal_code": "37700",
       "country": "FR",
       "email": "jean-michel@heppner-group.com"
    }

    
    You can also adapt the record to the query if necessary:
    adapted_record = {
        "name": record["company_name"],
        "SIREN": record["SIREN"],
        "VAT": record["VAT_number"],
        "city": record["city"],
        "zip_code": record["postal_code"],
        "country": record["country"],
        "email": record["email"]
    }

    Args:
        query: Query object containing search criteria.
        k: Number of results to return (default 5).

    Returns:
        A list of the k most similar records with their similarity scores.
    """

    # Convert the generic query model to a dictionary.
    results = find_top_k_neighbors(request.record, database, request.k)
    
    # Convert each result to a MatchResultModel.
    return [
        MatchResultModel(record=res["record"], similarity=res["similarity"])
        for res in results
    ]



@router.get(
    "/agencies", 
    response_model=List[str],
    description="Fetch a list of available agencies",
    responses={
        200: {
            "description": "List of agency names",
            "content": {
                "application/json": {
                    "example": [
                        "AGENCY_1",
                        "AGENCY_2",
                        "AGENCY_3",
                        "AGENCY_4",
                        "AGENCY_5"
                    ]
                }
            }
        }
    }
)
async def fetch_agencies() -> List[str]:
    """
    Endpoint to fetch a hardcoded list of agencies.
    Returns:
        A list of agency names.
    """

    agencies = [
      "AGENCE_1",
      "AGENCE_2",
      "AGENCE_3",
      "AGENCE_4",
      "AGENCE_5"
    ]
    
    return agencies

API Documentation

Endpoints

​Supplier Integration Guide

​Overview

​Key Integration Principles

​Required Endpoints

​1. Matching Endpoint (/match)

​Purpose

​Expected Behavior

​Considerations

​Data Matching Strategy

​1. Query Structure

​2. Matching Process

​3. Response Format

​2. Agencies Endpoint (/agencies)

​Purpose

​Expected Behavior

​Considerations

​Implementation Notes

​Next Steps

​Example Implementation

Supplier Integration Guide

Overview

Key Integration Principles

Required Endpoints

1. Matching Endpoint (`/match`)

Purpose

Expected Behavior

Considerations

Data Matching Strategy

1. Query Structure

2. Matching Process

3. Response Format

2. Agencies Endpoint (`/agencies`)

Purpose

Expected Behavior

Considerations

Implementation Notes

Next Steps

Example Implementation