Advanced Recognition API

Overview

The Advanced Recognition API provides high-precision text recognition with detailed position information. Unlike standard text recognition, this API returns not only the extracted text but also precise coordinates for each text block, including rotation rectangles and four-point coordinates.

It uses a unified JSON request format, accepting either URL references or base64-encoded image data.

Authentication

The API supports the following authentication method:

API Key: Pass your API key as a query parameter ?key=YOUR_API_KEY

Extract Text with Position Data

Extract text from an image file and get detailed position information for each text block, including rotation rectangles and four-point coordinates.

Request

POST /api/advanced-recognition

Parameters:

Parameter	Type	Required	Description
document	object	Yes	Document object
document.type	string	Yes	Fixed value "image_url"
document.image_url	string	Yes	Image URL or base64 data
filename	string	No	Filename (recommended for base64 data)
key	string	No	API key (query parameter, optional for logged-in users)

Examples:

Using Image URL:

curl -X POST "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": {
      "type": "image_url",
      "image_url": "https://llmocr.com/image.jpg"
    }
  }'

Using Base64 Image Data:

curl -X POST "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": {
      "type": "image_url",
      "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA..."
    },
    "filename": "document.jpg"
  }'

Response

Parameters:

Parameter	Type	Description
id	string	Database record ID
filename	string	Filename
content	string	Extracted text content (all text blocks joined by newlines)
ocrResult	object	Detailed OCR results with position information
format	string	Output format, fixed as "json"
timestamp	number	Processing completion timestamp
payload	string	API endpoint URL

ocrResult.words_info Structure:

Each item in the words_info array contains:

Field	Type	Description
text	string	Text content of the block
location	number[]	Four-point coordinates [x1,y1,x2,y2,x3,y3,x4,y4] (top-left → top-right → bottom-right → bottom-left)
rotate_rect	number[]	Rotation rectangle [center_x, center_y, width, height, angle], angle range: [-90, 90]

Example:

{
  "id": "12345",
  "filename": "document.jpg",
  "content": "Line 1 text\nLine 2 text",
  "ocrResult": {
    "words_info": [
      {
        "text": "Line 1 text",
        "location": [150, 80, 400, 80, 400, 120, 150, 120],
        "rotate_rect": [275, 100, 250, 40, 0]
      },
      {
        "text": "Line 2 text",
        "location": [150, 150, 400, 150, 400, 190, 150, 190],
        "rotate_rect": [275, 170, 250, 40, 0]
      }
    ]
  },
  "format": "json",
  "timestamp": 1640995200000,
  "payload": "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY"
}

LLMOCR

API Documentation

Quick Start

Overview

Authentication

Extract Text with Position Data

Request

Response