API Documentation
Advanced Recognition API
High-precision text recognition with position detection, extracts text content and provides detailed coordinate information for each text block
Overview
The Advanced Recognition API provides high-precision text recognition with detailed position information. Unlike standard text recognition, this API returns not only the extracted text but also precise coordinates for each text block, including rotation rectangles and four-point coordinates.
It uses a unified JSON request format, accepting either URL references or base64-encoded image data.
Authentication
The API supports the following authentication method:
- API Key: Pass your API key as a query parameter
?key=YOUR_API_KEY
Extract Text with Position Data
Extract text from an image file and get detailed position information for each text block, including rotation rectangles and four-point coordinates.
Request
POST /api/advanced-recognition
Parameters:
Parameter | Type | Required | Description |
---|---|---|---|
document | object | Yes | Document object |
document.type | string | Yes | Fixed value "image_url" |
document.image_url | string | Yes | Image URL or base64 data |
filename | string | No | Filename (recommended for base64 data) |
key | string | No | API key (query parameter, optional for logged-in users) |
Examples:
Using Image URL:
curl -X POST "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": {
"type": "image_url",
"image_url": "https://llmocr.com/image.jpg"
}
}'
Using Base64 Image Data:
curl -X POST "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": {
"type": "image_url",
"image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA..."
},
"filename": "document.jpg"
}'
Response
Parameters:
Parameter | Type | Description |
---|---|---|
id | string | Database record ID |
filename | string | Filename |
content | string | Extracted text content (all text blocks joined by newlines) |
ocrResult | object | Detailed OCR results with position information |
format | string | Output format, fixed as "json" |
timestamp | number | Processing completion timestamp |
payload | string | API endpoint URL |
ocrResult.words_info Structure:
Each item in the words_info array contains:
Field | Type | Description |
---|---|---|
text | string | Text content of the block |
location | number[] | Four-point coordinates [x1,y1,x2,y2,x3,y3,x4,y4] (top-left → top-right → bottom-right → bottom-left) |
rotate_rect | number[] | Rotation rectangle [center_x, center_y, width, height, angle], angle range: [-90, 90] |
Example:
{
"id": "12345",
"filename": "document.jpg",
"content": "Line 1 text\nLine 2 text",
"ocrResult": {
"words_info": [
{
"text": "Line 1 text",
"location": [150, 80, 400, 80, 400, 120, 150, 120],
"rotate_rect": [275, 100, 250, 40, 0]
},
{
"text": "Line 2 text",
"location": [150, 150, 400, 150, 400, 190, 150, 190],
"rotate_rect": [275, 170, 250, 40, 0]
}
]
},
"format": "json",
"timestamp": 1640995200000,
"payload": "https://llmocr.com/api/advanced-recognition?key=YOUR_API_KEY"
}