Task: PDF Parse
Short description: Parse PDF content and return extracted text or structured results using multipart/form-data.
Overview
- Method:
POST - Path:
/task/gi/pdf-parse - Content-Type:
multipart/form-data
Authentication
- Header:
Authorization: Bearer <token>
Request Example
Form Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | PDF file to parse |
| end_pages | integer | Yes | Number of PDF pages to process |
| is_ocr | boolean | No | Whether to enable OCR |
| language | string | No | Language to improve recognition accuracy; default is auto detect. Supported values: ch, en, korean, japan, chinese_cht, ta, te, ka, latin, arabic, cyrillic, devanagari |
| formula_enable | boolean | No | Whether to enable formula parsing |
| table_enable | boolean | No | Whether to enable table parsing |
| layout_model | string | No | Layout analysis model: layoutlmv3 or doclayout_yolo |
curl Example
bash
curl -X POST "https://api.gpt.ge/task/gi/pdf-parse" \
-H "Authorization: Bearer sk-xxxx" \
-F "file=@/path/to/document.pdf" \
-F "end_pages=10" \
-F "is_ocr=false" \
-F "language=en" \
-F "formula_enable=false" \
-F "table_enable=true" \
-F "layout_model=layoutlmv3"JavaScript (fetch) Example
javascript
const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('end_pages', '10');
formData.append('is_ocr', 'false');
formData.append('language', 'en');
formData.append('formula_enable', 'false');
formData.append('table_enable', 'true');
formData.append('layout_model', 'layoutlmv3');
fetch('https://api.gpt.ge/task/gi/pdf-parse', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-xxxx'
},
body: formData
}).then(r => r.json()).then(console.log);Python (requests) Example
python
import requests
url = 'https://api.gpt.ge/task/gi/pdf-parse'
headers = {
'Authorization': 'Bearer sk-xxxx'
}
files = {
'file': open('document.pdf', 'rb')
}
data = {
'end_pages': 10,
'is_ocr': 'false',
'language': 'en',
'formula_enable': 'false',
'table_enable': 'true',
'layout_model': 'layoutlmv3'
}
response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())Response Example (200)
json
{
"text": "I am morphogen API intelligent assistant, small vv\nDo you need my help?"
}Note: This endpoint uses
multipart/form-datato upload the PDF file. The response contains atextfield with the extracted text.