Number of requests you can make at the same time for a model is determined by the parallel requests allowed for your account.
If you try to send more requests in parallel than allowed, the request will be blocked.
Check for models available to you using the models endpoint.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import requests
import json
url = "https://api.arliai.com/v1/models"
payload = ""
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer {ARLIAI_API_KEY}'
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
Tokenize text and get token count using the tokenize endpoint.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import requests
import json
url = "https://api.arliai.com/v1/tokenize"
payload = json.dumps({
"model": "Meta-Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi!, how can I help you today?"},
{"role": "user", "content": "Say hello!"}
]
})
headers = {
'Content-Type': 'application/json',
'Authorization': f"Bearer {ARLIAI_API_KEY}"
}
response = requests.request("POST", url, headers=headers, data=payload)
Use the examples in the Quick-Start page for working copy-pastable examples. Copy paste parameters that you need from here.
These example API request are to show how to use the parameters, some options might conflict and the values are arbitrary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import requests
import json
url = "https://api.arliai.com/v1/chat/completions" # Can also use /v1/completions endpoint
payload = json.dumps({
"model": "Meta-Llama-3.1-8B-Instruct",
# Use messages for /chat/completions
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi!, how can I help you today?"},
{"role": "user", "content": "Say hello!"}
],
# Use prompt for /completions
"prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are an assistant AI.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello there!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
# Most important parameters
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"max_tokens": 1024,
"stream": True,
# Extra parameters
"seed": 0,
"presence_penalty": 0.6,
"frequency_penalty": 0.6,
"dynatemp_range": 0.5,
"dynatemp_exponent": 1,
"smoothing_factor": 0.0,
"smoothing_curve": 1.0,
"top_a": 0,
"min_p": 0,
"tfs": 1,
"eta_cutoff": 1e-4,
"epsilon_cutoff": 1e-4,
"typical_p": 1,
"mirostat_mode": 0,
"mirostat_tau": 1,
"mirostat_eta": 1,
"use_beam_search": False,
"length_penalty": 1.0,
"early_stopping": False,
"stop": [],
"stop_token_ids": [],
"include_stop_str_in_output": False,
"ignore_eos": False,
"logprobs": 5,
"prompt_logprobs": 0,
"custom_token_bans": [],
"skip_special_tokens": True,
"spaces_between_special_tokens": True,
"logits_processors": [],
"xtc_threshold": 0.1,
"xtc_probability": 0,
"guided_json": {"type": "object", "properties": {"response": {"type": "string"}}},
"guided_regex": "^\w+$",
"guided_choice": ["Yes", "No", "Maybe"],
"guided_grammar": "S -> 'yes' | 'no'",
"guided_decoding_backend": "regex",
"guided_whitespace_pattern": "\s+"
})
headers = {
'Content-Type': 'application/json',
'Authorization': f"Bearer {ARLIAI_API_KEY}"
}
response = requests.request("POST", url, headers=headers, data=payload)
Parameter | Explanation |
---|---|
presence_penalty | Penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. Disabled: 0. |
frequency_penalty | Penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens. Disabled: 0. |
repetition_penalty | Penalizes new tokens based on their frequency in the generated text so far. Applied multiplicatively. Must be in [1, inf). Set to 1 to disable the effect. Disabled: 1. |
temperature | Controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling. Disabled: 1. |
top_p | Controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens. Disabled: 1. |
top_k | Controls the number of top tokens to consider. Set to -1 to consider all tokens. Disabled: -1. |
top_a | Controls the cutoff for Top-A sampling. Exact cutoff is top_a * max_prob**2. Must be in [0, inf). Set to 0 to disable. Disabled: 0. |
min_p | Controls the cutoff for min-p sampling. Exact cutoff is min_p * max_prob. Must be in [0, 1]. Set to 0 to disable. Disabled: 0. |
tfs | Controls the cumulative approximate curvature of the distribution to retain for Tail Free Sampling. Must be in (0, 1]. Set to 1 to disable. Disabled: 1. |
eta_cutoff | Controls the cutoff threshold for Eta sampling (a form of entropy adaptive truncation sampling). Threshold is computed as min(eta, sqrt(eta) * entropy(probs)). Specified in units of 1e-4. Set to 0 to disable. Disabled: 0. |
epsilon_cutoff | Controls the cutoff threshold for Epsilon sampling (simple probability threshold truncation). Specified in units of 1e-4. Set to 0 to disable. Disabled: 0. |
typical_p | Controls the cumulative probability of tokens closest in surprise to the expected surprise. Must be in (0, 1]. Set to 1 to disable. Disabled: 1. |
mirostat_mode | Can either be 0 (disabled) or 2 (Mirostat v2). Disabled: 0. |
mirostat_tau | The target "surprisal" that Mirostat works towards. Range [0, inf). |
mirostat_eta | The rate at which Mirostat updates its internal surprisal value. Range [0, inf). |
dynatemp_min | Minimum temperature for dynamic temperature sampling. Range [0, inf). |
dynatemp_max | Maximum temperature for dynamic temperature sampling. Range [0, inf). |
dynatemp_exponent | Exponent for dynamic temperature sampling. Range [0, inf). |
smoothing_factor | Smoothing factor for Quadratic Sampling. Disabled: 0. |
smoothing_curve | Smoothing curve for Cubic Sampling. Disabled: 1.0. |
seed | Random seed to use for the generation. Set to None to disable. |
use_beam_search | Whether to use beam search instead of sampling. Disabled: False. |
length_penalty | Penalizes sequences based on their length. Used in beam search. Default: 1.0. |
early_stopping | Controls the stopping condition for beam search. Accepts: True (stops as soon as best_of complete candidates are found), False (uses heuristic for stopping), or "never" (canonical beam search). Default: False. |
stop | List of strings that stop the generation when they are generated. The returned output will not contain the stop strings. |
stop_token_ids | List of token IDs that stop the generation when they are generated. The returned output will contain the stop tokens unless they are special tokens (e.g., EOS). |
include_stop_str_in_output | Whether to include the stop strings in the output text. Default: False. |
ignore_eos | Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. Default: False. |
max_tokens | The maximum number of tokens to generate per output sequence. |
min_tokens | The minimum number of tokens to generate per output sequence before EOS or stop tokens are generated. Default: 0. |
logprobs | Number of log probabilities to return per output token. When set to None, no probability is returned. Default: None. |
prompt_logprobs | Number of log probabilities to return per prompt token. Default: 0. |
detokenize | Whether to detokenize the output. Defaults to True. |
custom_token_bans | List of token IDs to ban from being generated. |
skip_special_tokens | Whether to skip special tokens in the output. Defaults to True. |
spaces_between_special_tokens | Whether to add spaces between special tokens in the output. Defaults to True. |
logits_processors | List of functions that modify logits based on previously generated tokens and optionally prompt tokens. |
truncate_prompt_tokens | If set to an integer k, will use only the last k tokens from the prompt (left-truncation). Default: None (no truncation). |
xtc_threshold | In XTC sampling, if 2 or more tokens have a probability above this threshold, consider removing all but the last one. Disabled: 0. |
xtc_probability | The probability that the removal will happen in XTC sampling. Set to 0 to disable. Default: 0. |
guided_json | If specified, the output will follow the JSON schema. Can be a JSON string or a Python dictionary. |
guided_regex | If specified, the output will follow the regex pattern. |
guided_choice | If specified, the output will be exactly one of the provided choices (a list of strings). |
guided_grammar | If specified, the output will follow the context-free grammar provided in the string. |
guided_decoding_backend | Overrides the default guided decoding backend for this specific request. Must be either "outlines" or "lm-format-enforcer". |
guided_whitespace_pattern | Overrides the default whitespace pattern for guided JSON decoding. |