Arli AI

AI Inference Platform for power users

Chat Image Gen

Unlimited

No rate-limits and unlimited generations.

Zero-log

Absolutely no logs of requests or generations.

Reliability

All available models are always ready for inference at any time.

Unrestricted AI Inference

Text Generation

Chat with models as big as GLM-4.6 with 128K context and use Vision capable models like Gemma3-27B-it.

Image Generation

Text to Image, Image to Image, Inpainting and Image Upscaling.

128K Context

Text Models are available with up to 128K (131072) tokens context length.

RP Chat

Compatible with RP frontends such as Sillytavern, DurenAI, Wyvern, JanitorAI, RisuAI, etc.

Coding Agents

Plug and play compatiblity with coding agents such as Roo Code and Kilo Code.

Compatibility

Our endpoints use standard formats for both Text (OpenAI) and Image (SDNext) Generation APIs.

Most Popular Models

#	Model	Reasoning	Vision	Parameters	Max Context	Request Tokens (24h)	Response Tokens (24h)

Frequently asked questions

What is Arli AI?

Arli AI is a cost-effective, unrestricted and reliable AI inference platform for LLM inference or Image Generation inference via API.

How can there be no limits?

We do not have limits to the number of requests processed for any of our plan. Our system only limits the number of parallel requests an account can make at the same time.

How are the response speeds?

Our regular plans (Starter, Core, Advanced, Professional, Ultimate) are subject to load balancing adjustments in order to provide acceptable response speeds for all users. If you require faster and more consistent generations, consider the Continuous plan.

Do you keep logs of prompts and generation?

We strictly do not keep any logs of user requests or generations. User requests and the responses never touch storage media.

How do you have so many text-gen models?

For our LLMs, we extract a LoRA for each of the finetuned models. This is allows us to host many models by hotswapping LoRAs on the fly as needed while maintaining model behavior similar to the original (but not exactly 1:1).

What quantization do you use for the models?

All of our base LLMs (no - suffix) are using Compressed-Tensors INT8 W8A8 quantization with the finetuned models (with - suffix) being high-rank FP16 LoRAs applied on top.

Can I use it with x frontend?

As long as the frontend supports using the standard OpenAI API endpoint, then it will work with Arli AI. As a bonus, we also have per-API-key parameter overrides that lets you set inference parameters even if the frontend does not support it.

What is Midtrans? Is there another way to pay?

Midtrans is a payment processor. If you have issues paying via Midtrans, we can help you with different methods of payment.

Why use Arli AI API instead of self-hosting AI models?

Aside from costing way less than other options, we are as close to running models locally as you can get from a server-less platform. We run on our own on-premise secure servers and are always up-to-date with the latest models.

Where do I find the latest updates?

Join our subreddit and our discord server where we regularly post updates and discuss the models!

What if I want to use a model that's not here?

If a model you want to use is not in our Models page, as long as it is based on one of the base models, you can contact us to request to add it.