Arli AI

Unlimited Generations and Zero-log AI Inference API Platform

Chat Image Gen

Unlimited

No rate-limits and unlimited token generations.

Zero-log

Absolutely no logs of requests or generations.

Reliability

All models available are always ready for inference at any time.

Unrestricted AI Inference

Multi-Modal Text Generation

Chat with LLMs and VLMs that support multi-modal image and video inputs.

Image Generation

Text to Image, Image to Image, and Image Upscaling.

64K Context

Text Models are available with up to 64K (65536) tokens context length.

Roleplay

Compatible with RP frontends such as Sillytavern, Wyvern, JanitorAI, RisuAI, etc.

Compatiblity

Our endpoints use standard formats for both Text (OpenAI) and Image (SDNext) Generation APIs.

Advanced Features

Full control of the AI models with LoRA strength multiplier & advanced XTC and DRY samplers.

Most Popular Models

#	Model	Type	Requests (24h)

Frequently asked questions

What is Arli AI?

Arli AI is a cost-effective, unrestricted and reliable AI inference platform for LLM inference or Image Generation inference via API.

How can there be no limits?

We do not have limits to the number of requests processed for any of our plan. Our system only limits the number of parallel requests an account can make at the same time.

How are the response speeds?

Our regular plans (Starter, Core, Advanced, Professional, Ultimate) are subject to load balancing adjustments in order to provide acceptable response speeds for all users. If you require faster and more consistent generations, consider the Continuous plan.

Do you keep logs of prompts and generation?

We strictly do not keep any logs of user requests or generations. User requests and the responses never touch storage media.

How do you have so many models?

For our LLMs, we extract a LoRA for each of the finetuned models. This is allows us to host many models by hotswapping LoRAs on the fly as needed while maintaining model behavior similar to the original (but not exactly 1:1).

What quantization do you use for the models?

All of our base LLMs (no - suffix) are using Compressed-Tensors INT8 W8A8 quantization with the finetuned models (with - suffix) being high-rank FP16 LoRAs applied on top.

Can I use it with x frontend?

As long as the frontend supports using the standard OpenAI API endpoint, then it will work with Arli AI. As a bonus, we also have per-API-key parameter overrides that lets you set inference parameters even if the frontend does not support it.

What is Midtrans? Is there another way to pay?

We are based in Indonesia, and therefore use Midtrans which is an Indonesian payment processor. If you have issues paying via Midtrans, we can help you with different methods of payment.

Why use Arli AI API instead of self-hosting LLMs?

We have the most models on offer compared to other providers, and using Arli AI will cost you significantly less than paying for rented GPUs or paying for electricity to run your own GPUs.

Where do I find the latest updates?

Join our subreddit and our discord server where we regularly post updates and discuss the models!

What if I want to use a model that's not here?

If a model you want to use is not in our Models page, as long as it is based on one of the base models, you can contact us to request to add it.