Skip to content

LiteLLM

LiteLLM is a simple way to talk to a wide range of large language models (LLMs) such as OpenAI, Anthropic, Cohere, and more. It uses the LiteLLM Python package to send your messages to the chosen model and receive a reply. You can tweak how creative the model is, how many answers you want, and how many words it can produce.

How it Works

When you drop the LiteLLM component into your dashboard, you give it a few pieces of information:

  1. Model name – the exact name of the LLM you want to use (e.g., gpt-3.5-turbo).
  2. Provider – which service the model comes from (OpenAI, Azure, Anthropic, etc.).
  3. API key – the secret key that lets your dashboard talk to that provider.
  4. Other settings – optional knobs that control creativity, length, retries, and streaming.

The component then creates a LiteLLM client, passes your message to the chosen model, and returns the model’s reply. If you enable Stream, the reply comes back piece‑by‑piece so you can show it live.

Inputs

  • Input: The text or conversation you want the model to respond to.
  • Model name: The name of the model to use. For example, gpt-3.5-turbo.
  • API key: The secret key that authenticates your request to the provider’s API.
  • Provider: The provider of the API key. Options are OpenAI, Azure, Anthropic, Replicate, Cohere, OpenRouter.
  • Temperature: Controls how random the model’s responses are. Lower values make the output more deterministic; higher values make it more creative.
  • Model kwargs: A dictionary of extra keyword arguments you want to pass to the model. Leave it empty if you’re not sure.
  • Top p: Limits the model to the most probable tokens that add up to this probability mass. Useful for controlling randomness.
  • Top k: Limits the model to the top k most probable tokens. Another way to control randomness.
  • N: Number of chat completions to generate for each prompt. The API may return fewer if duplicates are produced.
  • Max tokens: The maximum number of tokens the model can generate for each reply. Helps keep responses short and cost‑effective.
  • Max retries: How many times the component will try again if the API call fails.
  • Verbose: If true, the component will log detailed debugging information.
  • Stream: Enable streaming of the model’s reply so you can display it as it arrives.
  • System Message: A message you can send to the model that sets the overall tone or behavior (e.g., “You are a helpful assistant.”).

Outputs

  • Text: The raw text reply from the language model. You can feed this into other components or display it to users.
  • Model: The underlying LiteLLM client object. Useful if you want to chain multiple calls or inspect the configuration later.

Usage Example

  1. Add the LiteLLM component to your workflow.
  2. Set the inputs:
    • Input: “What is the capital of France?”
    • Model name: gpt-3.5-turbo
    • Provider: OpenAI
    • API key: (enter your OpenAI key)
    • Temperature: 0.5 (moderate creativity)
    • Max tokens: 50 (keep the answer short)
  3. Connect the Text output to a “Display Text” component so the answer shows up on the dashboard.
  4. Run the workflow. The model will reply with “Paris” (or a short explanation).
  • ChatGPT – A pre‑configured component that only works with OpenAI’s GPT models.
  • Anthropic Claude – Connects to Anthropic’s Claude models.
  • Cohere – Uses Cohere’s language models.
  • Replicate – Calls models hosted on Replicate’s platform.
  • OpenRouter – Accesses models available through OpenRouter.

Tips and Best Practices

  • Keep your API key hidden by using the SecretStrInput field; it won’t show up in the UI.
  • Use a lower Temperature for factual queries and a higher one for creative writing.
  • Set Max tokens to avoid unexpectedly long responses that cost more.
  • Enable Stream if you want to show answers in real time, especially for long replies.
  • If you need multiple answers, set N to a higher number but remember the API may return duplicates.

Security Considerations

  • Never expose your API key in public dashboards or share screenshots that reveal it.
  • Store keys in environment variables or secure vaults when deploying the system.
  • Use the Verbose option only when troubleshooting; it can reveal sensitive request details in logs.
  • Review provider terms of service to ensure your usage complies with their policies.