Dev tools 20 results · Refreshed monthly · Free preview

Companies building inference APIs for large language models for mid-market teams

Explore companies matching “companies building inference APIs for large language models for mid-market teams” through Canonical’s structured public company search.

Run your own search

Query

Refine this query

Company Description: inference APIs for large language models

20 results · Refreshed monthly

Canonical uses these criteria to surface a broader shortlist, including niche and emerging companies that are easy to miss with generic search.

Top 20 results

Export

Company	Domain	Description	HQ Location
O OpenLM.ai	openlm.ai	OpenLM is an AI company focused on accelerating Generative AI inference and training. They provide a platform and access to various large language models, aiming to support developers, researchers, and businesses in their AI endeavors.
T Tensorzero	tensorzero.com	TensorZero provides an open-source stack designed for industrial-grade Large Language Model (LLM) applications. Their platform enables businesses and developers to integrate, monitor, optimize, and experiment with various LLM providers seamlessly. The core mission is to facilitate the production-ready deployment of	New York, New York, United States
S Sutro	sutro.sh	Sutro provides a platform and Python SDK designed for high-throughput inference and scaling of Large Language Model (LLM) workflows. They aim to help AI and data teams rapidly prototype, reduce costs, and effortlessly scale their LLM batch processing.
C COGA AI	coga.ai	COGA AI provides a platform for structured inference in Large Language Models (LLMs), enabling reliable and predictable outputs through dynamic grammar constrained decoding. Their technology addresses a key challenge in LLM development by ensuring structured and efficient results.	Orpington, England, United Kingdom
R Routr	routr.dev	Routr provides an AI gateway platform that unifies access to various Large Language Model (LLM) providers through a single API. It aims to simplify AI application development by offering features like safety guardrails, load balancing, and cost optimization.	Dover, Delaware, United States
N Neuratek	neuramorphic.ai	Neuramorphic AI provides a production-ready AI gateway platform designed for efficient Large Language Model (LLM) inference. Their platform leverages a neuramorphic architecture and intelligent routing to optimize AI model deployment for businesses and developers.	San Francisco, California, United States
V Vessl AI	vessl.ai	Vessl.ai offers a Platform-as-a-Service (PaaS) solution for deploying and managing large language models (LLMs) and other AI workloads, using a usage-based revenue model likely charging per GPU hour or based on resource consumption. Targeting developers and businesses globally, Vessl.ai simplifies AI model deployment by abstracting away complex infrastructure, supporting LLMs like Llama 3.2 and integrating with technologies such as LlamaParse and Pinecone. The platform emphasizes ease of use, scalability, and cost-effectiveness within a competitive market.	San Francisco, California, United States
G Gelu	gelu.ai	Gelu AI provides production-grade inference solutions for Large Language Models (LLMs), focusing on delivering low latency, high throughput, and reduced costs. Their service enables businesses and developers to efficiently deploy and operate LLMs with sub-second response times.	New York, New York, United States
I Infuzu	infuzu.com	Infuzu offers a SaaS platform that provides unified access to multiple Large Language Models (LLMs) through a single subscription. Their core technology, the Intelligent Model Selection (IMS) system, dynamically chooses the optimal LLM for each user prompt, aiming to deliver the best AI response for any given task.
T TreeScale	treescale.com	TreeScale provides an all-in-one development platform for building Large Language Model (LLM) applications. Their platform allows users to deploy LLM-enhanced APIs quickly and easily, without requiring extensive coding or infrastructure management. TreeScale aims to simplify AI integration for developers and	San Francisco, California, United States
K Kuzco	kuzco.xyz	Kuzco offers API-based access to a range of pre-trained large language models (LLMs) for text processing, using a pay-as-you-go pricing model based on token usage. Targeting developers and businesses globally, Kuzco facilitates LLM integration into applications by providing inference services supporting FP8, FP16, and INT8 precision for potentially faster and more cost-effective operation. The company's revenue is generated through a B2B SaaS model.	San Francisco, California, United States
M Munruh	munruh.com	Munruh provides a unified gateway for accessing various large language models (LLMs) through a single API endpoint. The company aims to simplify LLM integration for developers and businesses by offering better pricing, enhanced stability, and no subscription requirements.
C Cloudfog API	yunwu.ai	Yunwu.ai provides a unified LLM API gateway, offering stable and reliable transit services for multiple AI models. Their mission is to simplify access to various large language models for developers and businesses.
M Meteron.ai	meteron.ai	Meteron provides an AI platform designed to manage and optimize the deployment of Large Language Models (LLMs) and generative AI. Their solution simplifies AI infrastructure by offering features like metering, load balancing, and storage, enabling businesses to efficiently scale and monetize their AI applications.
H Hicap	hicap.ai	Hicap.ai provides a secure, enterprise-ready platform for high-performance inference of leading Large Language Models (LLMs). Their service aims to offer cost savings and reliable, low-latency access to AI models for businesses.	San Francisco, California, United States
A Avian.io	avian.io	Avian provides high-performance, private, and secure AI inference deployments for enterprises. They specialize in enabling rapid deployment of large language models with industry-leading inference speeds.
v vLLM	vllm.ai	vLLM offers an open-source library for efficient Large Language Model (LLM) inference and serving, utilizing a novel PagedAttention algorithm to optimize memory usage and throughput for small research teams and others requiring efficient LLM serving. The library supports several popular LLMs and offers OpenAI API compatibility, currently operating under an open-source model with potential future revenue streams through commercial support or enterprise licensing. It serves the AI industry globally, addressing the need for faster and more memory-efficient LLM deployment.
T TextSynth	textsynth.com	TextSynth is a SaaS platform that provides developers, businesses, and researchers with API access to a variety of large language, text-to-image, text-to-speech, and speech-to-text AI models. They aim to make advanced AI accessible and efficient for integration into various applications and workflows.
S Spectral	spectral.io	Spectral.io offers a platform-as-a-service (PaaS) for deploying and managing large language models (LLMs), targeting developers and businesses seeking to integrate LLMs into their applications. The company likely employs a Software as a Service (SaaS) business model, emphasizing ease of use and speed in LLM deployment, although specific pricing and revenue details remain unconfirmed. The platform serves the AI/ML industry globally.
A Anymod.Ai	anymod.ai	AnyMod provides a high-performance LLM API offering unified access to various open-source large language models. They aim to simplify AI integration for developers and businesses by offering a consistent and reliable service.