Dev tools 20 results · Refreshed monthly · Free preview

Companies building inference APIs for large language models with 20-100 employees

Explore companies matching “companies building inference APIs for large language models with 20-100 employees” through Canonical’s structured public company search.

Run your own search

Query

Refine this query

Company Description: inference APIs for large language models Employee Size Range: 11-50 employees

20 results · Refreshed monthly

Canonical uses these criteria to surface a broader shortlist, including niche and emerging companies that are easy to miss with generic search.

Top 20 results

Export

Company	Domain	Description	HQ Location
I Inception Labs	inceptionlabs.ai	Inception Labs is an artificial intelligence company specializing in the development of advanced Large Language Models (LLMs). They are known for their "Diffusion LLMs," which aim to provide breakthroughs in speed and quality for AI applications, particularly in coding assistance.	Palo Alto, California, United States
M Martian	withmartian.com	Martian provides a platform that optimizes the use of Large Language Models (LLMs) for developers and companies. Their solution aims to enhance AI integration by dynamically routing prompts to the most suitable LLM, thereby improving performance, reducing costs, and ensuring compliance.	San Francisco, California, United States
B BentoML	bentoml.com	BentoML provides a comprehensive platform for AI inference, enabling organizations to deploy, manage, monitor, and optimize their AI models efficiently. Their solution focuses on delivering speed and control for AI inference at scale, simplifying complex infrastructure challenges.	San Francisco, California, United States
M Maitai	trymaitai.ai	Trymaitai provides enterprise-grade Large Language Models (LLMs) with industry-leading inference speeds and low latency. They focus on delivering reliable and efficient AI solutions for businesses looking to deploy and scale AI agents and LLMs.	San Francisco, California, United States
N Neural Magic	neuralmagic.com	NeuralMagic offered B2B software and SaaS solutions for optimizing AI model inference, particularly for large language models, across CPU and GPU architectures. They served enterprise clients and developers in various industries seeking to improve the efficiency and scalability of AI deployments by providing tools like DeepSparse and Neural Magic Compress, leveraging proprietary techniques such as GPTQ and SparseGPT for enhanced performance. Their revenue model combined licensing fees and subscription revenue.	Somerville, Massachusetts, United States
D Deep Infra	deepinfra.com	DeepInfra provides developer-friendly APIs for AI inference, focusing on performance and cost-efficiency. Their platform enables businesses to accelerate AI deployment, scale to trillions of tokens, and host AI models efficiently.	Palo Alto, California, United States
S Siliconflow	siliconflow.cn	SiliconFlow is an AI company focused on accelerating AI model deployment and inference. They provide a suite of AI services designed to enhance performance and reduce costs for AI applications. Their mission is to make advanced AI more accessible and efficient for developers and enterprises.	Singapore, Central Region, Singapore
V Vessl AI	vessl.ai	Vessl.ai offers a Platform-as-a-Service (PaaS) solution for deploying and managing large language models (LLMs) and other AI workloads, using a usage-based revenue model likely charging per GPU hour or based on resource consumption. Targeting developers and businesses globally, Vessl.ai simplifies AI model deployment by abstracting away complex infrastructure, supporting LLMs like Llama 3.2 and integrating with technologies such as LlamaParse and Pinecone. The platform emphasizes ease of use, scalability, and cost-effectiveness within a competitive market.	San Francisco, California, United States
A Adaptive Ml	adaptive-ml.com	AdaptiveML provides the "Adaptive Engine," a platform designed to evaluate, tune, and serve large language models (LLMs) for enterprise applications. They focus on accelerating the production deployment of AI models and offering advanced fine-tuning capabilities.	Paris, Île-de-France, France
C Chai	chai-research.com	Chai Research is building a platform for Social AI, providing tools and infrastructure for AI development and deployment. Their focus is on creating AI that is both informative and engaging, particularly for applications involving large language models.	Palo Alto, California, United States
E Embedded LLM	embeddedllm.com	Embedded LLM provides JamAI Base, a platform designed to accelerate and secure AI workflows, particularly for Large Language Models (LLMs). Their solution focuses on optimizing LLM pipelines for businesses and enterprises.	Singapore, Central Region, Singapore
R Recursal.ai	featherless.ai	Featherless AI provides developers and businesses with access to a vast library of over 12,100 open AI models through an API. Their platform enables instant deployment for inference, fine-tuning, testing, and production, aiming to democratize AI model utilization.	San Francisco, California, United States
I Inceptron	inceptron.io	Inceptron provides a platform for building, optimizing, and deploying AI models, focusing on compiler-driven performance to achieve significant cost efficiencies and lower latency. Their solution aims to make AI model deployment more accessible and cost-effective for businesses and developers.
S Super Protocol	superprotocol.com	Super Protocol provides a confidential computing platform that enables secure AI inference, particularly for Large Language Models (LLMs). Their solution allows businesses and developers to integrate advanced AI capabilities into their products while ensuring the privacy and security of sensitive data.	New York, New York, United States
T Tensorfuse	tensorfuse.io	TensorFuse provides a platform for fine-tuning, deploying, and auto-scaling generative AI models on AWS. It offers serverless inference, job queues, and development containers to streamline the AI model lifecycle for developers and organizations.	San Francisco, California, United States
E ElastixAi	elastix.ai	Elastix.ai provides a next-generation AI inference platform designed for businesses looking to deploy and scale AI applications efficiently. Their platform focuses on delivering breakthrough total cost of ownership (TCO) per token, dynamic adaptability, and continuous evolution to meet the demands of evolving AI use	Seattle, Washington, United States
D Doubleword	doubleword.ai	Doubleword provides an LLMOps platform for enterprises to deploy and manage private, production-grade Generative AI (GenAI) APIs. Their solution allows businesses to run open-source and custom language models securely at scale, supporting various deployment environments.	London, England, United Kingdom
A Ai/Ml Api	aimlapi.com	AIMLAPI provides developers and businesses with API access to a vast library of over 300 AI models. Their platform enables seamless integration of AI capabilities, including chat, content generation, and data analysis, into various applications and services.
Z Zml	zml.ai	ZML provides a high-performance AI model inference platform optimized for any model and any hardware. Their solution simplifies deployment and enhances performance for businesses deploying AI models in production.
G Gaianet	gaianet.ai	GaiaNet is building a decentralized ecosystem for the development, deployment, and scaling of AI applications. Their platform aims to foster a collaborative environment where AI models can learn, improve, and grow, offering a more open and scalable alternative to traditional centralized AI infrastructure.	Berkeley, California, United States