Companies building inference APIs for large language models with 20-100 employees
Explore companies matching “companies building inference APIs for large language models with 20-100 employees” through Canonical’s structured public company search.
Top 20 results
Export| Company | Domain | Description | HQ Location |
|---|---|---|---|
|
|
inceptionlabs.ai | Inception Labs is an artificial intelligence company specializing in the development of advanced Large Language Models (LLMs). They are known for their "Diffusion LLMs," which aim to provide breakthroughs in speed and quality for AI applications, particularly in coding assistance. | Palo Alto, California, United States |
|
M
Martian
|
withmartian.com | Martian provides a platform that optimizes the use of Large Language Models (LLMs) for developers and companies. Their solution aims to enhance AI integration by dynamically routing prompts to the most suitable LLM, thereby improving performance, reducing costs, and ensuring compliance. | San Francisco, California, United States |
|
|
bentoml.com | BentoML provides a comprehensive platform for AI inference, enabling organizations to deploy, manage, monitor, and optimize their AI models efficiently. Their solution focuses on delivering speed and control for AI inference at scale, simplifying complex infrastructure challenges. | San Francisco, California, United States |
|
M
Maitai
|
trymaitai.ai | Trymaitai provides enterprise-grade Large Language Models (LLMs) with industry-leading inference speeds and low latency. They focus on delivering reliable and efficient AI solutions for businesses looking to deploy and scale AI agents and LLMs. | San Francisco, California, United States |
|
|
neuralmagic.com | NeuralMagic offered B2B software and SaaS solutions for optimizing AI model inference, particularly for large language models, across CPU and GPU architectures. They served enterprise clients and developers in various industries seeking to improve the efficiency and scalability of AI deployments by providing tools like DeepSparse and Neural Magic Compress, leveraging proprietary techniques such as GPTQ and SparseGPT for enhanced performance. Their revenue model combined licensing fees and subscription revenue. | Somerville, Massachusetts, United States |
|
|
deepinfra.com | DeepInfra provides developer-friendly APIs for AI inference, focusing on performance and cost-efficiency. Their platform enables businesses to accelerate AI deployment, scale to trillions of tokens, and host AI models efficiently. | Palo Alto, California, United States |
|
S
Siliconflow
|
siliconflow.cn | SiliconFlow is an AI company focused on accelerating AI model deployment and inference. They provide a suite of AI services designed to enhance performance and reduce costs for AI applications. Their mission is to make advanced AI more accessible and efficient for developers and enterprises. | Singapore, Central Region, Singapore |
|
|
vessl.ai | Vessl.ai offers a Platform-as-a-Service (PaaS) solution for deploying and managing large language models (LLMs) and other AI workloads, using a usage-based revenue model likely charging per GPU hour or based on resource consumption. Targeting developers and businesses globally, Vessl.ai simplifies AI model deployment by abstracting away complex infrastructure, supporting LLMs like Llama 3.2 and integrating with technologies such as LlamaParse and Pinecone. The platform emphasizes ease of use, scalability, and cost-effectiveness within a competitive market. | San Francisco, California, United States |
|
A
Adaptive Ml
|
adaptive-ml.com | AdaptiveML provides the "Adaptive Engine," a platform designed to evaluate, tune, and serve large language models (LLMs) for enterprise applications. They focus on accelerating the production deployment of AI models and offering advanced fine-tuning capabilities. | Paris, Île-de-France, France |
|
|
chai-research.com | Chai Research is building a platform for Social AI, providing tools and infrastructure for AI development and deployment. Their focus is on creating AI that is both informative and engaging, particularly for applications involving large language models. | Palo Alto, California, United States |
|
|
embeddedllm.com | Embedded LLM provides JamAI Base, a platform designed to accelerate and secure AI workflows, particularly for Large Language Models (LLMs). Their solution focuses on optimizing LLM pipelines for businesses and enterprises. | Singapore, Central Region, Singapore |
|
R
Recursal.ai
|
featherless.ai | Featherless AI provides developers and businesses with access to a vast library of over 12,100 open AI models through an API. Their platform enables instant deployment for inference, fine-tuning, testing, and production, aiming to democratize AI model utilization. | San Francisco, California, United States |
|
|
inceptron.io | Inceptron provides a platform for building, optimizing, and deploying AI models, focusing on compiler-driven performance to achieve significant cost efficiencies and lower latency. Their solution aims to make AI model deployment more accessible and cost-effective for businesses and developers. | |
|
|
superprotocol.com | Super Protocol provides a confidential computing platform that enables secure AI inference, particularly for Large Language Models (LLMs). Their solution allows businesses and developers to integrate advanced AI capabilities into their products while ensuring the privacy and security of sensitive data. | New York, New York, United States |
|
T
Tensorfuse
|
tensorfuse.io | TensorFuse provides a platform for fine-tuning, deploying, and auto-scaling generative AI models on AWS. It offers serverless inference, job queues, and development containers to streamline the AI model lifecycle for developers and organizations. | San Francisco, California, United States |
|
|
elastix.ai | Elastix.ai provides a next-generation AI inference platform designed for businesses looking to deploy and scale AI applications efficiently. Their platform focuses on delivering breakthrough total cost of ownership (TCO) per token, dynamic adaptability, and continuous evolution to meet the demands of evolving AI use | Seattle, Washington, United States |
|
D
Doubleword
|
doubleword.ai | Doubleword provides an LLMOps platform for enterprises to deploy and manage private, production-grade Generative AI (GenAI) APIs. Their solution allows businesses to run open-source and custom language models securely at scale, supporting various deployment environments. | London, England, United Kingdom |
|
A
Ai/Ml Api
|
aimlapi.com | AIMLAPI provides developers and businesses with API access to a vast library of over 300 AI models. Their platform enables seamless integration of AI capabilities, including chat, content generation, and data analysis, into various applications and services. | |
|
Z
Zml
|
zml.ai | ZML provides a high-performance AI model inference platform optimized for any model and any hardware. Their solution simplifies deployment and enhances performance for businesses deploying AI models in production. | |
|
G
Gaianet
|
gaianet.ai | GaiaNet is building a decentralized ecosystem for the development, deployment, and scaling of AI applications. Their platform aims to foster a collaborative environment where AI models can learn, improve, and grow, offering a more open and scalable alternative to traditional centralized AI infrastructure. | Berkeley, California, United States |
Sign up to access the full result set, export, and run custom queries
Start freeGo deeper
Run your own company query with custom filters
This page is a free preview. Sign up to modify criteria, refresh results, and export your shortlist.
Related searches
Why this is useful
This page helps teams discover companies tied to a specific capability or workflow.
What Canonical interprets
Canonical turns the use-case query into structured criteria that can surface long-tail company matches beyond simple keyword search.
How to adapt for your use case
Use this example as a starting point, then refine by segment, geography, company size, customer type, or funding signal after signup.