Arize Phoenix

OpenTelemetry-native open-source tracing and evaluation, plus the Arize AX enterprise platform on top.

Positioning

Phoenix is the open-source side of the Arize stack: OpenTelemetry-native tracing, eval primitives, datasets, and a notebook-friendly UI. Arize AX is the enterprise platform built on the same trace model and adds production observability, alerting, and team workflows. Phoenix's license is the Elastic License v2, which permits internal use but restricts hosting Phoenix as a commercial service ¹.

The product surface covers tracing, prompt experimentation, datasets, LLM-judge and code evaluators, and an evals library you can call from a notebook or CI.

Strengths

OTel-native. Traces are emitted as OpenTelemetry spans, so the same instrumentation can fan out to Phoenix, AX, or any other OTel backend. This is the rare eval platform that does not lock you into a proprietary trace schema.
High distribution. Phoenix is widely downloaded and the public customer roster includes GitHub, Microsoft, MongoDB, and ByteDance for the OSS product. The Arize AX customer roster includes Apple, DoorDash, Uber, Booking.com, and Reddit on the enterprise side.
Strong notebook story. The tutorials and cookbook walk through eval programs that run locally with no service dependency, which makes Phoenix easy to try on a Friday afternoon ².

Limitations

ELv2 restricts commercial hosting. You cannot resell Phoenix as a managed service. Most teams will not hit this limit, but it is worth knowing if you build internal platforms for external customers.
Two products to keep straight. Phoenix and Arize AX are different SKUs with overlapping but distinct UIs. Picking one and sticking with it avoids confusion.
Self-interested benchmarks. Performance and quality claims published by Arize are vendor-published; treat them as positioning.

Best fit

Hybrid architectures where Phoenix runs as the OSS observability hub and CI pushes eval results back through the OTel trace surface, RAG evaluation with the Phoenix evals library, and notebook-driven prototype evals that may later graduate to Arize AX without changing instrumentation.

Getting started

import phoenix as px
from phoenix.evals import HallucinationEvaluator, run_evals
from phoenix.otel import register

tracer_provider = register(project_name="rag-prod")
px.launch_app()  # local UI at http://localhost:6006

# Your RAG pipeline runs as normal; spans are captured automatically.
answers = run_rag_pipeline(questions)

# Run a hallucination evaluator on the captured traces.
queries_df = px.Client().get_spans_dataframe("name == 'rag.answer'")
results = run_evals(
    dataframe=queries_df,
    evaluators=[HallucinationEvaluator(model=px.evals.OpenAIModel("gpt-4o-mini"))],
)
print(results[0].head())

The Phoenix cookbook covers RAG and agent evaluation end-to-end, including dataset construction and the OTel-native span schema ².

Pricing notes

Phoenix is free under the Elastic License v2 for internal use. Arize AX is enterprise-priced with no public price list; contact the vendor for a quote. Self-hosting Phoenix is a Docker container plus a Postgres-compatible store. There is also a free Arize Phoenix Cloud tier for hosted use with limited retention; confirm current terms on the vendor's pricing page before relying on them in production.

Alternatives

Consider Langfuse for a more permissively licensed self-hostable hub (MIT core). Consider LangSmith for tighter LangChain gravity. Consider Braintrust for a hosted, framework-agnostic console. Consider Comet Opik for an Apache 2.0 alternative.

Citations and last verified

Verified 2026-05-29 against Phoenix tutorials and the Arize cookbook.

Phoenix Tutorials and Evals, https://github.com/Arize-ai/phoenix/tree/main/tutorials/evals. ↩
Phoenix Cookbook, "Evaluation," https://arize.com/docs/phoenix/cookbook/evaluation. ↩ ↩²