Reconnaissance in the Age of AI: Exploring Modern ML Infrastructure

Vulnerability Assessment and Penetration Testing (VAPT)

29 Jun 2026

AI Reconnaissance, ML Infrastructure, Model Security, Vector Databases, AI Attack Surface

Reconnaissance in the Age of AI: Exploring Modern ML Infrastructure

Executive Summary

Modern AI infrastructure introduces a completely different reconnaissance landscape from traditional applications. Instead of simple web servers and databases, AI environments expose inference servers, vector databases, orchestration platforms, model registries, notebook environments, and GPU-backed services.

Resecurity warns about ongoing malicious activity originating from foreign adversaries probing national AI infrastructure worldwide. These adversaries aim to map exposed AI and ML instances to specific organizations and execute data breaches by exploiting misconfigured services.

This guide covers:

AI infrastructure components
AI service fingerprinting
MLflow and vector database enumeration
gRPC reconnaissance
Metrics and metadata leakage
Jupyter notebook exposure
AI attack surface mapping
Supply chain reconnaissance
AI-focused pentesting methodology

One of the biggest differences in AI environments is the amount of operational intelligence exposed through APIs, metrics, metadata, and orchestration systems. Gaining unauthorized access to such environments can lead to massive data breaches, exposing records stored in data lakes, datasets used for training, and even operator queries, which may reveal confidential and proprietary information.

Introduction to AI Reconnaissance

AI Reconnaissance is the process of identifying, fingerprinting, and analyzing artificial intelligence infrastructure to map the attack surface of modern machine learning environments.

Unlike traditional reconnaissance, AI reconnaissance focuses on discovering:

Inference servers
Vector databases
Model registries
AI orchestration platforms
Notebook environments
GPU-backed infrastructure

It combines automation, machine learning awareness, and security reconnaissance techniques to collect and correlate intelligence at scale.

Understanding the Modern AI Attack Surface

Now that we understand what AI reconnaissance is, the next step is learning how modern AI infrastructure operates in production environments. This is where many traditional security assumptions begin to fail.

In a conventional network assessment, the landscape is predictable:

Web servers on ports 80 and 443
SSH on port 22
MySQL on 3306
PostgreSQL on 5432

Security professionals have spent years building tools, workflows, and intuition around these patterns.

Why Attackers Care About AI Infrastructure

Modern AI infrastructure contains far more than machine learning models. It often includes sensitive datasets, cloud credentials, internal APIs, orchestration systems, GPU resources, and business-critical automation pipelines.

For attackers, AI environments represent a high-value target because compromising a single AI component can expose an organisation’s entire machine learning ecosystem.

Common attacker objectives include:

Model theft and intellectual property extraction
Dataset theft and sensitive information exposure
GPU hijacking for cryptomining or unauthorized AI workloads
Supply chain poisoning through malicious models or dependencies
RAG manipulation and vector database tampering
Cloud credential theft from notebook environments
Access to internal Kubernetes or orchestration infrastructure
Operational intelligence gathering through metrics and metadata leakage

Unlike traditional applications, AI systems are highly interconnected. An exposed notebook, orchestration platform, or model registry may provide visibility into storage systems, inference servers, vector databases, and deployment pipelines simultaneously.

Core Components of AI Infrastructure

A production AI deployment is not a single application or server. It is an interconnected collection of specialised systems that manage different stages of the machine learning lifecycle:

Data ingestion
Model training
Experiment tracking
Inference serving
Vector search
Orchestration
Monitoring
Artifact storage

Understanding these components is the foundation of effective AI reconnaissance.

1. Model Serving Endpoints

Model serving frameworks are the operational layer of machine learning systems. Their job is to load trained models into memory and expose prediction APIs that applications can query in real time.

These services are effectively the “front door” of AI deployments.

Unlike traditional web applications, model servers often expose:

Multiple protocols simultaneously
Binary streaming interfaces
GPU metrics
Internal management APIs
Model metadata endpoints

This creates a much larger reconnaissance surface than standard REST applications.

1.1 NVIDIA Triton Inference Server

One of the most common enterprise inference frameworks is NVIDIA Triton Inference Server.

Typical exposed ports include:

Service	Port
HTTP API	8000
gRPC API	8001
Prometheus Metrics	8002

Three separate interfaces for a single AI service.

During reconnaissance, this matters because each interface exposes different information:

The HTTP API handles prediction requests
gRPC supports high-performance internal communication
Prometheus metrics reveal operational telemetry

An attacker can often map the AI deployment architecture without ever interacting with the actual inference API.

1.2 TensorFlow Serving

TensorFlow Serving is widely used in production ML environments.

Default ports:

gRPC → 8500
REST API → 8501

TensorFlow Serving deployments frequently expose model versioning information, prediction schemas, and inference signatures that can help fingerprint the organisation’s ML capabilities.

1.3 TorchServe

TorchServe powers many PyTorch deployments.

Typical configuration:

Service	Port
Inference API	8080
Management API	8081
Metrics Endpoint	8082

The management API is especially valuable during reconnaissance because it may expose:

Loaded models
Deployment status
Worker configurations
Scaling parameters
Snapshot information

Misconfigured management endpoints can sometimes allow remote model registration or unloading operations.

1.4 LLM-Specific Infrastructure

Large Language Model deployments introduce another layer of specialised infrastructure.

1.5 Ollama

Ollama commonly runs on: Port 11434

It provides lightweight local LLM serving and exposes OpenAI-compatible APIs.

1.6 vLLM

vLLM deployments typically use: Port 8000

vLLM is designed for high-throughput inference and efficient GPU memory management for transformer-based models.

2. AI Orchestration and Experiment Tracking

Orchestration and experiment tracking platforms manage the entire machine learning lifecycle, from model training to deployment. These systems are some of the highest-value targets during AI reconnaissance because they centralise critical information about an organisation’s AI operations.

They commonly store:

Training experiments
Hyperparameter configurations
Model artifacts
Deployment stages
Pipeline definitions
Infrastructure metadata

2.1 MLflow Tracking Server

MLflow commonly runs on:Port 5000

It records:

Experiment runs
Training metrics
Hyperparameters
Model versions
Artifact storage locations

Finding an exposed MLflow instance can provide visibility into:

Proprietary AI projects
Dataset references
Cloud storage paths
Internal model names

2.2 Kubeflow

Kubeflow typically operates over:

Ports 80 / 443

Built on Kubernetes, Kubeflow orchestrates:

Training pipelines
Notebook servers
Distributed workloads
Model deployments

Because it integrates deeply with Kubernetes infrastructure, exposed Kubeflow environments

2.3 Ray Framework

Ray commonly exposes:

Dashboard → 8265
Serving Endpoint → 8000

The Ray dashboard provides operational visibility into distributed AI clusters, including:

Active jobs
GPU allocation
Worker nodes
Runtime environments

The ShadowRay campaign specifically targeted exposed Ray dashboards, demonstrating how dangerous publicly accessible AI orchestration systems can become.

3. Vector Databases

Vector databases are a core component of modern AI systems, especially Retrieval-Augmented Generation (RAG) pipelines. They store embeddings — numerical representations of documents, images, or text — that allow AI systems to perform semantic search instead of simple keyword matching.

If an organisation operates:

AI chatbots
Knowledge assistants
Semantic search systems
Internal AI copilots

There is almost certainly a vector database behind it.

Unlike traditional databases, vector databases are designed to search by similarity, helping AI systems retrieve contextually relevant information for large language models.

Common Vector Databases

Several vector databases appear frequently during AI reconnaissance:

Platform	Common Ports
Qdrant	6333 (HTTP), 6334 (gRPC)
Weaviate	8080
Milvus	19530
Chroma	8000

Some platforms, such as Weaviate, also expose GraphQL APIs that provide additional schema visibility.

4. Model Registries and Artifact Management

Model registries store the actual machine learning models used across an organisation’s AI infrastructure. These systems manage model lifecycle operations, version control, deployment stages, and artifact storage.

A registry typically contains:

Serialized model files
Version history
Deployment status
Artifact locations
Metadata about model creation and ownership

Common model formats include:

.pkl
.pt
.onnx
.mar

In production environments, model registries act as the central inventory for an organisation’s AI assets.

5. supporting AI Infrastructure

Beyond model servers and orchestration platforms, AI environments rely heavily on supporting infrastructure that is frequently overlooked during security assessments. These systems often expose sensitive operational data, credentials, and internal architecture details.

In many cases, supporting services become easier reconnaissance targets than the AI models themselves.

5.1 Jupyter Notebook Environments

Jupyter notebooks commonly run on: Port 8888

They are widely used by data scientists for:

Model development
Data analysis
Experimentation
Internal testing

However, notebooks are frequently misconfigured.

A common issue is running Jupyter with:

--ip=0.0.0.0

without proper authentication enabled.

This can provide anyone who reaches the port with:

Interactive notebook access
Terminal access
Python execution
Internal network visibility

In many environments, notebook cells also contain:

Cleartext credentials
API keys
Cloud access tokens
Database connection strings

Because notebooks are designed for convenience rather than security, they often become one of the weakest points in AI infrastructure.

5.2 MinIO Object Storage

MinIO commonly runs on:

Port 9000
Port 9001

It provides S3-compatible object storage used for:

Model artifacts
Training datasets
Experiment outputs
Checkpoints
Deployment packages

During reconnaissance, exposed MinIO instances may reveal:

Bucket names
Artifact structures
Internal project identifiers
Stored model files

Because MinIO often stores the actual artifacts powering AI systems, exposure can lead to significant data leakage.

5.3 Prometheus Metrics Endpoints

Many AI model servers expose Prometheus metrics endpoints for monitoring and observability.

Examples include:

NVIDIA Triton → Port 8002
TorchServe → Port 8082

AI Infrastructure Port & Protocol Reference

The following reference table covers the most common AI infrastructure components encountered during reconnaissance. These services often expose non-standard ports, gRPC interfaces, metrics endpoints, and AI-specific APIs that traditional scanners may overlook.

Component	Default Port(s)	Protocol(s)	Common Recon Endpoints
NVIDIA Triton Inference Server	8000, 8001, 8002	HTTP, gRPC, Prometheus	/v2/models, /v2/health/live, /metrics
TensorFlow Serving	8500, 8501	gRPC, HTTP/REST	/v1/models, /metadata
TorchServe	8080, 8081, 8082	HTTP, Management API, Metrics	/models, /metrics, /ping
Ollama	11434	HTTP	/api/tags, /api/generate
vLLM	8000	HTTP/OpenAI-Compatible API	/v1/models, /v1/chat/completions
Open WebUI	3000	HTTP	/api/models, /login
LM Studio Server	1234	HTTP	/v1/models
Text Generation Inference (TGI)	3000	HTTP	/generate, /health
MLflow Tracking Server	5000	HTTP	/api/2.0/mlflow/experiments/list
Kubeflow	80, 443	HTTP/HTTPS	Dashboard UI, Pipelines UI
Ray Dashboard	8265	HTTP	/api/jobs, /nodes, /logical/actors
Airflow (ML Pipelines)	8080	HTTP	/admin, /api/v1/dags
Prefect	4200	HTTP	/api, Dashboard
Qdrant	6333, 6334	HTTP, gRPC	/collections, /metrics
Weaviate	8080	HTTP, GraphQL	/v1/schema, /v1/meta
Milvus	19530	gRPC	Collection APIs
Chroma	8000	HTTP	/api/v1/collections
Pinecone Gateway	443	HTTPS	Index APIs
Redis Vector Store	6379	RESP/TCP	Key Enumeration
Elasticsearch (Vector Search)	9200	HTTP	/_cat/indices, /_search
Jupyter Notebook	8888	HTTP	/tree, /lab, /terminals
JupyterHub	8000	HTTP	/hub/login
VS Code Server	8080, 8443	HTTP	/login, /static
MinIO	9000, 9001	HTTP	/minio, Bucket Listings
S3-Compatible Storage	443	HTTPS	Bucket Enumeration
Prometheus	9090	HTTP	/metrics, /targets
Grafana	3000	HTTP	/login, /api/search
NVIDIA DCGM Exporter	9400	HTTP	/metrics
Kubernetes API Server	6443	HTTPS	/api, /version
Kubelet API	10250	HTTPS	/pods, /metrics
Docker Remote API	2375, 2376	HTTP/HTTPS	/containers/json
RabbitMQ	15672	HTTP	/api/overview
Kafka	9092	TCP	Broker Enumeration
Neo4j	7474, 7687	HTTP, Bolt	/browser, Cypher Queries
Apache Spark UI	4040	HTTP	/jobs, /stages
Hadoop YARN	8088	HTTP	/cluster
Label Studio	8080	HTTP	/projects, /tasks
BentoML	3000	HTTP	/healthz, /docs
FastAPI ML Services	8000	HTTP	/docs, /openapi.json
Gradio Apps	7860	HTTP	/config, /api/predict
Streamlit Apps	8501	HTTP	/healthz, WebSocket Endpoints

Fingerprinting AI Services

Identifying AI infrastructure requires a different reconnaissance mindset than traditional web application fingerprinting. Standard enterprise services usually reveal themselves through familiar banners, technologies, and ports. AI systems, however, expose unique behavioural patterns that become obvious once you know what to look for.

Instead of focusing only on open ports, AI reconnaissance relies heavily on:

HTTP response headers
JSON response structures
Error message formatting
Endpoint naming conventions
Protocol behaviour
Metrics exposure

Each AI framework leaves behind distinct fingerprints. Once these patterns are recognised, identifying AI services becomes fast, reliable, and highly accurate.

1. HTTP Header Fingerprinting

HTTP response headers are often the fastest and most reliable way to identify AI infrastructure during reconnaissance. Many AI inference frameworks unintentionally reveal their identity unless administrators specifically hide them behind reverse proxies or API gateways.

Unlike traditional web applications, AI services frequently expose framework-specific headers, telemetry data, and inference-related metadata that make fingerprinting straightforward.

1.1 TorchServe

TorchServe commonly returns a response header such as:

Server: TorchServe/0.x.x

This is a direct identification of a PyTorch serving environment with little ambiguity.

Combined with endpoints like:

/models
/predictions
/metrics

TorchServe becomes easy to fingerprint during reconnaissance.

1.2 NVIDIA Triton Inference Server

Triton Inference Server exposes one of the most distinctive fingerprints in AI infrastructure.

A common indicator is the presence of:

NV-Status response headers

Triton also supports a unique request header:

endpoint-load-metrics-format: text

When this header is sent, Triton may return:

CPU utilisation
GPU utilisation
Load metrics
Hardware telemetry

1.3 FastAPI-Based ML Services

Many custom AI APIs are built using FastAPI and commonly return:

server: uvicorn

On its own, uvicorn only indicates a Python ASGI server. However, when combined with AI-related routes such as:

/predict
/embeddings
/generate

it strongly suggests a machine learning backend.

FastAPI services also frequently expose:

/docs
/openapi.json

which may reveal the full API schema during reconnaissance.

1.4 OpenAI-Compatible APIs

Modern LLM serving frameworks often imitate the OpenAI API specification for compatibility with existing tools and applications.

Frameworks such as:

Ollama
vLLM
LiteLLM

commonly expose:

/v1/models
/v1/chat/completions

and return:

x-request-id headers
Structured OpenAI-style JSON responses

Example response structure:

{"object": "model"}

This is a strong fingerprint of an OpenAI-compatible inference service.

2. API Response Signatures

AI frameworks often reveal their identity through the structure of their API responses. Even when headers are hidden or proxy servers are used, the JSON format itself can act as a reliable fingerprint.

Once you become familiar with common response patterns, identifying AI services from the response body alone becomes straightforward.

2.1 TensorFlow Serving

TensorFlow Serving commonly returns responses such as:

{"model_version_status": [
    {
      "version": "1",
      "state": "AVAILABLE" }]}

The presence of:

model_version_status
state
version tracking fields

is a strong indicator of the TensorFlow Serving infrastructure.

2.2 NVIDIA Triton Inference Server

Triton responses often include detailed model metadata:

{  "name": "fraud_detector",
  "versions": ["1"],
  "platform": "tensorflow_graphdef"}

Common Triton indicators include:

platform
versions
model backend identifiers
tensor configuration metadata

These fields help identify both the inference server and the underlying ML framework.

2.3 MLflow Error Responses

Even error messages can fingerprint AI infrastructure.

MLflow frequently exposes stack traces referencing namespaces such as:

mlflow.server
mlflow.tracking

This means that even failed requests may reveal:

The orchestration platform
Backend technologies
Internal application structure

In many AI environments, verbose error handling unintentionally becomes a reconnaissance source.

2.4 OpenAI-Compatible APIs

LLM-serving frameworks that mimic the OpenAI API format typically return structured responses like:

{"object": "model",
  "id": "llama-3.1-8b",
  "created": 1700000000}

The presence of:

"object": "model"
model identifiers
OpenAI-style schemas

strongly indicates an OpenAI-compatible service.

This usually narrows the technology stack down to:

vLLM
LiteLLM
Ollama
Custom OpenAI wrappers

3. Error Message Fingerprinting

Error message fingerprinting is one of the most reliable techniques for identifying AI infrastructure during reconnaissance. Unlike traditional web applications, AI inference systems are highly sensitive to input structure, tensor shapes, and data types.

Most AI APIs expect very specific payload formats. When they receive malformed input, they often return verbose debugging information that reveals exactly which framework is running behind the service.

The technique is simple:

Send an intentionally malformed request and analyze the error response.

3.1 TensorFlow Serving

TensorFlow Serving produces highly recognizable validation errors.

For example, sending an invalid tensor payload may trigger responses referencing:

tensorinfo_map

This string is strongly associated with TensorFlow Serving internals and immediately identifies the backend framework.

Tensor-related validation messages may also reveal:

Expected tensor shapes
Input dimensions
Datatypes
Model signatures

3.2 MLflow

MLflow error responses frequently expose internal namespaces such as:

mlflow.server
mlflow.tracking
databricks

Even failed requests can reveal:

Backend architecture
File paths
Internal modules
Deployment structure

Certain vulnerabilities, such as historical MLflow path traversal issues, have also exposed full filesystem paths during error handling.

3.3 Databricks Mosaic AI

Databricks Mosaic AI deployments may return Java exception traces like:

io.jsonwebtoken.IncorrectClaimException

This is an immediate fingerprint of a Java-based authentication backend associated with Databricks infrastructure.

These stack traces often expose:

Framework dependencies
Authentication libraries
Internal implementation details

4. Endpoint Naming Conventions

One of the most overlooked reconnaissance techniques in AI infrastructure is endpoint naming analysis. While traditional web applications usually follow predictable REST design patterns, AI systems expose entirely different API structures that immediately stand out once you know what to look for.

Conventional APIs typically use resource-based endpoints such as:

/users
/accounts
/products
/orders

AI frameworks rarely follow this pattern.

Instead, machine learning services expose action-oriented endpoints focused on inference, generation, embeddings, and model execution. These routes are often highly distinctive and can reveal the presence of AI infrastructure even when banners, headers, and metadata are hidden.

For reconnaissance professionals, endpoint naming conventions become a powerful fingerprinting mechanism during directory enumeration and content discovery.

4.1 AI Inference Endpoints

Inference APIs commonly expose endpoints such as:

/predict
/infer
/generate
/embeddings
/score
/invocations

Unlike traditional REST APIs, these routes describe computational actions rather than resources.

Each endpoint often corresponds to a specific AI capability:

/predict → Classification or regression inference
/generate → Text or image generation
/embeddings → Vector embedding creation
/score → Model scoring or ranking
/infer → Generic inference execution

The /invocations endpoint is particularly important because it is strongly associated with Amazon SageMaker deployments.

Finding one of these paths during brute-forcing may indicate machine learning infrastructure. Finding several together almost certainly confirms it.

4.2 Model Management APIs

Many AI serving frameworks expose model inventory and management endpoints through highly predictable paths:

/v1/models
/v2/models

These endpoints are commonly associated with:

NVIDIA Triton
TensorFlow Serving
vLLM
OpenAI-compatible APIs
Custom inference gateways

Depending on configuration, these routes may reveal:

Loaded model names
Available versions
Backend frameworks
Deployment status
Inference capabilities

In some environments, simply querying the model endpoint provides a complete inventory of deployed AI systems.

4.3 OpenAI-Compatible Endpoint Patterns

Modern LLM-serving frameworks increasingly emulate the OpenAI API specification because it simplifies integration with existing tooling and applications.

Common endpoints include:

/v1/chat/completions
/v1/completions
/v1/embeddings
/v1/models

These routes are commonly exposed by:

Ollama
vLLM
LiteLLM
Open WebUI
Custom LLM wrappers

Even if the backend framework is hidden, OpenAI-style endpoint structures strongly indicate the presence of an LLM inference service.

4.4 MLflow API Fingerprinting

MLflow exposes one of the most distinctive API namespaces in AI infrastructure:

/api/2.0/mlflow/

This prefix is highly recognizable and rarely appears outside MLflow deployments.

Its presence typically indicates:

Experiment tracking systems
Model registries
AI orchestration platforms
Artifact management infrastructure

Because MLflow centralises training metrics, model artifacts, and deployment history, discovering this endpoint during reconnaissance is often extremely valuable.

4.5 Kubeflow Pipeline APIs

Kubeflow deployments expose their orchestration functionality through recognizable pipeline routes such as:

/pipeline/apis/v1beta1/

These endpoints may reveal:

Training pipelines
Workflow definitions
Experiment orchestration
Kubernetes-integrated AI infrastructure

In enterprise environments, Kubeflow often acts as the operational layer coordinating the organisation’s entire machine learning workflow.

5. gRPC Fingerprinting

Many modern AI frameworks expose gRPC services alongside traditional HTTP APIs. These gRPC interfaces are heavily used for internal communication because they provide faster, low-latency data transfer for tensor-based workloads.

Common examples include:

Framework	Default gRPC Port
NVIDIA Triton Inference Server	8001
TensorFlow Serving	8500

Unlike REST APIs, gRPC uses binary protobuf communication over HTTP/2. Because of this, traditional HTTP scanners often fail to detect or correctly fingerprint these services.

A normal HTTP probe may return:

Empty responses
Protocol errors
Unknown services

even when a fully functional AI inference API is running behind the port.

Using grpcurl

The primary reconnaissance tool for gRPC services is:

grpcurl

If gRPC reflection is enabled — which is common in development environments pushed into production without hardening — researchers can enumerate the exposed protobuf schema.

Example:

grpcurl -plaintext target:8001 list

This lists all available RPC services exposed by the server.

Detailed service descriptions can then be retrieved:

grpcurl -plaintext target:8001 describe inference.GRPCInferenceService

Enabled gRPC reflection may expose:

RPC method names
Service definitions
Input tensor structures
Output schemas
Datatypes
Inference formats

6. TLS Fingerprinting (JA3/JA4)

AI infrastructure can also be identified at the network level through TLS fingerprinting techniques such as JA3 and JA4.

Unlike traditional web traffic, AI environments generate large amounts of automated service-to-service communication using:

Python libraries
gRPC clients
ML orchestration tools
API automation frameworks

This traffic behaves very differently from normal browser-based activity.

What JA3 and JA4 Measure

JA3 and JA4 create fingerprints based on TLS handshake characteristics, including:

Cipher suites
TLS extensions
Protocol versions
Client negotiation behaviour

Enumerating AI Systems

Once you have identified the framework behind an AI service, the next step is enumeration — extracting as much information as possible from exposed APIs, metadata endpoints, and management interfaces.

This is where AI reconnaissance becomes significantly more valuable.

Fingerprinting answers:

“What is this service?”

Enumeration answers:

In many AI environments, enumeration exposes far more intelligence than traditional infrastructure reconnaissance.

From Identification to Intelligence

Fingerprinting may tell you:

This is an MLflow server.

Enumeration may reveal:

- 12 active experiments
- 5 production LLM models
- Artifact storage paths
- Internal project names
- GPU training configurations
- Deployment environments

For example:

s3://nova-ai-prod-models/
Created by: Sarah.Kim (ML Engineer)
Project: customer-support-rag
Model: llama3-support-assistant-v2
Stage: Production

At this point, reconnaissance moves beyond service discovery and starts exposing the organisation’s internal AI operations.

1. MLflow Enumeration

MLflow is one of the most valuable targets during AI reconnaissance because it centralises nearly every part of the machine learning lifecycle and exposes it through a structured REST API.

If an MLflow instance is publicly accessible, a small number of API requests can reveal:

Experiments
Models
Artifact locations
Training metrics
Deployment stages
Internal project names
User attribution metadata

In many environments, enumerating MLflow effectively maps the organisation’s entire AI portfolio.

Step 1: Enumerate Experiments

MLflow experiments can be listed through:

POST /api/2.0/mlflow/experiments/search

The response typically includes:

Experiment names
Experiment IDs
Creation metadata

Experiment names often reveal internal projects and business functions.

Example:

fraud-detection-v3
customer-support-rag
internal-risk-scoring
llm-finetune-testing

Even experiment naming conventions alone can expose valuable operational intelligence.

Step 2: Enumerate Registered Models

Model inventories can be retrieved through:

GET /api/2.0/mlflow/registered-models/list

This endpoint may reveal:

Registered model names
Descriptions
Creation timestamps
Deployment stages

Example:

Model: finance-forecast-transformer
Stage: Production
Created: 2026-02-18

At this stage, researchers can begin mapping the organisation’s active AI systems.

Step 3: Retrieve Model Version Metadata

Detailed version information can be queried through:

GET /api/2.0/mlflow/model-versions/search

This is often the most valuable enumeration step.

Responses may include:

Artifact URIs
Cloud storage paths
Version history
User attribution
Deployment stages

Example:

Source: s3://nova-ai-models/experiments/4/artifacts/
User: emily.chen
Stage: Production

Artifact paths frequently expose:

S3 bucket names
Internal storage structure
Model artifact locations

This provides visibility into both infrastructure and deployment architecture.

Step 4: Search Training Runs

Training run metadata can be queried through:

POST /api/2.0/mlflow/runs/search

This may reveal:

Hyperparameters
Accuracy metrics
Training configurations
GPU usage
Custom tags

Tags are especially valuable because teams often store:

Internal codenames
Git commit hashes
Environment identifiers
Deployment labels

Example:

env=production-gpu
git_commit=4f2c9ab
team=fraud-analytics

These details can help correlate AI infrastructure with internal development workflows.

Step 5: Enumerate Artifacts

Artifact listings can be retrieved through:

GET /api/2.0/mlflow/artifacts/list

This endpoint may expose downloadable artifacts such as:

Serialized models
Training outputs
Checkpoints
Configuration files

Example:

model.pkl
tokenizer.json
training_config.yaml

At this point, the organisation’s machine learning environment has effectively been mapped through a small number of API calls.

2. Inference Server Metadata

AI inference servers often expose metadata endpoints that reveal exactly how deployed models operate. These endpoints are designed to help developers integrate applications with machine learning models, but during reconnaissance they become an extremely valuable intelligence source.

Frameworks such as:

NVIDIA Triton Inference Server
TensorFlow Serving

provide detailed model configuration data through publicly accessible APIs.

In many cases, these endpoints expose enough information to fully reconstruct valid inference requests without needing documentation or source code.

Triton Model Configuration Enumeration

Triton exposes detailed model metadata through:

GET /v2/models/<name>/config

The response commonly includes:

Input tensor names
Tensor shapes
Datatypes
Batch size limits
Backend framework information

Example fields may include:

FP32
UINT64
INT8
tensorflow_graphdef
pytorch_libtorch
onnxruntime

This effectively provides a blueprint for interacting with the model.

TensorFlow Serving Metadata

TensorFlow Serving exposes similar functionality through:

GET /v1/models/<name>/metadata

These responses may reveal:

Input tensor names
Output tensor names
Expected shapes
Datatypes
Model signatures

Example metadata often includes:

Tensor dimensions
Float and integer types
Prediction output structures

This allows researchers to understand precisely how the inference API expects requests to be constructed.

3. Vector Database Enumeration

Vector databases are one of the most valuable reconnaissance targets in modern AI environments because they reveal what data an AI system is built around and which embedding models power its semantic search capabilities.

These databases are commonly used in:

RAG pipelines
AI assistants
Enterprise search systems
Chatbots
Knowledge retrieval platforms

Unlike traditional databases, vector stores expose metadata about embeddings, collections, and indexing structures that can reveal significant operational intelligence.

3.1 Weaviate Enumeration

Weaviate exposes several useful reconnaissance endpoints.

Server metadata can be retrieved through:

GET /v1/meta

This may reveal:

Server version
Installed modules
Backend configuration
Enabled vectorisation components

Schema enumeration is available through:

GET /v1/schema

This endpoint returns:

Class definitions
Property names
Data structures
Vectoriser configuration

The vectoriser field is especially important because it identifies which embedding model or embedding provider the system uses.

Weaviate also commonly exposes:

/v1/graphql

On unauthenticated deployments, this may allow:

Schema introspection
Metadata enumeration
Data querying

This can provide direct visibility into the organisation’s AI knowledge base.

3.2 Qdrant Enumeration

Qdrant exposes collection information through:

GET /collections

This returns all available collection names.

Detailed collection metadata can then be queried using:

GET /collections/<name>

Responses may reveal:

Vector dimensions
Distance metrics
Point counts
Collection configuration

For example:

Collection: internal-hr-policies
Vectors: 768 dimensions
Points: 50,000

Even without direct document access, this reveals:

The likely use case
The scale of indexed data
The probable embedding model family

A 768-dimensional embedding size strongly suggests transformer-based embeddings commonly used in RAG systems.

3.3 Chroma Enumeration

Older Chroma deployments frequently exposed:

GET /api/v1/collections

without authentication enabled by default.

This endpoint may reveal:

Collection inventories
Internal project names
AI application structures
Retrieval system organisation

Because many vector databases prioritize developer usability and rapid deployment, authentication is often weak or entirely absent in development environments pushed to production.

4. Prometheus Metrics as Intelligence

Many AI inference servers expose Prometheus metrics endpoints for monitoring and observability. These endpoints are often available on dedicated ports and provide a surprisingly detailed view into production AI systems.

Common examples include:

Framework	Metrics Port
NVIDIA Triton Inference Server	8002
TorchServe	8082

These services typically expose: metrics in Prometheus format.

5. Debug Interfaces and Information Leakage

One of the most common weaknesses in AI infrastructure is excessive debugging functionality left enabled in production environments.

AI platforms are heavily optimized for:

Rapid experimentation
Developer usability
Model debugging
Internal observability

As a result, many frameworks expose interfaces and verbose error handling that unintentionally provide a rich source of reconnaissance data.

In many cases, these leaks reveal more operational intelligence than the primary APIs themselves.

5.1 FastAPI Debug Interfaces

Many custom AI services are built using FastAPI, which automatically generates interactive API documentation endpoints.

Common exposed routes include:

/docs
/openapi.json

These endpoints may reveal:

Full API schemas
Request formats
Response structures
Authentication requirements
Example payloads
Internal endpoint names

For reconnaissance professionals, this is effectively free documentation of the entire inference API.

5.2 MLflow GraphQL Exposure

Some MLflow deployments historically exposed GraphQL functionality through:

/graphql

In certain configurations, GraphQL resolvers could bypass standard REST API authentication controls.

Accessible resolvers may expose:

Experiment inventories
Training runs
User metadata
Source code paths
Internal project names

Queries such as:

mlflowSearchRuns
mlflowGetRun

Even metadata tags like:

mlflow.source.name

can expose internal development structure and proprietary project organisation.

5.3 Verbose Debug Parameters

AI gateways and inference APIs sometimes expose additional debugging output through parameters such as:

?debug=true
?verbose=1

In poorly hardened environments, these parameters may trigger:

Raw stack traces
Filesystem paths
Installed package versions
Python exceptions
Environment variable loading errors

6. Jupyter Notebook Enumeration

Jupyter environments are especially valuable reconnaissance targets because they combine:

Interactive code execution
Development workflows
Infrastructure access
Credential storage

On exposed Jupyter instances, endpoints such as:

GET /api/kernels

may reveal:

Active kernel IDs
Notebook activity timestamps
Running sessions
Execution state

Even this metadata can help infer:

Which notebooks are actively used
What workloads are running
Which users are connected

Credential Leakage in Notebook Cells

The real value of exposed Jupyter environments is often inside the notebook content itself.

Data scientists frequently store credentials directly in notebook cells for convenience, including:

MLflow credentials
Cloud storage access keys
API tokens
Database passwords
Hugging Face tokens

Examples commonly encountered include:

MLFLOW_TRACKING_USERNAME
MLFLOW_TRACKING_PASSWORD
AWS access keys
Hugging Face API tokens

Because notebooks are designed for experimentation rather than security, secrets management practices are often weak.

Mapping the AI Attack Surface

At this stage of reconnaissance, you have already:

Identified AI components on the network
Fingerprinted the frameworks behind them
Enumerated APIs and metadata
Extracted operational intelligence

Those are individual findings.

The next step is turning those isolated findings into a complete attack surface map.

This is where reconnaissance becomes significantly more powerful.

From Individual Findings to Infrastructure Mapping

A single exposed MLflow server is useful.

But the real value comes from understanding how that MLflow instance connects to:

Model registries
Object storage
Inference servers
Vector databases
Notebook environments
Kubernetes clusters
GPU-backed workloads

The difference between a vulnerability list and a true AI attack surface map is the relationships between components.

For example:

MLflow → S3 Artifact Storage → Triton Inference Server → Vector Database → Internal RAG Assistant

Once these connections are identified, the organisation’s entire machine learning architecture begins to emerge.

How AI Expands the Traditional Attack Surface

Traditional web applications usually expose a relatively small and predictable attack surface:

HTTP/HTTPS services
Authentication systems
Databases
SSH access

In most environments, that means roughly:

4–5 primary exposed ports
A handful of backend services
Limited internal service communication

AI Systems Are Built as Service Meshes

Modern AI environments rely on continuous communication between components.

For example:

Inference servers query vector databases
MLflow pushes artifacts to object storage
Kubeflow orchestrates training pipelines
Ray distributes workloads across clusters
Jupyter notebooks connect to every internal service
Prometheus continuously scrapes metrics from the entire environment

Unlike traditional applications, AI systems are designed around constant high-volume internal communication.

This creates a dense mesh of trusted internal traffic.

At the center of the architecture are the core AI services:

Inference servers handling model predictions
Vector databases powering semantic search and RAG pipelines
Model registries storing trained models and artifacts
Training platforms and distributed compute clusters processing machine learning workloads

Surrounding these systems are orchestration and operational components such as:

MLflow experiment tracking
Kubernetes orchestration platforms
Jupyter notebook environments
Prometheus monitoring systems
Object storage platforms like MinIO or S3

The arrows represent constant internal communication flows:

Inference requests and embedding lookups
Model artifact transfers
Metrics collection and monitoring
Configuration synchronization
Storage access between services

One Weak Service Can Expose Everything

In many AI deployments, internal services assume they are operating inside a trusted environment.

As a result:

Authentication is weak or missing
Internal APIs are openly accessible
Metrics endpoints are unauthenticated
gRPC services trust internal traffic
Notebook environments have broad access

If even one component accidentally binds to:

0.0.0.0

instead of:

127.0.0.1

the entire internal AI mesh may become externally reachable.

This is one of the biggest differences between traditional infrastructure and AI environments:

Internal exposure quickly becomes external exposure.

Platform Misconfigurations That Attackers Map

The most dangerous AI exposures are often not advanced zero-days or novel attacks. They are routine deployment mistakes repeated across thousands of environments.

Modern AI platforms prioritize:

Rapid experimentation
Ease of deployment
Developer convenience
Internal collaboration

Security hardening frequently comes later — if it happens at all.

As a result, many AI services are deployed with:

Weak authentication
Exposed management interfaces
Overly permissive network access
Dangerous default configurations

For reconnaissance professionals, these misconfigurations become high-value mapping targets.

1. MLflow Misconfigurations

MLflow historically shipped without authentication enabled by default before version 2.x.

This meant that publicly exposed MLflow instances often provided unrestricted access to:

Experiments
Model registries
Artifact metadata
Training runs
Internal project information

Even after authentication support was introduced, additional security issues emerged.

One vulnerability exposed the risk of default credentials stored inside:basic_auth.ini

This allowed attackers scanning: Port 5000

to authenticate using predictable or hardcoded credentials in improperly configured environments.

Additional vulnerabilities in artifact handling mechanisms demonstrated how unsafe file operations inside ML workflows could escalate from information disclosure into remote code execution.

The key lesson is that AI orchestration systems frequently combine:

Sensitive metadata
Artifact management
File handling
Execution pipelines

inside a single platform.

2. Kubeflow Dashboard Exposure

Kubeflow deployments are commonly exposed through:

Kubernetes LoadBalancers
NodePorts
Public ingress controllers

In many environments, authentication mechanisms such as:

OIDC
Identity-aware proxies
RBAC restrictions

are disabled or incompletely configured.

An exposed Kubeflow dashboard may allow attackers to:

View pipelines
Access notebook environments
Launch workloads
Interact with Kubernetes-connected services

The most dangerous part is often the notebook integration.

Notebook servers frequently inherit:

Kubernetes service accounts
Cluster permissions
Access to internal APIs

This creates a direct path from:

Exposed dashboard → Notebook access → Kubernetes infrastructure

In AI environments, orchestration exposure often becomes infrastructure exposure.

3. TorchServe Management API

TorchServe exposes a management API by default on:

Port 8081

This interface supports:

Dynamic model registration
Model loading
Model unloading
Worker management

If publicly accessible, the server may be instructed to:

Download external model archives
Register arbitrary .mar files
Load attacker-controlled models

TorchServe executes initialization code during model loading.

This means that loading a malicious model archive can lead to:

Arbitrary code execution
Server compromise
Internal network access

4. SageMaker Notebook Exposure

Cloud-hosted notebook environments introduce another major attack surface.

Amazon SageMaker notebooks configured with:

DirectInternetAccess: Enabled

may accept inbound internet connections depending on surrounding network policy configuration.

These notebook environments commonly contain:

Training code
Cloud credentials
API tokens
Access to internal ML systems

Because notebooks are designed for convenience and collaboration, they are often deployed with broad access permissions.

A single exposed notebook can become an entry point into:

Object storage
Model registries
Training infrastructure
Cloud orchestration systems

Supply Chain Reconnaissance

Modern AI systems depend heavily on external platforms, third-party packages, pretrained models, and cloud-hosted datasets. These dependencies create a large and often overlooked supply chain attack surface.

During reconnaissance, attackers are not only mapping internal infrastructure — they are also identifying:

External model sources
Dependency pipelines
Package registries
Access tokens
CI/CD integrations
Model distribution workflows

In AI environments, supply chain visibility frequently becomes operational visibility.

Hugging Face Token Exposure

One of the most common findings during AI reconnaissance is exposed Hugging Face access tokens.

These tokens often appear in:

.env files
GitHub repositories
Notebook cells
CI/CD pipeline logs
Kubernetes secrets
Docker build files

Simple GitHub dorks such as:

filename:.env HF_TOKEN

can reveal accidentally exposed credentials.

A compromised Hugging Face token may provide:

Access to private models
Dataset downloads
Model uploads
Repository modification permissions

Because many organisations store proprietary LLMs and fine-tuned models on Hugging Face, token exposure can directly compromise the AI supply chain.

1. Dependency Confusion in ML Pipelines

Machine learning environments are especially vulnerable to dependency confusion attacks.

ML projects commonly contain large:

requirements.txt
environment.yml
pyproject.toml

files with internal package references.

Example:

company-data-utils
internal-ml-common
corp-feature-engineering

If these internal package names are not registered publicly on:

PyPI
npm
other package registries

an attacker may register malicious versions externally.

This becomes especially dangerous in:

Kubeflow pipelines
Automated training jobs
Dynamic container builds

Where dependencies are installed automatically during runtime.

A malicious package can execute code directly inside:

Training clusters
GPU nodes
CI/CD environments
Model build pipelines

2. Reconnaissance of Model Download Sources

AI systems frequently download pretrained models from external sources during deployment or training.

Common sources include:

Hugging Face Hub
PyTorch Hub
GitHub releases
External artifact repositories

These download locations are often visible inside:

Configuration files
Notebook cells
Dockerfiles
Build logs
Training scripts

Example:

from_pretrained("company/private-llama-model")

or:

https://huggingface.co/org/model-name

This allows researchers to identify:

Which external dependencies the organisation trusts
Which model providers are used
Which repositories contain production AI assets

AI Reconnaissance Methodology

This methodology provides a structured workflow for identifying, fingerprinting, and enumerating AI infrastructure during security assessments.

Phase 1. Passive Reconnaissance

Before interacting with the target, identify publicly exposed AI infrastructure.

Search Engines

Use:

Shodan
Censys
FOFA

Example queries:

port:5000 "MLflow"
port:8888 title:"Home Page - Select or create a notebook"
http.title:"Ray Dashboard"

GitHub Secret Hunting

Search for leaked AI credentials:

filename:.env MLFLOW_TRACKING_URI
filename:.env HF_TOKEN
filename:config.json model_name

Look for:

MLflow credentials
Hugging Face tokens
Cloud storage keys
Model configurations

Public Research & Job Posts

Check:

arXiv papers
Engineering blogs
Conference talks
Job postings

These often reveal:

Frameworks in use
AI architecture
Orchestration platforms
Infrastructure choices

Phase 2. Active Scanning

Scan common AI infrastructure ports.

Example:

nmap -p 5000,6333,8000,8001,8002,8080,8265,8500,8501,8888,9000,11434,19530 -sV --script=http-title,http-headers <target>

Common targets include:

MLflow
Triton
TensorFlow Serving
Qdrant
Ray
Jupyter
MinIO

gRPC Enumeration

Check:

Port 8001
Port 8500

Phase 3. API Fingerprinting

Run:

ffuf
feroxbuster
dirsearch

with AI-specific endpoints.

Example paths:

/v1/models
/v2/models
/api/2.0/mlflow/
/v1/schema
/openapi.json
/docs
/graphql
/api/kernels
/metrics

For every response:

Inspect headers
Parse JSON structure
Analyze error messages
Identify framework-specific patterns

Phase 4. Metadata Extraction

Enumerate confirmed AI services.

MLflow

Extract:

Experiments
Registered models
Artifact URIs
Training runs
User metadata

Inference Servers

Extract:

Tensor schemas
Input/output formats
Backend frameworks

Vector Databases

Extract:

Collections
Embedding dimensions
Vectoriser configuration

Jupyter

Extract:

Kernel activity
Notebook contents
Stored credentials

Phase 5. Supply Chain Review

Review:

Hugging Face dependencies
External model downloads
Package dependencies
Container registries

Check for:

Public artifact buckets
Dependency confusion risks
Exposed AI tokens
Public container access

Conclusion

Modern AI infrastructure introduces a much larger and more interconnected attack surface than traditional applications. Instead of isolated web servers and databases, AI environments rely on inference engines, vector databases, orchestration platforms, notebooks, and model registries that continuously communicate with each other.

Throughout this guide, we explored how AI reconnaissance involves fingerprinting frameworks, enumerating APIs, extracting metadata, and mapping relationships between services.

One of the biggest challenges in AI security is the amount of operational intelligence exposed through APIs, metrics, model registries, debug interfaces, and notebook environments. As AI adoption continues to grow, security teams must treat AI infrastructure as a dedicated attack surface that requires specialised reconnaissance techniques and security assessments.

To mitigate the risks of data breaches and AI service misconfigurations, Resecurity assists businesses and government agencies through Vulnerability Assessment and Penetration Testing (VAPT). Conducting timely Red Teaming exercises and implementing Managed Threat Detection adds confidence that your infrastructure is properly protected. These proactive measures help identify and address potential blind spots and vulnerabilities at an early stage, preventing attackers from exploiting them.