How To Use

Installation

From PyPI:


pip install hephaes

From source:


cd hephaes
python -m pip install .

For local development:


cd hephaes
python -m pip install -e ".[dev]"

These docs describe the current 0.3.x package workflow: local workspace, path-backed assets, draft authoring, saved configs, and conversion.

Recommended Paths

hephaes now supports three main ways to work:

human authoring: use the interactive CLI wizard
automation and tests: use the scriptable CLI commands
embedded Python integrations: use Workspace as the durable package API

The package still exposes lower-level stateless helpers and direct conversion APIs, but those are best treated as building blocks rather than the default authoring path.

CLI-First Workflow

Recommended for people working directly in a terminal:


hephaes init ./demo
hephaes add --workspace ./demo ./logs/run_001.mcap
hephaes drafts wizard --workspace ./demo <asset-id>

hephaes add records the canonical asset path in the workspace. It does not copy the raw log into workspace storage.

If you want a fully scriptable flow instead of the wizard:


hephaes init ./demo
hephaes add --workspace ./demo ./logs/run_001.mcap
hephaes drafts create --workspace ./demo <asset-id> --topic /camera --trigger-topic /camera
hephaes drafts preview --workspace ./demo <draft-id> --sample-n 5
hephaes drafts confirm --workspace ./demo <draft-id> --yes
hephaes drafts save-config --workspace ./demo <draft-id> --name camera-demo
hephaes convert --workspace ./demo <asset-id> --config camera-demo

See the full CLI reference for the complete command surface.

Using The Workspace API

Recommended for future backends, automation services, or package-level integrations:


from hephaes.conversion.draft_spec import DraftSpecRequest
from hephaes.conversion.introspection import InspectionRequest
from hephaes.workspace import Workspace
 
workspace = Workspace.init("demo", exist_ok=True)
asset = workspace.register_asset("./logs/run_001.mcap")
 
draft = workspace.create_conversion_draft(
    asset.id,
    inspection_request=InspectionRequest(
        topics=["/camera"],
        sample_n=8,
    ),
    draft_request=DraftSpecRequest(
        trigger_topic="/camera",
        selected_topics=["/camera"],
        output_format="tfrecord",
        output_compression="none",
        max_features_per_topic=2,
    ),
    label="Initial Draft",
)
 
draft = workspace.preview_conversion_draft(draft.id, sample_n=5)
draft = workspace.confirm_conversion_draft(draft.id)
config = workspace.save_conversion_config_from_draft(
    draft.id,
    name="camera-demo",
    description="Front camera TFRecord export",
)
outputs = workspace.run_conversion(
    asset.id,
    saved_config_selector=config.name,
)
 
print(config.id, config.name)
print(outputs[0].output_path)

This is the main durable package boundary for the end-to-end authoring flow:

inspect an asset
create a draft
preview the draft
confirm it
save it as a config
run conversions from that config later

Stateless Authoring Helpers

Recommended when you want package primitives without workspace persistence:

inspect_reader(...)
inspect_bag(...)
build_draft_conversion_spec(...)
preview_conversion_spec(...)
preflight_conversion_spec(...)
conversion spec load/dump and migration helpers

These helpers are useful when you are building a custom authoring interface and want to manage persistence yourself. They do not own workflow state, draft revisions, or config promotion rules.

Direct Conversion APIs

Recommended when you already have a stable conversion spec and just want execution:


from hephaes import Converter, TFRecordOutputConfig
 
converter = Converter(
    ["data/run_001.mcap"],
    mapping=...,
    output_dir="dataset/processed",
    output=TFRecordOutputConfig(image_payload_contract="bytes_v2"),
    max_workers=1,
)
 
dataset_paths = converter.convert()
print(dataset_paths[0])

You can also inspect produced rows:


from hephaes import stream_tfrecord_rows
 
for row in stream_tfrecord_rows(dataset_paths[0]):
    print(row)
    break

Or for Parquet:


from hephaes import stream_wide_parquet_rows
 
for row in stream_wide_parquet_rows(dataset_paths[0]):
    print(row)
    break

Choosing The Right Layer

use the CLI wizard when a person is authoring a config interactively
use the scriptable CLI when a shell script, test, or automation needs stable command-level behavior
use Workspace when another Python integration needs durable workflow ownership inside the package
use stateless helpers when you want package logic but not package persistence
use Converter directly when the spec is already settled and authoring is out of scope