Metrics¶
For metrics we recommend using Tensorboard to log metrics directly to cloud storage along side your model. As the model trains you can launch a tensorboard instance locally to monitor your model progress:
$ tensorboard --log-dir provider://path/to/logs
Or you can use the torchx.components.metrics.tensorboard() component as
part of your pipeline.
See the Trainer Example for an example on how to use the PyTorch Lightning TensorboardLogger.
Reference¶
- PyTorch Tensorboard Tutorial https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html 
- PyTorch Lightning Loggers https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html 
- torchx.components.metrics.tensorboard(logdir: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', timeout: float = 3600, port: int = 6006, start_on_file: str = '', exit_on_file: str = '') AppDef[source]¶
- This component runs a Tensorboard server which will render the logs specified by logdir. - Since Tensorboard runs as a service you need to specify the termination conditions. This consists of a timeout as well as an optional - exit_on_filewhich will cause the service to quit when that path is created.- The files are periodically polled for existence via fsspec and will trigger the corresponding behavior when created. - Parameters:
- logdir – fsspec path to the Tensorboard logs 
- image – image to use 
- timeout – maximum time to run before exiting (seconds) 
- start_on_file – start the server when the fsspec path is created 
- exit_on_file – shutdown the server when the fsspec path is created