Disk cache
Replay an expensive step without recomputing
`.cached_task(fn, cache_dir=...)` computes `fn(x)` and serializes the result to `.pkl` with a SHA-256 key of the input. On the next run, if the key exists, the result is read from disk — the function is never called.
python
import tempfile
from olympipe import Pipeline
def expensive_transform(x: int) -> dict:
time.sleep(0.5) # simulation calcul lourd
return {"value": x ** 3, "meta": compute_stats(x)}
with tempfile.TemporaryDirectory() as cache_dir:
# 1er run : calcule et persiste sur disque
results = (
Pipeline(range(1000))
.cached_task(expensive_transform, cache_dir=cache_dir)
.uncache()
.wait_for_result()
)
# ~500s
# 2ème run : lecture disque uniquement
results_again = (
Pipeline(range(1000))
.cached_task(expensive_transform, cache_dir=cache_dir)
.uncache()
.wait_for_result()
)
# < 2s⚡ Performance
1 000 items, 0.5s/item computation
First run 8.3min
Second run (cache) 1.8s
🚀 277.8× faster
How it works
The key is the SHA-256 of the pickled input. Files are stored at `cache_dir/<step_hash[:16]>/<key>.pkl`. Works with multi-step pipelines: each step has its own cache subfolder.
