Skip to content

Overview

Meadowrun is a library for data scientists and data engineers who run python code on AWS, Azure, or Kubernetes. Meadowrun:

  • scales from a single function to thousands of distributed tasks.
  • syncs your local code and libraries for a faster, easier iteration loop. Edit your code and rerun your analysis without worrying about building packages or Docker images.
  • optimizes for cost, choosing the cheapest instance types and turning them off when they're no longer needed.

For more context, see our case studies of how Meadowrun is used in real life, or see the project homepage.

Quickstart

This section provides snippets for running your first job with Meadowrun. Alternatively, you can start with more in-depth tutorials on AWS EC2, Azure VMs, GKE, or Kubernetes.

1. Install the Meadowrun package

pip install meadowrun
conda install -c defaults -c conda-forge -c meadowdata meadowrun

If you're using conda on a Windows or Mac, Meadowrun won't be able to mirror your local environment because conda environments aren't cross-platform and Meadowrun runs the remote jobs on Linux. If you're in this situation, you can either switch to Pip or Poetry, or create a CondaEnvironmentFile that's built for Linux and pass that in to the mirror_local call in the next step below, like mirror_local(interpreter=CondaEnvironmentFile(...))

poetry add meadowrun

2. Configure your cloud environment:

Make sure you're logged into the AWS CLI as a root/administrator account. Then, set up Meadowrun resources in your AWS account:

meadowrun-manage-ec2 install

Make sure you're logged into the Azure CLI as a root/administrator account. Then, set up Meadowrun resources in your Azure account:

meadowrun-manage-azure-vm install
  • Configure kubectl to work with your GKE cluster
  • Create a Google Storage bucket called something like my-meadowrun-bucket
  • Create a Kubernetes service account called my-service-account that is linked to a Google Cloud service account that has permissions to read and write to my-meadowrun-bucket
  • Configure kubectl to work with your Kubernetes cluster
  • We'll assume you have an S3-compatible object storage system that is accessible from outside and inside the cluster (e.g. Minio)
  • Create a bucket called meadowrun-bucket in your object storage system
  • Create a Kubernetes secret called storage-credentials that has a username and password field to provide access to the object storage system

3. Run your first job

import meadowrun
import asyncio

print(
    asyncio.run(
        meadowrun.run_function(
            lambda: sum(range(1000)) / 1000,
            meadowrun.AllocEC2Instance(),
            meadowrun.Resources(logical_cpu=1, memory_gb=4, max_eviction_rate=80),
            meadowrun.Deployment.mirror_local()
        )
    )
)
import meadowrun
import asyncio

print(
    asyncio.run(
        meadowrun.run_function(
            lambda: sum(range(1000)) / 1000,
            meadowrun.AllocAzureVM(),
            meadowrun.Resources(logical_cpu=1, memory_gb=4, max_eviction_rate=80),
            meadowrun.Deployment.mirror_local()
        )
    )
)
import meadowrun
import asyncio

def pod_customization(pod_template):
    pod_template.spec.service_account_name = "my-service-account"
    return pod_template

print(
    asyncio.run(
        meadowrun.run_function(
            lambda: sum(range(1000)) / 1000,
            meadowrun.Kubernetes(
                meadowrun.GoogleBucketSpec("my-meadowrun-bucket"),
                reusable_pods=True,
                pod_customization=pod_customization,
            ),
            meadowrun.Resources(logical_cpu=1, memory_gb=4),
            meadowrun.Deployment.mirror_local()
        )
    )
)
import meadowrun
import asyncio

print(
    asyncio.run(
        meadowrun.run_function(
            lambda: sum(range(1000)) / 1000,
            meadowrun.Kubernetes(
                GenericStorageBucketSpec(
                    "meadowrun-bucket",
                    endpoint_url="http://127.0.0.1:9000",
                    endpoint_url_in_cluster="http://storage-service:9000",
                    username_password_secret="storage-credentials",
                )
            ),
            meadowrun.Resources(logical_cpu=1, memory_gb=4),
            meadowrun.Deployment.mirror_local()
        )
    )
)

Next steps

For more background, read about How Meadowrun works.