Skip to content

Run a Python function on GKE (Google Kubernetes Engine)

This page walks you through running a function on a GKE cluster using your local code and libraries. No need to build any containers!

Prerequisites

  • If you don't already have a GKE cluster, create a new one. We recommend creating an autopilot cluster. We'll assume you're working with an Autopilot cluster below. Standard clusters are fully supported as well, but require a bit more configuration..
  • Install kubectl and configure it to work with your GKE cluster.
  • Create a Google Storage bucket called something like my-meadowrun-bucket. Meadowrun will use this storage bucket to communicate with the remote processes running in the GKE cluster.
  • Create a Kubernetes service account and give it read/write permissions to your new storage bucket:
  • Install Meadowrun

    pip install meadowrun
    
    conda install -c defaults -c conda-forge -c meadowdata meadowrun
    

    If you're using conda on a Windows or Mac, Meadowrun won't be able to mirror your local environment because conda environments aren't cross-platform and Meadowrun runs the remote jobs on Linux. If you're in this situation, you can either switch to Pip or Poetry, or create a CondaEnvironmentFile that's built for Linux and pass that in to the mirror_local call in the next step below, like mirror_local(interpreter=CondaEnvironmentFile(...))

    poetry add meadowrun
    

Write a Python script to run a function remotely

Create a file called mdr.py:

import meadowrun
import asyncio

def pod_customization(pod_template):
    # You must replace this with your Kubernetes service account you created above. This
    # gives the Meadowrun-managed pods permissions to access the Google Storage bucket
    pod_template.spec.service_account_name = "my-k8s-service-account"
    return pod_template

print(
    asyncio.run(
        meadowrun.run_function(
            # the function to run remotely
            lambda: sum(range(1000)) / 1000,
            meadowrun.Kubernetes(
                # you must replace this with your Google Storage bucket
                meadowrun.GoogleBucketSpec("my-meadowrun-bucket"),
                pod_customization=pod_customization,
            ),
            # resource requirements when creating or reusing a pod
            meadowrun.Resources(logical_cpu=1, memory_gb=4),
        )
    )
)

Run the script

Assuming you saved the file above as mdr.py:

> python -m mdr 
Waiting for pods to be created for the job mdr-reusable-03290565-a610-4d1d-95a2-e09d6056c34f
Waiting for pod mdr-reusable-03290565-a610-4d1d-95a2-e09d6056c34f-0-sq9ks to start running: Unschedulable, 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.
Waiting for pod mdr-reusable-03290565-a610-4d1d-95a2-e09d6056c34f-0-sq9ks to start running: Unschedulable, 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Waiting for pod mdr-reusable-03290565-a610-4d1d-95a2-e09d6056c34f-0-sq9ks to start running: ContainerCreating (pulling image)
Started 1 new pods
Result: 499.5

The output will walk you through what Meadowrun's run_function is doing:

  • Meadowrun creates a pod which will run our function. (Specifically, this pod is created via a Kubernetes Job. The Job is just to make it easy to launch multiple pods quickly. In this case, because we're using run_function, we only need a single pod. Using run_map would require multiple pods.)
  • With an autopilot GKE cluster, the pod will be "Unschedulable" until the autoscaler kicks in and provisions a node to run this pod.
  • We didn't provide the deployment parameter, which is equivalent to specifying deployment=meadowrun.Deployment.mirror_local. This tells Meadowrun to copy your local environment and code to the pod instance. Meadowrun detects what kind of environment (conda, pip, or poetry) you're currently in and calls the equivalent of pip freeze to capture the libraries installed in the current environment. The pod will then create a virtualenv/conda environment that matches your local environment and caches the environment (using venv-pack/conda-pack) for reuse in the Google Storage bucket. Meadowrun will also zip up your local code and send it to the pod.
  • Meadowrun then runs the specified function in that environment in the pod and returns the result.
  • The pod can be reused by subsequent jobs. If the pod isn't reused for a few minutes, it will terminate on its own. (There is a reusable_pods parameter on meadowrun.Kubernetes if you prefer to use a pod-per-job model.)

Next steps