Introduction¶
MorphoCut Library can be used to process thousands of images almost like you would process a single image. It was created out of the need to process large collections of images, but is also able to treat other data types.
MorphoCut is data-type agnostic, modular, and easily parallelizable.
Writing a MorphoCut program¶
First, a Pipeline
is defined that
contains all operations that should be carried out on the
objects of the stream.
These operations are then applied to a whole stream of images.
MorphoCut allows concise defititions of heavily nested image processing pipelines:
import os.path
from morphocut import Call, Pipeline
from morphocut.contrib.ecotaxa import EcotaxaWriter
from morphocut.contrib.zooprocess import CalculateZooProcessFeatures
from morphocut.file import Glob
from morphocut.image import FindRegions, ImageReader
from morphocut.parallel import ParallelPipeline
from morphocut.str import Format
from morphocut.stream import Enumerate, Unpack
# First, a Pipeline is defined that contains all operations
# that should be carried out on the objects of the stream.
with Pipeline() as p:
# Corresponds to `for base_path in ["/path/a", "/path/b", "/path/c"]:`
base_path = Unpack(["/path/a", "/path/b", "/path/c"])
# Number the objects in the stream
running_number = Enumerate()
# Call calls regular Python functions.
# Here, a subpath is appended to base_path.
pattern = Call(os.path.join, base_path, "subpath/to/input/files/*.jpg")
# Corresponds to `for path in glob(pattern):`
path = Glob(pattern)
# Remove path and extension from the filename
source_basename = Call(lambda x: os.path.splitext(os.path.basename(x))[0], path)
with ParallelPipeline():
# The following operations are distributed among multiple
# worker processes to speed up the calculations.
# Read the image
image = ImageReader(path)
# Do some thresholding
mask = image < 128
# Find regions in the image
region = FindRegions(mask, image)
# Extract just the object
roi_image = region.intensity_image
# An object is identified by its label
roi_label = region.label
# Calculate a filename for the ROI image:
# "RUNNING_NUMBER-SOURCE_BASENAME-ROI_LABEL"
roi_name = Format(
"{:d}-{}-{:d}.jpg", running_number, source_basename, roi_label
)
meta = CalculateZooProcessFeatures(region, prefix="object_")
# End of parallel execution
# Store results
EcotaxaWriter("archive.zip", (roi_name, roi_image), meta)
# After the Pipeline was defined, it can be executed.
# A stream is created and transformed by the operations
# defined in the Pipeline.
p.run()
While creating the pipeline, everything is just placeholders. In this step, the actions that should be performed are just recorded but not yet applied. The Nodes, therefore, don’t return real values, but identifiers for the values that will later flow through the stream.
Concepts¶
An operation in the Pipeline is called a “Node”. It usually returns one (or multiple) Variables.
These are the Nodes used in this example:
Stream Unpack values from a collection into the |
|
Enumerate objects in the |
|
Call a function with the supplied parameters. |
|
Stream Find files matching |
|
Parallel processing of the stream in multiple processes. |
|
Read and open the image from a given path. |
|
Stream Find regions in a mask and calculate properties. |
|
Format strings using |
|
Calculate descriptive features similar to ZooProcess using |
|
Create an archive of images and metadata that is importable to EcoTaxa. |
Note
Nodes that change the stream are labeled with “Stream”.
Unpack
, Glob
and FindRegions
all introduce new objects into the stream.
Traditionally, this would be written using nested for-loops.
MorphoCut, on the other hand, applies a sequence of processing steps (Nodes) which allows for easy parallelization and nicely decouples the individual steps in the pipeline.