HIGH SPEED “BIG DATA” IMAGERY PROCESSING

The OSU Plankton Ecology lab works closely with the Center for Genome Research and Biocomputing (CGRB) to develop and continually refining the pipeline for automatically processing the very large volume of plankton imagery collected by the ISIIS. Together with collaborator Chris Sullivan at CGRB, we are improving our deep learning pipeline for image analysis and building high-throughput databases that allow easy access to our billions of plankton images.

While we have access to high-performance computing machines at CGRB (CPU machines as well as K80, P100, V100 GPU machines), we also work with the NSF funded Extreme Science and Engineering Discovery Environment (XSEDE). Our XSEDE grant (46,600 GPU hours) provides access to many more GPU machines and to process plankton images on XSEDEs P100 GPU nodes (1x node houses 4x P100 GPUs). While our current pipeline can process 6 million single images of plankton per hour per node, we routinely split our jobs over 5 XSEDE nodes, enabling us to process up to 30 million images per hour. Given that a 10-day ISIIS cruise can yield upwards of a billion plankton images, this speed is an important milestone on our goal to the real-time classification of imagery.

Undergraduate students working with us at CGRB:

  • Dominic Daprano: Improving the deep learning aspects of the pipeline.
  • Giovanni Petroni: Working on high throughput databases adhering to DarwinCore principles.

Past undergraduate students include:

  • Kyler Jacobson
  • Michaela Buchanan