メディアパイプウェブテスト

MediaPipe 紹介#

MediaPipe は、Google によって開発され、オープンソースで提供されているマルチメディア機械学習モデルアプリケーションフレームワークです。
以下の表は、MediaPipe がサポートしている機能とプラットフォームです。

	Android	iOS	C++	Python	JS	Coral
顔検出	✅	✅	✅	✅	✅	✅
顔メッシュ	✅	✅	✅	✅	✅
アイリス	✅	✅	✅
手	✅	✅	✅	✅	✅
ポーズ	✅	✅	✅	✅	✅
ホリスティック	✅	✅	✅	✅	✅
セルフィーセグメンテーション	✅	✅	✅	✅	✅
髪セグメンテーション	✅		✅
物体検出	✅	✅	✅			✅
ボックストラッキング	✅	✅	✅
インスタントモーショントラッキング	✅
Objectron	✅		✅	✅	✅
KNIFT	✅
AutoFlip			✅
MediaSequence			✅
YouTube 8M			✅

MediaPipe テスト#

ここでは、Pose の JS API を使用してウェブページでテストを行います。
React-Webcam ライブラリを使用してカメラからビデオを取得し、canvas を使用して認識結果を描画します。

import Webcam from "react-webcam";
import React, { useRef, useEffect, useState } from "react";
import { drawConnectors, drawLandmarks } from "@mediapipe/drawing_utils";
import { Camera } from "@mediapipe/camera_utils";

import { Pose, POSE_CONNECTIONS, POSE_LANDMARKS } from "@mediapipe/pose/pose";

const MPHolistic = () => {
  const webcamRef = useRef(null);
  const canvasRef = useRef(null);

  useEffect(() => {
    const pose = new Pose({
      locateFile: (file) => {
        return `pose/${file}`;
      },
    });
    pose.setOptions({
      modelComplexity: 1,
      smoothLandmarks: true,
      enableSegmentation: true,
      smoothSegmentation: true,
      minDetectionConfidence: 0.5,
      minTrackingConfidence: 0.5,
    });

    pose.onResults(onResults);

    if (
      typeof webcamRef.current !== "undefined" &&
      webcamRef.current !== null
    ) {
      const camera = new Camera(webcamRef.current.video, {
        onFrame: async () => {
          await pose.send({ image: webcamRef.current.video });
          // await holistic.send({ image: webcamRef.current.video })
        },
        width: 1280,
        height: 720,
      });
      camera.start();
    }

  }, []);

  const onResults = async (results) => {
    const videoWidth = webcamRef.current.video.videoWidth;
    const videoHeight = webcamRef.current.video.videoHeight;
    canvasRef.current.width = 1280;
    canvasRef.current.height = 720;

    const canvasElement = canvasRef.current;
    const canvasCtx = canvasElement.getContext("2d");

    canvasCtx.save();
    canvasCtx.clearRect(0, 0, videoWidth, videoHeight);
    canvasCtx.translate(videoWidth, 0)
    canvasCtx.scale(-1, 1)
    canvasCtx.drawImage(
      results.image,
      0,
      0,
      canvasElement.width,
      canvasElement.height
    );


    drawConnectors(canvasCtx, results.poseLandmarks, POSE_CONNECTIONS, { color: "#00FF00", lineWidth: 4 })
    drawLandmarks(canvasCtx, results.poseLandmarks, { color: "#FF0000", lineWidth: 2 })

    canvasCtx.restore();
  };

  const videoConstraints = {
    width: 1280,
    height: 720,
    facingMode: "user",
  };

  return (
    <>
      <div
        style={{
          position: "relative",
          width: "100%",
          height: "100%",
        }}
      >
        <Webcam
          audio={false}
          mirrored={true}
          ref={webcamRef}
          style={{
            position: "absolute",
            marginLeft: "auto",
            marginRight: "auto",
            left: 0,
            right: 0,
            textAlign: "center",
            zindex: 9,
            width: 1280,
            height: 720,
          }}
          videoConstraints={videoConstraints}
        />
        <canvas
          ref={canvasRef}
          style={{
            position: "absolute",
            marginLeft: "auto",
            marginRight: "auto",
            left: 0,
            right: 0,
            textAlign: "center",
            zindex: 9,
            width: 1280,
            height: 720,
          }}
        ></canvas>
      </div>
    </>
  );
};

export default MPHolistic;

効果#

テストのスクリーンショットは以下の通りです。

私はまた、ローカルのウェブページでフレームレートをテストしましたが、おおよそ安定して 100fps 程度で、認識の効果も十分に受け入れられるものでした。