Deep Learning Model Fast Serving

Development/for Machine Learning

Deep Learning Model Fast Serving

IMCOMKING 2020. 4. 28. 18:55

Fast Model Inference

TensorFlow Lite

빠른 inference를 위한 tensorflow버전. 학습에 필요한 operation이 전부 삭제되어 있다.

- TFLite is for mobile devices, works on CPU and a few mobile GPUs, Plus TPUs. Including the Edge TPUs.

Pytorch Mobile

Pytorch에서도 TF Lite와 동일한 목적의 프로젝트로 Pytorch Mobile을 만들고 있다.

https://pytorch.org/mobile/home/

TensorRT

TensorRT는 cuda의 wrapper로써 Nvidia GPU에서만 동작하는 방식이다. TensorFlow에서도 TensorRT를 backend로 지원하고 있는데, 다만 일부 operation의 사용이 불가능한 대신에 학습/인퍼런스의 속도가 매우 빠르다. TensorFlow모델을 읽어서 TensorRT로 변환하여 serving할 수 있다고 한다.

- TensorRT is from Nvidia, *only for Nvidia GPUs*.

https://www.reddit.com/r/MLQuestions/comments/b1rljt/what_is_the_difference_between_tflite_and_tensorrt/

* pytorch모델을 TensorRT로 변환하는 프로젝트도 NVIDIA에서 추진중이다.

https://github.com/NVIDIA-AI-IOT/torch2trt

저작자표시 비영리 변경금지