simulating execution time of tensor programs using gnn

1. Introduction

tensorflow, pytorch와 같은 DL framework들은 computaional graph representaion을 optimize한다

하지만 이들은 hardware-specific한 operator-level transformation을 다루지는 않는다. 최근 TVM과 같은 compiler framework가 graph-level and operator-level의 optimization을 지원하면서 그 간극을 매우고 있다.

이 논문은 tensor operators의 configuration을 abstract syntax tree (약자 AST, compiler theory 용어)로 표현하는 방법을 제시하고 TVM을 사용해서 node의 feature를 추출하는 방식을 제시한다.

각 node는 shared encoder에 의해 변형되고 graph convolution network가 node간의 information을 전파하는데 사용된다.

마지막으로 모든 node들은 aggregated and a prediction y is made

learnable graph data transformation방식이 기존의 fixed feature extractor, context relation feature (추후 논문 리뷰할 것)방식보다 낫다고 한다.

2. Datasets

The dataset contains 6,852 configurations

datasets은 execution time의 분포가 normal distribution과 don't match, but Gaussian Mixture와 닮았다.

이는 2가지 challenge 시사하는데 첫번째로는 simulator가 모든 mode의 representation을 학습할 필요가 있다는 점과

두번째로는 low probability의 component를 일반화하는 것은 어렵다는 점이다. (두번째는 왜 저렇게 시사되는 지는 모르겠다. 확률이론을 다시 공부해봐야 겠다.)

3. Experiments

실험에 사용된 network class는 aggregation을 하기 전에 노드 간에 정보를 propagation하지 않는 MLP,

propagation정보가 추가된 GCN모델이다. 두 네트워크는 node type을 32차원 백터를 활용해서 embedding시킨다. 그리고 이 embedding 벡터를 node feature에 concat시킨다. 모델 비교를 위해 Introduction에서 언급한 context relation feature와 비교한다. surrogate model의 cross-workload generation capability을 평가하는데 목표를 둔다.

4. Results and Final Remarks

GCN 기반이 아닌 모델에 대비하여 test data에 일반화가 잘 일어나는 지 관찰했는데, MLP는 aggregation 이후에 nodes간에 정보 share만 할 수 있었기 때문이다. 이는 알려지지 않은 data에 대해 일반화가 잘 일어남을 나타내고 발견된 representation이 transfer learning에도 도움이 될 수 있음을 알 수 있다고 한다. (GCN의 propagation이 transfer learning과 무슨 상관이 있는지 이 paper에 있는 reference를 좀 더 봐가면서 알아봐야 겠다.) 그리고 이 paper에서 제시된 dataset과 task가 tensor programs learning 문제를 제시할 수 있었다.

5. My View

포항공대에서 쓴 논문인 Metatune : Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks에서 AST를 GCN으로 해결했는데 Metatune논문만 봐서는 이해할 수 없는 내용과 그림들이 일부 있었다. 하지만 이 paper를 리딩하고 그 갈증이 일부 해소되었다. 하지만 여기서도 context relation feature와 같은 baseline을 아직 모르기에 reference를 과거로 더 거슬러 가봐야 될 것 같다. 해당 논문에 대한 구현은 TVM공식 document에도 tutorial형식으로 잘 있다. 내가 아직 Graph기반의 Network가 이론이 약하지만 이 논문에서 설명된 GCN을 조금 더 보완해서 application layer에서 generation성능을 키운다면 compiler학회가 아닌 Neurips와 같은 학회에 도전해 볼 수 있지 않을까 싶다

'Paper Reading > Compiler Optimization' 카테고리의 다른 글

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation (0)	2022.08.29
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance (22’) (0)	2022.08.27
One-shot tuner for deep learning compilers (22' CC) (0)	2022.08.27
ALT : Optimizing Tensor Compilation in DeepLearning Compilers with Active Learning (0)	2022.08.17
Autotuning Algorithmic Choice for Input Sensitivity (0)	2022.08.17

Study of Essence

simulating execution time of tensor programs using gnn

'Paper Reading > Compiler Optimization' 카테고리의 다른 글

티스토리툴바

simulating execution time of tensor programs using gnn

'Paper Reading > Compiler Optimization' 카테고리의 다른 글

'Paper Reading/Compiler Optimization' Related Articles

티스토리툴바