2018.02.28
The report includes :
Model comparisontool is a tool to help developer to compare difference between two model from different framework. The tool support Caffe, TF, MXNet, Chainer,pytorch. We mainly compare the variable node shape and functionnode type. We also compare the computation the graph and padding strategy.
We also develop a web model comparison tool for user. User just need upload their models file and see the visually graph comparison result. Below Section 1 is the design detail of the tool and Section 2 is the Web version Detail, final is the Deploy part.
Design Detail
Framework | Input Files | Function file | Detail |
---|---|---|---|
Caffe | train_val.prototxt | load_caffe_model.py | Based on the caffe.proto.caffe_pb2 to read model parameters. |
Tensorflow | model save files | load_tf_model.py | Read computation graph from stored model file. So User should save the graph before. |
Chainer | model class .py files | load_chainer_model.py | Get all functionNode from output. |
MXNet | model json file | load_mxnet_model.py | Read json file and parameters. |
SSD | SSD model | load_ssd_*.py | SSD have special layer. |
Web Model Comparison Tool
For a better User Experience, we develop a website. User just upload the input files and get the visual comparison result and detail parameters. The system combine Django backend and Bootstrap front end. Using D3.js to visualize the model graph. All source code in the web-model-comparison-tool folder.
The website have two pages. In page 1, user upload their needing. Page 2 show the Graph result.
Deploy Method
Our website verison deploy in developer01 server. The root username modeltools, password: abc110.
ssh to the web server.
xxxxxxxxxx
ssh modeltools@developer01
cd to work folder.
xxxxxxxxxx
cd ~/web-model-comparison-tool/model_comparison_tool/
Start the uwsgi server.
xxxxxxxxxx
uwsgi —socket /tmp2/model_comparison_tool.sock —module model_comparison_tool.wsgi —chmod-socket=777 —uid=www-data —gid=www-data
Restart the nginx service.
xxxxxxxxxx
sudo /etc/init.d/nginx restart
Now you can browser the site: server local IP:8000 to use our tool.
If wanna know more, please click on HELP.
In Chainermn master, multi use the blocking to implement multi node trainning. In Chainer, we use data parallelism as multi node practices. The weight update like the figure below.
Blocking communication, every iteration do allreduce together. But Non-blocking do weight all reduce every layer. So we can hide communication with omputation. Just like below.
Our code based on the intel chainer master_v3 branch. All the diff in the patch files.
I stored backup files in developer01 server. Model comparison tool is in ~/MCT folder. Non-blocking-chainer is in ./NBC folder. Our tool git also in dl_framework-dl_tools repo.
A Desktop in mine. NUM is: BB26468.