Zhe Work Report

2018.02.28

The report includes :

  1. model comparison tool.
  2. Non-blocking multi node communication in chainer master v3.
  3. Source code in developer01.
  4. Label Device

Model Comparison Tool

  Model comparisontool is a tool to help developer to compare difference between two model from different framework. The tool support Caffe, TF, MXNet, Chainer,pytorch. We mainly compare the variable node shape and functionnode type. We also compare the computation the graph and padding strategy.

  We also develop a web model comparison tool for user. User just need upload their models file and see the visually graph comparison result. Below Section 1 is the design detail of the tool and Section 2 is the Web version Detail, final is the Deploy part.

  1. Design Detail

    • The whole tool based on the architecture as below.

    archtecture

    • In our tool, firstlly loading the graph from different framework. You can find the method to load function responding to different framework as the Table below.
    FrameworkInput FilesFunction fileDetail
    Caffetrain_val.prototxtload_caffe_model.pyBased on the caffe.proto.caffe_pb2 to read model parameters.
    Tensorflowmodel save filesload_tf_model.pyRead computation graph from stored model file. So User should save the graph before.
    Chainermodel class .py filesload_chainer_model.pyGet all functionNode from output.
    MXNet model json fileload_mxnet_model.pyRead json file and parameters.
    SSDSSD modelload_ssd_*.pySSD have special layer.
    • Convert to unified data format and add different pdding strategies, as the Figure below. Later version, we add pw, ph, sh, sw, kh, kw params. source code in rank_multi_port.py.

    representation

    • Graph comparison is a difficult problem. So we transfer graph to an ordered list and find all difference. Source code in rank_multi_port.py.
  2. Web Model Comparison Tool

    For a better User Experience, we develop a website. User just upload the input files and get the visual comparison result and detail parameters. The system combine Django backend and Bootstrap front end. Using D3.js to visualize the model graph. All source code in the web-model-comparison-tool folder.

    comparison

    The website have two pages. In page 1, user upload their needing. Page 2 show the Graph result.

    upload

    result

 

  1. Deploy Method

    Our website verison deploy in developer01 server. The root username modeltools, password: abc110.

    1. ssh to the web server.

       
    2. cd to work folder.

       
    3. Start the uwsgi server.

       
    4. Restart the nginx service.

       
    5. Now you can browser the site: server local IP:8000 to use our tool.

    If wanna know more, please click on HELP.

     

Non-blocking Multi Node In Chainer Master V3

  In Chainermn master, multi use the blocking to implement multi node trainning. In Chainer, we use data parallelism as multi node practices. The weight update like the figure below.

	data_paralism

Blocking communication, every iteration do allreduce together. But Non-blocking do weight all reduce every layer. So we can hide communication with omputation. Just like below.

Non-blocking

Our code based on the intel chainer master_v3 branch. All the diff in the patch files.

Source code in developer01

  I stored backup files in developer01 server. Model comparison tool is in ~/MCT folder. Non-blocking-chainer is in ./NBC folder. Our tool git also in dl_framework-dl_tools repo.

Device

A Desktop in mine. NUM is: BB26468.