|
|
# Network training
|
|
|
# Network Training
|
|
|
|
|
|
## Table of Contents
|
|
|
1.[ Introduction](manual#introduction)
|
... | ... | @@ -7,31 +7,37 @@ |
|
|
|
|
|
3.[ Overview](manual-overview)
|
|
|
|
|
|
4.[Network creation](manual-network)
|
|
|
4.[ Network Creation](manual-network)
|
|
|
|
|
|
5.[Network training](manual-training)
|
|
|
5.[ Network Training](manual-training)
|
|
|
- [Sessions](manual-training#sessions)
|
|
|
- [Input Manager](manual-training#input-manager)
|
|
|
- [Snapshots](manual-training#snapshots)
|
|
|
- [Session Cloning](manual-training#session-cloning)
|
|
|
- [Remote Training](manual-training#remote-training)
|
|
|
- [Input Manager](manual-training#input-manager)
|
|
|
- [Host Manager](manual-training#host-manager)
|
|
|
- [Console](manual-training#console)
|
|
|
- [Plotter](manual-training#plotter)
|
|
|
- [CSV-export](manual-training#csv-export)
|
|
|
- [Weight visualization](manual-training#weight-visualization)
|
|
|
- [CSV-Export](manual-training#csv-export)
|
|
|
- [Weight Visualization](manual-training#weight-visualization)
|
|
|
- [Deployment](manual-training#deployment)
|
|
|
|
|
|
6.[ Miscellaneous](manual-miscellaneous)
|
|
|
|
|
|
## Sessions
|
|
|
|
|
|
A Barista session is a collection of a network topology (as shown in the [Node Editor](manual-network#navigation-in-the-node-editor)) all parameters set for the layers of the network (as shown in the [Layer Properties](manual-network#editing-layer-and-solver-parameters) dock), an optimization/learning method and its hyper parameters (as shown in the Solver Properties dock) and the data used for training and testing the performance (as defined in the [Input Manager](manual-training#input-manager)).
|
|
|
A Barista session is a collection of a network topology (as shown in the [Node Editor](manual-network#navigation-in-the-node-editor)), all parameters set for the layers of the network (as shown in the [Layer Properties](manual-network#editing-layer-and-solver-parameters) dock), an optimization/learning method with its hyper parameters (as shown in the Solver Properties dock) and the data used for training and testing the performance (as defined in the [Input Manager](manual-training#input-manager)).
|
|
|
|
|
|
The Session List gives an overview of all currently defined Sessions:
|
|
|
|
|
|
![SessionList](SessionList.png)
|
|
|
![NewSessionList](NewSessionList.png)
|
|
|
|
|
|
A session can either be local or remote. In local sessions the network is trained on the machine that Barista is running on. In remote sessions the network can be trained on a distant machine, see [Remote Training](manual-training#remote-training).
|
|
|
|
|
|
Every session is represented by one entry in the list. Every item provides basic information about the session status as well as controls. The controls and indicators for a single session item are from top to bottom and left to right:
|
|
|
|
|
|
* **Remote Host** information: For remote sessions, the host name and connected port are displayed
|
|
|
* **Unlink Button**: Closes the connection to the remote session.
|
|
|
* **Session ID**: A running ID of the session, this helps identifying which session a certain log-line in the console belongs to.
|
|
|
* **State Label**: A colored marker indicating the session state. A Session can be in one of the following states:
|
|
|
|
... | ... | @@ -39,26 +45,65 @@ Every session is represented by one entry in the list. Every item provides basic |
|
|
* **Running**: The training process is running.
|
|
|
* **Paused**: The session was paused and can be proceeded.
|
|
|
* **Finished**: The maximum iteration has been reached and the training process is thus finished.
|
|
|
* **Pre-trained**: A copy of a session with pre-trained weights, ready to be trained (see [Session Cloning](manual-training#session-cloning)).
|
|
|
* **Failed**: The session failed with some error. Look in the error console for further information.
|
|
|
* **Invalid**: Baristas internal checks found some faulty properties. More details are provided in an additional label instead of the Progress Bar. Even more Details are given when hovering over the latter label.
|
|
|
* **Not Connected**: A remote session lost its connection to a host. Make sure the remote machine is still running and its network connection is still active.
|
|
|
|
|
|
* **Snapshot Button**: When a session is running, the snapshot button can be used to create an unplanned (i.e. not defined in the solver properties) snapshot. Note: Your caffe version has to support the SIGHUP signal on Linux and Mac OS or SIGBREAK on Windows for this to work.
|
|
|
* **Snapshot Button**: When a session is running, the snapshot button can be used to create an unplanned (i.e. not defined in the solver properties) snapshot. Note: Your caffe version has to support the SIGHUP signal on Linux and Mac OS or SIGBREAK on Windows for this to work. See [Snapshots](manual-training#snapshots)
|
|
|
* **Delete Button**: This will delete a session and all its associated files and folders.
|
|
|
* **Context Menu**: More functionality like cloning and resetting sessions.
|
|
|
* **Play/Pause Button**: Start or Pause the training of a session.
|
|
|
* **Progress Bar**: Displays the iterations that have been trained so far and the maximum iterations as defined in the solver settings.
|
|
|
|
|
|
Remote sessions are not connected automatically, hence on loading a project, they have to be imported from the appropriate remote host, using the host manager.
|
|
|
Remote sessions are not connected automatically, hence on loading a project, they have to be imported from the appropriate remote host, using the [Host Manager](manual-training#host-manager). Connections which have been closed (via Unlink Button) can be reopened again.
|
|
|
|
|
|
Once training of a session has been started, the session can no longer be edited. This ensures that the training results are always in line with the displayed network and settings. If you want to change settings for a network for which training was already started, you can either create a new session which will have the same settings and alter them, or you can clone pre-learned weights from an existing session to a new session using the context menu in the old session. If you know that you do not want to use the trained network state of one session, you could also select the reset option to throw away all training results and treat the session as new (Please note that you will lose all your training results for the selected session).
|
|
|
|
|
|
All files needed to train a network (except databases) and all files created during training are stored in the session folder. This is a sub-folder of the project directory or - for remote sessions - a subfolder of the session directory provided when starting the host. Hence, a session folder can be easily transferred to another machine where training could be resumed even if Barista is not installed.
|
|
|
|
|
|
### Snapshots
|
|
|
|
|
|
A snapshot will be created when you press the snapshot button (camera) or when the session is paused.
|
|
|
When a session is continued, training will start from the last saved snapshot.
|
|
|
|
|
|
### Session Cloning
|
|
|
|
|
|
It is possible to copy learned weights from one session to a new one to start training a network with pre-learned weights.
|
|
|
|
|
|
To do so, click on the context-menu icon (three dots) for the session to be cloned in the session list and select **clone session**. You can then choose which snapshot you want to clone and a new session will be created in that state.
|
|
|
|
|
|
### Remote Training
|
|
|
|
|
|
It is also possible to create remote sessions, so that the training process happens on a headless server that allows the GUI to connect to it as a client.
|
|
|
|
|
|
##### Start a server instance
|
|
|
|
|
|
Run `server.py` to start a new server instance. It accepts optional `--ip` and a `--port` arguments, to specify where to listen for incoming connections. Remote sessions will be stored on the server, therefore you can use the optional `--dir` argument to set the remote location to store the files in. For additional information use the `--help` command.
|
|
|
|
|
|
Alternatively, you can also run `main.py --server`. This will start the client, as well as spawn a server instance listening on `localhost:4200`. The server runs in the background and will automatically be added to the list of remote hosts for the client. The location to store session files in can be set with the optional `--dir` argument and will default to the barista root folder. If the specified folder does not exist, barista exists with an error message.
|
|
|
|
|
|
##### Connect to a server from a client instance
|
|
|
|
|
|
If you know the IP and port of a barista server instance, you can connect to it from the GUI. Start the `main.py` as usual and open the [Host Manager](manual-training#host-manager). Click 'Add new Host` and enter the address. If the server was found it will be labeled 'alive', otherwise 'dead'.
|
|
|
|
|
|
![host_manager_remote](host_manager_remote.png)
|
|
|
|
|
|
##### Working remotely
|
|
|
|
|
|
Once you have a valid remote server added to the list of hosts, you can load remote databases using the [Input Manager](manual-training#input-manager), just as you would do locally. A dialog will pop up, prompting you to select a host to load the database from. In order to train a remote session, the inputs have to be loaded from the same remote host that the session is running on.
|
|
|
|
|
|
![input_manager_remote](input_manager_remote.png)
|
|
|
*Please note: The `Search and add Database` feature currently only works locally.*
|
|
|
|
|
|
To create a remote session, select a remote host from the 'Create Session' dialog. You can work with the session the same way you would work with a local session. Please note, however, that remote sessions will not be reloaded automatically when you reload a saved project. Instead, you have to reimport remote sessions belonging to the current project from the [Host Manager](manual-training#host-manager).
|
|
|
|
|
|
![create_remote_session](create_remote_session.png)
|
|
|
|
|
|
## Input Manager
|
|
|
|
|
|
In order to save storage space, databases containing training and test data are not stored within a project or session folder. However, Barista offers an Input Manager that takes care of managing all your data.
|
|
|
The Input Manager can be accessed via menu bar: **Edit -> Input Manager** or pressing **Ctrl + I**.
|
|
|
In order to save storage space, databases containing training and test data are not stored in a project or session folder. However, Barista offers an Input Manager that takes care of managing all your data.
|
|
|
The Input Manager can be accessed via menu bar: **Tools -> Input Manager**, by pushing the **database sign** or by pressing **Ctrl + I**.
|
|
|
|
|
|
![InputManager](InputManager.png)
|
|
|
|
... | ... | @@ -71,27 +116,57 @@ For every database the Input Manager provides an overview of the contained data |
|
|
|
|
|
For every database a unique ID is calculated by hashing its content. Hence loading the same database multiple times is prevented. Furthermore the same database located on multiple hosts can be identified.
|
|
|
|
|
|
Every entry in the host manager has a number of manipulation options, from left to right the buttons are:
|
|
|
Every entry in the input manager has a number of manipulation options, from left to right the buttons are:
|
|
|
|
|
|
- Edit: Change the Name of the Database. This is only for user support, the database name has no influence on the training.
|
|
|
- Delete: Delete the Database entry. This only deletes the Reference in the Input Manager, but not the referenced files.
|
|
|
- Reload: Reload a database and its content. This can be used if a database became unavailable/dead because a host connection was lost or a network mount failed.
|
|
|
- Move Up/Down: Change the order of the items in the list.
|
|
|
- Assign to layer: Assign the data base as source to one of the available data layers in the network.
|
|
|
- **Edit:** Change the Name of the database. This is only for user support, the database name has no influence on the training.
|
|
|
- **Delete:** Delete the Database entry. This only deletes the reference in the Input Manager, but not the referenced files.
|
|
|
- **Change Location:** Change the path that is associated with this database. This is useful, if the file's location was changed outside of Barista, but you don't want to delete the database entry and add it again.
|
|
|
- **Reload:** Reload a database and its content. This can be used if a database became unavailable/dead because a host connection was lost or a network mount failed. Reloading does not rehash the database. That means, if the file was altered outside of Barista, you get a warning, but the ID won't be updated.
|
|
|
- **Move Up/Down:** Change the order of the items in the list.
|
|
|
- **Assign to layer:** Assign the data base as source to one of the available data layers in the network.
|
|
|
|
|
|
**HDF5** data base files can not directly be set as a source to a caffe HDF5 data layer. Barista supports creation and editing of the necessary txt files, that contain links to the actual HDF5 data files. Like for the databases themselves, HDF5 files can be added to these files one-by-one, as a group or by recursively searching a directory. In each case, the relevant files are hashed again and the ID is updated.
|
|
|
While importing a HDF5-Text-file you will have two additional buttons with which you will be able to edit the .txt-file:
|
|
|
|
|
|
## Snapshots
|
|
|
- **Open File:** The seventh icon from the left will open the selected .txt-file, so you can edit it by hand.
|
|
|
- **Open HDF5 Textfile Editor:** The eighth icon from the left will open the **HDF5 Textfile Editor** for this .txt-file.
|
|
|
|
|
|
A snapshot will be created when you press the snapshot button (camera) or when the session is paused.
|
|
|
When a session is continued, training will start from the last saved snapshot.
|
|
|
### HDF5 Textfile Editor
|
|
|
**HDF5** data base files can not directly be set as a source to a caffe HDF5 data layer. There has to be a .txt-File containing the paths of the needed HDF5 files.
|
|
|
Barista supports creation and editing of the necessary .txt-Files, that contain paths to the actual HDF5 data files. This function can be accessed by clicking on **New HDF5TXT file**.
|
|
|
|
|
|
It is possible to copy learned weights from one session to a new one to start training a network with pre-learned weights.
|
|
|
![hdf5_file_editor](hdf5_file_editor.png)
|
|
|
|
|
|
To do so, click on the context-menu icon (three dots) for the session to be cloned in the session list and select **clone session**. You can then choose which snapshot you want to clone and a new session will be created in that state.
|
|
|
- **Add Line** will add a new, empty line to the list of paths. This is only useful if you want to type a path of a HDF5 file by hand. Empty lines will not be represented in the created .txt-File.
|
|
|
- **Remove Line** will remove a selected line from the list of paths (equal if its empty or not).
|
|
|
- **Add File(s)** will have you select a .h5/.hdf5-File or a directory. If you select a file it will just add the path of the selected file to the list. Selecting a directory will recursively search this directory for .h5/.hdf5-Files and will add them to the list.
|
|
|
- **Save as** will ask you for a name of the .txt-File you want to create and will save it in a location you can select.
|
|
|
- **Cancel** will close the window and no .txt-File will be created.
|
|
|
|
|
|
You can simply edit a path by hand, by double-clicking it in the list.
|
|
|
|
|
|
The relevant files are hashed again and the ID is updated.
|
|
|
|
|
|
## Host Manager
|
|
|
|
|
|
Barista offers a Host Manager that takes care of managing all your hosts. A host is a remote computer on which sessions can be trained (see [Remote Training](manual-training#remote-training)).
|
|
|
The Host Manager can be accessed via menu bar: **Tools -> Host Manager** or pressing **Ctrl + H**.
|
|
|
|
|
|
![HostManager](HostManager.png)
|
|
|
|
|
|
Via **Add new Host**, new host can be connected via file dialog.
|
|
|
|
|
|
Every entry in the host manager has a number of manipulation options, from left to right the buttons are:
|
|
|
|
|
|
- **Edit:** Change the Name of the host. This is only for user support, the host name has no influence on the remote address.
|
|
|
- **Delete:** Delete the connection to the host. This only deletes the Reference in the Host Manager.
|
|
|
- **Refresh:** Refresh the connection to the remote host.
|
|
|
- **Move Up/Down:** Change the order of the items in the list.
|
|
|
- **Set caffe path:** Opens a dialog to change the root path of the caffe version.
|
|
|
- **Select hardware:** Change the hardware used for the training. You can choose from all usable GPUs and CPUs of the remote machine.
|
|
|
- **Add session from host:** Connect a session from the remote host to the current project for training.
|
|
|
|
|
|
## Console
|
|
|
The console is the main output of barista. There are different callers, that register to the console and can write custom messages and errors. The messages should keep the user informed about the current state and actions. For example, sessions write the caffe output here, the input manger would write errors if the selected data is faulty, and other barista modules would also complain about errors.
|
|
|
The console is the main output of Barista. There are different callers, that register to the console and can write custom messages and errors. The messages should keep the user informed about the current state and actions. For example, sessions write the caffe output here, the input manger would write errors if the selected data is faulty, and other barista modules would also complain about errors.
|
|
|
The messages are not filtered by default. They can be filtered by their type ("Text" or "Error") and caller. The filter is set by the two drop-down boxes.
|
|
|
If there is no filter set, every line has a prefix to identify the caller. It has the format: ``[HH:MM:ss, Caller] Message``
|
|
|
|
... | ... | @@ -106,7 +181,7 @@ For instance, one can compare the loss rate and the learning rate of a training |
|
|
|
|
|
Furthermore, the user can choose how to plot the data (linear, logarithmic) and to plot it against the time or the number of iterations. Both options can be set in the settings panel.
|
|
|
|
|
|
## CSV-export
|
|
|
### CSV-Export
|
|
|
It is possible to export the plotted data in the CSV-format. Only data which is currently selected for plotting will be written to a CSV-file. The file then contains one table for each session/log file and phase introduced by a comment for identification. To export the plots just click on 'Export as CSV'. The exported file could look similar to the following:
|
|
|
|
|
|
```
|
... | ... | @@ -127,7 +202,7 @@ Iterations; Time; logname2.train.key1; logname2.train.key2; ... |
|
|
...
|
|
|
```
|
|
|
|
|
|
## Weight visualization
|
|
|
## Weight Visualization
|
|
|
The weight plotter can visualize the filters learned in the neural network (currently limited to convolutional layers).
|
|
|
After splitting a convolutional layer into its filters, the matrices for every filter are rendered as a block of grayscale pixels.
|
|
|
Very light pixels represent the weights with the higher parameter values, darker pixels the weights with the lower parameter values.
|
... | ... | |