Data transfers
File transfer
The center provides several means of transferring data to and from its various resources.
- On Unix-like OSes (for instance Linux or Mac OS X) use the scp or rsync commands.
- On Windows, several clients exist. For example PSCP or WinSCP
Note
SFTP, which is used for example by WinSCP, is disabled for security reasons on European and FR login nodes.
Call the transfer routines from your local machine (local_host). Here are the different ways to copy data from or to the supercomputer login node (remote_host). The remote host depends on the type of project. It is the one used for SSH connections (see Interactive access for more details). Hereafter, remote_dir
represents a valid directory on the remote host.
Note
- If you search […] you can look at the Data spaces chapter, at the Data management section, and at the Parallel IO chapter, at the MpiFileUtils sections.
scp
scp copies files between hosts on a network. It uses ssh for data transfer, with the same authentication and the same security as ssh.
To transfer data from local machine to remote machine:
$ scp [options] <local_files> <login>@<remote_host>:<remote_dir>
To transfer data from remote machine to local machine:
$ scp [options] <login>@<remote_host>:<remote_dir>/<files> <local_dir>
Basic options are -v
for verbose mode and -r
to copy directories. For more information, type man scp from the command line.
rsync
rsync synchronizes two sets of files across a network. It sends only the differences between the source files and the destination files.
To transfer data from local machine to remote machine:
$ rsync -e ssh -avz <local_files> <login>@<remote_host>:<remote_dir>
To transfer data from remote machine to local machine:
$ rsync -e ssh -avz <login>@<remote_host>:<remote_dir>/<files> <local_dir>
For more information, type man rsync from the command line.
Note
Transferring data from or to the supercomputer may be more efficient when using archives instead of many small files.
sftp
sftp connects on a remote host, then can transfer files on both directions. It uses ssh for data transfer, with the same authentication and the same security as ssh.
To connect on a remote host:
$ sftp [options] <login>@<remote_host>
An usual option is -r
to copy directories. For more information, type man sftp from the command line.
Once logged in, to transfer data from the remote host to the local host:
$ sftp> get [options] <remote_path> <local_path>
And to transfer data from the local host to the remote host:
$ sftp> put [options] <local_path> <remote_path>
If you haven’t used the -r
option with the sftp command, you can use it directly with get and put. For more information, type help from the sftp prompt, or man sftp from the command line.
parallel sftp
To speed up file transfer, you can use psftp, a tool developed by CEA. psftp uses parallel sftp to make file transfers. You can install it locally by following the dedicated website: https://github.com/cea-hpc/openssh-portable
Note
- psftp is not tested on Windows and MacOS.
- psftp is only useful when your transfer overuses a CPU on one of the hosts (local or remote computer). On a modern processor, it happens when the transfer bandwidth reaches ± 1Gbps (please double check if the network link between both nodes can use more than 1Gbps of bandwitdh).
psftp is installed on login nodes of CEA clusters. Its usage is identical as sftp, except the option -n
which let you choose the number of ssh connections used for the parallel transfer.
For example, to make a parallel transfer with 5 ssh connections:
$ psftp -n 5 <login>@<remote_host>
If one sftp transfer is limited at 1Gbps, this transfer will use at most 5Gbps.
Note
If you search for handling files within the cluster, you can look at the Parallel IO chapter and the MpiFileUtils section.
CCFR infrastructure
Warning
Please refer to internal technical documentation to get information about this subject.
PRACE infrastructure
Warning
Please refer to internal technical documentation to get information about this subject.