FreeNAS: Move NFS home directories to new pool
Moving part of a dataset to a new pool and change the exported NFS directories in bulk.
NFS home directories
In our group every user has his home directory on a FreeNAS server as an extra ZFS dataset. Concurrent use of home folders by some 30 users and molecular dynamics simulation data analysis leads us to a problem: contention of disk IOPs due to concurrent access. Our solution to this is obvious, we just need to move the home directories to another pool. They make up only about 1 TB while the simulation data has grown to 30 TB. Obviously, moving the simulation data to SSD is cost prohibitive, so we chose to move at least the home directories.
In order to do this we need a new case for the hardware as the 16 bay system was already used to the max. We bought a used supermicro rack server with 36 3.5” bays. This should be plenty enough to expand in the future.
Used hardware
The server is a relatively small server:
- SuperMicro X10SDV-4C+-TP4F
- Intel® Xeon® processor D-1518
- 128GB RAM
- 64GB Intel Optane Memory M.2 2280 as SLOG device
- 1.2TB Intel SSD DC P3500 as L2ARC
- 16x 10TB HGST Ultrastar He10 (HUH721010ALE600)
The 16 disks are assembled as one pool with 8 mirrors
Procedure
In principle the procedure is simple:
- transfer disks and mainboard to new case
- create new pool for home directories
- zfs send and receive the home directories to new pools, including snapshots (NFS service stopped)
- rename the exported NFS path
While 1 to 3 where easy enough, point 4 has proven to be not that simple. How to rename 120 exports? FreeNAS recreates the exports from its configuration database. The usual way with sed etc. is obviously impossible. We could manipulate the configuration database directly, which is a somewhat scary thing to do. Another possibility would be to use the well documented FreeNAS API.
In our case we used a small python script to get the share IDs of all home directories and then to rename the nfs_paths attribute to point to the new directories in the pool.
#!/usr/bin/env python
import json
import requests
import subprocess
import os
class Startup(object):
def __init__(self, hostname, user, secret):
self._hostname = hostname
self._user = user
self._secret = secret
self._ep = 'http://%s/api/v1.0' % hostname
def request(self, resource, method='GET', data=None, parameters="limit=1000"):
if data is None:
data = ""
r = requests.request(
method,
'%s/%s/' % (self._ep, resource),
data=json.dumps(data),
headers={'Content-Type': "application/json"},
auth=(self._user, self._secret),
params=parameters
)
if r.ok:
try:
return r.json()
except:
print(r.json())
print(r)
return r.text
raise ValueError(r)
freenas = Startup("nas1.cluster", "root", "secret")
nfs_shares_list = freenas.request("sharing/nfs")
for i in nfs_shares_list:
nfs_path = i["nfs_paths"][0]
id = i["id"]
print(id,nfs_path)
if nfs_path.startswith("/mnt/tank/home/"):
try:
req = freenas.request(
"sharing/nfs/%s/" % id,
method='PUT',
data={"nfs_paths":nfs_path.replace("/mnt/tank/home","/mnt/home")}
)
except ValueError as e:
print("ValueError", nfs_path, str(e))
After 200 days of usage
Here is some data after 200 days of use. The SLOG needs to be replaced soon, it is not meant to be used in enterprise environments. Thus the lifespan is limited and will stop working at around 100% of “Percentage Used”. It is much less expensive than any othe solution and the size is certainly big enough.
SMART data of SLOG:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 0%
Percentage Used: 45%
Data Units Read: 17 [8.70 MB]
Data Units Written: 327,474,485 [167 TB]
Host Read Commands: 682
Host Write Commands: 1,915,934,271
Controller Busy Time: 0
Power Cycles: 4
Power On Hours: 5,031
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged The ZIL log wrote 167TB in 200 days.
SMART data of L2ARC
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 20 Celsius
Available Spare: 98%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 296,248,205 [151 TB]
Data Units Written: 250,591,045 [128 TB]
Host Read Commands: 2,666,780,540
Host Write Commands: 1,957,955,794
Controller Busy Time: 50
Power Cycles: 2
Power On Hours: 5,031
Unsafe Shutdowns: 4
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
Some utilization graphs of the ARC,L2ARC and SLOGs