@@ -199,6 +199,7 @@ dev.flashcache.ram3+ram4.pid_expiry_secs = 60
199
199
dev.flashcache.ram3+ram4.max_pids = 100
200
200
dev.flashcache.ram3+ram4.do_pid_expiry = 0
201
201
dev.flashcache.ram3+ram4.io_latency_hist = 0
202
+ dev.flashcache.ram3+ram4.skip_seq_thresh = 0
202
203
203
204
Sysctls for a writeback mode cache :
204
205
cache device /dev/sdb, disk device /dev/cciss/c0d2
@@ -218,6 +219,7 @@ dev.flashcache.sdb+c0d2.dirty_thresh_pct = 20
218
219
dev.flashcache.sdb+c0d2.stop_sync = 0
219
220
dev.flashcache.sdb+c0d2.do_sync = 0
220
221
dev.flashcache.sdb+c0d2.io_latency_hist = 0
222
+ dev.flashcache.sdb+c0d2.skip_seq_thresh = 0
221
223
222
224
Sysctls common to all cache modes :
223
225
@@ -243,13 +245,19 @@ dev.flashcache.<cachedev>.do_pid_expiry:
243
245
Enable expiry on the list of pids in the white/black lists.
244
246
dev.flashcache.<cachedev>.pid_expiry_secs:
245
247
Set the expiry on the pid white/black lists.
248
+ dev.flashcache.<cachedev>.skip_seq_thresh:
249
+ Skip (don't cache) sequential IO larger than this number (in kb).
250
+ 0 (default) means cache all IO, both sequential and random.
251
+ Sequential IO can only be determined 'after the fact', so
252
+ this much of each sequential I/O will be cached before we skip
253
+ the rest. Does not affect searching for IO in an existing cache.
246
254
247
255
Sysctls for writeback mode only :
248
256
249
257
dev.flashcache.<cachedev>.fallow_delay = 900
250
258
In seconds. Clean dirty blocks that have been "idle" (not
251
- read or written) for fallow_delay seconds. Default is 60
252
- seconds .
259
+ read or written) for fallow_delay seconds. Default is 15
260
+ minutes .
253
261
Setting this to 0 disables idle cleaning completely.
254
262
dev.flashcache.<cachedev>.fallow_clean_speed = 2
255
263
The maximum number of "fallow clean" disk writes per set
@@ -350,13 +358,17 @@ not cache the IO. ELSE,
350
358
2) If the tgid is in the blacklist, don't cache this IO. UNLESS
351
359
3) The particular pid is marked as an exception (and entered in the
352
360
whitelist, which makes the IO cacheable).
361
+ 4) Finally, even if IO is cacheable up to this point, skip sequential IO
362
+ if configured by the sysctl.
353
363
354
364
Conversely, in "cache nothing" mode,
355
365
1) If the pid of the process issuing the IO is in the whitelist,
356
366
cache the IO. ELSE,
357
367
2) If the tgid is in the whitelist, cache this IO. UNLESS
358
368
3) The particular pid is marked as an exception (and entered in the
359
369
blacklist, which makes the IO non-cacheable).
370
+ 4) Anything whitelisted is cached, regardless of sequential or random
371
+ IO.
360
372
361
373
Examples :
362
374
--------
@@ -480,6 +492,34 @@ agsize * agcount ~= V
480
492
481
493
Works just as well as the formula above.
482
494
495
+ Tuning Sequential IO Skipping for better flashcache performance
496
+ ===============================================================
497
+ Skipping sequential IO makes sense in two cases:
498
+ 1) your sequential write speed of your SSD is slower than
499
+ the sequential write speed or read speed of your disk. In
500
+ particular, for implementations with RAID disks (especially
501
+ modes 0, 10 or 5) sequential reads may be very fast. If
502
+ 'cache_all' mode is used, every disk read miss must also be
503
+ written to SSD. If you notice slower sequential reads and writes
504
+ after enabling flashcache, this is likely your problem.
505
+ 2) Your 'resident set' of disk blocks that you want cached, i.e.
506
+ those that you would hope to keep in cache, is smaller
507
+ than the size of your SSD. You can check this by monitoring
508
+ how quick your cache fills up ('dmsetup table'). If this
509
+ is the case, it makes sense to prioritize caching of random IO,
510
+ since SSD performance vastly exceeds disk performance for
511
+ random IO, but is typically not much better for sequential IO.
512
+
513
+ In the above cases, start with a high value (say 1024k) for
514
+ sysctl dev.flashcache.<device>.skip_seq_thresh, so only the
515
+ largest sequential IOs are skipped, and gradually reduce
516
+ if benchmarks show it's helping. Don't leave it set to a very
517
+ high value, return it to 0 (the default), since there is some
518
+ overhead in categorizing IO as random or sequential.
519
+
520
+ If neither of the above hold, continue to cache all IO,
521
+ (the default) you will likely benefit from it.
522
+
483
523
484
524
Further Information
485
525
===================
0 commit comments