Sort --parallel isn't parallelizing
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Romantic Lands Beckon
--
Chapters
00:00 Sort --Parallel Isn'T Parallelizing
01:38 Accepted Answer Score 35
02:19 Answer 2 Score 5
02:36 Thank you
--
Full question
https://superuser.com/questions/938558/s...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#linux #cpu #sorting #parallelprocessing
#avk47
ACCEPTED ANSWER
Score 35
sort doesn't create a thread unless it needs to, and for small files it's just too much overhead. Now unfortunately sort treats a pipe like a small file. If you want to feed enough data to 24 threads then you'll need to specify to sort to use a large internal buffer (sort does that automatically when presented with large files). This is something we should improve on upstream (at least in documentation). So you'll want something like:
(export LC_ALL=C; grep -E <files> | sort -S1G --parallel=24 -u | wc -m)
Note I've set LC_ALL=C for all processes, since they'll all benefit with this data).
BTW you can monitor the sort threads with something like:
watch -n.1 ps -C sort -L -o pcpu
ANSWER 2
Score 5
With parsort
you can sort big files faster on a multi-core machine.
On a 48 core machine you should see a speedup of 3x over sort.
parsort
is part of GNU Parallel and should be a drop-in replacement for sort
.