The Computer Oracle

Sort --parallel isn't parallelizing

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Romantic Lands Beckon

--

Chapters
00:00 Sort --Parallel Isn'T Parallelizing
01:38 Accepted Answer Score 35
02:19 Answer 2 Score 5
02:36 Thank you

--

Full question
https://superuser.com/questions/938558/s...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#linux #cpu #sorting #parallelprocessing

#avk47



ACCEPTED ANSWER

Score 35


sort doesn't create a thread unless it needs to, and for small files it's just too much overhead. Now unfortunately sort treats a pipe like a small file. If you want to feed enough data to 24 threads then you'll need to specify to sort to use a large internal buffer (sort does that automatically when presented with large files). This is something we should improve on upstream (at least in documentation). So you'll want something like:

(export LC_ALL=C; grep -E  <files> | sort -S1G --parallel=24 -u | wc -m)

Note I've set LC_ALL=C for all processes, since they'll all benefit with this data).

BTW you can monitor the sort threads with something like:

watch -n.1 ps -C sort -L -o pcpu



ANSWER 2

Score 5


With parsort you can sort big files faster on a multi-core machine.

On a 48 core machine you should see a speedup of 3x over sort.

parsort is part of GNU Parallel and should be a drop-in replacement for sort.