The Computer Oracle

How to extract a complete list of extension types within a directory?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: City Beneath the Waves Looping

--

Chapters
00:00 How To Extract A Complete List Of Extension Types Within A Directory?
00:30 Accepted Answer Score 40
02:23 Answer 2 Score 38
03:38 Answer 3 Score 4
03:58 Answer 4 Score 4
04:13 Answer 5 Score 0
04:59 Thank you

--

Full question
https://superuser.com/questions/397943/h...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#windowsxp #script #shellscript #batchfile #fileextension

#avk47



ACCEPTED ANSWER

Score 40


This batch script will do it.

@echo off

set target=%~1
if "%target%"=="" set target=%cd%

setlocal EnableDelayedExpansion

set LF=^


rem Previous two lines deliberately left blank for LF to work.

for /f "tokens=*" %%i in ('dir /b /s /a:-d "%target%"') do (
    set ext=%%~xi
    if "!ext!"=="" set ext=FileWithNoExtension
    echo !extlist! | find "!ext!:" > nul
    if not !ERRORLEVEL! == 0 set extlist=!extlist!!ext!:
)

echo %extlist::=!LF!%

endlocal

Save it as any .bat file, and run it with the command batchfile (substitute whatever you named it) to list the current directory, or specify a path with batchfile "path". It will search all subdirectories.

If you want to export to a file, use batchfile >filename.txt (or batchfile "path" >filename.txt).

Explanation

Everything before the for /f... line just sets things up: it gets the target directory to search, enables delayed expansion which lets me do update variables in the loop and defines a newline (LF) that I can use for neater output. Oh, and the %~1 means "get the first argument, removing quotes" which prevents doubled-up quotes - see for /?.

The loop uses that dir /b /s /a:-d "%target%" command, grabbing a list of all files in all subdirectories under the target.

%%~xi extracts the extension out of the full paths the dir command returns.

An empty extension is replaced with "FileWithNoExtension", so you know there is such a file - if I added an empty line instead, it's not quite as obvious.

The whole current list if sent through a find command, to ensure uniqueness. The text output of the find command is sent to nul, essentially a black hole - we don't want it. Since we always append a : at the end of the list, we should also make sure the search query ends with a : so it doesn't match partial results - see comments.

%ERRORLEVEL% is set by the find command, a value of 0 indicates there was a match. So if it's not 0, the current extension is not on the list so far and should be added.

The echo line basically outputs, and I also replace my placeholders (:) with newlines to make it look nice.




ANSWER 2

Score 38


Although not strictly meeting the requirement for a batch script, I have used a single-line piped PowerShell script:

Get-Childitem C:\MyDirectory -Recurse -File | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt

Where:

  1. Get-ChildItem C:\MyDirectory -Recurse retrieves all files in the directory and subdirectories.
  2. Group Extension -NoElement groups the results by the file extension.
  3. Sort Count -Desc Sorts the results by the number of matching extensions in each group (from most to least).
  4. > FileExtensions.txt pipes the results to the specified file.

You could potentially run it from the command line/batch file:

Powershell -Command "& Get-Childitem C:\MyDirectory -Recurse -File | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt"

If you remove C:\MyDirectory it will execute in the current directory.

Edit 2021-04-20: As per the comment from @ManSamVampire, if you want to find hidden files as well, you should add -Force before -Recurse in the above command.

At the end it will produce a FileExtensions.txt containing something like the following:

+-------+------+
| Count | Name |
+-------+------+
| ----- | ---- |
| 8216  | .xml |
| 4854  | .png |
| 4378  | .dll |
| 3565  | .htm |
| ...   | ...  |
+-------+------+

Depending on your folder structure, you may occasionally get errors notifying you that you have a long path.

Get-ChildItem : The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

Any subdirectories in there will also not be parsed but the results for everything else will still show.

Notes

You will of course need PowerShell which you can grab from here. It can also run on multiple operating systems.




ANSWER 3

Score 4


Here's a detailed answer using PowerShell (with Windows XP you'll have to install PowerShell):

Hey, Scripting Guy! How Can I Use Windows PowerShell to Pick Out the Unique File Extensions Used in a Collection of Files?




ANSWER 4

Score 4


To list all unique extensions from cmd under the path your on use:

Powershell -Command "Get-ChildItem . -Include *.* -Recurse | Select-Object Extension | Sort-Object -Property Extension -Unique"



ANSWER 5

Score 0


I found it useful to change

if "!ext!"=="" set ext=FileWithNoExtension

to

if "!ext!"=="" set ext=.FileWithNoExtension

and to change

echo %extlist::=!LF!%

to

echo %extlist::=!LF!% > ext-list.txt

The generated file contained (no linefeeds, but no matter) .bat.pdf.skp.ai.png.jpg.tif.pcp.txt.lst.ttf.dfont.psd.indd.docx.PDF.JPG.gif.jpeg.dwg.exr.FileWithNoExtension.vrlmap.sat.bak.ctb

which I was then able to use for my project.