Not that I’m an expert in PowerShell by any means, but here’s my tip of the day … use the -ReadCount parameter with Get-Content!
From the Get-Content page on TechNet …
-ReadCount<Int64>
Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time.
This parameter does not change the content displayed, but it does affect the time it takes to display the content. As the value of ReadCount increases, the time it takes to return the first line increases, but the total time for the operation decreases. This can make a perceptible difference in very large items.
The line in bold above for me was pure gold! Right now I am working with parsing very large #FIM2010/#MIM2016 audit drop files to check change thresholds, and PowerShell is my tool of choice, not just because it integrates natively with MIM Event Broker (this idea will be the subject of another post a bit later).
Example for me just now with a 100Mb XML file:
- 5 mins 17 seconds to load WITHOUT specifying any -ReadCount, with a memory footprint growth to 3.5 Gb
- 39 seconds with -ReadCount 0, and a memory footprint growth of only 750 Mb.
Importantly, be aware that this setting is not appropriate when your script is still under construction, when the value should still be 1 (default). In my case I found that stepping through my script in debug with this set to 0 resulted in painful delays reloading variable XML variables. I am now thinking of using a $debug variable to toggle between 0 and 1 as appropriate – it should definitely be 0 once the script has been deployed.
So when loading large XML files in future I will certainly be using -ReadCount 0 unless I have a good reason not to – one that is worth taking 8 times longer and 5 times the resources :).