screen scraping with Internet Explorer
2012-Nov-15, Thursday 12:07 pmPowershell can be used with Internet Explorer to "screen scrape" websites. Launching these scripts through the task scheduler can be complicated to setup, especially if the destination url is a secure protocol website. I wanted to avoid making the batch logon a local machine administrator, so I found the following security changes necessary to get the script to work properly.
1) Update the logon authority
1. The end user account should not be an administrator account, just a regular domain (or local) user.
2. Run the local security policy editor (c:\windows\system32\secpol.msc)
3. Navigate to Security Settings / Local Policies / Security Options
4. Click "Log on as a batch job", add the userid or security group
Preferably, use a domain security group designed specifically for this purpose.
2) Update the COM policy
1. Run the DCOM policy editor (c:\windows\system32\dcomcnfg.exe)
2. Navigate to Console Root / Component Services / Computers / My Computer
3. Right-click on My Computer, select Properties
4. Navigate to COM Security tab.
5. Click "Edit Default" for Launch and Activation Permissions
6. Add the userid or group and Allow both "Local Launch" and "Local Activation".
3) Disable user IE enhanced security
1. Run the server manager (Start / Computer / right-click, Manage)
2. Click to select the top entry "Server Manager (servername)"
3. Click link on right panel for "Configure IE ESC"
4. toggle OFF for Users
4) Update the IE settings
1. Run the Internet Options editor (Control Panel / Network and Internet / Internet Options)
2. Navigate to Security tab
3. Make sure you are viewing the "Internet" zone
4. UNcheck (if using IE8 or prior) "Enable Protected Mode".
5. View the "Trusted Sites" zone
6. UNcheck "Enable Protected Mode"
7. click "Sites" button
8. add your destination domain to the list
If you fail to perform these steps, then the powershell variable used to interact with IE will "break". Either IE will fail to launch (possibly due to the choice of default webpage to load on initialization), or powershell will be unable to interact with the browser that launches. OS security features will "break" the handle to the Document Object Model variable in IE, leaving empty values instead.
1) Update the logon authority
1. The end user account should not be an administrator account, just a regular domain (or local) user.
2. Run the local security policy editor (c:\windows\system32\secpol.msc)
3. Navigate to Security Settings / Local Policies / Security Options
4. Click "Log on as a batch job", add the userid or security group
Preferably, use a domain security group designed specifically for this purpose.
2) Update the COM policy
1. Run the DCOM policy editor (c:\windows\system32\dcomcnfg.exe)
2. Navigate to Console Root / Component Services / Computers / My Computer
3. Right-click on My Computer, select Properties
4. Navigate to COM Security tab.
5. Click "Edit Default" for Launch and Activation Permissions
6. Add the userid or group and Allow both "Local Launch" and "Local Activation".
3) Disable user IE enhanced security
1. Run the server manager (Start / Computer / right-click, Manage)
2. Click to select the top entry "Server Manager (servername)"
3. Click link on right panel for "Configure IE ESC"
4. toggle OFF for Users
4) Update the IE settings
1. Run the Internet Options editor (Control Panel / Network and Internet / Internet Options)
2. Navigate to Security tab
3. Make sure you are viewing the "Internet" zone
4. UNcheck (if using IE8 or prior) "Enable Protected Mode".
5. View the "Trusted Sites" zone
6. UNcheck "Enable Protected Mode"
7. click "Sites" button
8. add your destination domain to the list
If you fail to perform these steps, then the powershell variable used to interact with IE will "break". Either IE will fail to launch (possibly due to the choice of default webpage to load on initialization), or powershell will be unable to interact with the browser that launches. OS security features will "break" the handle to the Document Object Model variable in IE, leaving empty values instead.