Scrape Data From Local Web Files
WebSundew allows to extract data from local HTML files. You may create Agent or Extractor. In this walkthrough we will use Agent. If you are not familiar with Agent, Extractor and other concepts you may read about them here
If you did not download and install WebSundew you can find information here and follow instructions. If you downloaded and installed WebSundew start it by clicking on desktop icon or from OS menu.
Step 1 - Create New Project
Click New Project in the application toolbar.
Step 2 - Create New Agent
Click New Agent in the application toolbar.
New agent dialog will appear:
Select Local Files. The agent's start up mode will change. Select folder with target HTML files. You can add several
folders to process, just click Add Folder. Also you can configure file filter Allowed Files to include only files
that match pattern. In this case it will be files with html
or htm
extension.
You may preview collected files by clicking Preview. Click Finihs to complete creating the agent. The agent's editor will open. The content of the first HTML file will be available in the browser part of the agent editor.
Other Steps
You configured agent that searches HTML files in the folders you provided. Now you need to capture and export extracted data. These steps depend on the HTML file structure and require export format. You can read more about capture and about export. Also you can read our tutorials.
Edit Agent
You can modify folders properties after you created the agent:
- You need to open Agent for editing.
- Select Loop in the agent's graph:
- Open Properties View and modify folders.