Processing Documents Using Orchestration Framework
ProjectWise Design Integration Server delivers the following document processors (sometimes also referred to as automated file processors) which you can install and configure and run as needed:
- Full Text Indexing - used to extract text from documents in the datasource, which can then be used to search for documents in the Search dialogs and in the Quick Search bar.
- Thumbnail Image Extractor - used to extract thumbnail images from documents in the datasource, which are then displayed on the Document Properties tab of the Preview Pane in ProjectWise Explorer, and also in integrated applications' document selection dialogs.
- File Property Extractor - used to extra file properties from documents in the datasource, which are then displayed on the File Properties tab of the Document Properties dialog in ProjectWise Explorer
For each document processor you can:
- enable or disable extractions
- schedule document processing to automatically start and run continuously at specified intervals of time
- map unrecognized file type extensions to extensions recognized by the extraction engine
- prevent files of specified file type extensions from being processed
- start an extraction manually, whether a schedule has been defined or not
- force the reprocessing of all documents in a particular folder
How Extractions Work
When an extraction starts, the extraction engine inspects the datasource for documents to process. The first time you run an extraction on a datasource, all documents in the datasource are viable candidates for processing. During document inspection, the extraction engine filters out the documents that will not be processed, based on any extension mapping settings you may have configured, and queues the rest of the documents for processing.
Of the documents queued for processing, the extraction engine sends a set number of documents (this number is set by you) to be processed. When those documents have been processed, the extraction engine sends the next set of documents to be processed, and so on until all queued documents have been processed.
If running extractions through a schedule, this routine occurs until all documents queued for processing have been processed, or until the schedule runs out of time, whichever happens first. If all documents are not processed during a scheduled extraction, the remaining documents will be processed during the next scheduled extraction. If manually starting extractions, the extraction engine will only process up to the number of documents you have set to be processed at a time. The next time an extraction starts, whether by schedule or manually, the extraction engine processes any documents still marked for processing from a previous extraction, and inspects the datasource for any new or updated documents requiring processing, and marks them and processes them accordingly.
If you configure or change any extension mappings between extractions, any documents still queued for processing will be processed taking these new extension mapping rules into account. However, any documents already processed during a previous extraction, and which have not otherwise been updated, will not be identified for processing during the routine inspection pass, and therefore will not take the new extension mapping rules into account. To reprocess documents that have already been processed, you must mark the folders containing them for reprocessing. You can mark as many folders as necessary; once marked, the documents in those folders will remain queued for processing until they have been successfully processed.