Chemical Exposure Information Corpus

Chemical Exposure Information Corpus

We are constantly exposed to a large number of chemicals present in food, water, air, dust, soil and consumer products. These chemicals enter our bodies via several routes: ingestion, inhalation and dermal absorption. Many of these chemicals are known or suspected to have toxic effects that can cause disorders and diseases. Chemical risk assessment is the process of evaluating such risks, and includes exposure assessment. Exposure assessment methods include both indirect methods, such as exposure modelling and exposure calculations based on environmental measurements and questionnaire data, and direct measurements, such as human biomonitoring (HBM) and personal monitoring. HBM is the measurement of exposure biomarkers (chemicals or chemical metabolites) and effect biomarkers (indicators of effects caused by chemical exposure) in human body tissues or fluids, such as blood, hair and urine. To assess the total exposure to a chemical and evaluate the importance of different exposure routes. We have annotated a corpus of 3686 scientific publication abstracts with a novel classification taxonomy specific to Exposure assessment. The taxonomy is divided into two main branches: Biomonitoring and Exposure routes.

The Chemical Exposure Information (CEI) Corpus consists of 3661 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 32 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the “labels” directory, while the tokenized text can be found under “text” directory. The filenames are the corresponding PubMed IDs (PMID).

Download the Chemical Exposure Information corpus here

Please cite the following paper:

Text mining for improved exposure assessment