PubChem Assay Downloads
preprocessed for KNIME and CACTVS
with complete data and structures

What is this about? The page in the frame below links to PubChem assay tables which have been preprocessed for use with two popular data analysis systems. The files contain both the complete assay results and the associated structures.
How are these files generated? This is a pretty simple sample application of the CACTVS Toolkit. If you have the toolkit installed, you do not really need this page. You can directly script a single-line command like
table write [table create $aid] aid$aid.table knime {structures 1}
to fetch any PubChem assay, automatically augment it with structures, and write it out as native KNIME table. The CACTVS Toolkit has extensive PubChem and Entrez interface capabilities via the NCBI PUG and Eutils gateways. Assay retrieval is just a small example of the things you can do. You can even open the full PubChem compound database as a virtual SD-style multi-record file and perform optimized structure queries and downloads on it by simple script commands.
Why should I use these table files, and not directly download the CSV data from PubChem? Two answers: You get the structures bundled, and the columns and their proper data types all nicely encapsulated in a native binary file. For example, in KNIME you can now use its native table reader node to import the file without worrying about complex I/O set-up.
The CACTVS files are notably larger. Why? They contain significant additional infomation which cannot be represented in KNIME tables. The two most important extra components are the complete assay description (as multi-field compound table property T_NCBI_ASSAY_DESCRIPTION), and the normalized PubChem compounds in addition to the deposited substances (as structure object property E_PUBCHEM_COMPOUND of the structures associated with the table).
How do you write binary KNIME tables? The CACTVS toolkit supports I/O of a lot of native table file formats, including those of many well-known statistical packages, both for for input and output. You can read native KNIME output tables for processing in CACTVS, too. Some of these formats are reasonably well documented, others were (like KNIME) more or less reverse engineered. The CACTVS KNIME table I/O module does not use any original KNIME source code.
I need an assay not listed below! Enter the desired assay ID below, and we queue it. If you give us an email, the software notifes you when it's done. Currently we do not guarantee reponse times - if it is really urgent, download the toolkit and do it yourself. A supplied notification email ist not permanently stored or used for any other purpose. Come back later, reload the frame listing below, and start your download.
  AID: Your email (optional):
Click to display 20K AID sets starting with AID: 1 10001 20001 30001 40001 50001 60001 70001 80001 90001 100001 110001 120001 130001 140001 150001 160001 170001 180001 190001 200001 210001 220001 230001 240001 250001 260001 270001 280001 290001 300001 310001 320001 330001 340001 350001 360001 370001 380001 390001 400001 410001 420001 430001 440001 450001 460001 470001 480001 490001 500001 510001