Thursday, March 29, 2012

Chromatography

Chromatography lecture and my learning diaries

The chromatography is a collective terms for separation of mixtures techniques. The sample (Mixtures) is dissolved in the mobile phase, which carried it through the stationary phase. The separation was achieved by the difference of travel time/speeds. Several chromatography methods have been developed during the past time including paper chromatography, gas chromatography, and high performance liquid chromatography (HPLC). In this lecture, the teacher mainly introduced the principle of HPLC.
HPLC is a technique to separate a mixture of compounds in analytical chemistry and biochemistry with the purpose of identifying, quantifying and purifying the individual component. The instruments consists of PUMP (moves the mobile phase and sample through the column), injector (add samples), column and detector.
Different liquid chromatography including gel filtration, ion exchange, affinity and reversed phase chromatography are utilized according to the protein/peptide properties including size, net surface charge, hydrophobicity, respectively. Different methods can be combined according to the purpose, but the most important is keep it simple because quite a lot of samples are lost during the process.

Introduction to MS

Mass spectrometry (MS) can identify chemical composition of a sample based on the mass-to-charge ratio of charged particles. The instrument mainly contains three parts including ion source, mass analyzer and detector. The samples are ionized in the ion source using chemical or electron modes. Ions from the ion source were separated according to the m/z ratios in the analyzer part.
Two biological mass spectrometries were introduced during the lecture: Matix assisted laser desorption ionization (MALDI) and electrospray ionization (ESI). Both of them belong to soft ionization methods. The irradiated substance is embedded in crystallized matrix in MALDI. In contrast of MALDI, the ionization in ESI is achieved by spraying a solution into an electrical field. MS has been applied in many fields of protein research including protein identification, molecular weight determination, characterisation of post-translation modifications, relative quantification and also protein complex, etc.
It is important to note that the protein needs to be in solution without salts and detergents and should be purified before the MW measurement. For protein identification, the protein needs to be digested into peptide before analyzing by MS. The identification can base on peptide mass finger print (PMF) and MS/MS data from one or more peptides.

Using Perl script to download sequence from database NCBI

最近发现ncbi上可以用perl scripts 下载序列。虽然是perl,但不需要你来输入命令,你只需输入关键词即可,再就是你的电脑安装了perl。
这就是Ebot!!

Ebot is an interactive tool that generates a Perl script that implements an E-utility pipeline. Ebot will guide you step by step in building the pipeline and then will download the Perl script to your computer.

http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi

Sunday, March 4, 2012

Modk-Prototypes for Simultaneous Clustering of Gene Expression Data with Clinical Chemistry and Pathological Evaluations

Overview


The modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster’s prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members.

Reference for Citing


Bushel PR, Wolfinger RD and Gibson G. Simultaneous Clustering of Gene Expression Data with Clinical Chemistry and Pathological Evaluations Reveals Phenotypic Prototypes. BMC Bioinformatics 2006.

Data Types and Format


Gene expression data needs to be formatted (short and wide) in a tab delimited text file with array observations as row values and gene, clinical chemistry and histopathology variables as column values. The first row is the column header, the second row is an integer denoting the data type (1 = gene expression, 2 = clinical chemistry measurement, 3 = histopathology observation). The order of the data in the file should be from data type 3 to 2, to 1 and be within individual groups or blocks.

Limitations


Only one categorical feature value per observation is permitted. A feature can exist as only one type of data. The application is optimized for clustering the samples and identifying phenotypic prototypes from the groups of them, not of the genes. The application is not guaranteed to find the optimal solution for the clustering of the samples, just the assignment of the samples to clusters according to the reduction of an objective function close to the global minimum.

Requirements


Modk-Prototypes is a set of Matlab functions and scripts tested in Matlab version 6.5.X.X R13.X for Windows (2000 and XP). You may encounter problems in other operating systems, platforms and/or other Matlab versions. The applications require the Matlab Statistics Toolbox Version 4.0, the Resampling Stats Toolbox Version 1.0 by Daniel T. Kaplan (Department of Mathematics and Computer Science, Macalester College, St. Paul, Minnesota, USA), the adjusted Rand Index function by Tijl De Bie(February 2003), the Matlab loadcell.m function to load mixed type data and the cell2csv.m function to convert cell arrays to comma separate value formatted files, both available at the Matlab Central File Exchange (File ID 1965 and 7601 respectively). Be sure to set the path of the Toolboxes in Matlab before running the application.

Downloads


Download the Matlab files and a stand-alone executable version of the program (http://www.niehs.nih.gov/research/resources/assets/docs/modkprototypesdistributionzip.zip) (101MB) . You will be required to register as a user of the application in order to gauge the distribution and to keep you informed of updates and revisions. A demo script, ReadMe file and sample data are provided in the distribution to help get you started with using the application. Report bugs, corrections and suggestions to Pierre Bushel .

Info:http://www.niehs.nih.gov/research/resources/software/biostatistics/modk/index.cfm

Saturday, March 3, 2012

Clustering analysis of expression microarray data with Subio Platform and Basic Plug-in fwd

Subio Platform is a free, technology-independent omics data browser and software platform for sharing analysis results. Its integrated visualization tools greatly help handling complex omics data and revealing biological insight.

Basic Plug-ins adds analytical functions which are widely used for microarray data analysis. This movie shows how to use the hierarchical clustering analysis to over-viewing too many genes into clusters.

You can see it more clearly.
http://www.screencast.com/t/N2NjZjA2Nzkt

For more information.
http://www.subio.jp/products/basicplugin



Friday, March 2, 2012

Microarray Analysis with R



This is a short video introducing R as a language and showing some of its capabilities with microarray data.



protein analysis fwd

Again, a lot of information, but better there are some things you can choose than nothing to choose.

Protein analysis:

Molecular Modeling Software fwd

Maybe too many, select the proper one is also challenging!!

modeling software (collected at http://www.mybiosoftware.com).

3DNA 2.0 – Vsualization of Three-Dimensional Nucleic Acid Structures
  • ActiveICM 1.1.6 – PowerPoint & Web Browsers Plugin to Display 3D Modules
  • AlloPathFinder 1.1 – Compute Likely Allosteric Pathways in Proteins
  • AlphaMol 1.0 – Tools for Biomolecular Geometry
  • AMBER 11 – Assisted Model Building with Energy Refinement
  • AmberTools 1.5 – Molecular Dynamics Simulation
  • ANTHEPROT 3D 1.0.162- Molecule Viewer to look at PDB files
  • APBS 1.3 – Evaluat Electrostatic Properties of Nanoscale Biomolecular System
  • ArgusLab 4.0.1 – Molecular Modeling, Graphics & Drug Design Program
  • Ascalaph 1.7.12 – Molecular Modelling Suite
  • AtVol 1.2 – Atomic Volume Calculation
  • AUDocker v1 – GUI for AutoDock Vina
  • Autobondrot 2.0 – Generate Multiple Molecular Conformation
  • AutoDock 4.2.3 / AutoDockTools 1.5.6 – Suite of Automated Docking Tools
  • AutoDock Vina 1.1.2 – Molecular Docking and Virtual Screening Program
  • Autodock/Vina plugin for PyMOL
  • AutoGrow 2.0.4 – Use AutoDock Vina in Protein Inhibitor Design
  • Avogadro 1.0.3 – Molecule Editor & Visualizer
  • AVP 1.3 – Calculate Protein Void Volumes and Packing Quality
  • AxPyMOL 1.0r1 – PowerPoint Plug-In for Embedding 3D Molecular Images & Animations
  • B 1.0alpha – Biomolecular Modeling Package
  • BALLView 2.0-r1 – Molecular Modeling & Visualization
  • Benchware® 3D Explorer 2.6 – 3D Chemical Visualization
  • Bioclipse 2.4 – Life Sciences Workbench
  • Biodesigner 0.75 – Molecular Modeling & Visualization
  • BioEditor 1.6.1 – Present Macromolecular Structure & Structural Annotation
  • BioViewer 1.5.7 – Read only version of BioEditor
  • Biskit 2.3.1 – Python Platform for Structural Bioinformatics
  • BndLst 1.6 – List Covalent & H-bonded Neighboring Atoms
  • C2A 1.0 – Coarse to Atomic
  • CCOMP 3.70 – Compare Ligand/Receptor Complexes
  • CHARMM 36 – Macromolecular Dynamics and Mechanics
  • ChemCraft 1.6 – Graphical Program for working with Quantum Chemistry Computation
  • Chemis3D 2.89b – Java 3D Molecular Viewer Applet
  • Chemitorium 3.5 – Molecule Editor & 3D Chemical Structure Viewer
  • Chime 2.6SP8 – Display 2D / 3D Molecules directly in Web Pages
  • ClashList 1.1 – Build Lists of van der Waals Clashes from PDB file
  • ClashScore 1.1 – R Script for VTF Percentile Plot
  • CLICK – Comparison of Biomolecular 3D Structures
  • Cluster 1.3 – Build Collections of Interacting Items
  • CN3D 4.3 – 3D Molecular Structure Viewer
  • CompuCell3D 3.6.0 – 3D Multiscale Multi-cell Simulations
  • Concoord 2.1 – Protein Structure Generation from Distance Constraint
  • CONSCRIPT – Generate Electron Density Isosurfaces in Protein Crystallography
  • Coot 0.6.2 – Macromolecular Model Building Tool
  • COSMOS 5.0 / COSMOS Viewer 3.0 – Computer Simulation & Visualisation of Molecular Structures
  • CueMol 2.0.1.161 – Macromolecular Structure Visualization
  • Dang 1.8 – Read PDB File & Generate Geometric Measurement Table
  • Dangle 0.63 – Read PDB File & Generate Geometric Measurement Table
  • DeepView 4.04 – Analyze Several Proteins 3D Structure at the Same Time
  • Desmond 2.4 – High-speed Molecular Dynamics Simulation
  • DINO 0.9.4 – Structural Biology Data 3D Visualization
  • DireX 0.5 – Low-resolution Structure Refinement
  • DOCK 6.4 – Docking Molecules to each other
  • DS Visualizer 3.1 & ActiveX Control 3.1 – Molecular Visualization
  • DTMM 4.2 – molecular modelling program
  • EDTSurf – Quick and Accurate Construction of Macromolecular Surfaces
  • EGO VIII – Molecular Dynamics Simulation
  • eMovie 1.04 – Make Molecular Movies
  • Facio 15.1.1 – 3D-Graphics program for Molecular Modeling and Visualization
  • FEATURE 2.0 – Examine Biological Structures
  • FiltRest3D – Filtering Protein Models by Fuzzy Restraints
  • FINDSITE 1.0 – Ligand-binding Site Prediction & Functional Annotation
  • FINDSITE-LHM 1.0 – Homology Modeling Approach to Flexible Ligand Docking
  • Flex-EM – Fitting and Refinement of Atomic Structures
  • FlexS 2.0.0 – Predict Ligand Superpositions
  • Flipkin 2.4 – Script to Make the Kinemages
  • FMA 0901 – Protein Functional Mode Analysis
  • FREEHELIX 98 – Analyze DNA bending
  • FRETsg 1.0 – Structure Building from Multiple FRET Distances
  • Friend 2.0 – Multiple Structure Visualization & Multiple Sequence Alignment
  • FTDock 2.0/ RPScore /MultiDock 1.0 – Protein Molecule 3D-Dock Suite
  • g0penMol 3.0 – Molecules Visualization & Analysis
  • Gabedit 2.4.0 – Graphical User Interface to Computational Chemistry Packages
  • GAP 1.2.14 – Geometric Analysis of Proteins
  • GDIS 0.90 – Visualization Program for Molecular and Periodic Systems
  • Ghemical 2.99.2 – Molecular Modeling and Editing Package for GNOME
  • GPGPUFRAGFOLD 0.1 – CUDA Fragment Assembly Based Protein Structure Prediction
  • Graphite-MicroMégas – Model in 3D Assemblies of Proteins and DNA
  • GROMACS 4.5.4 – Molecular Simulation
  • Gromita 1.06 – GUI for GROMACS
  • g_correlation 1.02 – Generalized Correlation for Biomolecular Dynamics
  • g_permute 1.12 – Permutation-Reduced Phase Space Density Compaction
  • HAAD – Quick and Accurate Hydrogen Atom Addition
  • Hollow 1.1 – Illustration software for Proteins
  • ICM-Browser 3.7 2b – Molecules & sequence alignments Visualization
  • iMol 0.40 – Molecular Visualization Application for Mac OS X
  • iMolview 1.1 – iPhone & iPad App for Browsing Protein, DNA & Drug Molecules in 3D
  • IMP 1.0 – Integrative Modeling Platform
  • ISD 1.1 – Bayesian NMR Structure Calculation
  • ISIM – Simulation of Ions in the Grand Canonical Ensemble
  • ISIM Interface 1.3.2 – Graphical Interface for running the program
  • ISIM
  • Jamberoo 11 – Cross-Platform Molecular Editor & Builder
  • Jimp 2 0.091 – Visualize and Manipulate Molecules
  • Jmol 12.0.50 – Java Viewer for Chemical Structures in 3D
  • JMVS 4 041122 – Java3D Molecular Visualisation System
  • jSim for Gromacs 0.63b – Graphical User Interface for Gromacs
  • JyMOL 1.0 – Java-based Molecular Visualization
  • Kin2Dcont 1.8 & Kin3Dcont 1.12 – Produce Molecule Contour Map
  • KiNG 2.20 – Three Dimensional Vector Graphics
  • KinImmerse 0.5 – Translate Kinemage Files into Software for Virtual Environment
  • LGscore/LGscore2 2.0 – Measure Quality of Protein Model
  • LifeExplorer 20100108 – 3D Navigation Tool for Cells
  • LigandScout 3.02 – Pharmacophore 3D Modeling
  • LoopTK 2.0.1 – Protein Loop Kinematic Toolkit
  • lrrr 1.4 beta1 – Determines Ligands on the Surface of Proteins
  • Mage 6.47 – Kinemage File 3D Display
  • Maptools 1.0 – Deal with Experimental (X-ray, EM) 3D Maps
  • MapVol 1.1 – Awk Script to Assign Volume by Atom
  • MaSK 1.3.0 – Molecular Modeling and Simulation Kit
  • MD Morphing 1.0 – Perform Molecular Dynamics Morphing Simulations
  • MDynaMix 5.2 – Molecular Dynamics Program
  • MetaTASSER – Protein Structure Prediction tool
  • MGLTools 1.5.6RC2 – Visualization & Analysis of Molecular Structures
  • MINT 3.2 – User Interface to Modeller
  • MMB 2.4 – Model the Structure and Dynamics of Macromolecules
  • mmPDBViewer 2009.3.20.4 – Protein Data Bank Viewer
  • MMPRO 0.7 – Molecule Visualization & Analysis Program
  • MMTK 2.7.4 – The Molecular Modelling Toolkit
  • MMTSB toolset – Multiscale Modeling Tools for Structural Biology
  • MMV 2.2.0 – Visualization of Molecules
  • ModeHunter 1.1 – Normal Mode Analysis of Coarse Grained Elastic Networks
  • MODELLER 9.10 – Comparative Protein Structure Modeling
  • Models@Home 4.5 – Distributed Computing Software for Protein Modeling
  • ModeRNA 1.6 – Comparative RNA 3D Modeling
  • ModPipe 2.2.0 – Calculate Protein Structure Model
  • ModRefiner 20111024 – High-resolution Protein Structure Refinement
  • ModView 0.903 – Visualization of Multiple Protein Sequences & Structures
  • MOIL 12.0.3671 – Molecular Modeling Software
  • Móilín 2011 – Molecular Modelling Software
  • Mol2Mol 5.6.3 – Molecule File Manipulation & Conversion
  • MOLA – System for Virtual Screening using AutoDock4/Vina on Computer Clusters
  • MolIDE 1.7 – Protein 3D Homology Modeling
  • MolPOV 2.0.8 – PDB to POV File Converter & Visualizer
  • MolScript 2.1.2 – Display Molecular 3D Structures
  • MolTalk 3.0.1 – Computational Environment for Structural Bioinformatics
  • MoluCAD 1.034 – Molecular Modeling & Visualization Tool
  • MoSART pr – NMR-based Biomolecular Structure Computation
  • MSMExplorer 0.02 – Visualization Application for Markov State Models for Folding
  • MVP/MVP-Fit 2.0 – Macromolecular Visualization and Processing
  • NAContacts 2.5 – Write Contact Information between Nucleic Acid Bases
  • NAST 1.0 – Nucleic Acid Simulation Tool
  • NMFF – Normal Mode Flexible Fitting
  • NOC 3.01 – Molecular Explorer for Protein Structure Visualization
  • OB Score 1.0 – Structural Genomics Target Ranking
  • OpenAstexViewer 3.0 – Software for Molecular Visualisation
  • OpenMM 3.1.1 – Library for Molecular Modeling Simulation
  • OpenMM Zephyr 2.0.3 – Molecular Simulation Application
  • OpenStructure 1.1.0 – Computational Structural Biology Framework
  • ORTEP-III 1.03 – Crystal Structure Illustration
  • Oscail 2011 – Crystallography & Molecular Modelling
  • OVOP 1.0 – View Generation for Protein Structures
  • PaDEL-ADV 1.6 – Facilitate Virtual Screening with AutoDock Vina
  • PDB Editor 090203 – PDB (Protein Data Bank) File Editor
  • PDBCNS 2.0 – Interconvert Atom Names between PDB & CNS formats
  • PDBlib 2.2 – C++ Macromolecular Class Library
  • PDBpy – Python Parser for PDB files
  • PeppeR 0.8.160 – Graphical 3D-EM DAS Client
  • PHENIX 1.7 – Python-based Hierarchical ENvironment for Integrated Xtallography
  • PovChem 2.1.1 – Chemical Visualization & Illustration & POV File Converter
  • Prekin 6.51 – Prepares Kinemages Files from PDB-format Files
  • PREPI 0.9 – Molecular 3D Representation
  • Probe 2.12 – Evaluate Atomic Packing & Contact Analysis
  • ProbeWithO 0.9.0 – Use Small Probe Contact Dots Within O
  • ProFit 3.1 – Protein Least Squares Fitting
  • ProSa 2003 – Protein Structure Research Tool
  • PROTEAND 1.0 – Display Macromolecular Structural Uncertainty
  • Protein Explorer 2.80 – Visualize 3D Structures of Macromolecules
  • ProteinGlimpse 1.6 – Visualize Macromolecules Retrieved from PDB
  • ProteinScope 1.0.5 – 3D Protein Structure Viewer
  • ProteinShader beta 0.9.4 – Illustrative Rendering of Macromolecules
  • ProtoMol 3.3 – Molecular Dynamics (MD) Simulation
  • PULCHRA 3.06 – All-atom Reconstruction & Refinement of Reduced Protein Models
  • pymacs 0.4 – Python Module for Dealing with Structure files from GROMACS
  • PyMOL 1.4.1 – Molecular Visualization System
  • PyOpenMM 3.0 – Python API of OpenMM Library
  • PyRx 0.8 – Virtual Screening software for Computer-Aided Drug Design
  • QTree 2.3 – Graphics Rendering using Quad-tree Algorithm
  • QuickPDB 20021101 – Java Applet for quickly viewing PROTEIN PDB Structure
  • QuteMol 0.41 – Molecular Visualization System
  • R.E.D. III.4 – Calculate RESP Charges
  • Ramachandran Plot Explorer 1.0 – Interactive Cross-platform Protein Viewer
  • Rasmol 2.7.5.2 – Molecular Graphics Visualisation
  • Raster3D 3.0-2 – Generate High Quality Raster Images of Proteins or other Molecules
  • RasTop 2.2 – Molecular Visualization Software Adapted for Rasmol
  • RDCvis 1.02 – Residual Dipolar Coupling Visualizer
  • Reduce 3.14 – Add Hydrogens to PDB Molecular Structure File
  • Remediator 1.60 – Convert PDB Files between PDBv2.3 & PDBv3.2 Formats
  • REMO 1.0 – Construct Full-atom Protein Models from C-alpha Traces
  • Ribbons 3.32 – Molecular Graphics Software
  • RIP 1.0 – Accelerated Molecular Dynamics
  • RNABC 1.11 – RNA Backbone Correction
  • Rosetta@home – Grid Software for Protein Folding
  • rTools 0.7.2 – PyMOL plugins
  • ScoreDotsAtAtom 1.0 – Bookkeep All-atom Contact Dot
  • Sculptor 2.0.2 – Docking & Visualization for Atomic Structures
  • SEQMOL 3.4.6 – Sequence Alignment & PDB Structure Analysis Utility
  • SHIFTS 4.3 – Predict Nitrogen, Carbon & Proton Chemical Shifts in Proteins
  • SimTK Core 2.1 – Simbios Biosimulation ToolKit
  • Situs 2.6 – Integration of Multi-Resolution Structures
  • Solvate 1.0 – Construct Atomic Solvent Environment Model for Given Atomic Macromolecule Model
  • StrukEd – Editor for Molecules & 3D Viewer
  • Suitename 0.3 – RNA Conformer
  • Superficial 1.2 – Identification of Potential Epitopes or Binding Sites
  • SuperMimic – Fit Peptide Mimetics into Protein Structures
  • TASSER-Lite 1.0 – Protein Structure Modeling tool
  • tCONCOORD 1.0 – Predict Protein Conformational Flexibility
  • Tessellator 1.0 – Software for Tessellation of 3D Volume in Biological Molecule
  • Theseus 1.6.1 – Superimpose Macromolecular Structures
  • THREADER 3.51 – Protein Fold Recognition by Threading
  • TimeScapes 1.2.2 – Molecular Dynamics Analysis tool
  • Tinker 5.1.09 – Software Tools for Molecular Design
  • Torsions – Calculates Backbone Torsion Angles from a PDB file
  • UCSF Chimera 1.5.3 – Molecular Modeling System
  • VcPpt – Protein Ligend Docking & in silico High-throughput Screening
  • VEGA ZZ 2.4.0 – Molecular Modeling Toolkit
  • VESTA 3.0 – 3D Visualization System for Electronic & Structural Analysis
  • Viewmol 2.4.1 – Molecule Viewer
  • ViewMol3D 5.00.alpha.3 – 3D OpenGL Viewer for Molecular Structures
  • VisProt3DS 3.03 – Stereoscopic Visual Analyzer of Biological Macromolecules
  • VMD 1.9 – Molecular Graphics Viewer
  • Voronoia 1.0 – Analyse Packing of Protein Structures
  • WebMol – JAVA PDB Viewer
  • WPDB 2.2 – The Protein Data Bank Through Windows
  • XCrySDen 1.5.24 – Crystalline & Molecular Structure Visualisation
  • XmMol 3.1 – Macromolecular Visualization and Modeling tool
  • XtalView 4.0 – Molecular Graphics Program
  • YAKUSA – Scan Structural database with Query Protein Structure
  • YASARA 11.9.18 – Molecular Graphics, Modeling & Simulation program
  • YUP 1.080827 / Yammp 2 – Molecular Simulation
  • Zodiac 0.6.5 – Molecular Modelling suite for Drug Design
  • HyperBalls Viewer – Molecular structures and trajectories visualization using GPU rendering