Andy Computer Blog: 9月 2010

2010年9月30日星期四

Termial

1. if terimal have starnge symbol, like you accident show the pic using vi,
$ using the reset

2010年9月29日星期三

Adobe flash install

1. download install_flash_player_10_linux.tar.gz

2. install the player.so in the mozilla
{{{
[Andy@localhost adobe]$ cp libflashplayer.so ~/.mozilla/plugins/
}}}

2010年9月25日星期六

0.定義:
2 個字串在某部份分開後, 能得到較好的相似度(比對分數)

1.說明
最簡單的比對是兩個字串作線性的比對, 此是比對只有兩種情形發生 match 與 mismatch,
e.g
CATCCGA alignment CAAAGCGA, 最後是只有兩個字母像似.

CATCCGA
CAAAGCGA
CA******

但是若能使用 gap 則字串就能分開以求得最好的相似度, 但是此時會產生不同 gap 方式, 如
gap 1:
C-A--TCCGA
CAAAG-CAGA
此時有 3 個 gap, 連續的分開算一個

gap 2:
CA---TCCGA
CAAAG-CAGA
有 2 個 gap

gap ....

2. 有 gap 後, 兩字串最像的計分方式;
通常會需要從許多不同分開 gap 字串中, 計算哪一組的分數最高,
已得到兩字串最像的結果.

Score Formula:(計分的方式, 可以據不同的應用或人而定)
if xi and yi is match score is 2,
if xi and yi is not match score is -1,
if this region is gap, the score is -(4 + k), k is the consecutive gap

例如:
gap 1 is
2 - (4+1) + 2 - (4+2) - (4+1) + 2 - 1 = -7
gap 2 is
2 - (4+3) - (4+1) +2 - 1 + 4 = -3

如而最好的字串比對是
CA--TCCGA
CAAAGCAGA score = 3

2. 參考:
Gap Punishment Aligment Problem(DP):

3. 進階閱讀:
affine gap cost -http://homepage.usask.ca/~ctl271/857/affine_gap_penalties.shtml

Online document(Algorithms for SP-optimal multiple alignments): http://lectures.molgen.mpg.de/online_lectures.html

Gap costs for multiple sequence alignment (paper)
Logarithmic gap costs decrease alignment accuracy(paper)

sequencing problem:
http://seqcore.brcf.med.umich.edu/doc/dnaseq/interpret.html

2010年9月23日星期四

Sorting hash

in Hash
Assuming the hash is ("aaa" => 10, "bbb" => 60, "ccc" => 20)
foreach my $key (sort {$repeatRank{$b} <=> $repeatRank{$a}} (keys (%repeatRank)))
{
print $key $repeatrank{$key}
}

使用 sort( Algorithm Array)
Algorithm is {$repeatRank{$b} <=> $repeatRank{$a}}
1. Hight to lower (decreasmemt)
2. Using the <=> sysmbol represent the number comparsion

Array (keys %repeatRank)
is return the array content

---------------------------------------------

my %hash = (john, 24, mary, 28, david, 22);
my @order = sort { $hash{$a} <=> $hash{$b} } keys %hash;
print @order; # 依序是 david john mary

雖然只有三行程式，不過我們還是應該來解釋一下其中到底發生了甚麼事，否則看起來實在讓人有點頭暈。第一行的問題應該不大，或者說如果你第一行看起來有點吃力，那你可能要先翻回去看看雜湊那一章，至少你應該要懂得怎麼定義一個雜湊，然後指定雜湊的鍵跟值。這裡所用的方式一點也不特別，我們只是用串列來賦值給一個雜湊。最複雜的應該是第二行 (除非你覺得最後一行要印出一個陣列對你而言太過困難)，我們先看等號左邊，那裡定義了一個陣列，因為我們希望可以得到一個依照雜湊值排序過的雜湊鍵陣列。這聽來好像不難，讓我們先想像一下，我們該怎麼取得這樣的陣列呢？
首先我們應該先拿到包含所有雜湊鍵的陣列，也就是利用keys這個函式取得的一個陣列。拿到這個陣列之後，我們就可以來進行排序了。排序的重點在於區塊內的那一小段程式。我們還是使用了Perl預設的兩個變數，也就是$a跟$b，分別代表從陣列(keys %hash)拿出來準備比較的兩個數值。部過我們並不是直接對變數$a，$b進行比較，而是以他們為鍵，而取的雜湊值來進行排序

---------------------------------------------
Reference:

perl learn: http://perl.hcchien.org/ch12.html
web teacher: http://devdaily.com/perl/edu/qanda/plqa00016/

2010年9月22日星期三

RepeatElementCoverageInContigs_v1.pl

#!/usr/bin/perl
# Description:
# Get the column of contig and repeattype in repeatMasker .out file and mapping toi the template coverage profile in the .depth file, and generate the mapping file of the coverage file.
#
# Author: Andy
#
# Input file:
# [0] repeatmasker .out file
# [1] .depth file (from alang)
#
# Output format:
# contig RepeatName Coverage file ...
#
# Output file
# ana.RepeatElementCoverageInContigs_v1.modifyRepeatMaskerOutFile is the fileter header and add the number
# ana.RepeatElementCoverageInContigs_v1.IntegerOutFile is the table integrate data
#
# Sample
# perl RepeatElementCoverageInContigs_v1.pl 454AllContigs_change.fna.out 454AllContigs_change.depth
#
# Time: 2010.09.21

use strict;

die "Error:$0 RepeatMasker.out(file) CoverageFile(.depth) " unless ($#ARGV == 1);

# Open the two file
open (my $maskerOutFile, $ARGV[0]);
open (my $depthFile, $ARGV[1]);
open (my $analysisMaskerOutFile, ">".$ARGV[0].".ana.RepeatElementCoverageInContigs_v1.modifyRepeatMaskerOutFile");
open (my $analysisIntegrateOutFile, ">".$ARGV[0].".ana.RepeatElementCoverageInContigs_v1.IntegerOutFile");

# Filter the maskerOutfile
my ($title, $title2, $space, @lines) = <$maskerOutFile>; # Dsicare the header information for the masker .out file

my $count = 1;
my @statistics = <$depthFile>;

foreach my $line (@lines) {
# Analysis the repeat masker outfile
$line =~ s/^\s+//;
my @token = split (/\s+/, $line);
print $analysisMaskerOutFile "$count\t\t$line";

# Assing need process columns
my ($contigName, $repeatName)=($token[4],$token[9]);# token4 is the contigname name and token9 is the repeat name

# Compare the contigs name to static file
foreach my $statisticLine (@statistics) {
# Print the all columns
if ($statisticLine =~ /^$contigName/ ) {
print $analysisIntegrateOutFile "$count\t$contigName\t$repeatName\t$statisticLine";
}
}# end compare the contigs to statistic file

$count++;
print "." if ($count % 1000 == 0);

} # End for eachc line in MakerOutput file

# End the program close file
close $maskerOutFile;
close $depthFile;
close $analysisIntegrateOutFile;
close $analysisMaskerOutFile;

DNASequenceComplement.pl

{{{
#!/usr/bin/perl
#
# Andy 20100920 reverse the string
#
# Sample:
# echo "ell" | test.pl
# DNASequence.pl DNAFile
#
# out:
# four different sequence
# Original, rever dna,
# Origianl complement, reversa complement

use strict;

# Reverse the contnet
my $DNA = <>;
chomp($DNA);
print ("Original sequence: $DNA\n");

my $reverseDNA = DNAReverse($DNA);
print ("Reverse sequence: $reverseDNA\n");

my $complementDNA = DNAComplement($DNA);
print ("postive complement sequence: $complementDNA\n");

my $reverseComplementDNA = DNAComplement($reverseDNA);
print ("Nagative complement sequence: $reverseComplementDNA");

#=======================================================================
# Need a DNA sequence
# Paramenter:
# DNA sequence as the parameter zero and to reverit
#=========================================================================
sub DNAReverse () {
chomp($_[0]);
my $reverse = reverse($_[0]);
#print "reverse: $reverse\n";
return $reverse;
}

# change the content for
#my $string = "hello Andy";
#print substr($string, 6, 1);

#=====================================================
# Action the DNA complement
# Parameter:
# [0] is the DNA and change to the complement
#
# MEthod:
# A->T,
# T->A,
# G->C,
# C->G
#
# Return
# complement array
#
#================================================================
sub DNAComplement () {
# Declare the hash of chage complemt content
my %complementHash = ("A"=> "T", "T" => "A", "G" => "C", "C" => "G");
#print "hash array $complementHash{'A'}"; # Test OK
# Declare the complement DNA sequence
my $complementDNA;

# Preprocess the parameter
chomp($_[0]);
my $DNA = $_[0];
#print "hash DNA: $DNA\n"; # for deub

# Run the change method
for (my $i = 0; $i < length($DNA); $i++)
{
my $unit = $complementHash{ substr($DNA, $i, 1) };
#print ($unit);
$complementDNA = $complementDNA.$unit;
#print ("complement $complementDNA");

} # end process the complement for string

return $complementDNA;

}

}}}

My Program

ID PrgramName Time Explain
1. DNASequenceComplement.pl 2010.09.20 can look the sequence order, complement

2. RepeatElementCoverageInContigs_v1.pl 10.09.23 can integreate the repeat masker output and depth file in one table

2010年9月16日星期四

CPAN

說明:
　Perl是一种相当灵活的程序编程语言，现有的许有程序都是使用它进行编程的。它的优点之一就是无需自己编写编码，你就能利用许多增加的模块，创建新的功能。
　　
　　程序利用这些模块的编码，而程序员就能集中开发编码，无需担心大量的工作量。但是，你必须在Perl程序运行之前安装任何特定的模块。
　　
　　虽然许多Linux零售商创建了各种Perl模块的RPM软件包，但他们并不为每一个现存的模块创建数据包，除了那些是供应商所要求的。这就是Comprehensive Perl Archive Network (CPAN)的产生的原因。
　　
　　使用CPAN模块，你就能使用Perl本身安装其它模块。这样做，你就需要你所要安装的模块的名字。比如，你要安装的是Time::HiRes或是DBI模块。具有特色的是，如果你为一个特定的Perl程序查看README文件，它将会列举任何所要求的模块的名字。
　　
　　使用CPAN，成为根用户，执行以下操作：
　　
　　# perl -MCPAN -e shell
　　
　　首次这样做的话，你就必须配置CPAN模块。花一些时间回答它所问的问题；通常情况下，问题按住【Enter】就可。
　　
　　安装模块的时候，在CPAND的提示下，输入安装和模块的名字。比如：
　　
　　cpan> install Time::HiRes
　　
　　这样就对Time::HiRes Perl模块进行了下载，编辑和安装。核查CPAN网站获取有效的CPAN模块的完整列表，。

Andy Computer Blog

2010年9月30日星期四

Termial

2010年9月29日星期三

Adobe flash install

2010年9月25日星期六

Gap Alignment Discuss

2010年9月23日星期四

Sorting hash

2010年9月22日星期三

RepeatElementCoverageInContigs_v1.pl

DNASequenceComplement.pl

My Program

2010年9月16日星期四

CPAN

關於我自己

標籤

網誌存檔

2010年9月30日 星期四

2010年9月29日 星期三

2010年9月25日 星期六

2010年9月23日 星期四

2010年9月22日 星期三

2010年9月16日 星期四

關於我自己

標籤

網誌存檔

2010年9月30日星期四

2010年9月29日星期三

2010年9月25日星期六

2010年9月23日星期四

2010年9月22日星期三

2010年9月16日星期四