Changes

Jump to navigation Jump to search
no edit summary
We wrote a couple of simple scripts together to get to grips with Perl.
 
 
===Running a Perl Script===
The first was (save it in a file called Script1.pl in the root of your R drive):
Or we can shell on to Bear and run it there:
Use PuTTY to connect to bear.haas.berkeley.edu (see [[Research Computing At Haas| here]]).
perl Script1.pl
 
 
===Processing Text Data===
 
Next we went to:
 
http://www.contractormisconduct.org/index.cfm/1,73,222,html?CaseID=2
 
And we created a file called Data.txt (saved next to the script) that contained the following:
 
Accenture
Potential Foreign Corrupt Practices Act Violation
Date: 07/01/2003 (Date of Incident Report)
Misconduct Type: Ethics
Enforcement Agency: SEC
Contracting Party: None
Court Type: Administrative
Amount: $0
Disposition: Pending
Synopsis: "As previously reported in July 2003, we became aware of an incident..."
Document(s):
•1. SEC 10-K (p. 34 of 137)
 
We then wrote the following script to process the data:
 
#!/usr/bin/perl -w
#Lines that start with a # are comments that aren't read by the interpreter
 
use strict;
#The strict module forces us to declare variables before we use them
 
my @Textfile;
#Declare an array called TextFile
 
open (DATA,"Data.txt");
#Open a filehandle on our file
 
while (<DATA>) {
#Read the data from the filehandle, line by line
 
chomp $_;
#$_ is a special variable - it captures the line being read from the filehandle here
 
if (!$_) {next;}
#if the line is undefined (i.e. blank) move to the next loop iteration
 
my $line = $_;
#Set a local variable called line to $_
 
push (@Textfile, $line);
#Push the line onto the Textfile array
}
 
my $Doccell;
#Declare the Doccell variable
 
for (my $i=0; $i<=$#Textfile; $i++) {
#Do a for loop, starting from i=0, going while i is less than the
#last index of the Textfile array, and incrementing by one each time
 
if ($Textfile[$i]=~/^Document\(s\):/) {$Doccell=$i;}
#Test to see if the entry matches a regular expression, if it does record the index
}
 
my @docs = splice(@Textfile,$Doccell);
#Create a next array by splicing out everything after the index we just found
 
shift @docs;
#Remove the first element of the docs array
 
my $Firm = shift @Textfile;
#Set Firm equal to the first element of Textfile (which we just removed)
 
my $Violation =shift(@Textfile);
#Set Violation equal to the (new) first element of Textfile (which we just removed)
 
my $Offense={};
#Create an anonymous hash
 
foreach my $cell (@Textfile) {\
#Iterative over Textfile, setting the current iteration to cell
 
my ($name,@value)=split(":",$cell);
#Spill the cell on :
 
my $value=join(":",@value);
#Join the Value array on :
 
$Offense->{$name}=$value;
#Set an entry in the Offense hash
}
 
$Offense->{"DocList"}=\@docs;
#Set the doclist entry in the Offense hash to a reference to the docs array
my $Master=[];
#Define an anonymous array
 
$Master->[0]={};
#Define an anonymous hash in the zeroth cell of the anonymous array
 
$Master->[0]->{FirmName}=$Firm;
#Set a hash entry
 
$Master->[0]->{Offense}=$Offense;
#Set a hash entry
 
$Master->[0]->{Violation}=$Violation;
#Set a hash entry
open(OUTPUT,">Result.txt");
#Open a filehandle for writing (overwrite the file if it exists)
 
print OUTPUT $Master->[0]->{FirmName};
#Print the output file an entry from the anonymous hash in the anonymous array
 
print OUTPUT "\t";
#Print a tab
 
print OUTPUT $Master->[0]->{Violation}."\t";
#Print another entry with another tab on the end
 
foreach my $key ( sort {$a cmp $b } (keys %{ $Master->[0]->{Offense} } )) {
#Iterate through the hash's keys, in alphabetical order, setting the current key to $key
 
print OUTPUT $Master->[0]->{Offense}->{$key}."\t";
#Print an entry, with a tab
}
 
print OUTPUT "\n";
#Print a new line
 
close OUTPUT;
#Close the output filehandle - this will flush the write buffer
Anonymous user

Navigation menu