Friday, March 12, 2010

The Power of Shell

A friend of mine has acquired the nick name of Ludo (From the Labyrinth) so i decided to find the extent of ludo's lines in the film (to post on his Facebook page).

I found the script for Labyrinth here and is in perfect format for what i was planing.

SARAH: WHERE DO THESE DOORS LEAD?
KNOCKER1: WHAT?
KNOCKER2: SEARCH ME. WE'RE JUST THE KNOCKERS.
SARAH: OH.
LUDO: RRR.
SARAH: HOW DO I GET THROUGH?
KNOCKER1: HUH?
KNOCKER2: KNOCK, AND THE DOOR WILL OPEN.
SARAH: OH.
LUDO: HUH?
SARAH: LUDO.

First i need it downloading i could simply have SaveAsfrom the browser but i find working in a shell quicker.

[MyComputerName]:Crap [MyUserName]$ wget http://corky.net/scripts/labyrinth.html--2010-03-12 08:33:10-- http://corky.net/scripts/labyrinth.html
Resolving corky.net (corky.net)... 212.150.53.130
Connecting to corky.net (corky.net)|212.150.53.130|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60960 (60K) [text/html]
Saving to: `labyrinth.html'

100%[======================================>] 60,960 63.8K/s in 0.9s

2010-03-12 08:33:12 (63.8 KB/s) - `labyrinth.html' saved [60960/60960]

[MyComputerName]:Crap [MyUserName]$

'wget' takes a URL an then downloads the file to the current directory. I'm left with the plain text script in the file labyrinth.html.

Then extracting Ludo's lines was also easy with 'grep'

[MyComputerName]:Crap [MyUserName]$ grep ^LUDO* labyrinth.html > ludolines.txt
[MyComputerName]:Crap [MyUserName]$

'grep' take the regular expression for matching and then a file name and then displays all the lines containing the regular expression.

The regular expression reads '^' any line starting with 'LUDO' and can have anything after it '*'

Unfortunately due to the verbose nature of the character Ludo there are a lot of repeating lines

LUDO: SARAH?
LUDO: SARAH?
LUDO: SARAH BACK.
LUDO: GRR!
LUDO: GRR!
LUDO: GRR!
LUDO: HUH?


I decided to remove duplicates


[MyComputerName]:Crap [MyUserName]$ sort ludolines.txt | uniq > ludovocab.txt
[MyComputerName]:Crap [MyUserName]$


The 'uniq' command removes adjacent lines, to make sure i got all the duplicate lines out i used 'sort' to place all similar lines together. The '|' char pipe's command outputs and inputs together.

In fact we can chain all the commands together and never need to save the information locally


[MyComputerName]:Crap [MyUserName]$ wget -qO- " http://corky.net/scripts/labyrinth.html | grep ^LUDO* | sort | uniq
LUDO: AAARGHH!
LUDO: AAH!
LUDO: AH!
LUDO: AH.
LUDO: AH. AH!
LUDO: AH. WHUAH!
LUDO: AH...UH!
LUDO: DOWN.
LUDO: EEOOWW!
LUDO: FRIEND?
LUDO: GOOD-BYE, SARAH.
LUDO: GRR!
LUDO: GRRR!
LUDO: HHRRH.
LUDO: HMM.
LUDO: HMM?
LUDO: HOGGLE AND LUDO FRIENDS.
LUDO: HUH.
LUDO: HUH. HUH.
LUDO: HUH?
LUDO: HUH? SURROUNDED?
LUDO: HUH? WHAT?
LUDO: HUNGRY.
LUDO: Hhrrr...
LUDO: LUDO GET BROTHER.
LUDO: LUDO SCARED.
LUDO: LUDO.
LUDO: LUDO...
LUDO: MMM...
LUDO: NNH!
LUDO: NNNNH!
LUDO: NO.
LUDO: OH!
LUDO: OH.
LUDO: OH. OH...
LUDO: OHH...
LUDO: OHHH.
LUDO: OOH.
LUDO: ROCKS FRIENDS.
LUDO: RRAGHH!
LUDO: RRR.
LUDO: SARAH BACK.
LUDO: SARAH FRIEND. YEAH!
LUDO: SARAH.
LUDO: SARAH?
LUDO: SMELL BAD!
LUDO: SMELL.
LUDO: SURE.
LUDO: THE SMELL!
LUDO: UHRR.
LUDO: UUUH.
LUDO: WHOO!
LUDO: YARGGH!
LUDO: YARRGH!
LUDO: YARRGH! YARRGH!
LUDO: YARRGH.
LUDO: YEAH.
LUDO: YEEIAAHH!
LUDO: YES!
LUDO: YRRURH!
[MyComputerName]:Crap [MyUserName]$


If you have anyways to make the code more efficient feel free to comment

Thanks to :

http://corky.net/scripts/labyrinth.html
http://www.commandlinefu.com/commands/view/1913/redirect-wget-output-to-the-terminal-instead-of-a-file

1 comment:

Anonymous said...

Once you are able to understand the power of shell it is sure that you are going to take precaution. Sometimes we are accused by these types of things to be affirmative in the start.