Garfield Nate

Big Fat Hairy Programmer

Schools Kill Creativity

| Comments

Ken Robinson Says Schools Kill Creativity

First post of my new linklog; I’ll be writing responses to articles from BYU’s CS 404.

There’s another reason schools kill creativity in my opinion: by killing intrinsic motivation. The worst culprit is literature class. I did honors/AP all through high school, and haven’t read a classic since (except Ender’s Game, which is a rather new classic). Who wants to read anything when it’s just required reading to make the grade? Kids should learn to enjoy good works of mankind because they are enriching, instead of just developing an aversion to requirements of “the system”. Sudbury Schools look nice. No classes!

Refactoring the Soar Lexer

| Comments

After my last writeup on how Soar parses Soar code, I decided to dive in and try and refactor it, starting with the lexer (and here’s the resulting PR). I haven’t worked with C/C++ in a while, and a good part of the code I changed was close to my age, so it was an interesting learning experience for me.

The lexer used global state stored via an agent to keep track of the current input character and lexeme as well as to remember line and column numbers. The goal of the refactoring was of course to change that, but 30 commits later I have not yet completely separated the agent from the Lexer because all printing logic (warnings, etc.) requires an agent. The result of this is that after every change I make I have to do build Tests and watch the entire project be rebuilt because of the various interdependencies. Also, the lexer has none if its own tests, and so the full test suite must be run to find possible problems. The result here is that every time I make a change and need to see if it works, I start a long compile/test process running and watch more of some show while waiting for it to finish (you’ve probably seen this comic before). So the global state is not only difficult to work with, but it also slows development to a crawl.

The lexer had a lot of dead code which told me the story of how it used to work. Originally it would read Soar code straight from a file; later it was retrofitted to take production strings directly as an “alternate”. There must have been something off about the string input, however, because besides setting alternate_input_string, alternate_input_suffix was also required, and always needed to be set to ") ". Once the production string was finished, the lexer would read the suffix. The original reason for this was long gone, but the parser still expected productions to end with an R_PAREN lexeme, as did other lexer clients. Later developers copied the incantation to lex a string, not knowing why it was needed, and the refactored jSoar parser implementation has the check for the parenthesis commented and marked “this makes no sense to me”. So the current functionality started out as an alternative for testing or something and then became the only use case, leaving dead code everywhere.

Some of the updating work I did was already done in the [9.5] branch, but it was still interesting. There were workarounds for the old Think C and Symantech C++ compilers, between THINK_C and __SC__ #ifdefs. I had no idea what they were and found an excellent compiler macros list project on SourceForge. Apparently C had no boolean variable types, either, so programmers used typedef char Bool and defined TRUE and FALSE manually, and part of updating to C++ code is changing those to real bool types.

The value of making small, working commits was reinforced when my initial pull request broke the automated build when --no-scu was set. This setting causes each .cpp file to be compiled separately, ensuring that the build fails if a #include directive was forgotten somewhere. I was able to bisect the Git history and discover the problem pretty quickly.

Nuggets:

  • Alex set up Jenkins to test pull requests, so I was able to push several times and get live feedback during another build each time.
  • The legacy spaghetti code wasn’t too difficult to untangle and I’m confident I can do a lot more

New TODO’s:

  • A printer or printer manager class to decouple the agent code from everything else in Soar (jSoar is way ahead on that one).
  • Unit tests for the lexer, preferably data-driven to reduce the amount of required recompilation

Random Slice of Culture: Unicode Proposals

| Comments

So a couple of weeks ago I had a casual use for hentaigana and went looking to see if there were Unicode code points. Alas, there were none, but through Wikipedia I found that there was a proposal to include it. The proposal document includes relevant uses such as transcribing old documents, restaurant names and documents on calligraphy. Unfortunately the proposal hasn’t been accepted yet, but it did lead to an interesting discovery.

The containing directory has all of the Unicode proposals, some of them labeled urgent because characters have become common use. For example, http://std.dkuug.dk/JTC1/SC2/WG2/docs/n4584A.pdf calls for the addition of several Chinese characters for scientific use, such as 鎶, which is used for the new(ish) element Copernicum, some other weird-looking ones. http://std.dkuug.dk/JTC1/SC2/WG2/docs/n4583.pdf is a request for special Chinese characters that were used by a Russian mission in Beijing to transcribe Slavonic sounds. http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3734.pdf requests an Arabic character used in Urdu for a specific Indian calendar.

Also, apparently two years after the hentaigana proposal was shelved, the Japanese government sent word saying they actually needed it to encode names of some people born before 1948. Now that’s interesting! I’ve never met someone with a hentaigana name.

Each of these comes with cultural background, descriptions of the required glyphs, and pictures or screenshots of the glyphs in the wild.

Code Reading: pl2bat.pl

| Comments

When you install a Perl distribution from CPAN that comes with an executable script, Perl adds it to your path so it can be treated like any other command line program. On a nix system, all that needs to be done for this to work is to set the script to be executable. Windows doesn’t have executable script files with shebang lines like nix, however, so Perl generates a batch file.

Let’s see how this works. First, let’s pick a distribution that comes with an executable script. The App:: namespace is a great place to look for that.

cpan App::Cleo
...
Running make install
Installing C:\dev\strawberry-perl-5.18.2.2-64bit-portable\perl\site\lib\App\Cleo.pm
Installing C:\dev\strawberry-perl-5.18.2.2-64bit-portable\perl\site\bin\cleo
Installing C:\dev\strawberry-perl-5.18.2.2-64bit-portable\perl\site\bin\cleo.bat
...

Hmm, cleo.bat. That’s not in the distribution! Let’s take a look:

@rem = '--*-Perl-*--
@echo off
if "%OS%" == "Windows_NT" goto WinNT
IF EXIST "%~dp0perl.exe" (
"%~dp0perl.exe" -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
) ELSE IF EXIST "%~dp0..\..\bin\perl.exe" (
"%~dp0..\..\bin\perl.exe" -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
) ELSE (
perl -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
)

...

#!/usr/bin/env perl
#line 29

use strict;
use warnings;
use App::Cleo;

our $VERSION = 0.003;
...

It seems to be a batch script with the entirety of cleo.pl pasted in the bottom. How does that work? Well first there’s a little bit of logic to differentiate requirements on Windows NT versus other Windows systems, and some logic to deal with Perl being located in different places (relative to %~dp0, the directory containing the script). %0 is the name of the script being run, and Perl is being invoked on it with different arguments. The other %number arguments are the script arguments, which are passed along to the script once again. So this is a batch script that passes itself to Perl? But what’s Perl going to do with all of the batch garbage that comes before the actual Perl script?

The magic here is in the -x switch. From perlrun:

tells Perl that the program is embedded in a larger chunk of unrelated text, such as in a mail message. Leading garbage will be discarded until the first line that starts with #! and contains the string "perl". Any meaningful switches on that line will be applied.

Ooh! That’s an interesting feature. I wonder if anyone still uses it for emailed scripts.

So the batch script calls Perl on itself, and Perl skips all of the batch syntax and goes straight to the shebang line (#!/usr/bin/env perl) and runs the original cleo Perl script.

The mechanism used to create this is is pl2bat.pl, and you’ll find interesting history, motivation and gotchas in the documentation there. Thanks to pl2bat, CPAN author scripts are installed on Windows and added to the path. I can just open my cmd and type cleo to start the app.

Code Reading: The Soar Parser

| Comments

To investigate the possibility of an upcoming project, I’ve been wanting to know how Soar parses productions, and if there’s any way to retrofit it to make a parser usable by various IDE’s. I have tried to make two separate parsers for Soar code already, and it just seems to be difficult to imitate the real thing. If you’d like to follow along, you can view or download the code on GitHub:

Disclaimer: My “critiques” of the code are areas that I think can use some contributions or TLC. Soar was written and is maintained by programmers and researchers far greater than I, and I am not dismissing the hard work and craft that went into its construction. The Soar code has evolved over 20-30 years in an academic environment, and I expect it to have a few rough edges.

Step 0: Some Background Knowledge

OK, so I guess background knowledge would be useful for the readers. Soar is a cognitive architecture used for creating agents. Cognitive meaning it provides the basic faculties required for gaining and using knowledge, and agents meaning programs that make choices to accomplish a goal of some kind. Soar has its own programming language, and instead of the familiar loops and functions it has productions. Soar code looks something like this:

watch 5
source somefile.txt
sp {some*production
    (state <s> ^foo |want greeting|)
-->
    (<s> ^operator <o>)
    (<o> ^name hello-world ^greeting |hello world|)
}

The first two lines are commands; the first makes the command line interface much more verbose and the second loads another Soar file. The sp {...} is a Soar production, which is where the bulk of the language features are. Productions do all the work (thus soar is called a production system). Productions match the program state (working memory) and make changes to it if the match is successful. The production above matches a state that has a foo attribute with a value of want greeting, and if that succeeds then it creates a special operator attribute which in turn has two more attributes. Working memory is organized like a network of attributes and values, and Soar cycles over the productions, changing memory and working towards a goal. This is accomplished efficiently through the RETE algorithm, which I’d like to study another time.

That all might leave it unclear, but at least understand that Soar is cool because it allows you to make programs that play Mario, fly planes, or talk (all of those with caveats, of course; work is ongoing).

Step 1: Try to Build It

The first thing I wanted was to build Soar on my own computer. After all, if I can’t even build it, what’s the point in trying to modify it? I’ve always had terrible experiences trying to build C/C++ projects on Windows. This was a little better because Windows is supported and a release is provided regularly, but there were still hiccups.

The build instructions are here. To do a full build you need Python, SWIG, Tcl, and a C++ compiler toolchain. The build is done via Scons, but batch and sh files are used to do initial checking of the available tools (and it’s easier to double-click a batch file on Windows than to call Python). My first attempt ended in an uncaught error dumping messages to the screen because I had not used the Visual Studio Prompt. Once I figured this out (and improved the message for future users) the build died again trying to generate C# bindings. This was reported and fixed, after which I was able to do a partial build. I had some more trouble because I was using the wrong MSVC toolchain(!), but now it builds without any issues.

Overall, the build system is very nice. It’s hands-off, and just works. The maintainers are very responsive and dedicated to a working cross-platform build. I do wish there were more comments in the build scripts, since I have never used Scons and had some trouble looking through it. Also it would be nice to have the build instructions in the repository instead of on a separate website.

Step 2: How does Soar parse Soar code?

The answer to this turned out to be more interesting than I thought. My own attempts at writing a single Soar parser didn’t work very well because Soar actually has two parsers! Well, actually it has 76 parsers; one to split commands into constituents and then one for each of 75 possible Soar commands. sp is just one of many commands.

Let’s start by looking at what happens when you source a file. The code for the source command is in Core/CLI/src/cli_source.cpp. DoSource of line 80 is called, and after loading the input file into memory and doing some error checking and logging it calls Source of line 212. Only the first 4 lines matter for understanding the parser:

bool CommandLineInterface::Source(const char* buffer, bool printFileStack)
{
    soar::tokenizer tokenizer;
    tokenizer.set_handler(&m_Parser);
    if (tokenizer.evaluate(buffer))
        return true;

It creates a new tokenizer and sets its handler to m_parser, the main CLI parser available from cli_commandLineInterface.h. The parser can be passed to set_handler because it implements tokenizer_callback (declared in tokenizer.h) by having the handle_command method:

        /**
         * Implement to handle commands. The words of the command are in the
         * passed argv vector. The first entry in the vector is the command.
         * The vector is guaranteed to never be empty, though the first command
         * could be.
         * @return true if the command was ok, or false if there is an error.
         *         Returning false will stop parsing and cause
         *         tokenizer::evaluate to return false.
         */
        virtual bool handle_command(std::vector<std::string>& argv) = 0;

Next, tokenizer.h tells me why I got it wrong when I tried to make my own Soar parser:

 /**
     * Essentially implements a simple Tcl parser, with some exceptions.
     *
     * Takes a string and farily efficiently converts it in to a series of
     * callbacks with arguments separated in to a vector of strings (what Tcl
     * refers to as "words").
     *

So I’ve had it backwards, building a Soar production parser and then as an afterthought adding methods to parse other commands. This “tokenizer” parses Tcl commands and enforces rules on the individual words. For instance, this checks that curly braces match within words, and follows rules of escaping and quoting inside of and outside of quoted and curly-braced sections. This would certainly make some parts of a production parser simpler!

So tokenizer::evaluate parses individual Tcl commands and sends them to the handle_command routine of /Core/CLI/src/cli_Parser.h. This then finds a ParserCommand object using a prefix lookup on the first word of the command; i.e. wa gives you the watch command, pr gives you the print command, etc. The prefix search is basically this:

given a command string $str
    make a list of all commands that start with the same letter as $str
    for each next letter in $str
        remove the commands that don't have the same letter at the same spot
    return the list of matching commands

If more than one matching command is found, then the input is ambiguous and a warning is printed and no command is executed. Hmm, there’s a TODO note there about using a simpler lookup mechanism.

How is the list of commands populated in the first place? Using the parser’s AddCommand method. All of the normal Soar commands are added at runtime in cli_CommandLineInterface.cpp, and there are some other examples of AddCommand in the test code.

    m_Parser.AddCommand(new cli::AddWMECommand(*this));
    m_Parser.AddCommand(new cli::AliasCommand(*this));
    m_Parser.AddCommand(new cli::AllocateCommand(*this));
    m_Parser.AddCommand(new cli::BreakCommand(*this));
    m_Parser.AddCommand(new cli::CaptureInputCommand(*this));
    ...

The commands themselves are ParserCommand objects and are all declared in cli_Commands.h. The structure of a command class is given in cli_parser.h:

    class ParserCommand
    {
    public:
        virtual ~ParserCommand() {};
        virtual const char* GetString() const = 0;
        virtual const char* GetSyntax() const = 0;
        virtual bool Parse(std::vector<std::string>& argv) = 0;
    };

GetString is the name of the command and is used as the first word in the command invocation (watch, sp, etc.). This is used by the prefix lookup code discussed above. GetSyntax gives a usage statement in case the user invokes the command incorrectly. Parse is the meat of the command; it takes the list of command words and performs the action specified by them.

Although the Parse method could directly contain the command actions, all of the Parse implementations simply parse the command and then call DoXYZ. These methods are declared in cli_Cli.h and are implemented in their own files (cli_source.cpp, cli_break.cpp, etc.). Here is the implemention of the sp command as an example:

   class SPCommand : public cli::ParserCommand
    {
    public:
        SPCommand(cli::Cli& cli) : cli(cli), ParserCommand() {}
        virtual ~SPCommand() {}
        virtual const char* GetString() const { return "sp"; }
        virtual const char* GetSyntax() const
        {
            return
                "Syntax: sp {production_body}";
        }

        virtual bool Parse(std::vector< std::string >&argv)
        {
            // One argument (the stuff in the brackets, minus the brackets
            if (argv.size() < 2)
                return cli.SetError(GetSyntax());
            if (argv.size() > 2)
                return cli.SetError(GetSyntax());

            return cli.DoSP(argv[1]);
        }

    private:
        cli::Cli& cli;

        SPCommand& operator=(const SPCommand&);
    };

The DoSP command is in cli_sp.cpp. Here, some craziness comes out. We find soarAlternateInput, which has no documentation and relates to fuctionality that is rather unclear. Then we have the use of a global agent provided by the global m_pAgentSML of cli_CommandLineInterface.h.

 agent* agnt = m_pAgentSML->GetSoarAgent();
    soarAlternateInput( agnt, productionString.c_str(), const_cast<char*>(") "), true );
    set_lexer_allow_ids( agnt, false );
    get_lexeme( agnt );

    production* p;
    unsigned char rete_addition_result = 0;
    p = parse_production( agnt, &rete_addition_result );

    set_lexer_allow_ids( agnt, true );
    soarAlternateInput( agnt, 0, 0, true );

The lexer and parser files in \Core\SoarKernel\src work together to load a production, with the parser repeatedly calling get_lexeme (line 747 of lexer.cpp) to find the next token. After checking for comments and doing some other stuff I don’t get yet (fake_rparen_at_eol) it calls a lexing method based on the current character in the buffer using lexer_routines as a dispatch table.

  record_position_of_start_of_lexeme(thisAgent);
  if (thisAgent->current_char!=EOF)
    (*(lexer_routines[static_cast<unsigned char>(thisAgent->current_char)]))(thisAgent);
  else
    lex_eof(thisAgent);

The explicit functionality of the tokenizer and parser are nicely separated, meaning that production tokenizing can be done context-free, or without the parser sharing knowledge with the tokenizer. However, things are actually crazier than that because the lexer is tied intimately with an agent:

  /* ----------------------- Lexer stuff -------------------------- */

  lexer_source_file * current_file; /* file we're currently reading */
  int                 current_char; /* holds current input character */
  struct lexeme_info  lexeme;       /* holds current lexeme */
  Bool                print_prompt_flag;

The get_lexeme method requires an agent as an argument, even though the agent was provided via init_lexer. During tokenization, get_lexeme sets the lexeme and current_char fields in the agent. So even though there’s a nice separation of parser and lexer/tokenizer, there is potential there for the parser to change the state of the lexer, the input buffer, etc. It doesn’t look like that happens, but it’s not a good possibility. You also have to call get_lexeme before calling parse_production, and parse_production adds the input production to the RETE network directly instead of returning a parsed production. There’s severe coupling between the lexer, the agent, and the parser. This could probably be remedied easily. Ideally the lexer would need only text and return a stream of tokens with no side effects; the parser would instantiate the lexer and would have access to an agent for the RHS functions.

Step 4: How does gp work?

gp is a command that generates new Soar productions by permuting values inside of square brackets in an otherwise normal-looking production. gp was an added mystery to me because its syntax was almost the same as that for sp, but it allowed syntactic variations deep in the parse tree. Here is an example gp statement:

gp {gp*test1
(state <s> ^operator <o> +
           ^someflag [true false])   # some normal values
(<o> ^name foo
     ^att [val1 1.3 |another val|])  # a value with a space, in pipes
-->
(<s> ^operator <o> = 5)
}

This turned out to be simple once I understood CLI parsing. DoGP is located in /Core/CLI/src/cli_gp.cpp. It simply looks for the special [] syntax in a string and generates new strings to be loaded as productions by DoSP. This is simple and easy, but means that there may not be an easy way to do syntax coloring for it.

    if(!DoSP(generatedProduction))
    {
        return false;
    }

Conclusions

Reading the Soar parser code is an awesome exercise that I plan to continue. I probably won’t write up another post as detailed as this one, instead writing a code guide once I’ve grokked it well enough. That will probably be a better contribution to the writers, anyway.

Reading it was actually an interesting bit of archaeology, too. There are notes sprinkled about with author initials and dates back to at least 1994. Notes like that would not have been necessary if a version control system had been available or in use. There are also bug numbers referring to a long-gone database, references to code that doesn’t exist anymore, and commented commented comments. I wish that project history were available prior to 2010.

The design of the Soar parser makes a lot of sense to me now. A Tcl parser parses commands, verifying quotes, escaping, comments and brace or quote matching; a command object is located and is given the command arguments; the command object then parses the arguments and takes whatever action is required. It should be a simple process to create and register new commands in Soar.

The parser and tokenizer for Soar productions are not difficult to understand as far as their operation goes. However, the strong coupling and use of global variables need improvement, especially if parallelization is a concern.

Low-Hanging Fruits

  • Rename the parsers and tokenizers so they don’t get confused:
    • Core/SoarKernel/src/lexer -> Core/SoarKernel/src/production_tokenizer
    • Core/SoarKernel/src/parser - Core/SoarKernel/src/production_parser
    • Core/shared/tokenizer -> Core/CLI/src/cli_tokenizer
    • Core/CLI/src/cli_parser -> this one is fine
  • add a project README containing build instructions
  • consistency
    • choose tabs or spaces, but not both!
    • put declarations at either top or bottom of file
    • capitalize or don’t capitalize names; camelCase or snake_case
  • find and remove commented-out code
  • read and grok the code, then add comments about what it does

A less low-hanging fruit would be trying to address some of the many TODO/BUGBUG/FIXME/need notes sprinkled throughout. Interestingly, searching FIXME gives lots of results in the included Scons distribution, as well!

Using Octopress- One More Thing

| Comments

“Aiiiiyaaaa! Jacky! You did not install Ruuuubyyy. You need that to write your blog.”

“I already install Ruby, uncle, but it still does not work. rake generate creates empty HTML file! Look, it’s broken!”

>rake generate
Generating... C:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/posix-spawn-0.3.8/lib/posix/spawn.rb:162: warning: cannot close fd before spawn
'which' is not recognized as an internal or external command, operable program or batch file.

Swat!

“Ow! What was that for?”

“These are not eeeeerroooorrs, just warnings! Your site is empty because you forgot to install Pyyyythoooon.”

“What? Python? But you said Ruby!”

Swat! “Ow!”

“You need Ruby and Python to use coooode hiiiighliiiighting on your blooooog!”

“What!? I have to install Ruby and Python?! But I don’t use either of those!”

“One mooooore thing.” Swat!

“Ow! What was that for?!”

Uncle swats Jackie

“For using Windooooows!”


So, if you tried using Octopress’s neat backtick codeblock syntax to display code on your blog, but you don’t have Python intsalled, you will get a mostly blank website as the Pygments plugin silently fails and messes up the whole generate process. The warnings can safely be ignored (and will hopefully be eliminated soon). I put in a request for a message of some kind so that the process wouldn’t fail so silently.

Good times. Those are all on Netflix, by the way.

July 4, 2014: Internet and a Bike Ride

| Comments

Maybe I’ll manage to keep a somewhat OK journal if I write it on my blog…

Dear journal-that-everybody-can-read,

Today we finally got internet in our apartment, which I may explain another time. A guy showed up at 9 o'clock sharp and installed it, after which I had to call the support center for help because their installation disk didn’t work (because I have English Windows?). The internet is blazing fast, and now I can work on online classes without taking a trip to Erika’s work.

I needed to send a rental phone back to SoftBank, so I took a ride to the place in the main shopping street (商店街) that had a Kuro Neko sign outside. The shop is a recipient for packages, but mainly they sell hanko and religious items. I filled out the form and we talked for a while. I was looking for the translated version of Glenn as a hanko (small valley; 小谷) and she showed me the ones that her father carved. The light wood is coated with black ink before being carved and the dark wood with red, so that it is easy to see what shape has been carved. He must have carved thousands of hanko, but apparently his eyesight was still good in his 80’s. She talked about wanting to learn English, and having English-speaking friends.

Next I asked about the objects in her shop and she taught me about ihai (位牌), a Buddhist memorial tablet. When someone dies, the local priest(s) and family work together to make a posthumous name for the deceased. The name might have something about the person’s hobby or work. The piece she showed me had “ocean” (海) because the man was a fisherman, and because the wife was so diligent about paying regular respects at the shrine she was able to get her husband a rather good name which included the character “弘”. The character is nice because it immediately brings to mind the name “弘法大師”, the founder of the Shingon Buddhist sect. The top of the tablet has a single Sanskrit character (अ) representing the Shingon sect inscribed on it. (Each sect has it’s own bonji (梵字); the characters that go with each sect are given here). The back of the tablet contained the name of the deceased’s original name and death date, in esoteric(ish) Buddhist style. The tablet is placed in the home’s Buddhist altar (仏壇) for 50 years. The family meets on certain years to commemorate their death, and after the 50th year there is a final celebration and the name is inscribed in a special register book (帳面) instead. The ihai is burned ceremonially at the grave site, and the register that contains the names of many deceased relatives is kept in the altar ever after. She showed me the one for her family, and it had lots of names in it, some of them with old kana characters that haven’t been used for a long time (specifically a hentaigana form for “hana” using the characters based on 者 and 奈). We exchanged names (笠原秀佳) and I told her I’d bring back my wife as a possible study mate.

Then I explored the town a little bit. There’s a park on top of a hill on the edge of town, but the steps to the top are full of weeds. There’s also a monument to someone, which I’ll have to go back and read when I have my dictionary. I found 4-8 small temple/shrines along the mountainside, and one seemed to be abandoned and overgrown with weeds. On my way back an old lady stopped me to sell me some cucumbers (きゅうり) she had just picked. Her skin was dark from working in the field, and she stuttered while she talked so she had to repeat some things for me. She gave me several handfuls of shiso leaves. “You’re married, right? Good! Have her cook these up for you.” We exchanged names (石岡) before parting.

Next I found a community center. They might only have activities for children and the elderly, but my interest was piqued because they have art, cooking, and several other categories. I picked up a copy of a city newsletter, and what I thought was a culturally interesting ad for consultation regarding being Ainu (mistreatment, racism, invasion of privacy), with a special guarantee that all communication would be completely secret. There was also a big sign outside with a list of foods that are unique to Kasaoka. I think I’ll go on a treasure hunt to try all of them!

Tonight Corey and Chun Mi picked up Bentley and me and we all went to Ooshima elementary (大島小学校) to play badminton. I think the Japanese players were all elementary school teachers. One of them has been playing badminton every Friday there for the last 20 years! It was my first time playing, so Bentley and Chun Mi helped me figure it out, and we formed teams to play until about 10:30. The gym was pretty warm and muggy, and so of course I was soaked.

Whew. I’m hungry!

Preparation: The Key to a Less Painful Windows Reinstall

| Comments

Recently I decided to work on an easy ticket in StepMania to get into the source code. It turned out that I (gasp!) actually needed to build the project on my computer, but of course I was missing required libraries: DirectX 9 and the DirectX SDK. No matter what I tried, neither of these would successfully install on my computer, and I decided that the only way to install it would be to reinstall Windows (which worked, but ugh!). Usually reinstalling Windows is a terrible process that can take days and leave you lost on your own computer, which is now missing all of your favorite programs and customizations. This time, I thought, I should make it as painless as possible.

I should note that many Windows users can just pop in the restore disk they got from their computer manufacturer and be happy with the results. Users like myself, however, hate the extra programs and files that come from the manufacturer, opting instead for a vanilla install disk. If you’re like me, then you’ve also built up a library of programs and configurations over time, and reinstalling Windows threatens to destroy your precious setup.

The key to a less painfull reinstall is preparation. The following will make your life a lot easier:

Use the Cloud

“The cloud”, of course, meaning some service that will store your files for you. I prefer not to have the entirety of my hard drive synched with an online service, so I cannot make recommendations there. But the most important files, my current coding projects, are all on GitHub, so I don’t have to worry about losing them. When I do manual backups, I don’t have to bother with local copies of GitHub projects. Some of my application settings are synched through DropBox, as well (see this tutorial to do it for Sublime Text), as well as many documents I feel are too important to have just one copy of.

Organize Your Files

If you’re like me, you might try to stay organized but fail to do so completely. Reinstall time is a good opportunity to clean up unneeded files and to gather what’s left into centralized directories. I keep archived work (school, projects, research, work) in a single place. This turned out to be a good opportunity to decide that several projects were done and should be archived. Make your files as clean as possible before reinstalling so that backup is as simple as possible.

Backup on a Hard Drive Partition

Reinstalling Windows only wipes out the C: drive, so if you have another partition (D: on my computer) this will be saved during reinstall. This is useful for backing up large amounts of data. It is much faster to move files between partitions than it is to move them between completely different drives. I recommend you make a large partition for keeping permanent files (archived work, pictures, videos) that you need ready access to but don’t necessarily make constant use of. Ones you don’t need ready access to can be stored on external drives and put away somewhere safe.

Files that you do need constant access to are usually in My Documents, on the Desktop, or elswhere in the C: drive. Back up these files in your other partition and they will be easy to replace after the the reinstall.

Download Drivers Before Reinstalling

It is always so nice when your computer restarts with a brand new copy of Windows. Like a newborn, so clean and shiny and unsoiled by the world. But then, you realize that it is as also as helpless as a baby. The screen resolution is off, the sound doesn’t work, the touchpad is jumpy, etc., etc. Navigating the internet without good drivers is a pain in the butt, so do yourself a favor and download and save all of the driver installers before you reinstall windows.

Find Chocolatey Package for Your Favorite Programs

Usually after reinstalling Windows you end up reinstalling all of your programs as you need them for the next several weeks or months. This is time-consuming and distracting. Chocolatey, a package manager for Windows, can do the tedious part for you. After you install Chocolatey, find the names of the packages containing the programs you like on the web site. Then instead of downloading and running an installer, simply type cinst packageName into your cmd and let Chocolatey do the boring part!

For example, I know I will need Evernote again after the reinstall. Instead of installing it manually, I can type cinst Evernote5 into my cmd and be done with it!

Before you delete your Windows partition, look at all of the programs you have installed and decide which ones you actually want to reinstall later. That’s right, I know you have 5 bajillion programs that you never use. Don’t worry, me too. Save the names of Chocolatey packages you want and the names of other programs that don’t have an available Chocolatey package into a single file. This will easily save you hours of work later.

Save Product Keys with ProduKey

ProduKey comes as a Chocolatey package. When run, it gives you the product keys used for both Windows and Microsoft Office. These can sometimes be hard to track down, and retrieving and saving them this way before reinstall will save time.

Create an Executable Master Plan

As if each Chocolatey command didn’t cut out enough work, why not put all of the chocolatey install commands into a batch script and have them all run while we go eat lunch or something? A simple batch script will work fine:

cinst GoogleChrome
cinst Evernote5
cinst dropbox
cinst adobereader

To make this file really useful, however, you should comment it with your complete reinstall plan. rem is how you do comments in batch scripts:

rem Windows post-installation setup
rem You must download and install Chocolatey before running this: chocolatey.com
rem Before reinstalling:
rem   1. backup xyz directory
rem   2. download and save drivers in external hard drive
rem   3. save registration keys using ProduKey

rem Fun Stuff
cinst GoogleChrome
cinst Evernote5
cinst dropbox

rem General Development
cinst kdiff3
cinst git
cinst git-credential-winstore
cinst sublimetext2

rem Java Development
cinst jdk8
cinst eclipse-standard-kepler
cinst ant
cinst maven
cinst gradle

rem Configure Git
git config --global user.name "Nathan Glenn"
git config --global user.email "garfieldnate@gmail.com"

echo "Now install KeyTweak by hand"

In the comments or echos you can record all of the programs that need to be installed manually, all of the settings you should copy or set manually, all of the files that you need to restore, etc. Usually trying to remember what your computer needs can take a lot of time and some stress (“I swear I’m forgetting something!”), so saving a master plan is really a must.

You can view or fork mine on GitHub.

Conclusion

This time around I had the least painful Windows reinstall ever. It still took some time, but I was able to cut out the guesswork and most of the manual installing. Taking a page out of the dev-ops book, an executable master plan really simplified the whole process. If you have lots of computers that need to be configured quickly and automatically, then take a look at BoxStarter.

List Assignment in Scalar Context

| Comments

(Cross-posted on blogs.perl.org)

This week I received some special help on SO in understanding how the goatse operator works. I was very thankful for everyone’s help. These two articles were also very helpful and I recommend reading them.

Part of my confusion over the goatse operator was not knowing the difference between list and scalar assignment operators, which both are indicated via ‘=’. Further confusing is the fact that each can be used in either scalar or list context, so you can have list assignment in scalar context or scalar assignment in list context.

The type of assignment is determined by what is being assigned to. As ikegami says, assignment to an aggregate is a list assignment, aggregate meaning an array, a hash, a parenthetical expression, or a my/our/local variable declared with parens.

The context of an assignment operator will really only matter when you are storing or checking the return value. You can store the value of an assignment operator by using another asignment operator: blah1 = blah2 = blah3, where blah1 is the value returned by assigning blah3 to blah2. The value gets checked in other contexts too, like inside a control structure condition: if(my $line = <>), etc. Here are examples for each combination of context and assignment operator:

# scalar assignment in scalar context
$thing = ($foo = 'bar'); # assignment returns $foo as lvalue
say $thing; # bar
# scalar assignment in list context
($thing) = ($foo = 'bar'); #assignment returns ($foo), $foo is lvalue
say $thing; # bar
# list assignment in scalar context;
# assignment returns number of items in RHS of list assignment
$thing = (($foo, $bar) = qw(foo bar));
say $thing; # 2
$thing = (() = qw(foo bar))
say $thing; # 2
$thing = () = qw(foo bar);
say $thing; # 2
# list assignment in list context
# assignment returns LHS list as lvalues
($thing) = (($foo, $bar) = qw(foo bar));
say $thing; # foo
($thing) = (() = qw(foo bar));
say $thing; # nothing ($thing is undef)

That third one is of course the goatse operator. By the way for the record I totally think it looks more like a Saturn, though my wife disagrees and everyone seems to call it goatse. Anyway, though generally list assignment in scalar context is the rarest one, there are other occurrences. Ysth mentions the each operator inside of a while loop:

perl `each` operator in a `while` loop while(my ($key, $value) = each %hash)

The aggregate on the left makes this list assignment, and while makes it scalar context. Once the hash is out of keys, each returns () so that the assignment operator returns 0, finishing the while loop.

I was pretty happy to finally understand this area I never quite understood I didn’t understand (though someone might still point out I don’t know what I’m talking about, as seems to be common with this subject). Today, though, I thought of one more usage of list assignment in scalar context that is probably used erroneously fairly often: quick and dirty parameter checking:

my ($input, $output) = @ARGV or die 'Usage: script <input> <output>';

I always thought that the assignment would return $output, probably by analogy with comma expression assignment to a scalar ($stuff = qw(foo bar)). However, if the user fails to provide a second parameter, the error would not be caught. This assignment will return the number of elements in @ARGV, which could be 1 instead of the required 2. So this use is only correct when unpacking @_ or @ARGV and expecting exactly one variable:

my ($input) = @ARGV or die 'Usage: script <input>';

This is probably obvious to Perl old-timers, but to me it was a revelation. And it doesn’t look like I’m the only one, either. Grepping CPAN for assignment of an array to a parenthetical with ‘or’ after it turns up many mis-uses here.

Packaging XML::LibXML With PAR Packer on Windows

| Comments

PAR Packer is an excellent utility for delivering your Perl scripts as standalone executables. A standalone executable is highly desired in, for example, a corporate environment where everyone needs a program you wrote but you can’t expect anyone to learn how to run Perl programs.

A recent requirement at $work was for a standalone executable. Originally, I was supposed to let my coworker work his magic (and his ActiveState PerlPacker license), but the client required an all-open-source solution. Thus I turned to PAR Packer and its pp utility.

So far, the most difficult aspect of using pp is that it doesn’t detect all dependencies. It requires the user to explicitly list many required DLL’s. I needed to list DLL’s for two libraries: Wx and XML::LibXML.

Creating Wx apps with pp is a solved problem: wxpar, bundled with Wx::Perl::Packager, is a pp wrapper and adds all of the required Wx DLL’s.

Getting it to work with XML::LibXML required some trial and error. I would create the executable, move it to another computer without Perl or C, run it from the command line (clicking the file hid certain error messages), and write down the name of the library that was missing. It turned out that three DLL’s needed to be explicitly added: libxml2-2__.dll, libiconv-2__.dll and libz__.dll. On my computer these were located in C:\strawberry\c\bin. So, the final command I used to build my application was thus:

wxpar -o MyApp.exe -I lib -l C:/strawberry/c/bin/libxml2-2__.dll -l C:/strawberry/c/bin/libiconv-2__.dll -l C:/strawberry/c/bin/libz__.dll MyApp.pl

Is there a simpler way to do this? What’s with all the underscores? Comments and questions welcome below.