Testing Infinite Loops

I had an interesting problem today – well, interesting for me.

While working through adding tests to a code-base woefully bereft of any such safety nets, I happened upon a method that had an obvious risk of getting into an infinite loop (in an extreme edge case).

The method’s job is to return three random letters from a ‘secret word’ that a user has previously stored, and the position they were taken from in the word. The goal being to augment a login process.

  # general sense of the existing method
  def sample_of_secret_word_characters
    chosen_characters = {}

    until chosen_characters.length == 3 do
      index = rand(secret_word_characters.length)

      if !chosen_characters.has_key?(index)
        chosen_characters[index] = secret_word_characters[index]
      end
    end

    ## the rest of the code to sort and format the characters...
  end

Okay – I’m not stressed about the current code for now (or rather, I’m trying not to be); as it’s running in live, and has been for a couple of years. What I need to do is write tests to make sure it carries on doing what it’s supposed to. I can’t refactor until I have ‘green’ tests for all the method’s behaviour.

The tests for the happy-path are easy enough, but you may have spotted the “what if”… what if the chosen_characters hash never gets to have three elements in it? In that situation, the code would loop forever.

In practice, other methods ensure that the secret word has a minimum length, but it’s perfectly reasonable to imagine that at some point, the database could be updated directly, or even the validations could be (accidentally) removed – and since there were previously no tests for them either, it could have happened. If one of these – admittedly unlikely – events would occur, there’d be a problem.

But how do I test the existing functionality that, given a fewer-than-three-character secret word, the code gets in a loop? I can’t rightly just start hacking away at code that has tens of thousands of users (because, honestly, the risk of a loop wasn’t the biggest issue, but it was the weirdest to test).

After a little Googling, and StackOverflowing, I happened on (what I thought, anyway) was a neat little solution: Run the offending method in a new thread, and assume that if the thread doesn’t finish quickly, it must be looping.

  # temporary test of existing faulty operation
  it 'enters an infinite loop if the secret_word is fewer than three characters' do
    begin
      thread = Thread.new do
        user.secret_word = 'ab'
        user.sample_of_secret_word_characters
      end
      sleep 0.01
      expect(thread.status).to be_truthy
    ensure
      thread.kill
    end
  end

So with that in place, I had tests for all the behaviour (even the undesirable infinite looping) of the method. So I could get on refactoring (, and also alter the test for what to do in the case of the secret_word being fewer than three characters (I set it to raise a custom exception – it would be, after all, an exceptional situation).

  # refactored method
  def sample_of_secret_word_characters
    raise SecretWordError if secret_word.size < 3

    chosen_characters = secret_word_characters.each_with_index.to_a.shuffle.first(3)

    ## the rest of the code to sort and format the characters (equally refactored :-) ...
  end
click to read more
Imposter Syndrome

Imposter Syndrome.

When I teach people, one of the biggest obstacles to success is the student’s own feeling that “they are a fraud, and can’t possibly become a developer”. Your sense that you’re going to be “found out” and have people point fingers at you and laugh at you as an interloper; that’s so strong, that it makes people decide to not pursue a path to learning which could give them the skills they want.

As someone who loves the results of seeing people achieve the goal of forging a career in software development, there is little more disappointing to me than to have someone decide that learning is not for them because “they can’t do it” (based on nothing other than their feelings of being a imposter).

The doubly-frustrating part of this is that I’ve never found a successful way of expressing to that person, that the feeling of being an imposter Never Goes Away. I empathise with them completely, and I understand their decision, because I, and every developer I’ve ever spoken to says they feel exactly the same.

Whenever I sit in meetings where everyone is discussing Serious Business™, I look at all the terribly self-assured people around me and cringe at the prospect of being found out that I’ve blustered my way into the room, and that I have no right to be there giving my input to the discussion. At the times I’m about to give my opinion on something, I look around as if to find a responsible person to whom I can defer the decision - because they can’t possibly think that I’m able to make those choices. After all, I’m a complete imposter…

However, I have learned over long experience that it’s most likely that the vast majority of the others in the room are having the exact same same doubts - and because those feelings are not unusual, I need to discover ways of turning them to my benefit. To this end, when the feelings of being an imposter start to creep over me, I reflect on the things I’m worrying about, and use them to make my preparations even more thorough: If I feel that I’m a fraud to be teaching a particular topic, I double-check my notes to ensure that there are no surprises in there for me; If I feel that a task is way beyond my capabilities, I review the similar things I’ve done before, and find the comparisons.

Over everything else, I listen to others’ views about my abilities and performance. Being conscious of the fact that people are far quicker to criticise than praise, when I get sincere praise for something I’ve done, I file that away for the next time I feel like I’m an imposter at that task - for if I am an imposter, and they’re the experts telling me I can do it, they must be right.

click to read more
The Essential Interview Question

If there’s one thing to do during an interview it’s this…

I dislike interviews (with me as the interviewee – I’m a contractor, so I get quite a lot of reminders how much I dislike them): I’ve had some awful ones; where the interviewer has plainly told me they don’t like me; where I’ve frozen up and not known what to say; where I’ve been told they’re gonna offer the position to the previous candidate (but might as well go through the motions with me now I’m here); where I’ve been way out of my depth (normally having been sent in there by a lying recruiter).

And I’ve developed a fairly thick skin.

For a long time now, I take the approach of “just being me”, and they either hire me, or not – no harm, no foul. Needless to say, if you knew me, you’d know that wasn’t a great tactic for me to get a job – but I’d rather get turned down for jobs I’m not right for, than start and get sacked a few weeks in coz I pretended to be someone they liked.

Anyway. Although I’m fairly ambivalent about interviews, but I do still get very down and frustrated if I think that the role would be perfect for me, but they don’t think I’m right for the role. I still recall one interview that I thought went AWESOMELY – they loved me, I nailed it. It was in the bag.

Until I got a phone call from the agent – nope… they didn’t think I could do part of the job; part they never mentioned in the interview as being required, that had they, I would have regaled them with my suitability for.

Clawing at the recruiter to go back to them with this information; it was too late, they’d offered to someone else.

The question

So now, I always end an interview by asking, when they say, “well… is there anything else you’d like to ask?”

Is there anything that we've talked about today that would make you think I'm not right for this role?

Boom.

Simple.

Firstly, I’m showing great confidence (try it, it’s a hard question to ask). Secondly, I’m showing that I’m keen for the role, and am looking to remove any blockers.

This gives them a chance to reflect on their impression of me and clarify any nagging doubts. Which gives you a chance to finish the interview with a positive impression (gotta love that recency effect).

The extra benefit is that if they say “No, it’s all good”, it means one of two things:

  1. You’re gonna get the job - or at least, unless there’s someone better suited. But you could do no better than your best, so nothing to be ashamed of.

  2. They reject you over some detail, which proves they’re lying scumbags (they told you to your face there were no concerns…), who don’t even have the integrity to talk to you honestly. So you don’t have to feel bad about not getting the job, because you didn’t want to work for liars anyway! ;-)

Win, win.

PS Obviously, if you don’t want the job by the end of the interview, don’t ask The Question!

click to read more
Limiting ActiveRecord results

TL; DR

No, adding .limit(some_number) won’t cut it for this problem. Quick, go here.

What’s the scenario?

In my current role, I’m working with fairly large data-sets (not really “big data”, but no small shakes either). I’m writing reporting functionality on a PostgreSQL-backed Rails app – migrating existing functionality from a C# .Net app (I know… the thought-process must’ve gone something like this: “Let’s manipulate millions of records to generate totals and statistics. What framework is really not designed for that…” :-)

The main data table has around 2-million records, with a handful of associations on the base model (let’s call it Result). Users will filter based on attributes of the associations and results themselves, and then show grouped totals and subtotals for combinations of the associations (yes, I’m being deliberately vague), all split out by month and year.

Skip to the end…

So anyway, it was fairly clear to most of the team, that the vast majority of this needs to be done in the DB, with the most minimal amount of Rails model instanciation. But it soon became apparent I had two broad problems:

Firstly, while I was coding and trying stuff out; the general day-to-day tweaks and twiddles, I didn’t want to have to wait for 2-million records to be operated on.

Secondly, one team member would not grok the absolute insanity of thinking that with the size of datasets we had, iteration of enumerable collections of it was acceptable.

Solutions, solutions

The second problem was going be solved with facts; cold, hard evidence in the form of benchmarking results. All accompanied with a nice chart showing how the time increased disproportionately the larger the sets you were iterating became (big O, baby).

The first problem was going to be solved as a side effect of solving the second - I was going to end up with a nice little scope I could configure globally, which limited the records any ActiveRecord query would operate on. I would then use this scope in development to only perform calculations of a few ten thousand records (enough to see numbers coming out), but I could switch back to the full set at the drop of a hat to see the effect with the whole data set.

Ideal.

Benchmarking setup

There’s a programming mantra of measure, don’t guess – whenever I get on my high-horse and start exclaiming what approach a code-base should take, I always have a nagging little voice in my head asking “how would I back this up if someone challenged me”.

So my first thought after pointing out the error of the nested map calls was to scuttle off and measure the operation compared to grouping in SQL.

But — I wanted to do this progressively; one of the arguments for the map was that “it’s fine for the moment” (on the developer’s dev machine with 10,000 records), and I wanted to show plainly how performance would degrade as the records accumulated.

Now you might say, “That’s fine, just limit the records so:”

Result.limit(10_000)

But unfortunately no – because by doing a lot of grouping and summing in SQL, any limit condition would apply to the amount of records in the end relation, rather than the amount of records being used for the grouping.

> Result.limit(2).group(:foo_id).count
=> {5=>23, 6=>19} # two items in the hash, not two records being grouped
> Result.limit(4).group(:foo_id).count
=> {5=>23, 6=>19, 12=>10, 13=>1}  # four items in the hash, see my problem?

I wanted to limit to counting just two records grouped by their foo_id, but I actually counted 42 records (judging by the totals of the count). And the second query brings even more in – by limiting the size of the hash – the output of the relation (instead of the records that go into generating that output).

Stop, hammer time

Second solution:

Blow away the DB, and then populate it with the first 10,000, measure, add another 10,000, measure again, repeat…

Except this is a bit labour-intensive (since I needed to step up – in increasing step sizes – to 2-million-ish), and also hard to repeat when someone queries my numbers, or offers an improvement in code and wants to measure again. And it certainly won’t offer me any chance to save time in my day-to-day work on the app.

Time to get down and dirty

v1

I had an little inkling in my head… there must be a way to add an elegant little scope to my ActiveRecord objects which would limit the records. The simplest way, I suppose, is to choose records by ID:

scope :limited_records, ->(amount) {
  where("#{table_name}.id < ?", amount.to_i)
}

Ta da!

Now my console checks look something like this:

> Result.limited_records(2).group(:foo_id).count
=> {5=>1, 6=>1} # two records grouped and counted
> Result.limited_records(4).group(:foo_id).count
=> {5=>1, 6=>2, 12=>1} # four records grouped and counted
> Result.limited_records(8).group(:foo_id).count
=> {5=>3, 6=>2, 12=>1, 13=>2} # eight records grouped and counted

Okay… what’s wrong with that? It seemed to work at first pass. But the assumption is that there won’t be any holes in the sequence of results, and that’s a bad assumption.

If there’s “holes” in my dataset, I could end up with something like this:

> Result.count
=> 1000000 # one million exactly -- convenient
> Result.limited_records(10).count
=> 10 # all looks good here
> Result.limited_records(20).count
=> 18 # uh oh! in the first 20 records, two must have been deleted at some point
> Result.limited_records(30).count
=> 18 # OMG! it's more than just two that have been deleted :-(

v2

scope :limited_records, ->(amount) {
  id = order(:id).offset(amount.to_i).limit(1).pluck(:id).first
  where("#{table_name}.id < ?", id)
}

So this is a bit more like it – beasting the AR methods!

I’m plucking the ID of the record after the position I want to go up to, then using that as a filter to include only records below it.

> Result.limited_records(10).count
=> 10 # all looks good here
> Result.limited_records(20).count
=> 20 # still good
> Result.limited_records(999_999).count
=> 999999 # cool! well, that was easy enough

Although, as nice as the abstraction of those ActiveRecord methods are, it would be best for the purposes of benchmarking (remember why we were doing this!) if our code has as small an overhead as possible. So if you can do that in SQL it would be better.

v3

scope :limited_records, ->(amount) {
  where("#{table_name}.id < (select id from #{table_name} order by id limit 1 offset ?)", amount.to_i)
}

So now what’s wrong?

What would you expect to happen if you asked to limit the records to more records than were in the DB? The unsurprising behaviour might be to just return everything. Unfortunately, that’s not what our scope does:

> Result.limited_records(999_999).count
=> 999999 # cool
> Result.limited_records(1_000_000).count
=> 1_000_000 # cool, cool
> Result.limited_records(1_000_001).count
=> 0 # oh!

Since there is no record at that offset, there’s no ID to be lower than – so no records to return. The quick fix for this is a slightly uglier scope:

v4

scope :limited_records, ->(amount) {
  if count > amount.to_i
    where("#{table_name}.id < (select id from #{table_name} order by id limit 1 offset ?)", amount.to_i)
  else
    scoped # done for a Rails 3 app... Rails 4 should use `all`
  end
}
> Result.count
=> 1000000 # cool
> Result.limited_records(999_999).count
=> 999999 # cool
> Result.limited_records(1_000_000).count
=> 1000000 # cool
> Result.limited_records(1_000_001).count
=> 1000000 # cool, cool, cool!

One hiccup though, is that your model might have a default scope applied to it – and this might be filtering out records. Maybe not an issue, but be careful.

References

click to read more
Rails 'Except' scope

If you want to be able to exclude certain records from an ActiveRecord query, this scope does the trick.

  scope :excluding, -> (*values) { 
  where(
    "#{table_name}.id NOT IN (?)",
      (
        values.compact.flatten.map { |e|
          if e.is_a?(Integer) 
            e
          else
            e.is_a?(self) ? e.id : raise("Element not the same type as #{self}.")
          end
        } << 0
      )
    )
  }

I’ve been putting it into pretty much every app I’ve written in the last couple of years (which is a bit of a clue I should have made a gem out of it). If you want to exclude a subset of records, you can pass it an array of IDs, or objects (of the type you’re querying), and the arguments passed to it will be excluded from the list.

For example, given the scope being included in a Post model, you could use the scope thus:

  @posts = Post.excluding(current_user.posts)

Or you could populate a list of people to invite to an event - but ommitting those people that have already been invited (assuming the scope is in your User model):

  @people = Person.excluding(@event.invitees)

And it can be chained with other scopes (as you should expect):

  @winner = Person.non_winners.excluding(Person.staff).sample

But what’s it doing?

Okay, I don’t blame you if you don’t just copy it, paste it, and use it in blind faith.

Let’s step through it.

scope :excluding, -> (*values) { . . . }

We define a scope, called ‘excluding’, and it takes some arguments. The arguments are all ‘splatted’ into an array, and later we can refer to them as the variable values.

where("#{table_name}.id NOT IN (?)", . . . )

The scope might be chained to other scopes, so any references to SQL field names is best ‘disambiguated’ by adding the table-name of the class to it.

This simple where condition is saying “get me all records except those that have an ID of …” – and the collection of IDs to exclude are passed in in the next argument.

values.compact.flatten.map { |e| . . . } << 0

Let’s take those values that were passed into the scope, compact out any nil values, and flatten any arrays that were passed in. Now we can iterate over it, and aim to get an array of integers to pass to the where condition.

But hang on… if no values are passed in, the map will return an empty array, and our id NOT IN (?) query will get no values replacing the question-mark placeholder – That would cause a SQL error. Oh noes!

To avoid this, we’re going to shovel a zero into the result of the map; so whatever happens, the array will have at least one value, and no records will (should…) ever have the ID ‘0’, so it will not exclude anything it shouldn’t (unless you have some very bad DBAs who start their sequences at zero… but in that case, while you work your notice period, you could shovel -1 in instead).

if e.is_a?(Integer) 
  e
else
  e.is_a?(self) ? e.id : raise("Element not the same type as #{self}.")
end

As we loop over the values, we’ll keep what we were given if it was an integer, otherwise we’ll ask the object for its ID. Which means these two give exactly the same result:

Person.excluding(1,2,3,4)
Person.excluding(Person.find_by(id: [1,2,3,4]))

The idea being that if you have a relation, you can use it, but if you have a collection of IDs, that’ll work too.

In the event that an object of the ‘wrong’ type is passed into the values, an exception should be raised – all sorts of pain and suffering would ensue if the code worked happily for something like this: @cats = Cat.excluding(Dog.all)

Aliases

The name “excluding” makes sense to me, but if you feel that you’d rather call it “except” instead – if that trips off the tounge nicer for you; or you like to keep the interface of your ActiveRecord objects similar to Hash’s method except, then feel free to just name the scope something else (or alias it!).

click to read more