Captcha Audio with ASP.NET

The subject of captcha images has been well covered. There are plenty of available resources for creating those warped letters. I created such a control that is a combination of quite a few different ideas and was quite proud of it until not thirty seconds after demonstrating it, someone asked whether it had a ‘play audio’ button for blind people. All of a sudden my fancy captcha control was not so fancy.

I tried to suggest using reCaptcha as a solution that had all the bells and whistles, but the customisation of my control trumped the somewhat fixed design of reCaptcha. So the problem of the audio captcha brewed a little in the back of my mind and a month or two later I turned back to it to see if I could find a solution.

A speech synthesis engine was not on the cards, so I figured that because the captcha image was a random collection of letters and numbers, the only way I could generate the appropriate audio was to have audio files of all the letters and numbers. I would then need to join them together, on demand, and play them from the web page.

Creating all the letters and numbers is straight forward enough (cue microphone and best-est speaking voice) and playing audio files from a web page has also become pretty easy thanks to the great SoundManager 2 javascript plugin.

The trickiest part was definitely joining mp3 files that SoundManager requires. MP3 audio files are particularly tricky as they can contain ID1 or ID3 tag information, so just joining them back to back would not create a correct MP3 file. What I needed to do was determine if a particular file had the tag information in it and strip it out if necessary. This took a lot of Googling and studying of the MP3 specification but I eventually managed to ensure the files had no ID2/3 tags in them before joining the files together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Imports System.IO
 
Public Class MP3Concatenator
 
    Public Shared Function Join(ByVal MP3sToJoin As Generic.List(Of String)) As MemoryStream
        Dim ms As New MemoryStream()
        Dim bw As New BinaryWriter(ms)
 
        'loop around each file and remove the tags and then concatenate the files
        For Each mp3File As String In MP3sToJoin
            Dim bytes() As Byte
            Dim fs As New FileStream(mp3File, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
            Dim br As New BinaryReader(fs)
            Dim audioStart As Integer = 0
 
            'Check for ID3 Tags
            fs.Position = 0
            If (System.Text.Encoding.ASCII.GetString(br.ReadBytes(3)).ToUpper = "ID3") Then
                'position of the header size bytes
                fs.Position = 6
                'MSB of size is Set to 0 and ignored so we need to convert the value
                audioStart = BitsToLow(br.ReadBytes(9))
                'add the header end position to this
                audioStart += 9
            End If
 
            'Check ID1 Tag
            fs.Seek(-128, SeekOrigin.End)
            If (System.Text.Encoding.ASCII.GetString(br.ReadBytes(3)).ToUpper = "TAG") Then
                'there is a ID3v1 tag on the end which needs removing
                fs.Position = audioStart
                bytes = br.ReadBytes(CInt(fs.Length - 128))
            Else
                fs.Position = audioStart
                bytes = br.ReadBytes(CInt(fs.Length))
            End If
 
            bw.Write(bytes)
            br.Close()
            fs.Close()
        Next
 
        ms.Position = 0
        Return ms
    End Function
 
    Private Shared Function BitsToLow(ByVal Size() As Byte) As Integer
        Dim Ret As Integer
        Ret = Size(3)
        If Size(2) <> 0 Then
            If CBool(Size(2) And 1) Then Ret += 128
            If CBool(Size(2) And 2) Then Ret += 256
            If CBool(Size(2) And 4) Then Ret += 512
            If CBool(Size(2) And 8) Then Ret += 1024
            If CBool(Size(2) And 16) Then Ret += 2048
            If CBool(Size(2) And 32) Then Ret += 4096
            If CBool(Size(2) And 64) Then Ret += 8192
        End If
        If Size(1) <> 0 Then
            If CBool(Size(1) And 1) Then Ret += 16384
            If CBool(Size(1) And 2) Then Ret += 32768
            If CBool(Size(1) And 4) Then Ret += 65536
            If CBool(Size(1) And 8) Then Ret += 131072
            If CBool(Size(1) And 16) Then Ret += 262144
            If CBool(Size(1) And 32) Then Ret += 524288
            If CBool(Size(1) And 64) Then Ret += 1048576
        End If
        If Size(0) <> 0 Then
            If CBool(Size(0) And 1) Then Ret += 2097152
            If CBool(Size(0) And 2) Then Ret += 4194304
            If CBool(Size(0) And 4) Then Ret += 8388608
            If CBool(Size(0) And 8) Then Ret += 16777216
            If CBool(Size(0) And 16) Then Ret += 33554432
            If CBool(Size(0) And 32) Then Ret += 67108864
            If CBool(Size(0) And 64) Then Ret += 134217728
        End If
        BitsToLow = Ret
    End Function
End Class

The trickiest part was handling the ID3 tag, as the specification states:

The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored..

Which is why I have the BitsToLow function in the code above.

The resulting concatenation is returned as a memory stream because I knew I could output this directly to the Response without writing the concatenated file to disk:

1
2
3
4
5
6
7
8
9
            ....
            Dim ms As IO.MemoryStream = Nothing
            ms = MP3Concatenator.Join(MP3FileList)
            Response.ContentType = "audio/mpeg"
            Response.ExpiresAbsolute = Date.MinValue
            If ms IsNot Nothing Then Response.OutputStream.Write(ms.GetBuffer, 0, CInt(ms.Length))
            ms.Close()
            Response.End()
            ....

By wiring up the CaptchaAudio.aspx page that returned the concatenated audio to the SoundManager plugin, I could create a link next to the captcha image, that played the letters in the image. Now my captcha control really was fancy.

 

4 thoughts on “Captcha Audio with ASP.NET

  1. Thanks a bunch for this article, it has proven extremely helpful! 🙂 The only part which I struggled with (since I’m a .net newbie) was the actual code to build the generic list for “MP3FileList”. After enough experimentation I got it to work by using logic such as:

    Dim MP3FileList As New System.Collections.Generic.List(Of String)()

    MP3FileList.Add( server.mappath(“/audio/file1.mp3”) )
    MP3FileList.Add( server.mappath(“/audio/file2.mp3”) )

    etc

    Thanks again!

  2. @theonlykenobi thanks for the comment. Unfortunately I cannot provide the full source for the page without crossing the line my employer for whom I wrote it. I have basically provided all the guts of the process here and there is very little else to do. Given the letters shown on the captcha image (presumably held in session), create a list of the letter/number mp3 files you need to join together and pass it into the concatenator. Output the returned memory stream to the response as shown.

  3. OMG! Awesome article, just stumbled accross it.

    Is this the full source of the audio implementation? If not is there any chance of providing it?

    Again awesome….

    Thanks for sharing 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.