computers | The Tokenizer

In the past few years I had developed a few NLP (Natural Language Processing) systems. Some systems were very simple, and others were extremely complicated. However, in both cases it all started when one of my clients or partners came and requested “A system that understands… (something)”. So – I decided to dedicate this post to share my opinion about the relationship between the term “Understand” and NLP.

“Understand”

The Human Brain:

How does the human brain work? I guess nobody knows the real answer. However, I like the Dual Process theory and Daniel Kahneman approach, that simply states that the human brain is a combination of two systems:

#1 System:
“System 1 is fast; it’s intuitive, associative, metaphorical, automatic, impressionistic, and it can’t be switched off. Its operations involve no sense of intentional control, but it’s the “secret author of many of the choices and judgments you make””. [source]
“System 1 that decides whether you like a person, which thoughts or associations come to mind, and what you feel about something. All of this happens automatically. You can’t help it, and yet you often base your decisions on it.” [source]

#2 System:
“System 2 is slow, deliberate, effortful. Its operations require attention. To set it going now, ask yourself the question “What is 13 x 27?” And to see how it hogs attention” [source]
“System 2, on the other hand, is lazy and only becomes active when necessary. Slow, deliberate thinking is hard work.” [source]

The Computer “Brain”:

Obviously a computer doesn’t have a brain, but it has a very powerful (at least compared to humans) computing unit (CPU) and different types of memory. In some way we can say that a computer is a “Super #2 System”, but still – it doesn’t have a #1 System at all!

My Personal Opinion: (in relation to the NLP world)

#1 System gives us as humans the ability to understand the relation between objects in our world. If you ask a six year old to give you a list of words that are related to the word “Car”, he can do it quite easily. This is, of course, not true in case of a computer. A computer can store in memory a long list of words under the category “Car”, but it doesn’t have any built-in ability to understand the relation between the words in the list, moreover it doesn’t have the ability to get a completely new word/object and immediately understand if and how it is related to the list.

The bottom line, in my opinion, is that a computer can’t really understand anything! This is simply because it doesn’t have a #1 System. Although there are a few theories (See Turing test) that state that a #1 System can be built of a finite number of #2 Systems, I personally don’t agree with them. I truly believe that we need completely new technologies, models and approaches in order to build a machine/computer with a real #1 System.

So, What is NLP all about?

If a computer can’t understand anything, how can we build a NLP system? A system that recognizes a human text as an input and return a meaningful result.

The answer is quite simple. NLP is all about taking a very limited and defined task that is related to human language and building a “Model” that tries to simulate the behavior of #1 System under the circumstances of the given task . We build the “Model” using tools from Computer Science (data structures, algorithms..), Mathematics (Statistics, Probability..) and much more.

In my opinion, defining the task properly and building the right model is the “Art” of NLP.

For example, here is hashtagify.me, a visual graph that shows relations between hashtags on Twitter. On top of this graph it is possible to build a model that recognizes and rank the relationship between tweets and Twitter users.

Timeout – My life outside the NLP world:
Besides being a computer geek, I’m also an amateur cross-fitter. Take a look of what I do for fun in my free time:

Follow me on Twitter or contact me: shlomibabluki@gmail.com.

Shlomi Babluki

For some reason I feel that NLP (Natural Language Processing) is considered an “Academic” field. While I don’t have a degree in this field, I do have quite a bit of practical experience. In the past few years I have developed a several NLP systems: a public transportation route planner, a remote television program recorder, an appointment scheduling system, as well as a few others. I am proud to say I have developed real products that thousands of people use every day!

First I want to apologize to the academics, as you may not agree with many of the things in this post (or the ones that follow).

Welcome to the NLP world

The goal of NLP systems:

In simple words the goal of a NLP system is to convert (or “translate”) a human text such as: a news article, text message, search request, Facebook status… etc, into a well-defined data structure which is readable for a computer.

A very simple example – a system that recognizes flights search requests:

Possible inputs:

The Result:

I’m looking for a flight- from Madrid, Spain to London, England- from Spain, Madrid to England, London- to London, England, from Madrid, Spain

– to England, London, from Spain, Madrid

<from>
<city>Madrid</city>
<country>Spain</country>
</from>
<to>
<city>London</city>
<country>England</country>
</to>

* Different inputs in a human language (left side) with the same XML result (right side).

Usually the NLP system is not a standalone system, but a one module of a larger system. In most cases the result of the NLP engine is used to retrieve some information (in our example: search for a relevant flight in the schedule) and then send the final result back to the user.

The cycle of a standard system:

User → User input → NLP system → Database, Information center → Final Result → User

More NLP systems:

For further reading, here are some well-known uses of NLP:

– Semantic role labeling
– Named-entity recognition (NER)
– Document classification
– Language identification

Timeout – My life outside the NLP world:

I would like to introduce you LEGO Mindstorms, a nice kit for learning the robotics world:

Follow me on Twitter or contact me: shlomibabluki@gmail.com.

Shlomi Babluki

The Tokenizer

Here’s a few things you might need to know, or maybe you just forgot…

Tag Archives: computers

“Understand”

Welcome to the NLP World

Share this:

Share this: