There’s a core concept in machine learning called high-dimensional space

27.02.2023 0 By admin

Machine learning is pretty complex.

So we’ve been experimenting with ways to visualize what’s happening.

There’s a core concept in machine learning called high-dimensional space.

Here’s one way to wrap your head around this concept.

You can think about people as being high-dimensional.

For example, take famous scientists.

You can think about when they were born, where they were born, their fields of study.

Each of these is like a dimension of that person.

These dimensions become difficult to untangle when you think about different people, because someone might be similar in some ways, but very different in others.

MARTIN WATTENBERG: But this is the kind of thing you can use machine learning for.

With machine learning, the computer isn’t told the meaning of these dimensions.

It just sees them as numbers.

And it sees each set of numbers as a data point.

But by looking across all of these dimensions at once, it’s able to place related points closer together in high-dimensional space.

DANIEL SMILKOV: Here’s a concrete example where words are treated as high-dimensional data points.

The important thing to remember is that we haven’t told the computer the meaning of words.

Instead, we’ve shown it millions of sentences as examples of how words get used.

Here is a visualization of the results.

We’re looking at a subset of words that the computer has learned about.

Each dot represents one word.

Each word is a data point with 200 dimensions.

Using a technique called t-SNE, the computer clusters words together that it considers related.

And clusters form-base the meaning, even though we’ve never taught it the meaning of words.

READ  Artificial Intelligence Restores Einstein's Photos

Here is a cluster of numbers, months of the year, words related to space, people’s names, cities, and so on.

FERNANDA VIEGAS: We can also look closely at smaller sets of words.

If we search “piano,” we can run t-SNE only on words related to “piano.

” We get clusters of composers, genres, musical instruments, and more.

MARTIN WATTENBERG: And this approach doesn’t just work from words.

For example, you can also treat an image as a high-dimensional data point.

Here’s a dataset where lots of people wrote digits between 0 and 9.

People write in all kinds of ways.

So the question is, instead of us needing to manually code rules for all the ways people write, could a machine figure it out itself using machine learning? Each image is 784 pixels.

The computer treats each pixel as a dimension.

Again, using t-SNE, it clusters these images in a high-dimensional space.

We’ve color-coded them so that it’s easier for us to see what’s going on.

And you can see groups of digits clustering together.

It’s learned something about the meaning of these digits.

FERNANDA VIEGAS: These visualizations techniques we’ve been exploring can be useful for all kinds of things.

That’s why we’re working on open sourcing all of this as part of TensorFlow so that anyone can use these tools to explore their data.