hckrnws
FYI this site is keeping everything you upload in a Google storage bucket, which was unauthenticated up until a little bit ago. (Full disclosure, it's my tweet.)
thotDBSmash ??
Does this imply that the people behind the website specifically saved "juicy" user-uploaded images?
no, it looks like it was a separate project, but stored in the same bucket. In the time I had access to the bucket (it's no longer public), it looks like they were scraping images from a dating site/app and each directory represented a profile.
That doesn’t sound super sketchy or anything.
Yeah, just what we need. Scams on tinder based off LLMs + fake/stolen profile pics.
I uploaded pictures of a couple of street corners and it confidently identified them as being in Texas and Florida, based on text that was not in the pictures and, in the second case, foliage in a scene that included only concrete. Although in fairness to the model, a parking lot may be the dominant ecosystem in Jacksonville.
Anyways, these pictures were from Iowa.
Same, location identified by the architecture of the buildings in the background and the car's numberplate... of a car with no numberplate driving through a wood.
I gave it a picture from a bar in Austin. It nailed it, but with some interesting hallucinations in the description. The photo had a small Texas flag, but nobody was wearing cowboy hats, and there was nothing with "Austin" on it in the photo. Description was:
This photo was taken inside a bar. There are several clues that indicate this is Austin. First, there is a sign on the wall that says "Austin." Second, there is a Texas flag on the wall. Third, there are several people wearing cowboy hats, which is a common sight in Austin. The coordinates of this photo are
Is it possible that the sign that says Austin is on the wall and is known to the system but not visible in the actual photo?
Perhaps, but I've tried some rural landscapes without any sign and it came up with English signs as a hint for pointing England/Wales.
Even photos with signs in Irish were pointing to England, it's half funny, half offensive.
I uploaded a generic photo I took of a field, dandelions, and trees. It confidently stated Switzerland, and provided specific GPS coords.
Of course, it was entirely wrong.
Some level of confidence indication would make a system like this much more useful.
I think this is generative AI and it doesn't know how confident it is. So far it hasn't gotten any of my pictures right and it's made pretty bad guesses, a human could do better on most of them.
The LLM’s logits should be translatable into probabilities although I’m not too sure how meaningful those might be as models can sometimes be quite confident in entirely invalid predictions.
I don't think that's the right way to think about LLM logits. Fundamentally the logits represent probability of similarity with the text it's been trained on, given the current prefix. Mixed in with any correspondence with truth is not only tone, phrasing, dialect, language syntax, but also stuff like the likelihood that specific details are related to general concepts. Even if we're talking about a person with three legs, or a horse riding a man, it'll be hard for the LLM to not assign a fairly high probability to sentences that describe two legs, or a man riding the horse and not the other way around.
I haven't done deep reading on LLM architectures and I don't really know if LLMs have logits in the traditional sense of a CNN or something, but I think the problem with this is that the LLM's logits would have absolutely no bearing on it's confidence of the location being correct, only on it's confidence that the tokens making up the answer it provides follow from the tokens that were encoded from the provided image, which isn't the same thing.
I can only imagine the quality of the systems being sold to governments, and people trusting them because "AI." I mean, "intelligence" is right in the name!
Intelligence is also in the name of the CIA, and that's pretty well understood as an oxymoron. Artificial is also in the name which seems much more apropos. It's clearly not real intelligence, it's purely artificial in the use of the word. I guess, Computer's Best Guess Simulating Intelligence To Low Intelligent Humans would be too on the spot and not as sexy of an acronym.
I actually have faith in the intelligence products produced by the modern CIA, much more than AI snake oil.
There's a difference in the products used to produce intelligence vs the analysis/centralization of it that the oxymoron comment goes towards.
Tried it a few times, it's hit or miss.
- Rooftop bar in Viejo San Juan, PR was identified correctly, down the intersection.
- Beach on the south coast of Vieques, PR was identified as Jamaica, so reasonably close for a non-descript tropical beach.
- Office building in Reston, VA which is fairly obvious (biggest/tallest building in the area) was identified as being in San Jose, CA.
- Train station in Staunton, VA was identified as somewhere in Massachusetts.
The attributes of the photo were mostly accurate, but were matched to an incorrect location.
1/3 was completely wrong, in the sense that the coordinates, country and city had nothing to do with it but THEN the sources were other buildings from the actual country and city. 2/3 got city and coordinates correct, but got the country wrong, which idk how that happened. 3/3 got country, city and coordinates correct Pretty cool
Doxxing for dough. The ethics committee is out to lunch.
See this thread: https://news.ycombinator.com/item?id=40233248
In their defense, the AI hype folks have been ignoring the ethics community from jump. I'd leave too.
[dead]
Neat demo. It seems like there may be a few things happening in tandem.
I uploaded a picture of a forest and it came back with visually similar images. So the first thing it might be doing is some kind of KNN, and if the pictures have location labels associated applying some sort of weighted average to determine GPS coordinates. This is pretty cool.
I also tried flipping the image horizontally, and it came back with the same images. So their embedding isn't based off of exact matches (good) and seems to be invariant to some basic translations (good). It also seems like it's directly extracting visual features from the image. This can be done with something like Blip[0].
Then I uploaded a screenshot from Magic School Bus. It still extracted information to guess the "location" of the cartoon (San Francisco, which is wrong). So that's probably how it works.
I also found the text output is similar in some ways with OpenGVLab InternViT [1]. So perhaps this or something like it is being used to extract features.
And of course there may be an LLM on top of these extracted features with some sort of prompt template. But I should add that the text explanation is the least useful part of the result, since it is unreliable and less informative than the "boring" similarity metrics above.
[0] https://huggingface.co/Salesforce/blip-image-captioning-larg... [1] https://internvl.opengvlab.com/
It's heavily biased and therefore easily tricked. I uploaded a photo from NYC.
The graffiti on the wall is a clue that the photo was taken in Detroit. The vegetation in the background is also consistent with the climate of Detroit.
It's also a ridiculously hard task.
Yeah maybe just don't do it then? If someone removes the EXIF data from a photo there's probably a reason for that, and assuming that's suspicious in some way is pretty ridiculous in a society that's supposedly all about personal freedom and the right to a fair trial.
I don't mean to be aggressive here but this seems like yet another tool that will be abused to shit by already powerful people to do even sketchier things.
I agree, this project probably shouldn't exist. But oh, well here we are and stuff like this can be built with reasonable effort. Scrape google streetview and every exif tagged Image you can get your hands on and get training.
I have no idea where this is heading, but we aren't turning back.
It is looking for distinct features in the photo and does a probabilistic match against a tagged dataset. The features that match best on the tagged photos in its dataset are used to construct output that looks like a plausible answer. Don't use it to plan your trip.
Yeah, gave it a photo of a beach on Lake Ontario with two Asian friends of mine in it. Guessed.. Japan
I uploaded a photo of a screenshot of a chess game on chess.com
It identified as the golden state bridge of San Francisco, saying the buildings in the background are also consistent with the architecture in San fran.
This correctly identifies South Korean landmarks, like Diamond Bridge in Busan. Since I don't have encyclopedic knowledge of world landmarks -- I wouldn't be able to recognize Diamond Bridge-like landmarks in United States -- and nearly no one does either, that alone is quite useful.
Google reverse image search will so that too, and has been working for 10+ years now?
Pretty cool. Correctly identifies different islands in the Galapagos based on the ground and the plants.
This is close to an actual need I've managed to create for myself.
I do photography and I store those I want to share on nextcloud. In my selection and export process all metadata etc is stripped. But I realized too late that it also stripped out the geo-coordinates. No problem adding that in, but still have a laaarge amount of photos without geolocation data.
I'm too lazy to re-export all the older ones, so being able to run something like this on them would be perfect. I would be satisfied with a general area, roughly hitting the province/state its taken in. It doesn't have to be accurate at all, it's more for my own geo grouping.
This site though goes bananas on firefox/mac. Flickering and font adjustments..
> I'm too lazy to re-export all the older ones, so being able to run something like this on them would be perfect. I would be satisfied with a general area, roughly hitting the province/state its taken in. It doesn't have to be accurate at all, it's more for my own geo grouping.
I don't think this is even close to being accurate to be used in this way, out of ~10 images I uploaded it got one "correct" (right country, wrong city). Unless you want all your images to geo-tagged "Somewhere, US", probably better to re-export/re-import with your original metadata.
That's fair. I couldn't even get this to work, so not in particular looking at this implementation. I just literally was thinking of if this would be viable or not as an approach, so it was fun to see something that tries to match the bill!
If you still have the original photos, maybe you can write a script to run both the originals and exports against a perceptual hash (so as to easily identify the correct original) and then just update the JPEG EXIF data of the exports?
https://github.com/JohannesBuchner/imagehash
Depending on specific formats, you should be able to read and edit metadata without having to reprocess the images. If the exports are named similarly to the originals, you don't even need to hash them.
Seems to be about as accurate as a good geoguessr player on a time limit. Recognizable vistas are generally right down to the city, and even if there's only general architecture to go off it's often right to within a couple hundred kilometers.
The explanations are a bit hit and miss. Some are great and correctly describe the names of buildings in the picture, some are only vaguely related to the picture.
Ethically this is very questionable. Of course with enough dedication humans can do the same (e.g. Rainbolt has made a Youtube career out of this), but commoditizing this for every stalker around the world has some troubling implications.
Ethics is absent in the minds of the people building and financing this. AI is about wholesale value extraction and destruction of competition done by the ecosystem of small startups repackaging AI APIs. Those APIs will be turned off once ad revenue starts flowing into the bank accounts of AI API providers.
My backyard: Germany because there are trees and a fence (? Also, no). A picture of the farmer's market of my town: correctly assume France but confidently incorrect on the town and landmarks (off by 200-300km I'd say).
The question is, how does this do at GeoGuessr, where users are given a picture from Google street view, and are asked where it is on the world by clicking on a map of the world. Users get points based on how close it is, user with the most points after N rounds, wins.
The best player in the world, Rainbolt, played against an AI out of Stanford, so I wonder how this one would do.
I put an image of villa isnard, cascina, Pisa, and it was recognized as being from France. It is a villa with French architecture and olive trees. I then tried to upload an image still from Pisa with a building in venetian Gothic style and it was recognized being in Venice. It can be deceived quite easily imho, it looks like it just search for the corresponding architecture (maybe?) and details surrounding it but it doesn't search online. Villa isnard is quite famous (at least, you have results online) and a Google lens search would have found it
Yeah, I would guess it's identifying elements in the photo, qualifying the likelihood of those combined elements in a particular location, then outputting the assumption. I posted a picture of a Texas lake seen from a privatr residence, and it correctly guessed Texas, but pushed it off by a couple hundred miles into an Austin golf course.
It seems to use many signals, at least according to its own explanation. For example it looks at road signs and license plates to identify countries.
I'm not convinced by the quality of this. I took some screenshots of street view, not including any icons, and it identified them as completely the wrong city. One of them included the name of the town on a bus stop, which it completely failed to identify, placing the picture across the county, also asserting that it contained featured that it definitely didn't, such as thatched rooves (all rooves in the image were normal slate). I would make trust it to get me in the correct area of the country, but that's about it
After a bit more testing, it could successfully identify Buckingham Palace and St Michaels Mount (however the location wasn't great), however a street overlooking a beach in Cornwall was marked as Wales, apparently including house numbers in Welsh and English (despite Welsh using English numerals). It seems to work somewhat ok if there is a clear image of an obvious and distinctive monument, otherwise it isn't particularly accurate
I got similar results, and that would be ok if it didn't sound so confident with the guess.
Assuming this is more of a proof of concept/prototype, that's not bad. It didn't get it[1] right, but at it's core, the guess is not terrible, shift it 560km[2] south-east and you'd be bang on. I'll admit, I did set the bar a bit high.
It told me the correct country, but the completely wrong city, and then began to describe a typical place in the style of the country - nothing of which was visible on the image.
> Sweden
Good
> Rural area
good
> [pin in the center of Stockholm, the most urban area in Sweden]
ouch.. not so good.
to be fair they say they provide the coordinates for the city or the town, so if it's not too far from stockholm I would count it right
It wasn't. Not even close.
It recognized 2/7 of the pictures I used. The two success are a really well known place in Rome (the roman forum, although it got the arch wrong) and a small but very touristic city. It guessed the country right but was far from the place in 4 cases: landmarks were visible but they are not hugely touristic. In the last case the country was wrong, but it was a picture from my office with no landmark.
I've been learning deep learning, and I built a very toy version of this recently. It's really just a classifier that can (maybe) tell you if a photo was taken in one of the 5 cities I trained it on.
https://huggingface.co/spaces/itslenny/fastai-lesson-2-big-m...
I had ChatGPT generate some selfies taken in various places, then ran it through this app. My assumption was that this app would do really well, since one model would identify the stereotypical features generated by the other model. It got 1/3. It nailed Minneapolis, it got Damascus, Syria wrong (said Amman, Jordan), and it got the Ballard neighborhood of Seattle wrong (said San Francisco).
> Explanation: The image is of a residential street in London. The architecture of the buildings is典型的英国风格, and the vegetation is consistent with that of London. The street is also relatively narrow, which is common in London.
Not sure about the mix of languages here. It’s correct, but not specific.
It was correctly able to identify several photos of my vacation to NC, down to the exact location where the photo was taken on the hiking trail. Pretty scary. Additionally, just to be sure, I used an EXIF data wiper to make sure it wasn’t pulling data from there and tried each photo in a seperate Incognito instance. Still got it correct, all 3 times. Mind boggling.
how?
It's not very accurate, but it seems consistent. However it quite often tells me 'this is in X because the language on the sign' when there are no signs at all. Or just now I got 'The house in the background is made of wood, which is a common building material in Finland.' with a photo of a lake. There is no house, there are trees though :)
I gave it a snippet of Montevideo city skyline, and it responded with:
The photo was taken from a rooftop in Buenos Aires, Argentina. The photo shows a clear view of the city's skyline, including the iconic Obelisk of Buenos Aires. The buildings in the photo are characteristic of Buenos Aires' architecture and the vegetation is typical of the region.
I took images directly from Google Images search and it got them wrong. But it was sort of directionally right. My local city hall it said was the courthouse in the same county. The local bridge was put into a wrong state.
Interestingly, it provided reference images and the images I posted were basically in the reference images.
Googled 'vacation photo' and picked some off of Flickr. The locations matched the captions, State and Country (FLA and Cancun) correctly.
Obviously uploading a picture of a hot dog will waste compute on trying to figure out what kind of traffic the ketchup is, but it works with snapshots great (not stock photos)
Childish AI, I gave him two photos, both was totally wrong and hundred thousands miles from actual locations.
Don't recommend.
Funny enough it was accurate all while citing items that were not in the picture (not even cropped out), like tall buildings and signs in a specific language. I'm sure there will be refined versions that are scarily accurate. Another OSINT tool for better or worse.
Claude Vision can do that for you if you are building something similar. Had similar results with OpenAI.
I uploaded a photo of a helicopter cockpit, flying up the Talkeetna River in alaska, which was very out of focus in the background. It knew exactly where I was, and even mentioned I was in a helicopter. And that it was fall!
I uploaded couple of pics of a beach in Turks and Caicos. It came back with a beach in the Bahamas. Not even close. But I suppose close enough geographically. Also a pic taken from a stationary train in Chicago, came back as NYC.
Comment was deleted :(
FirebaseError: Installations: Create Installation request failed with error "400 INVALID_ARGUMENT: API key expired. Please renew the API key." (installations/request-failed).
I uploaded a drone shot of New Taipei City and it gave me Taipei. Close enough. I don’t know if it was cheating though because the image had exif gps coords embedded…
The site worked fine on Firefox on iOS.
Hmm interesting; one hit, one probably in vaguely the right area; both from scans of ~40 year old photos. (As someone else noted, site is rather brokne on Firefox/Linux but does work).
Why use a LLM for this? You'd definitely want a large model, but this seems like a more straightforward classification problem that doesn't require understanding of language.
I'm blown away; it correctly identified a photo taken inside my house - just a picture of my kitchen - as being in eastern Massachusetts, just based on the architecture.
I didn't have the same luck.
I gave it a photo from inside a house, you can see a person on the bed, and the white wall behind - that's it.
Obviously I wasn't expecting an accurate location, but
> This photo was taken in Los Angeles, California. We can tell this from the architecture of the buildings in the background, as well as the vegetation. The palm trees are a dead giveaway that this is Los Angeles.
There are no palm trees, the photo wasn't taken in the US and palm trees exist outside of LA.
I also fed a photo of some quite distinctive castle ruins. It mislocated that by 100s of miles.
Maybe it does a lot better with photos related to the US, training set probably contained mostly US-related images, as only one image out of ~10 taken in various European places were correctly guessed for me. Most of the guesses was places in the US while none of the images I tried were from the US.
There's a certain training set bias. Most pictures from post-Soviet states land in Moscow for me.
I wonder if it looks at EXIF data at all.
I stripped date and geo info.
It's funny that with a direct match to a precisely located photo in its database, it got the country right by comparing the architectural style, but still got the city wrong.
I put in a photo of a small lake near Truckee, CA, several miles from Lake Tahoe, and it reported it was Lake Tahoe. It was wrong, but impressed it was geographically very close.
Now try with a photo of a lake that's on the other side of the world compared to Lake Tahoe and see if it doesn't also report it as Lake Tahoe.
> The photo was taken from a tall structure, possibly a fire tower.
It was way off on the state, but I am still impressed with that spot on description. It was taken from a fire tower.
This is what happens when I try to scroll to read the results...?
https://i.imgur.com/ywc1Hn0.png
(Chrome on Windows)
Reminds me of a research paper that used ai to accurately pin point where picture was taken, I had hoped this was it. But it’s better than nothing.
I uploaded a nondescript scenery photo with no non-natural cues from the Dominican Republic, it got the general area right within 50km.
Interesting concept, and it works somehow. But they definitely needs better web developers. Very strange flickering, what the hell is that?
Even though it missed the town by few kilometers it also recognized my wife's dress and linked to the webstore for it
I think it just uses EXIF data, then makes guess of other photos from same IP. Fake it till you make it
I’m not sure how this works under the hood. My initial observation is it does not work.
Uploaded a photo of the Bell Centre. Easy Habs logo on it. City location: Toronto.
it seems to be a rule of thumb that you can pick a subreddit that operated as some kind of service, ie r/whereisthis, and replace that entire apparatus with an ai of some kind
Is the website glitchy on firefox?
Very - never seen anything explode quite like it
> I'm sorry but GeoSpy is not allowed to process this image. Please try again with a different image or contact support at info@graylark.io
That was with an image I took in London on my phone.
Entirely wrong result.
scary accurate
This seems like another Hotdog/Not Hotdog business model.
The page is very broken for me (Firefox in Linux), locking up, flickering.
I did manage to get it to place a picture of a praying mantis I took in Japan to be from California...
I get the same flickering in Firefox on MacOS, but it managed to recognise my picture.
Same, Firefox in linux, also tried with all extensions disabled and same thing. Just flickering with an "error" modal.
I got "an error has occurred"
Me too - maybe its overloaded
Comment was deleted :(
Crafted by Rajat
Source Code