Map of 2000+ lemmy communities

Danterious@lemmy.dbzer0.com · edit-2 2 months ago

Map of 2000+ lemmy communities

aasatru@kbin.earth · 2 months ago

!dadvice@lemmy.world is there right next to !bellyexpansion@lemmynsfw.com at [9.6, 38.3]. I guess someone is excited about their wives being pregnant. :)

m_f@midwest.social · 2 months ago

What is !steamdeck@lemmy.world doing over with the red dots 🤔

Blaze (he/him)@feddit.org · 2 months ago

That’s, great, thanks !

Reposting to !dataisbeautiful@mander.xyz

cabbage@piefed.social · edit-2 2 months ago

Very cool!

Do you be have any idea how tolling scraping these data is for the servers?

If this is something you want to keep working on, maybe it could be combined with a sort of Threadiverse fund raiser: we collectively gather funds to cover the cost of scraping (plus some for supporting the threadiverse, ideally), and once we reach the target you release the map based on the newest data and money is distributed proportionally to the different instances.

Maybe it’s a stupid idea, or maybe it would add too much pressure into the equation. But I think it could be fun! :)

Danterious@lemmy.dbzer0.com · 2 months ago

I had to try scraping the websites multiple times because of stupid bugs I put in the code beforehand, so I might of put more strain on the instances than I meant too. If I did this again it would hopefully be much less tolling on the servers.

As for the cost of scraping it actually isn’t that hard I just had it running in the background most of the time.

_Anti _{Commercial-AI} _license _(CC _BY-NC-SA _4.0)

PugJesus@lemmy.world · 2 months ago

What do the X and Y Axis represent?

Danterious@lemmy.dbzer0.com · edit-2 2 months ago

Well I used dimensionality reduction to make it 2D so the axes are how the algorithm chose to compress it.

The original data had each data point as a community and the features as a frequency of a user posting in that community.

_Anti _{Commercial-AI} _license _(CC _BY-NC-SA _4.0)

threelonmusketeers@sh.itjust.works · 2 months ago

I used dimensionality reduction to make it 2D

Huh, interesting. So is the idea to spread the data out as much an possible, while keeping “similar” communities near each other? What was the dimensionality of the original set?

Danterious@lemmy.dbzer0.com · edit-2 2 months ago

Total communities: 2986

Total users: 21934

So the dimensions were reduced from (2986, 21934) to (2986, 2)

Edit: Also yeah it is using Umap for the algorithm and it does do something pretty similar to what you described.

_Anti _{Commercial-AI} _license _(CC _BY-NC-SA _4.0)

keepthepace@slrpnk.net · 2 months ago

That’s really interesting! It shows which communities share users. I am part of jlai.lu, a french-speaking community that is relatively isolated by also slrpnk.net that seems very spread out!

Would it make sense to compute the standard deviation of each instance’s communities? It would give an idea of which are islands and which are more extended. Not sure if it makes sense to compute it more on 2 dimensions or on the original 21934 though.

Danterious@lemmy.dbzer0.com · 2 months ago

Yeah that sounds like a good idea so you can see how connected local communities are. Probably makes more sense to use original dimensions so no extra information is lost.

_Anti _{Commercial-AI} _license _(CC _BY-NC-SA _4.0)