Search results

(Now looking for a generic interface to split compounds, or split off suffixes etc., or even sentences into words, for various languages. Even in English, I'd prefer to split off parts like -ed and -s and add them separately to the board)

: is there something in that allows me to parse word forms into their parts, where possible?

So words would get broken down into a tree. Like pos_tag and chunk, but without stopping at tokens or word forms.

Like "stemming", but without throwing away all the other parts.

0

Quick question

Where can I find a list of most frequent words, not just ranked but with (rough) frequencies?

(Without downloading a huge corpus and compiling it myself?)

edit: fully answered - see replies :D

0
1

Hi everyone! :blobcatwave: I'm Jodie and I love all things !

I'm currently working as a developer advocate in at , and in my previous life I worked as a , mostly in . I write tutorials and blog posts, present at conferences and host webinars about a range of , data science and topics. 📊

I'm hoping we can build up an amazing machine learning and data science community here on Mastodon!

0
0


I'm a researcher at the Institut Urban Landscape at the Zürich University of Applied Sciences (ZHAW). I'm working on the governance and narratives of sustainable digitalization and urban sustainability transformations.

Here for friendly exchanges and critical takes on urban digitalization and sustainability, governance and , + 🐍, , , Bayesian stats, and finding little nuggets of inspiration from your research lifes. Also cycling.

0
0
0
0

We're the LIPN, a joint Laboratory between the and the University Sorbonne Paris Nord.

We're approximately 150 researchers, within five research teams :
- Machine Learning
- Combinatorial and High Performance Computing
- design and analysis of combinatorial models at the interface of physics, geometry and algorithmic
- and
- Automatic natural language processing and knowledge representation .

0
0
0

I might as well do another specifically for the side of this here fediverse:

Coming from (with applications in ) to doing (computational ), I've now landed in . Specifically, I'm interested in exploring , both sharpening existing critiques of current AI practise by confronting capital and exploring inherent politics of technologies, and finding better ones for a socialist world.

0
0
0

Browser-Native Translation and Language Detection APIs Coming Soon

洪 民憙 (Hong Minhee) @hongminhee@hackers.pub

Just reviewed the W3C draft for the Translator and Language Detector APIs. This is genuinely exciting development for web developers.

The proposal would add native browser support for:

  • Text translation between languages
  • Language detection of arbitrary text
  • Both with streaming capabilities

No more relying on third-party translation services or embedding external APIs for basic language operations. All processing happens locally in the browser.

The API design is clean and straightforward:

// Translation example
const translator = await Translator.create({
  sourceLanguage: "en",
  targetLanguage: "fr"
});

const translatedText = await translator.translate("Hello world");

// Language detection example
const detector = await LanguageDetector.create();
const results = await detector.detect("Hello world");
// Returns array of detected languages with confidence scores

This will be a game-changer for multilingual sites and applications. The browser handles downloading appropriate language models and manages usage quotas.

The spec is still in draft form but shows promising progress toward standardizing these capabilities across browsers. Looking forward to seeing this implemented.

Read more →
0
0
0
0