What is Hackers' Pub?

Hackers' Pub is a place for software engineers to share their knowledge and experience with each other. It's also an ActivityPub-enabled social network, so you can follow your favorite hackers in the fediverse and get their latest posts in your feed.

0
0
0
0
0
0
0
0

Apache Arrow is 10 years old 🎉

The Apache Arrow project was officially established and had its first git commit on February 5th 2016, and we are therefore enthusiastic to announce its 10-year anniversary! Looking back over these 10 years, the project has developed in many unforeseen ways and we believe to have delivered on our objective of providing agnostic, efficient, durable standards for the exchange of columnar data. How it started From the start, Arrow has been a joint effort between practitioners of various horizons looking to build common grounds to efficiently exchange columnar data between different libraries and systems. In this blog post, Julien Le Dem recalls how some of the founders of the Apache Parquet project participated in the early days of the Arrow design phase. The idea of Arrow as an in-memory format was meant to address the other half of the interoperability problem, the natural complement to Parquet as a persistent storage format. Apache Arrow 0.1.0 The first Arrow release, numbered 0.1.0, was tagged on October 7th 2016. It already featured the main data types that are still the bread-and-butter of most Arrow datasets, as evidenced in this Flatbuffers declaration: /// ---------------------------------------------------------------------- /// Top-level Type value, enabling extensible type-specific metadata. We can /// add new logical types to Type without breaking backwards compatibility union Type { Null, Int, FloatingPoint, Binary, Utf8, Bool, Decimal, Date, Time, Timestamp, Interval, List, Struct_, Union } The release announcement made the bold claim that "the metadata and physical data representation should be fairly stable as we have spent time finalizing the details". Does that promise hold? The short answer is: yes, almost! But let us analyse that in a bit more detail: the Columnar format, for the most part, has only seen additions of new datatypes since 2016. One single breaking change occurred: Union types cannot have a top-level validity bitmap anymore. the IPC format has seen several minor evolutions of its framing and metadata format; these evolutions are encoded in the MetadataVersion field which ensures that new readers can read data produced by old writers. The single breaking change is related to the same Union validity change mentioned above. First cross-language integration tests Arrow 0.1.0 had two implementations: C++ and Java, with bindings of the former to Python. There were also no integration tests to speak of, that is, no automated assessment that the two implementations were in sync (what could go wrong?). Integration tests had to wait for November 2016 to be designed, and the first automated CI run probably occurred in December of the same year. Its results cannot be fetched anymore, so we can only assume the tests passed successfully. 🙂 From that moment, integration tests have grown to follow additions to the Arrow format, while ensuring that older data can still be read successfully. For example, the integration tests that are routinely checked against multiple implementations of Arrow have data files generated in 2019 by Arrow 0.14.1. No breaking changes... almost As mentioned above, at some point the Union type lost its top-level validity bitmap, breaking compatibility for the workloads that made use of this feature. This change was proposed back in June 2020 and enacted shortly thereafter. It elicited no controversy and doesn't seem to have caused any significant discontent among users, signaling that the feature was probably not widely used (if at all). Since then, there has been precisely zero breaking change in the Arrow Columnar and IPC formats. Apache Arrow 1.0.0 We have been extremely cautious with version numbering and waited until July 2020 before finally switching away from 0.x version numbers. This was signalling to the world that Arrow had reached its "adult phase" of making formal compatibility promises, and that the Arrow formats were ready for wide consumption amongst the data ecosystem. Apache Arrow, today Describing the breadth of the Arrow ecosystem today would take a full-fledged article of its own, or perhaps even multiple Wikipedia pages. Our "powered by" page can give a small taste. As for the Arrow project, we will merely refer you to our official documentation: The various specifications that cater to multiple aspects of sharing Arrow data, such as in-process zero-copy sharing between producers and consumers that know nothing about each other, or executing database queries that efficiently return their results in the Arrow format. The implementation status page that lists the implementations developed officially under the Apache Arrow umbrella (native software libraries for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust). But keep in mind that multiple third-party implementations exist in non-Apache projects, either open source or proprietary. However, that is only a small part of the landscape. The Arrow project hosts several official subprojects, such as ADBC and nanoarrow. A notable success story is Apache DataFusion, which began as an Arrow subproject and later graduated to become an independent top-level project in the Apache Software Foundation, reflecting the maturity and impact of the technology. Beyond these subprojects, many third-party efforts have adopted the Arrow formats for efficient interoperability. GeoArrow is an impressive example of how building on top of existing Arrow formats and implementations can enable groundbreaking efficiency improvements in a very non-trivial problem space. It should also be noted that Arrow, as an in-memory columnar format, is often used hand in hand with Parquet for persistent storage; as a matter of fact, most official Parquet implementations are nowadays being developed within Arrow repositories (C++, Rust, Go). Tomorrow The Apache Arrow community is primarily driven by consensus, and the project does not have a formal roadmap. We will continue to welcome everyone who wishes to participate constructively. While the specifications are stable, they still welcome additions to cater for new use cases, as they have done in the past. The Arrow implementations are actively maintained, gaining new features, bug fixes, and performance improvements. We encourage people to contribute to their implementation of choice, and to engage with us and the community. Now and going forward, a large amount of Arrow-related progress is happening in the broader ecosystem of third-party tools and libraries. It is no longer possible for us to keep track of all the work being done in those areas, but we are proud to see that they are building on the same stable foundations that have been laid 10 years ago.

arrow.apache.org · Apache Arrow

0
0
0
0
0
0
0
0
0

In the book Rural Versus Urban: The Growing Divide That Threatens Democracy (Sept 2025) Mettler and Brown argue that today’s U.S. political polarization is increasingly organized around place — a hardened rural-versus-urban identity divide — rather than simply "red states vs. blue states" or "coastal elites vs. flyover country." In their account, American sense of political conflict has come to feel like a fight between us and them based on where people live, and this rural - urban sorting now cuts across essentially every region and state.

Interesting! 🤔 Who could have possibly seen that coming? Maybe we could form a committee, have a lot of meetings, write a whitepaper or two, get some real solid podcasts going (everyone must know what Ezra Klein's opinion is on this), and get the think tanks to think even more hardererer?

Oh yeah! I almost forgot! We already started working on this nine months ago and we're underway in three rural counties already. My bad, y'all. rgmii.org

0
0
1
0
0

U uhynulých ptáků v pražské zoologické zahradě byla potvrzena ptačí chřipka. Informovala o tom Státní veterinární správa.
ℹ️ Kvůli výskytu nemoci omezí vstup návštěvníků do průchozích voliér a dočasně uzavře Rákosův pavilon a pavilon Sečuán, uvedl mluvčí.
🔵 Inspektoři zjistili výskyt nákazy poté, co zoo nahlásila postupný úhyn několika ptáků v některých expozicích.
🔗 https://czch.tv/WT6Zbv

0
0
0
0
0
0
0

Open source is built by maintainers. Testing is built on trust.

In the latest Push to Talk | Meet the Maintainers, we sit down with Ned Batchelder (@nedbatNed Batchelder), creator and long-time maintainer of coverage.py, to talk about the story behind one of Python’s best-known testing tools.

In this episode:

➡️ How coverage.py started, and what kept it going

➡️ What code coverage can (and can’t) tell you

➡️ Why “94% measured” is a choice, not an accident

Watch the video: youtu.be/xjWjfRVTUHo?si=-pS184

0
0
0
0
0
0
1
0

🌟 Happy International Day of Women & Girls in Science! 🌟

Today we celebrate the curiosity, creativity, and achievements of women and girls in STEM.

At the FSFE :fsfe: we have projects like the illustrated book "Ada & Zangemann". It shows how contributes to make technology more inclusive, giving everyone the tools to explore, create, and collaborate.

ada.fsfe.org

0
0
0
0

RE: mastodon.social/@404mediaco/11

This is definitely why I’m cautious about photos I share & WHERE.

AI is not sophisticated about a lot of things but it can recognize patterns.

Whether it’s photos or anything that identifies you or your location, consider these Questions before posting…

👨‍👩‍👧‍👦 Who is this for?
🛟 What app am I sharing on?
👀 Who can see my content now and in the future?
🔐 Who has access to this data? (Apps, orgs, data brokers)
🤔 Why am I posting?

rootschangemedia.com/digital-s

0
0
0

RE: mastodon.social/@404mediaco/11

This is definitely why I’m cautious about photos I share & WHERE.

AI is not sophisticated about a lot of things but it can recognize patterns.

Whether it’s photos or anything that identifies you or your location, consider these Questions before posting…

👨‍👩‍👧‍👦 Who is this for?
🛟 What app am I sharing on?
👀 Who can see my content now and in the future?
🔐 Who has access to this data? (Apps, orgs, data brokers)
🤔 Why am I posting?

rootschangemedia.com/digital-s

0

Today a customer wanted to "help" me and gave me a list of Copilot's answers to the configuration I intended. The five pages contained different ways not to solve the problem, because they were wrong, outdated, lacking details that made the suggestions impossible, and some didn't even apply to the problem...

0
0
0
0
0
0
0
0
0
0

Despite Mastodon having real quote posting now, I still see a lot of folks just posting links to toots.

Please, start using real quotes.

I'm guessing the reason lots of folks are still just pasting links is that some instances running modified software started displaying those as quotes already before Mastodon got quotes. But well-behaved fedi software doesn't do that, because it bypasses notification and consent of the quoted post author. So when you do that, the rest of us are left with the bad old UX of having to copy-and-paste the URL to the search box to actually see what you were trying to quote.

0
0
0
0
0