Add bigdata-analysis skill
loading diff…
bigdata-analysis — an AI coding skill that prevents silent data bugs in Hive/Impala/Spark ETL pipelines.
When AI assistants generate Spark/Hive code, they frequently produce code with invisible bugs — row counts look correct, no errors thrown, but data is silently wrong. This skill encodes 10 battle-tested rules from real production incidents:
GROUP BY causing silent row explosionSELECT * in INSERT causing column position shiftsforeachPartition as a no-op for cache materializationData engineers writing Hive/Impala SQL or Spark Scala ETL jobs on HDFS/YARN, especially those using AI coding assistants.
One rule alone (Rule 4: Spark SQL vs DataFrame API) reduced a production job from 367 lines / 45 Jobs / 3 hours to 50 lines / 1 Job / 15 minutes.
Added to Data & Analysis (alphabetical order).
npx skills add Oak-B/bigdata-analysis-skill@bigdata-analysis