{"html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Fyohei-a.hatenablog.jp%2Fentry%2F20210612%2F1623470162\" title=\"PySpark \u306f Java \u306e\u6b63\u898f\u8868\u73fe\u8a18\u6cd5\u3092\u4f7f\u3046 - ablog\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","title":"PySpark \u306f Java \u306e\u6b63\u898f\u8868\u73fe\u8a18\u6cd5\u3092\u4f7f\u3046","version":"1.0","width":"100%","url":"https://yohei-a.hatenablog.jp/entry/20210612/1623470162","blog_url":"https://yohei-a.hatenablog.jp/","author_name":"yohei-a","height":"190","blog_title":"ablog","description":"PySpark \u3067\u306f Java \u306e\u6b63\u898f\u8868\u73fe\u3092\u4f7f\u3046 Regex in pyspark internally uses java regex.One of the common issue with regex is escaping backslash as it uses java regex and we will pass raw python string to spark.sql we can see it with a sample example \\d represents digit in regex.Let us use spark regexp_extract to matc\u2026","image_url":null,"published":"2021-06-12 12:56:02","provider_url":"https://hatena.blog","categories":["Spark"],"type":"rich","provider_name":"Hatena Blog","author_url":"https://blog.hatena.ne.jp/yohei-a/"}